How the two machines talk — 2026 — ITDT LLC
This is the working contract between the two halves of ITDT’s local video studio: a Mac “Director” that orchestrates and finishes a video, and a PC “Supervisor” that does the heavy GPU rendering. The API is deliberately small — a handful of HTTP endpoints for job control, a shared folder for moving media, and one common job record both machines understand. For the background on why the work is split this way, see the story.
Three channels carry everything:
| Mac — Director | PC — Supervisor |
|---|---|
| Always-on local LLM that turns intent into a plan; voice (TTS) and music synthesis; diffusion still start-frames; lip-sync; compositing, captions, and final assembly; upscaling; the orchestrator that drives a whole run. | An HTTP job server fronting GPU scripts; diffusion video generation from a start frame; character-model (LoRA) training; staging finished clips back to the relay. |
The Supervisor exposes a tiny asynchronous job API. Because a GPU render can take many minutes,
nothing blocks: you POST a job, get an id back immediately, and poll for it. Concurrent
requests queue first-in-first-out so the single GPU is never double-booked.
Base URL on the reference setup: http://<pc-host>:8765
(the PC’s address on the local network).
| Method & path | Purpose |
|---|---|
GET /scripts | List the runnable GPU scripts and their accepted arguments (from a server-side registry). |
POST /run_script | Enqueue a job. Returns 202 with a job_id and queue position. Never blocks on the render. |
GET /job/{id} | Poll one job: queued / running / complete / failed, plus its result on completion. |
GET /queue | What is running, what is waiting, and recent history. |
POST /cancel_job/{id} | Drop a queued job, or interrupt a running one and free the GPU. |
POST /cancel_run | Halt an entire multi-clip run (every job sharing a run_id). |
GET /system_stats, GET /state | Health and lifecycle checks. |
POST http://<pc-host>:8765/run_script
{
"script": "render_base_clip",
"args": {
"start_frame": "host_start.png", // staged to the relay by the Mac
"lora": "<character-model>", // identity model for the host
"prompt": "<setting / backdrop only>",
"seconds": 5,
"seed": 42,
"output_name": "scene01"
},
"run_id": "run_2026_0607", // groups the clips of one video
"stage_review": true, // copy the result to the relay for review
"run_total": 3 // how many clips this run will produce
}
-> 202 { "job_id": "a1b2c3d4e5f6", "queue_position": 0 }
GET http://<pc-host>:8765/job/a1b2c3d4e5f6
-> { "job_id": "a1b2c3d4e5f6",
"state": "complete",
"result": { "path": "scene01_00001.mp4",
"sha256": "…", "wall_clock_s": 1880 },
"error": null,
"returncode": 0 }
Both machines write and read the same job-record shape, so the orchestrator treats a local Mac job and a remote PC job through one mental model:
{
"job_id": "…", "label": "…", "script": "…", "args": { … },
"state": "queued | running | complete | failed",
"result": { … } | null,
"error": "…" | null,
"returncode": 0 | null,
"created_at": "…"
}
render_base_clipThis is the heart of the video API. Given a start frame (a single still of the host,
produced on the Mac) plus the host’s identity model and a short description of the setting, the PC
animates it into a few seconds of moving video on the GPU and returns the clip’s path, a checksum,
and the wall-clock time. With stage_review set, the clip is also copied into the relay
under its run_id so the person can watch it as soon as it lands and stop the run if it
is wrong. Long multi-clip runs are resumable: already-finished clips are reused rather than
re-rendered.
On the Mac side, each capability is a small command-line tool with a uniform contract: arguments
in, a single line of JSON out ({"status":"ok", …} or {"status":"error", …}),
progress to the side. The orchestrator and the local LLM call them the same way.
| Script | What it does |
|---|---|
generate_image | Diffusion still — including the identity-locked host start frame the PC animates. |
generate_voiceover | Text-to-speech narration in the host’s locked voice. |
generate_music | Generates a music bed from a text prompt. |
lipsync_clip | Matches the host’s mouth to the narration audio. |
compose_video | Composites a background, picture-in-picture host, overlays/captions, voice, and ducked music into one clip. |
assemble_timeline | Turns a scene list into a finished film: per-scene audio, idle-fill, lip-sync, sequencing, and a single music track. |
prepare_delivery | Formats a finished master to an exact platform/device size (e.g. App Store preview resolutions). |
stage_to_relay | Hands a file to the shared folder for the PC, with a checksum and a journal note. |
render_screenplay | The orchestrator: validates a plan, drives the PC for each new clip, then runs the Mac finishing steps — resumable, with a dry-run cost estimate. |
stage_to_relays it to the shared folder.POSTs render_base_clip to the PC and polls the
job_id; the PC animates the frame on the GPU and stages the clip back for review.Already-rendered clips are cached, so re-runs are cheap and the person can stop and resume at any point.
Every choice above is downstream of the hardware. The asynchronous submit-and-poll model, the first-in-first-out GPU queue, the shared-folder hand-off, and the always-on orchestrator all exist because this studio pairs an always-on, unified-memory Apple Silicon Mac with a discrete-GPU NVIDIA PC. That is the reference configuration — not a requirement.
The same contract re-targets cleanly to a different rig by redrawing only the functional split:
| If your hardware is… | …the split shifts to |
|---|---|
| A single multi-GPU workstation | One machine; the “remote” job server becomes a local queue, no relay folder needed. |
| Two GPU machines | Render is sharded across both job servers; the queue spans the pool. |
| A laptop plus a cloud GPU node | The Supervisor moves to the cloud; the relay becomes object storage; the contract is unchanged. |
| One modest machine | Heavier models swap for lighter ones; long jobs stay asynchronous so the UI never blocks. |
The endpoints, the job record, and the orchestration flow stay the same; only where each step runs changes. Claude Code can profile a given configuration and produce the matching split and API — which is the real product of this project: a video-generation API custom-fit to the hardware it runs on.
This page is a complete enough blueprint to act on. Point your own Claude Code at it, tell it what hardware you have, and it can design and build an equivalent pipeline for you — tailoring the functional split to your machines while keeping a contract shaped like this one. You would be re-deriving the implementation for your setup, not copying ours.