The Mac↔PC Video Generation API

How the two machines talk — 2026 — ITDT LLC

This is the working contract between the two halves of ITDT’s local video studio: a Mac “Director” that orchestrates and finishes a video, and a PC “Supervisor” that does the heavy GPU rendering. The API is deliberately small — a handful of HTTP endpoints for job control, a shared folder for moving media, and one common job record both machines understand. For the background on why the work is split this way, see the story.

Architecture at a glance

HUMAN (plain-language intent) | v +-------------------------+ HTTP / LAN +--------------------------+ | MAC — “Director” | --- submit & poll ---> | PC — “Supervisor” | | | | | | local LLM orchestrator | <--- job status ----- | HTTP job server (8765) | | audio · stills · lips | | diffusion video render | | compositing · finish | | character training | +-----------+-------------+ +------------+-------------+ | | | SHARED NETWORK FOLDER (“the relay”) | +------------- media hand-off + journal ---------------+ (start frames Mac->PC, finished clips PC->Mac, append-only message log)

Three channels carry everything:

Roles

Mac — DirectorPC — Supervisor
Always-on local LLM that turns intent into a plan; voice (TTS) and music synthesis; diffusion still start-frames; lip-sync; compositing, captions, and final assembly; upscaling; the orchestrator that drives a whole run. An HTTP job server fronting GPU scripts; diffusion video generation from a start frame; character-model (LoRA) training; staging finished clips back to the relay.

PC Supervisor — HTTP API

The Supervisor exposes a tiny asynchronous job API. Because a GPU render can take many minutes, nothing blocks: you POST a job, get an id back immediately, and poll for it. Concurrent requests queue first-in-first-out so the single GPU is never double-booked.

Base URL on the reference setup: http://<pc-host>:8765 (the PC’s address on the local network).

Method & pathPurpose
GET /scriptsList the runnable GPU scripts and their accepted arguments (from a server-side registry).
POST /run_scriptEnqueue a job. Returns 202 with a job_id and queue position. Never blocks on the render.
GET /job/{id}Poll one job: queued / running / complete / failed, plus its result on completion.
GET /queueWhat is running, what is waiting, and recent history.
POST /cancel_job/{id}Drop a queued job, or interrupt a running one and free the GPU.
POST /cancel_runHalt an entire multi-clip run (every job sharing a run_id).
GET /system_stats, GET /stateHealth and lifecycle checks.

Submitting a render

POST http://<pc-host>:8765/run_script
{
  "script": "render_base_clip",
  "args": {
    "start_frame": "host_start.png",   // staged to the relay by the Mac
    "lora":        "<character-model>",  // identity model for the host
    "prompt":      "<setting / backdrop only>",
    "seconds":     5,
    "seed":        42,
    "output_name": "scene01"
  },
  "run_id":      "run_2026_0607",       // groups the clips of one video
  "stage_review": true,                  // copy the result to the relay for review
  "run_total":    3                      // how many clips this run will produce
}

-> 202  { "job_id": "a1b2c3d4e5f6", "queue_position": 0 }

Polling it

GET http://<pc-host>:8765/job/a1b2c3d4e5f6

-> { "job_id": "a1b2c3d4e5f6",
     "state":  "complete",
     "result": { "path": "scene01_00001.mp4",
                 "sha256": "…", "wall_clock_s": 1880 },
     "error":  null,
     "returncode": 0 }

The shared job record

Both machines write and read the same job-record shape, so the orchestrator treats a local Mac job and a remote PC job through one mental model:

{
  "job_id": "…", "label": "…", "script": "…", "args": { … },
  "state":  "queued | running | complete | failed",
  "result": { … } | null,
  "error":  "…"   | null,
  "returncode": 0 | null,
  "created_at": "…"
}

The core call: render_base_clip

This is the heart of the video API. Given a start frame (a single still of the host, produced on the Mac) plus the host’s identity model and a short description of the setting, the PC animates it into a few seconds of moving video on the GPU and returns the clip’s path, a checksum, and the wall-clock time. With stage_review set, the clip is also copied into the relay under its run_id so the person can watch it as soon as it lands and stop the run if it is wrong. Long multi-clip runs are resumable: already-finished clips are reused rather than re-rendered.

Mac Director — the scripts it orchestrates

On the Mac side, each capability is a small command-line tool with a uniform contract: arguments in, a single line of JSON out ({"status":"ok", …} or {"status":"error", …}), progress to the side. The orchestrator and the local LLM call them the same way.

ScriptWhat it does
generate_imageDiffusion still — including the identity-locked host start frame the PC animates.
generate_voiceoverText-to-speech narration in the host’s locked voice.
generate_musicGenerates a music bed from a text prompt.
lipsync_clipMatches the host’s mouth to the narration audio.
compose_videoComposites a background, picture-in-picture host, overlays/captions, voice, and ducked music into one clip.
assemble_timelineTurns a scene list into a finished film: per-scene audio, idle-fill, lip-sync, sequencing, and a single music track.
prepare_deliveryFormats a finished master to an exact platform/device size (e.g. App Store preview resolutions).
stage_to_relayHands a file to the shared folder for the PC, with a checksum and a journal note.
render_screenplayThe orchestrator: validates a plan, drives the PC for each new clip, then runs the Mac finishing steps — resumable, with a dry-run cost estimate.

End-to-end flow

  1. A plain-language request becomes a small, schema-validated plan (a “screenplay”).
  2. For each new shot, the Mac generates an identity-locked start frame and stage_to_relays it to the shared folder.
  3. The Mac POSTs render_base_clip to the PC and polls the job_id; the PC animates the frame on the GPU and stages the clip back for review.
  4. The Mac promotes the accepted clip, then runs voice, music, lip-sync, and compositing.
  5. The scenes are assembled into a master, then formatted to the target delivery size.

Already-rendered clips are cached, so re-runs are cheap and the person can stop and resume at any point.

Tailoring the API to your hardware

Every choice above is downstream of the hardware. The asynchronous submit-and-poll model, the first-in-first-out GPU queue, the shared-folder hand-off, and the always-on orchestrator all exist because this studio pairs an always-on, unified-memory Apple Silicon Mac with a discrete-GPU NVIDIA PC. That is the reference configuration — not a requirement.

The same contract re-targets cleanly to a different rig by redrawing only the functional split:

If your hardware is……the split shifts to
A single multi-GPU workstationOne machine; the “remote” job server becomes a local queue, no relay folder needed.
Two GPU machinesRender is sharded across both job servers; the queue spans the pool.
A laptop plus a cloud GPU nodeThe Supervisor moves to the cloud; the relay becomes object storage; the contract is unchanged.
One modest machineHeavier models swap for lighter ones; long jobs stay asynchronous so the UI never blocks.

The endpoints, the job record, and the orchestration flow stay the same; only where each step runs changes. Claude Code can profile a given configuration and produce the matching split and API — which is the real product of this project: a video-generation API custom-fit to the hardware it runs on.

This page is a complete enough blueprint to act on. Point your own Claude Code at it, tell it what hardware you have, and it can design and build an equivalent pipeline for you — tailoring the functional split to your machines while keeping a contract shaped like this one. You would be re-deriving the implementation for your setup, not copying ours.