Verbatim verification procedure (Mac terminal) · 2026 · ITDT LLC · sanitized placeholders
Audience: the owner, running on the Mac. Goal: Independently confirm the 6 Mac build steps of the Screenplay Compiler work, then run the cross-machine integration test (Mac + PC together). What this covers: generate_image start-frame mode, the screenplay schema + validator + library, generate_voiceover --voice, assemble_timeline, compose_video --pip-round/--pip-fade, the render_screenplay orchestrator + Elira tool wiring, and the full end-to-end run.
Each test = a command to copy/paste + what a PASS looks like. If a test fails, stop and tell me which test number + paste the output.
Companion docs: Design/Screenplay_Compiler_Mac_Build.md (what was built + why), Design/Screenplay_Compiler_Design.md (the locked feature spec), Relay/Design/Supervisor_API.md (the PC contract).
Time budget: Tests 1–9 are quick (seconds to ~2 min each, except the two marked SLOW). Test 10 (integration) includes a real ~31 min PC render + an ~11 min Mac lip-sync, so set aside ~50 min and run it when you can leave it.
You run these in Terminal, not in Elira. Open the macOS Terminal app (Applications → Utilities → Terminal), and keep it open for the whole session.
Each gray box is one command block. Run them one box at a time, top to bottom:
Paste each box whole — don’t run it line by line. Several boxes are a single multi-line command (you’ll see <<'EOF', <<'PYEOF', or a trailing \ at the end of a line). Those only work if pasted as one piece; splitting them mid-command will error.
What is and isn’t a command:
echo "--- valid ---" are just labels that print a heading so you can tell which part of the output belongs to which step. Safe to paste.# comment lines for you to type. (Your shell, zsh, does not treat a # typed at the prompt as a comment — it would try to run it and error — so the boxes use echo labels instead. The only # lines you’ll see are Python comments inside a python3 … <<'PYEOF' block, which are fed to Python, not the shell.)Run the tests in order. Test 0 (setup) creates the working folder and a sample file that later tests reuse, so do it first. Within a test, run its boxes top-to-bottom.
If a command seems to hang: the slow ones are marked (Test 1 first run loads models; Test 10 is ~50 min). Everything else should finish in seconds. If something else hangs, press Control-C to stop it, and tell me which test.
Run this first. It moves into the project folder and creates the working directory that every later test writes into (/Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify). Without this, the next command fails with "No such file or directory."
cd /Users/<user>/dev/ITDT/Videos mkdir -p /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify echo "ready — working dir is /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify"
PASS: prints ready — working dir is /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify and no error.
Now create a reusable valid 1-scene screenplay (used by several tests). Paste this whole box — it's one command that writes a file and then confirms it:
cat > /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/mini.json <<'EOF'
{
"title": "Verify Mini",
"scenes": [
{
"id": "s1",
"character": "elira",
"outfit": "default",
"setting": "neon-lit city rooftop at dusk, wide skyline",
"presentation": "full",
"hold": 2,
"timeline": [
{ "say": "This is the way." },
{ "wait": 3 },
{ "say": "Follow me." }
]
}
]
}
EOF
echo "wrote /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/mini.json"
generate_image start-frame mode (PuLID, 9:16)Generates an identity-anchored Elira start frame from her reference still. Needs Mac ComfyUI (the script auto-starts it; first run is slower while models load).
cd /Users/<user>/dev/ITDT/Videos
REF=$(python3 -c "import json;print(json.load(open('character_library.json'))['elira']['outfits']['default']['reference'])")
python3 Tools/scripts/generate_image.py \
--prompt "elira_host, medium close-up chest up, against a neon-lit city rooftop at dusk, holding pose, looking directly into camera, gentle closed-mouth expression, photorealistic, cinematic lighting" \
--reference "$REF" --seed 7 \
--output /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/startframe.png
PASS: stdout is one JSON line {"status": "ok", "mode": "start_frame", "size": "480x832", ...}. Confirm the dimensions and look:
sips -g pixelWidth -g pixelHeight /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/startframe.png open /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/startframe.png
PASS: pixelWidth 480, pixelHeight 832 (9:16 portrait), and the image is recognizably Elira — jet-black hair, magenta blazer, high-neck/modest, facing camera. (If the wardrobe is off, that's the start-frame review gate's job — see Test 1b.)
cd /Users/<user>/dev/ITDT/Videos
REF=$(python3 -c "import json;print(json.load(open('character_library.json'))['elira']['outfits']['default']['reference'])")
python3 Tools/scripts/generate_image.py \
--prompt "elira_host, medium close-up chest up, against a quiet modern office at night, holding pose, looking directly into camera, gentle closed-mouth expression, photorealistic" \
--reference "$REF" --candidates 3 \
--output /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/candidates
ls -la /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/candidates
PASS: JSON {"status":"ok","mode":"start_frame","count":3,...} and the directory holds 3 PNGs named candidate_01_seed*.png … candidate_03_*.
cd /Users/<user>/dev/ITDT/Videos python3 Tools/scripts/generate_image.py --prompt x --reference /nope/none.png python3 Tools/scripts/generate_image.py --prompt x --pulid python3 Tools/scripts/generate_image.py --prompt x --size foo
PASS: each prints a single {"status":"error","message":"..."} line (missing reference / --pulid without --reference / bad size) and renders nothing.
--character) + negative prompt (new this session)--character resolves Elira's reference + composed look from character_library.json (so --prompt is just the SETTING); --negative steers wardrobe/anatomy away (Flux ignores a negative at cfg 1.0, so it implies true CFG > 1 automatically). First the fail-loud checks (no GPU), then one real render with both:
cd /Users/<user>/dev/ITDT/Videos echo "--- unknown character / outfit error cleanly (no ComfyUI) ---" python3 Tools/scripts/generate_image.py --character nobody --prompt "on a rooftop" python3 Tools/scripts/generate_image.py --character elira --outfit tuxedo --prompt "on a rooftop" echo "--- real render: yourself, with a wardrobe negative ---" python3 Tools/scripts/generate_image.py --character elira \ --prompt "on a neon-lit city rooftop at dusk, wide skyline" \ --negative "low-cut top, exposed cleavage, open blazer, deformed hands, extra fingers" \ --seed 7 --output /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/char_neg.png open /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/char_neg.png
PASS: the two --character errors print {"status":"error",...} (character 'nobody' not in library / outfit 'tuxedo' not found ...) with no render; the real render prints {"status":"ok","mode":"start_frame","size":"480x832",...} and the image is recognizably Elira (jet-black hair, magenta blazer, high-neck/modest) on the rooftop.
explore_outfits — survey DIFFERENT outfits (new this session)generate_image --candidates N re-rolls the SEED of ONE outfit (the wardrobe text is fixed by the library). To survey different outfits, explore_outfits.py holds the face / appearance / framing / quality constant (PuLID-anchored to the library reference) and varies ONLY the wardrobe text — one image per outfit. Each look composes in the exact start-frame order (trigger, framing, appearance, wardrobe, setting, gaze, quality).
cd /Users/<user>/dev/ITDT/Videos echo "--- no outfits given: clean error, no GPU ---" python3 Tools/scripts/explore_outfits.py --character elira echo "--- real: three different outfits, one image each ---" python3 Tools/scripts/explore_outfits.py --character elira \ --outfit "charcoal_suit=wearing a charcoal grey tailored business pantsuit over a crisp white blouse, modest high neckline" \ --outfit "navy_turtleneck=wearing a fitted navy blazer over a grey turtleneck, modest neckline" \ --outfit "techwear=wearing a matte-black structured sci-fi utility jacket, high collar" \ --output /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/outfits open /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/outfits
PASS: the no-outfit run prints {"status":"error","message":"no outfits given ..."} with no render; the real run prints {"status":"ok","count":3,...,"outfits":[...]} and the folder holds 01_charcoal_suit.png, 02_navy_turtleneck.png, 03_techwear.png — the same Elira face in three visibly different outfits (NOT three magenta-blazer takes). If a stubborn garment/colour from the reference bleeds through, add --negative "magenta blazer, pink jacket" and/or lower --pulid-weight (default 0.9).
build_character_set + register_character (new this session)The push-button path for a NEW character: build_character_set.py makes a PuLID-anchored training dataset (PNG stills + .txt captions, varied pose/angle/lighting, identity+outfit held constant) → the PC's train_character consumes it → register_character.py adds the trained character/outfit to character_library.json programmatically (preserving every other entry). Settings need no training — they're just text. Here a 1-image smoke (real run = default 8 buckets × 3 seeds ≈ 24 stills, minutes) and the library helper against a TEMP copy (so the real library is untouched):
cd /Users/<user>/dev/ITDT/Videos W=/Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify printf 'head | tight headshot, head facing forward, neutral expression, looking into the camera | in a soft-lit studio against a grey backdrop\n' > $W/buckets_smoke.txt python3 Tools/scripts/build_character_set.py --character smoketest \ --reference Brand/Elira_Outfits/business_suit_480x832.png \ --appearance "a woman in her early 30s, jet-black hair" --wardrobe "wearing a charcoal business suit" \ --buckets-file $W/buckets_smoke.txt --seeds-per-bucket 1 --output $W/charset_smoke ls $W/charset_smoke # expect a .png AND a matching .txt cp character_library.json /tmp/lib_test.json python3 Tools/scripts/register_character.py --library /tmp/lib_test.json --character nova \ --voice nova_blend --trigger nova_host --appearance "a woman in her late 20s, short auburn hair" \ --outfit default --reference Brand/Elira_Outfits/business_suit_480x832.png \ --wardrobe "wearing a teal flight jacket" --lora nova_lora_v1 python3 Tools/scripts/register_character.py --library /tmp/lib_test.json --character x --wardrobe foo # error path
PASS: the build prints {"status":"ok","count":1,"trigger":"smoketest_host","base_caption":...} and the dir holds a 1024×1024 PNG plus a same-stem .txt whose text is the base caption; the first register prints {"status":"ok",...,"created_character":true,"created_outfit":true} and /tmp/lib_test.json is valid JSON still containing _meta + elira (with business_suit) and a new nova; the last prints {"status":"error","message":"outfit fields ... require --outfit"}.
validate_screenplay + libraryThe validator gates malformed input and emits the work plan.
cd /Users/<user>/dev/ITDT/Videos
echo "--- valid ---"
python3 Tools/scripts/validate_screenplay.py /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/mini.json
echo "--- pip without background (invalid) ---"
printf '{"scenes":[{"id":"s1","character":"elira","outfit":"default","setting":"office","presentation":"pip","timeline":[{"say":"Hi."}]}]}' > /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/bad_pip.json
python3 Tools/scripts/validate_screenplay.py /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/bad_pip.json
echo "--- unknown outfit (invalid) ---"
printf '{"scenes":[{"id":"s1","character":"elira","outfit":"tuxedo","setting":"office","presentation":"full","timeline":[{"say":"Hi."}]}]}' > /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/bad_outfit.json
python3 Tools/scripts/validate_screenplay.py /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/bad_outfit.json
echo "--- malformed JSON ---"
printf '{ "scenes": [ {"id":"s1", } ] }' > /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/malformed.json
python3 Tools/scripts/validate_screenplay.py /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/malformed.json
echo "--- missing file ---"
python3 Tools/scripts/validate_screenplay.py /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/nope.json
PASS, in order:
"valid": true, "work_plan": [{"character":"elira","outfit":"default","setting":"neon-lit city rooftop at dusk, wide skyline"}], "new_base_clips": 1."valid": false, error mentions pip presentation requires 'background'."valid": false, error mentions outfit 'tuxedo' not found."valid": false, error mentions invalid JSON (a successful run reporting the problem, NOT a crash).{"status":"error", ...} (this one is a tool error; exit code non-zero is expected).generate_voiceover --voicePer-character voice; default is Elira's locked blend.
cd /Users/<user>/dev/ITDT/Videos echo "--- default voice ---" python3 Tools/scripts/generate_voiceover.py --text "Your finances, decade by decade." --output /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/vo.wav echo "--- explicit elira_blend + pad to 5s ---" python3 Tools/scripts/generate_voiceover.py --text "This is the way." --voice elira_blend --pad-to 5.0 --output /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/vo_padded.wav echo "--- bad voice blend (fail-loud) ---" python3 Tools/scripts/generate_voiceover.py --text "x" --voice /nope/missing.pt
PASS:
{"status":"ok","voice":"elira_blend","duration_s":<~3>, ...}; afplay /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/vo.wav sounds like Elira."duration_s": 5.0 exactly (the pad-to invariant lip-sync relies on).{"status":"error",...}, no file produced.assemble_timeline deterministic ops (no PC needed)This exercises the editing engine (timeline math, continuous-take WAV, ping-pong fill, concat, music, final mix) on synthetic inputs — no PC base clip and no 11-min lip-sync. It's the fast confidence check that the mini-editor math is correct.
cd /Users/<user>/dev/ITDT/Videos
python3 - <<'PYEOF'
import sys; sys.path.insert(0, "Tools/scripts")
import assemble_timeline as A
from pathlib import Path
import subprocess
wd = Path("/Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/asm"); wd.mkdir(parents=True, exist_ok=True)
# synthetic say wavs + a 5s base clip
subprocess.run(["ffmpeg","-y","-f","lavfi","-i","sine=frequency=440:duration=2:sample_rate=24000","-ac","1",str(wd/"a.wav")],capture_output=True)
subprocess.run(["ffmpeg","-y","-f","lavfi","-i","sine=frequency=660:duration=1.5:sample_rate=24000","-ac","1",str(wd/"b.wav")],capture_output=True)
subprocess.run(["ffmpeg","-y","-f","lavfi","-i","testsrc=size=320x568:rate=30:duration=5","-pix_fmt","yuv420p",str(wd/"base.mp4")],capture_output=True)
pl, dur = A.compute_timeline([{"say":"a"},{"wait":3},{"say":"b"}], [2.0,1.5], hold=2.0)
assert dur == 8.5, dur # 2 + 3 + 1.5 + 2
sw = A.build_scene_wav([wd/"a.wav",wd/"b.wav"], pl, dur, wd/"scene.wav")
assert abs(A.probe_duration(sw) - 8.5) < 0.05, A.probe_duration(sw)
canvas = A.pingpong_fill(wd/"base.mp4", 8.5, 60, wd/"canvas.mp4", wd/"pp")
assert abs(A.probe_duration(canvas) - 8.5) < 0.1 and A.probe_fps(canvas) == 60.0
print("PASS: timeline=%.1fs scene_wav=%.2fs canvas=%.2fs@%dfps" % (
dur, A.probe_duration(sw), A.probe_duration(canvas), A.probe_fps(canvas)))
PYEOF
PASS: prints PASS: timeline=8.5s scene_wav=8.50s canvas=8.50s@60fps (no AssertionError).
compose_video --pip-round + --pip-fadeRounded-corner + alpha-fade PIP overlay.
cd /Users/<user>/dev/ITDT/Videos W=/Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify ffmpeg -y -f lavfi -i "testsrc=size=1280x720:rate=60:duration=4" -pix_fmt yuv420p $W/bg.mp4 >/dev/null 2>&1 ffmpeg -y -f lavfi -i "testsrc=size=480x832:rate=60:duration=4" -pix_fmt yuv420p $W/pip.mp4 >/dev/null 2>&1 python3 Tools/scripts/compose_video.py --background $W/bg.mp4 --pip $W/pip.mp4 \ --pip-pos top-right --pip-width 360 --pip-start 0.5 --pip-end 3.5 \ --pip-round 40 --pip-fade 0.5 --duration 4 --output $W/composite.mp4 ffmpeg -y -ss 2.0 -i $W/composite.mp4 -frames:v 1 $W/frame.png >/dev/null 2>&1 open $W/frame.png
PASS: JSON {"status":"ok","duration_s":4.0,"fps":60,...}; the extracted frame shows the PIP inset in the top-right with visibly rounded corners over the background.
fill_clip — extend a short clip to a target length (new this session)Ping-pong fill (forward+reverse, seamless) a short clip to an exact duration — the step that turns a ~5s base clip into a ~10s lipsync canvas (VRT needs a face track as long as the audio). Thin CLI over the proven assemble_timeline.pingpong_fill().
cd /Users/<user>/dev/ITDT/Videos W=/Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify ffmpeg -y -f lavfi -i "testsrc=size=480x832:rate=30:duration=3" -pix_fmt yuv420p $W/short3.mp4 >/dev/null 2>&1 python3 Tools/scripts/fill_clip.py --input $W/short3.mp4 --duration 10 --fps 60 --output $W/filled10.mp4 python3 Tools/scripts/fill_clip.py --input $W/short3.mp4 --duration 0 --output $W/x.mp4 # error path
PASS: the first prints {"status":"ok","duration":10.0,"fps":60,"source_duration":3.0,...} (output is exactly the target length); the second prints {"status":"error","message":"--duration must be > 0 ..."}.
compose_video general multi-overlay + multi-audio (new this session)The general path (triggered by any --overlay/--audio) places multiple timed overlays — videos and still images — and mixes multiple offset audio tracks. This is the App Store preview layout (two corner PIPs + a centered icon + two voiceovers at different offsets + a ducked music bed) as a single repeatable API. The legacy single---pip path (Test 5) is unchanged.
cd /Users/<user>/dev/ITDT/Videos W=/Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify ffmpeg -y -f lavfi -i "color=c=navy:s=540x960:r=30:d=27" -pix_fmt yuv420p $W/g_bg.mp4 >/dev/null 2>&1 ffmpeg -y -f lavfi -i "testsrc=s=240x416:r=30:d=10" -pix_fmt yuv420p $W/g_intro.mp4 >/dev/null 2>&1 ffmpeg -y -f lavfi -i "testsrc=s=240x416:r=30:d=10" -pix_fmt yuv420p $W/g_outro.mp4 >/dev/null 2>&1 ffmpeg -y -f lavfi -i "color=c=white:s=200x200:d=1" -frames:v 1 $W/g_icon.png >/dev/null 2>&1 ffmpeg -y -f lavfi -i "sine=frequency=440:duration=10" $W/g_intro_vo.wav >/dev/null 2>&1 ffmpeg -y -f lavfi -i "sine=frequency=660:duration=8.5" $W/g_outro_vo.wav >/dev/null 2>&1 ffmpeg -y -f lavfi -i "sine=frequency=220:duration=25" $W/g_music.wav >/dev/null 2>&1 python3 Tools/scripts/compose_video.py --background $W/g_bg.mp4 --fps 30 --duration 27 --output $W/general.mp4 \ --overlay "src=$W/g_intro.mp4,pos=top-right,start=0,end=10,round=40,fade=0.5,width=240" \ --overlay "src=$W/g_outro.mp4,pos=top-right,start=17,end=27,round=40,fade=0.5,width=240" \ --overlay "src=$W/g_icon.png,type=image,pos=center,start=25,end=27,fade=1,width=200" \ --audio "src=$W/g_intro_vo.wav,delay=0" --audio "src=$W/g_outro_vo.wav,delay=17" --audio "src=$W/g_music.wav,gain=-12" ffmpeg -y -ss 26 -i $W/general.mp4 -frames:v 1 $W/g_f26.png >/dev/null 2>&1 open $W/g_f26.png
PASS: JSON {"status":"ok","duration_s":27.0,"overlays":3,"audio_tracks":3,...}; the t=26 frame shows the outro PIP top-right AND the centered icon, a frame at t=12 shows neither (the gap), and the file has an audio stream. The Test 5 (legacy --pip) result is unchanged.
render_screenplay --dry-run (plan + cost, no GPU)The orchestrator's planning path. Spends nothing.
cd /Users/<user>/dev/ITDT/Videos echo "--- valid screenplay ---" python3 Tools/scripts/render_screenplay.py --screenplay /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/mini.json --dry-run echo "--- invalid screenplay (must gate) ---" python3 Tools/scripts/render_screenplay.py --screenplay /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/bad_pip.json --dry-run
PASS:
{"status":"ok","valid":true,"dry_run":true,"scenes":1,"work_plan":[...1 entry...],"estimate":{"new_base_clips":1,"pc_minutes_est":31.0,"mac_minutes_est":12.0,"total_minutes_est":43.0},...}.{"status":"ok","valid":false,"errors":["... pip presentation requires 'background'"]} — no GPU touched.cd /Users/<user>/dev/ITDT/Videos
python3 -c "import json; d=json.load(open('character_library.json'))['elira']['outfits']['default']; print('lora =', d['lora']); print('reference exists =', __import__('os').path.exists(d['reference']))"
PASS: lora = elira_lora_v2_lownoise and reference exists = True.
Confirms the rebuilt elira:latest advertises the new tools and the dispatch is wired.
cd /Users/<user>/dev/ITDT/Videos
echo "--- system prompt mentions the new tools ---"
ollama show elira --system | grep -o -E "render_screenplay|stop_pc_job|stop_generation|mount_relay|prepare_delivery" | sort -u
echo "--- tool registry + dispatch all consistent ---"
python3 - <<'PYEOF'
import re
src = open("Elira/elira_tools.py").read()
dispatch = set(re.findall(r'if name == "(\w+)"', src))
handlers = set(re.findall(r'def (t_\w+)\(', src))
# The 4 memory tools dispatch via the elira_memory module, not t_* handlers.
memory_tools = {"remember", "recall_memory", "list_memories", "forget_memory"}
names = []
for n in re.findall(r'"name": "(\w+)"', src):
if n not in names: names.append(n)
missing = [n for n in names if n not in memory_tools
and not (n in dispatch and ("t_"+n) in handlers)]
print("tools:", len(names), "| missing wiring:", missing)
PYEOF
PASS: the grep prints all five names (mount_relay, prepare_delivery, render_screenplay, stop_generation, stop_pc_job); the python prints tools: 24 | missing wiring: [].
Only if Elira's chat shell is running and you want to confirm the model actually calls the tool. In Elira chat, say:
"Do a dry run of the screenplay at /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/mini.json"
PASS: Elira calls render_screenplay with dry_run and reports back the plan (1 scene, 1 base clip, ~43 min estimate) rather than describing it in prose.
cd /Users/<user>/dev/ITDT/Videos
curl -s -m 5 http://<pc-host>:8765/healthz; echo
curl -s -m 5 http://<pc-host>:8765/queue | python3 -m json.tool | head -20
curl -s -m 5 http://<pc-host>:8765/scripts | python3 -c "import sys,json; print('render_base_clip present:', 'render_base_clip' in json.load(sys.stdin).get('scripts',{}))"
echo "--- mount the Relay share (unattended, uses saved credentials) ---"
python3 Tools/scripts/mount_relay.py
PASS: healthz responds (supervisor 0.4.0), /queue returns JSON, render_base_clip present: True, and mount_relay.py prints {"status":"ok","mounted":true,...} (either already:true or a fresh mount). If the PC is off or unreachable, power it on / start the Supervisor before Test 10. (You no longer have to log into the share by hand — render_screenplay also auto-mounts it when a PC render is needed.)
This is the one that proves the whole pipeline. It will: generate an Elira start frame on the Mac → stage it to the PC → render a ~5 s base clip on the PC (~31 min) → voiceover + single lip-sync pass (~11 min) + compose on the Mac → assemble the master.
Prereqs: Test 9 passed (PC up; Relay mountable). render_screenplay auto-mounts the share itself when it needs the PC, so you don't have to mount it by hand. Leave the machine alone while it runs; it's a "kick it off and come back" job.
Recommended: run the dry-run first (Test 6) so you see the plan, then:
cd /Users/<user>/dev/ITDT/Videos python3 Tools/scripts/render_screenplay.py \ --screenplay /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/mini.json \ --output /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/master.mp4 \ 2>/Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/render.log echo "EXIT=$?"
While it runs you can watch progress from another Terminal window. This is the Mac-side step log — it now prints a timestamped line for each step (validating, start frame, staging, PC job id, poll progress, assembling, done):
tail -f /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/render.log
And, in yet another window, to watch the PC render land in the review folder:
ls -la /Volumes/Relay/review/screenplay_*/ 2>/dev/null
(Press Control-C to stop a tail -f when you're done watching — it doesn't affect the render.)
PASS (the full chain):
{"status":"ok","valid":true,"path":".../master.mp4","scenes":1,"base_clips_generated":1,...}.open /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/master.mp4Elira on the rooftop, lips moving on “This is the way.” and “Follow me.” and at rest (mouth closed) during the 3 s gap, audio in her voice, ~8.5 s long, no seams at the ping-pong loop point. Her eyes should be steady — no flickering/sparkling “twinkle” (the eye-restore composite handles this; if you see eye shimmer, note it). The output is 60 fps; head/background motion is from the 16 fps Wan source, the mouth is synced at 60 fps.
python3 -c "import json; print(json.load(open('character_library.json'))['elira']['outfits']['default']['base_clips'])"
shows the rooftop setting mapped to a .mp4 filename (no longer empty {}).If it fails partway: the step log (/Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/render.log) and the final JSON message say which step. Common ones: PC unreachable (Test 9), the start frame failed (Test 1), or the lip-sync timed out. Tell me the failing step + the log tail.
prepare_delivery (publish-ready formats, no PC) ⏳ first run ~4 minUpscales a finished master (Real-ESRGAN, no face-enhance → no eye flicker) and formats it per platform. Uses the master from Test 10 (or any master). The first platform does the upscale (~4 min for an ~8 s clip); the rest reuse the cache (~2 s). App Store is special — a preview is a composite, so it produces a PIP-ready element by default.
cd /Users/<user>/dev/ITDT/Videos M=/Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify/master.mp4 echo "--- YouTube 1080x1920@60 (does the upscale) ---" python3 Tools/scripts/prepare_delivery.py --master "$M" --platform youtube echo "--- X 1080x1920@30 (reuses the cache, fast) ---" python3 Tools/scripts/prepare_delivery.py --master "$M" --platform x echo "--- App Store: PIP-ready element (default) ---" python3 Tools/scripts/prepare_delivery.py --master "$M" --platform appstore echo "--- App Store: full-frame letterbox fallback ---" python3 Tools/scripts/prepare_delivery.py --master "$M" --platform appstore --full-frame open /Users/<user>/dev/ITDT/Videos/Generated/Delivery/
PASS: each prints {"status":"ok",...}. youtube → res:"1080x1920","fps":60; x → fps:30; appstore default → "mode":"pip_ready" + a note that the preview is a composite; appstore --full-frame → res:"1290x2796" (a tall device frame with visible letterbox — that's expected, it's why the composite is the real App Store path). The files open and play; the faces are sharper than the 480×832 master with no eye shimmer.
Optional — the App Store composite (talking head as a corner PIP over an app capture):
python3 Tools/scripts/prepare_delivery.py --master "$M" --platform appstore \ --background /Users/<user>/dev/ITDT/Videos/Episodes/AppStore_v1/capture_trimmed_27s.mp4 \ --pip-width 460
PASS: "mode":"composite"; the output shows the app screen full-frame with Elira as a rounded top-right inset.
⚠️ Verify the composite isn't silent or the wrong file. Two real defects shipped here before: the composite came out silent (the app-capture background has no audio and the compositor dropped the talking head's voice), and a near-identically-named intermediate …_composite_pip.mp4 (just the talking head, no app behind it) was left next to the real composite. Run these three checks on the actual output:
F=/Users/<user>/dev/ITDT/Videos/Generated/Delivery/master_appstore_composite.mp4 # (a) AUDIO PRESENT + not silent — the key regression guard. Expect an aac stream and a real # max_volume (around -8 dB), NOT "mean/max_volume: -91 dB" and NOT an empty result. ffmpeg -v info -i "$F" -map 0:a -af volumedetect -f null - 2>&1 | grep -iE "mean_volume|max_volume" # (b) only the deliverable is left — no leftover _pip / _voice files to open by mistake. ls /Users/<user>/dev/ITDT/Videos/Generated/Delivery/ | grep -i composite # expect ONLY master_appstore_composite.mp4 # (c) eyeball a frame from the middle — Elira should be a rounded corner PIP over the app. ffmpeg -y -v error -ss 12 -i "$F" -frames:v 1 /tmp/appstore_check.png && open /tmp/appstore_check.png
PASS: (a) prints a real max_volume near −8 dB; (b) lists only master_appstore_composite.mp4; (c) the frame shows the rounded corner PIP over the app. If (a) shows no audio stream or −91 dB, the composite is silent — stop and report it.
The newer, broader system: one episode JSON (a list of typed shots — host, host_pip, narration, title, montage) becomes a full multi-minute master with overlays (chyrons / lower-thirds), a music bed + one-shot stings, transitions (cut / dissolve / fade), and camera moves. Driven by render_episode (orchestrator) → assemble_episode (editor), checked by validate_episode. Tests E1–E4 + E6–E7 need no PC; E5 is the long PC chain. (The Screenplay Compiler above is still the short single-scene path; the Episode Compiler reuses its base-clip pipeline underneath.)
validate_episode (structure + content lint, no GPU)cd /Users/<user>/dev/ITDT/Videos python3 Tools/scripts/validate_episode.py --episode Episodes/E01/E01.episode.json | python3 -m json.tool | head -20
PASS: "status":"ok", "valid": true, "summary": {"shots": 13}, errors empty, and warnings lists the not-yet-recorded app captures (e.g. capture_05_…mov … not found yet) — those are expected until you record them (a missing capture is a WARNING, not an error, so you can author and dry-run first). A malformed script → "valid": false with errors; off-brand narration trips the content lint.
Exercises every non-host shot type, both overlay kinds, a bed + a sting, and a transition — all on the Mac, no GPU/PC.
cd /Users/<user>/dev/ITDT/Videos
mkdir -p verification/temp/ep
cat > verification/temp/ep/verify.episode.json <<'JSON'
{
"title": "verify — no-host episode", "fps": 60, "canvas": { "w": 1920, "h": 1080 },
"music": { "bed": { "source": "prompt", "value": "warm synth bed", "duck_db": -12 },
"cues": [ { "id": "chime", "at_s": 1.5, "source": "prompt", "value": "short bright ui chime" } ] },
"overlays_global": [ { "text": "Next episode: coming soon.", "style": "lower_third", "pos": "lower_third", "from_s": 5.0, "to_s": 9.0 } ],
"shots": [
{ "id": "t", "type": "title", "duration_s": 4.0,
"elements": [ { "text": "Slipstream", "style": "title_magenta", "anim": "materialize" } ],
"overlays": [ { "text": "E01 — Slipstream", "style": "chyron", "pos": "lower_right" } ],
"transition_out": "dissolve" },
{ "id": "n", "type": "narration",
"background": { "source": "file", "value": "/Users/<user>/dev/ITDT/Videos/Episodes/E01/captures/placeholder_app_portrait.png" },
"voiceover": [ { "say": "Voice over the app capture." } ], "transition_out": "cut" },
{ "id": "m", "type": "montage",
"voiceover": [ { "say": "Holographic panels, dial by dial." } ],
"panels": [ { "kind": "income", "capture": "/Users/<user>/dev/ITDT/Videos/Episodes/E01/montage_panels/panel_income.png" },
{ "kind": "chart", "capture": "/Users/<user>/dev/ITDT/Videos/Episodes/E01/montage_panels/panel_chart.png" } ] }
]
}
JSON
python3 Tools/scripts/render_episode.py --episode verification/temp/ep/verify.episode.json \
--workdir verification/temp/ep/work --output verification/temp/ep/verify.mp4
# inspect
ffprobe -v error -show_entries stream=codec_type,width,height -show_entries format=duration -of default=noprint_wrappers=1 verification/temp/ep/verify.mp4
ffmpeg -y -v error -ss 2 -i verification/temp/ep/verify.mp4 -frames:v 1 /tmp/ep_title.png # title card + chyron
ffmpeg -y -v error -ss 9 -i verification/temp/ep/verify.mp4 -frames:v 1 /tmp/ep_montage.png # hologram panel + outro lower-third
open /tmp/ep_title.png /tmp/ep_montage.png
PASS: {"status":"ok","valid":true,...}; the master is 1920×1080 (no host → assembled at the canvas), has a video + an aac audio stream, and runs ~13 s. The title frame shows the “SLIPSTREAM” card with the lower-right chyron; the later frame shows a glowing hologram panel with the “Next episode” lower-third. (volumedetect should show real audio — bed + VO + chime.)
render_episode --dry-run on the real E01 (plan + extended-chain cost, no GPU)cd /Users/<user>/dev/ITDT/Videos python3 Tools/scripts/render_episode.py --episode Episodes/E01/E01.episode.json --dry-run 2>/dev/null | python3 -m json.tool | head -28
PASS: "valid": true, "host_shots": 6, base_clip_seconds ≈ 18.6 / 18.5 s per setting (the host VO is pre-synthesized on the Mac to size the clip — no GPU), and estimate shows ~8 pc_segments and ~248 pc_minutes_est — i.e. the long host shots are built as drift-managed chains of short segments, not one impossible long render. Nothing touches the GPU.
cd /Users/<user>/dev/ITDT/Videos python3 Tools/scripts/render_episode.py --episode Episodes/E01/E01_montage_holo.episode.json --workdir Episodes/E01/work_montage_holo open Episodes/E01/master/E01_montage_holo.episode.mp4
PASS: ~17 s, 1920×1080; the four translucent 3D hologram panels materialize full-frame in sequence under the voiceover (you judge the look — this is style verification).
How a long host shot stays on-model. In production this is automatic: render_episode calls the orchestrator (render_screenplay.render_extended_base_clip) whenever a host shot's VO is longer than one segment — it renders short segments on the PC, seeds each from the previous segment's seam frame (raw at d0.0, or identity-re-anchored if a denoise > 0), and concatenates them aligned. To verify the mechanism by hand without a full episode, use the test drivers:
cd /Users/<user>/dev/ITDT/Videos # one segment from a seam frame (d0.00 = raw seed = smoothest seam; --seconds 3 = shorter/cheaper): python3 Episodes/E01/sample15/chain_step.py \ --seg1 Episodes/E01/base_clips/elira_rain_street_832x480_c3_5s.mp4 \ --reference Episodes/E01/start_frames/elira_shot01_rain_street_832x480_c3.png \ --setting "rain-slicked empty street at night, NY-style neon megalopolis towering behind, volumetric haze, magenta and cyan rim light" \ --denoise 0.0 --seconds 3 --out-prefix verify_chain # multi-segment chain (5s seed clip + two 3s segments), then watch the join: python3 Episodes/E01/sample15/chain_run.py --init-clip Episodes/E01/base_clips/elira_rain_street_832x480_c3_5s.mp4 \ --reference Episodes/E01/start_frames/elira_shot01_rain_street_832x480_c3.png \ --setting "rain-slicked empty street at night, NY-style neon megalopolis towering behind, volumetric haze, magenta and cyan rim light" \ --steps "3:0.0,3:0.0" --out-prefix verify_chain2 open Episodes/E01/sample15/chain_verify_chain2.mp4
PASS: the PC renders each segment (~16 min for a 3 s / 49-frame clip, ~30 min for 5 s); the concatenated clip plays continuously and the character holds together across the seams (you judge the video). The orchestrator's segment length and seam mode are the constants EXTEND_SEG_S / EXTEND_DENOISE in render_episode.py (default 5.06 s, raw seam).
prepare_delivery upscale of an episode master (no PC) ⏳ ~3 mincd /Users/<user>/dev/ITDT/Videos python3 Tools/scripts/prepare_delivery.py --master Episodes/E01/master/E01_montage_holo.episode.mp4 --platform youtube_landscape
PASS: a …_youtube_landscape_1920x1080.mp4 under Generated/Delivery/, 1920×1080@60, audio intact (Real-ESRGAN upscale is the terminal step — assembly stays native, upscale is dead-last).
The long host chain can be paused at a segment boundary and resumed without losing GPU work (render_control.py; the chain checks the .render_paused sentinel before each segment). This verifies the control + the hold/release logic headlessly — no GPU.
cd /Users/<user>/dev/ITDT/Videos
# 1) the control CLI round-trips the sentinel
python3 Tools/scripts/render_control.py status # paused:false
python3 Tools/scripts/render_control.py pause # paused:true
python3 Tools/scripts/render_control.py status # paused:true
python3 Tools/scripts/render_control.py resume # was_paused:true
ls Tools/scripts/.render_paused 2>/dev/null && echo "STILL THERE (FAIL)" || echo "sentinel gone (ok)"
# 2) the chain actually HOLDS while paused and continues when cleared (function-level, no GPU)
python3 - <<'PY'
import sys, time, threading
sys.path.insert(0, "Tools/scripts")
import render_screenplay as rs
rs.PAUSE_POLL_S = 1 # snappier poll for the test
assert rs.pause_requested() is False
rs.PAUSE_SENTINEL.write_text("paused\n") # arm a pause
assert rs.pause_requested() is True
released = []
def clear(): time.sleep(2); rs.PAUSE_SENTINEL.unlink(); released.append(True)
threading.Thread(target=clear, daemon=True).start()
t0 = time.time(); rs._wait_if_paused("test boundary"); held = round(time.time() - t0, 1)
assert held >= 1.5 and released, f"did not wait for resume (held {held}s)"
print(f"PASS — held {held}s until resume, then continued")
PY
PASS: the CLI flips paused true→false and the sentinel file is gone afterward; the Python block prints PASS — held ~2s until resume, then continued (i.e. _wait_if_paused blocked while the sentinel was present and returned once it was removed). Segment-resume (no separate test): re-run E5's chain — or re-run render_episode after a pause-then-kill — and the chain reuses any seg_NN.mp4 already on disk (already rendered — reusing in the log) instead of re-rendering finished segments. Elira drives the same controls via pause_render / resume_render / render_pause_status (Elira-chat doc Test E5).
rm -rf /Users/<user>/dev/ITDT/Videos/verification/temp/sc_verify rm -rf /Users/<user>/dev/ITDT/Videos/verification/temp/ep
The promoted base clip stays cached in character_library.json + the run workdir under Generated/Screenplay/. To force a clean re-render, clear that setting's entry from base_clips in the library.
| Test | Build step | Proves |
|---|---|---|
| 1 / 1b / 1c / 1d | 1 | PuLID start frame at 480×832; candidate gate; fail-loud; --character lib resolution + negative |
| 2 | 2 | schema + validator + library; work plan; gating |
| 3 | 3 | per-character voice; pad-to invariant; fail-loud |
| 4 | 4 | timeline math + scene WAV + ping-pong + concat + mix (no PC) |
| 5 | 5 | rounded-corner + alpha-fade PIP |
| 6 | 6 | orchestrator dry-run plan + cost; gating |
| 7 | 2/6 | library lora + reference correct |
| 8 / 8b | 6 | Elira tool wiring; model sees the tools |
| 9 | — | PC pre-flight + unattended Relay mount |
| 10 | all | the real cross-machine end-to-end + resume/cache + steady eyes |
| 11 | delivery | Real-ESRGAN upscale + platform formats (YouTube/X/App Store) + cache |