vo-synclisted
Install: claude install-skill JagZ999/explainer-video
# VO Sync
Make the visuals follow the voice: each cue (a card raising, a caption, a sound) lands **on the word that names it**, and each scene lasts about as long as its narration — so the piece is **tight** with no silent stretches.
## How it works
1. **Write VO that names things in appear-order**, then generate it **with word timestamps**:
`python3 scripts/elevenlabs.py tts script.json vo --voice <id>` → clips + `vo/durations.json` + `vo/alignments.json` (per-character start times). (`script.json` = `[{"id":"scene1","text":"…"}, …]`.)
2. **Compute timing:** edit the `CONFIG` (scene order + cue→keyword) in `scripts/compute_timing.js` and run it → `timing.json` = `{ total, head, scenes:[{id,start,dur,cues:[{key,at}]}] }`. `dur` follows the VO length; `at` is `head + word time`.
3. **Feed your engine** `timing.json` (read it at boot via a `timing.js` `window.__TIMING=…`, or `fetch`): use `dur` for scene length and `at` for when each cue fires.
4. **Build audio to the same timeline** so SFX hit on the words (see the `audio-stems` skill: place each event at `scene.start + at`).
## Key moves
- **VO-driven durations** (`dur = head + voLen + tail`, clamped to fit the last cue/animation) → continuous narration, no dead air.
- **Anchor on words**: a word spoken `w` seconds into a clip → cue `at = head + w`.
- **Spread** when you can't name every item: anchor named ones, interpolate the rest between anchors.
- **Snap-to-instant**: to make a whole list turn green the moment the V