transcribe-with-disfluencieslisted

Transcribe a user-recorded video via AssemblyAI with disfluency tagging (um, uh, like, repeats, long pauses). Use as the first step of rough-cut when the user films themselves.
huodebing-alt/Agentic-Headshot-Video-Studio · ★ 0 · AI & Automation · score 71

Install: claude install-skill huodebing-alt/Agentic-Headshot-Video-Studio

# Transcribe With Disfluencies / 带 filler 的转写 ## When to use / 何时使用 User-shot footage where filler removal is needed. AssemblyAI returns word timestamps and disfluency flags. 用户实拍素材且需要去除口癖时使用。AssemblyAI 返回词级时间戳与口癖标签。 ## Inputs / 输入 - `inputs/recording.mp4` or `.wav`. - `ASSEMBLYAI_API_KEY`. - `--language` default auto. ## Output / 输出 - `workspace/rough/transcript.json`: words, start_ms, end_ms, confidence, is_filler. - `workspace/rough/transcript.srt` (raw). ## Steps / 步骤 1. Extract audio if input is video. 2. Submit to AssemblyAI with `disfluencies=true`, `auto_chapters=false`. 3. Poll until done; download JSON. 4. Emit normalized transcript JSON + SRT. ## Examples / ��例 ``` claude run transcribe-with-disfluencies --input inputs/recording.mp4 ``` ## Failure recovery / 失败回退 - If AssemblyAI unavailable, fall back to local whisper.cpp without disfluency tags and warn. - If audio < 5s, abort.