hyperframes-medialisted

Asset preprocessing for HyperFrames compositions — text-to-speech narration (Kokoro), audio/video transcription (Whisper), and background removal for transparent overlays (u2net). Use when generating voiceover from text, transcribing speech for captions, removing the background from a video or image to use as a transparent overlay, choosing a TTS voice or whisper model, or chaining these (TTS → transcribe → captions). Each command downloads its own model on first run.
tranbathanhtung/dddx · ★ 1 · AI & Automation · score 64

Install: claude install-skill tranbathanhtung/dddx

# HyperFrames Media Preprocessing Three CLI commands that produce assets for compositions: `tts` (speech), `transcribe` (timestamps), and `remove-background` (transparent video). Each downloads a model on first run and caches it under `~/.cache/hyperframes/`. Drop the output into the project, then reference it from the composition HTML — see the `hyperframes` skill for the audio/video element conventions. ## Text-to-Speech (`tts`) Generate speech audio locally with Kokoro-82M. No API key. ```bash npx hyperframes tts "Text here" --voice af_nova --output narration.wav npx hyperframes tts script.txt --voice bf_emma --output narration.wav npx hyperframes tts --list # all 54 voices ``` ### Voice Selection Match voice to content. Default is `af_heart`. | Content type | Voice | Why | | ----------------- | --------------------- | ----------------------------- | | Product demo | `af_heart`/`af_nova` | Warm, professional | | Tutorial / how-to | `am_adam`/`bf_emma` | Neutral, easy to follow | | Marketing / promo | `af_sky`/`am_michael` | Energetic or authoritative | | Documentation | `bf_emma`/`bm_george` | Clear British English, formal | | Casual / social | `af_heart`/`af_sky` | Approachable, natural | ### Multilingual Voice IDs encode language in the first letter: `a`=American English, `b`=British English, `e`=Spanish, `f`=French, `h`=Hindi, `i`=Italian, `j`=Japane