wjs-transcribing-audio

Solid

Use when the user has audio or video and wants a timestamped transcript (SRT) in the source language. Routes by source language — Chinese defaults to Volcano (豆包) ASR; other languages (Spanish, English, Portuguese, French, Italian, Japanese, Korean, etc.) use OpenAI Whisper API with word-level timestamps and self-assembled cues. Outputs SRT with punctuation-bounded cues capped for on-screen reading. Triggers — "转写", "转成字幕", "做 SRT", "transcribe", "make subtitles", "speech to text", "出字幕".

AI & Automation 108 stars 15 forks Updated 4 days ago MIT

Install

View on GitHub

Quality Score: 86/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# wjs-transcribing-audio Spoken audio in → timestamped SRT in the same language out. **This skill stops at the source-language SRT.** Translation to another language is the next skill (`/wjs-translating-subtitles`). ## When to use - User provides a video or audio file and wants a transcript / SRT in the source language. - User already has a translated SRT and the source SRT is missing. - User asks "做 SRT" / "make subtitles" / "出逐字稿" with no translation step requested yet. ## When NOT to use - Source-language SRT already exists → skip straight to `/wjs-translating-subtitles`. - User wants the transcript in a different language than spoken → run this skill first, then `/wjs-translating-subtitles`. - User wants only the dub or burn-in → if SRT exists, skip; otherwise run this first. ## Routing: which engine | Source language | Default engine | Why | |---|---|---| | Chinese (zh-CN, zh-HK, zh-TW) | **Volcano (豆包) ASR** | Materially better accuracy than Whisper for Chinese — user's standing preference | | Any other (es, en, pt, fr, it, ja, ko, …) | **OpenAI Whisper API** with word-level granularity | Whisper's multilingual is strong; word timestamps let us assemble cues ourselves | | Offline / no API access | Local `openai-whisper` (medium) | Quality floor; same loop/blob failure modes apply | For Chinese, do **not** default to Whisper unless the user explicitly asks for it or Volcano is unavailable. This is a deliberate routing decision — see user's memory on Chinese ASR p...

Details

Author: jianshuo
Repository: jianshuo/claude-skills
Created: 2 months ago
Last Updated: 4 days ago
Language: Python
License: MIT

Integrates with

OpenAI · AI

Bundled in these plugins

claude-skills

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

wjs-translating-subtitles

Use when the user has an SRT (or transcript text) in one language and wants it translated to another, with punctuation-bounded re-segmentation so cues end at real sentence breaks. Simplified Chinese (zh-CN) and English (en) are first-class targets; other targets follow the same rules. Outputs a target-language SRT or bilingual SRT — no audio, no burn-in. Triggers — "翻译字幕", "翻成中文", "translate this SRT", "中英双语字幕", "把这个 SRT 翻译成 X", "bilingual subtitles".

108 Updated 4 days ago

jianshuo

Code & Development Solid

wjs-dubbing-video

Use when the user has a video + a target-language SRT and wants the video to actually speak that language — generates a time-aligned TTS voice dub. Routes by voice ID — Volcano (豆包) TTS for Chinese, edge-tts neural for any language. Defaults to one voice (single-speaker); opt-in multi-speaker via visual diarization. Outputs `*_<lang>_dub.mp4` with the dub audio in place of the original. Final mixing (audio bed + burn-in) is handed off to `/wjs-burning-subtitles`. Triggers — "配音", "中文配音", "Chinese dub", "voice over this", "dub the video", "TTS this SRT", "different voice for each speaker".

108 Updated 4 days ago

jianshuo

AI & Automation Solid

wjs-localizing-video

Thin orchestrator for the end-to-end video localization pipeline. Routes to the four focused sub-skills — /wjs-transcribing-audio, /wjs-translating-subtitles, /wjs-dubbing-video, /wjs-burning-subtitles. Use when the user asks for full localization in one go ("帮我把这个��班牙语视频做成中文字幕+配音", "translate and dub this video", "做完整的本地化"). For any individual step (just transcribe, just translate, just dub, just burn), invoke the sub-skill directly — it's faster and the boundary is cleaner.

108 Updated 4 days ago

jianshuo