video-to-textlisted

视频/音频转文字稿。从视频或音频文件中用 SenseVoiceSmall 进行语音识别，再用 ct-punc 恢复标点符号，输出带时间戳的 transcript.txt 和结构化 transcript.json（segments + word_segments）。适用于：直播回放转写、会议录音转文字、播客转录、任何视频/音频转文稿的场景。
jiabai/awesome-skills · ★ 1 · AI & Automation · score 62

Install: claude install-skill jiabai/awesome-skills

# Video to Text — 视频/音频转文字稿 ## 依赖 - **ffmpeg / ffprobe**: 视频解码、时长探测、音频活动检测 | 系统 | 安装方式 | |------|---------| | **macOS** | `brew install ffmpeg` | | **Ubuntu/Debian** | `sudo apt install ffmpeg` | | **Fedora/CentOS** | `sudo dnf install ffmpeg` | | **Arch Linux** | `sudo pacman -S ffmpeg` | | **Windows (winget)** | `winget install ffmpeg` | | **Windows (手动)** | 从 [ffmpeg.org](https://ffmpeg.org/download.html) 下载 Windows builds，解压后将 `bin\` 目录添加到系统 `PATH` 环境变量 | 安装后验证：`ffmpeg -version` - **funasr**（含 SenseVoiceSmall + ct-punc）：语音识别 + 标点恢复 ```bash pip install modelscope funasr ``` - `modelscope`：提供 SenseVoiceSmall ASR pipeline（`iic/SenseVoiceSmall`，首次下载 ~400MB） - `funasr`：提供 ct-punc 标点恢复模型（首次下载 ~50MB） - SenseVoice 内置 ffmpeg 后端，视频文件无需手动提取音频 ## 快速执行对于简单的转写任务，直接运行脚本： **Linux / macOS（nohup 后台执行）：** ```bash nohup python3 {skillDir}/scripts/transcribe.py /path/to/video.mp4 \ --output-dir /path/to/output \ --output-name transcript \ > /tmp/transcribe.log 2>&1 & ``` **Windows（cmd / PowerShell 后台执行）：** ```batch REM cmd.exe — 用 start /B 实现后台运行 start /B python {skillDir}\scripts\transcribe.py D:\path\to\video.mp4 ^ --output-dir D:\path\to\output ^ --output-name transcript ^ > transcript.log 2>&1 ``` ```powershell # PowerShell 7+ — 用 & 后台操作符 python {skillDir}\scripts\transcribe.py D:\path\to\video.mp4 ` --output-dir D:\path\to\output ` --output-name transcript ` *> transcript.log & ``` **必须后台执行**——长音频（>30 min）转写时间可达数十分钟到数小时，exec s