transcribirlisted

Transcribe one or more audio or video files to a structured Markdown document with timestamps and speaker diarization, using Google Gemini 3.5 Flash. Use this skill whenever the user provides audio/video files and wants them transcribed, or when the user says "transcribe esto", "transcribe estos audios", "pasame esto a texto", "que dice este audio", "transcripción", "voice note", "audio nota", "WhatsApp PTT", or similar. Supports .ogg, .mp3, .m4a, .wav, .opus, .flac, .aac, .webm, .mp4, .mov, .mpeg, .mpga. Auto-detects language (Spanish, English, etc.), identifies multiple speakers with timestamps, and outputs a clean MD next to each source file. Processes multiple files in parallel. Defaults to Gemini 3.5 Flash for the best speed/quality/cost balance (handles Spanish, Latin American accents, regionalisms, and multi-speaker diarization). Use the --pro flag to upgrade to gemini-pro-latest for maximum quality on critical audio, or --model X to specify any other model.
josuebustosn/gemini-transcribe · ★ 0 · Data & Documents · score 70

Install: claude install-skill josuebustosn/gemini-transcribe

# Skill: transcribir Transcribe audios o videos a Markdown estructurado con timestamps e identificación de hablantes, usando Google Gemini 3.5 Flash (default) o cualquier modelo de Gemini vía flag. ## Cuándo invocar este skill Activar cuando el usuario: - Provee uno o varios paths de archivos de audio/video y quiere transcribirlos. - Dice cualquiera de: "transcribe esto", "transcribe estos audios", "pásame esto a texto", "qué dice este audio", "transcribir", "voice note", "audio nota", "PTT", "WhatsApp PTT". - Tipea `/transcribir`. - Menciona ElevenLabs Scribe, Whisper, u otra herramienta de transcripción como referencia. ## Cómo invocar La skill incluye `transcribe.py` junto a este `SKILL.md`. Se corre directamente con Python. ### Sintaxis básica ```bash python "<SKILL_DIR>/transcribe.py" "<path-audio-1>" ["<path-audio-2>" ...] ``` Donde `<SKILL_DIR>` es el directorio donde la skill está instalada (típicamente `~/.claude/skills/transcribir/` en Linux/macOS o `%USERPROFILE%\.claude\skills\transcribir\` en Windows). El script lee `GEMINI_API_KEY` (o `GOOGLE_API_KEY` como fallback) del entorno. Configurala via el gestor de variables de entorno de tu OS (ver el README del proyecto). Output: por cada `<audio.ext>` el script escribe `<audio.ext>.transcripcion.md` junto al archivo original. ### Flags - `--pro` — usar `gemini-pro-latest` (alias al último Pro estable de Google) en lugar del default `gemini-3.5-flash`. Más caro y lento, mejor para audios críticos con much