audiomindlisted

Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys.
wells1137/media-skills · ★ 25 · AI & Automation · score 72

Install: claude install-skill wells1137/media-skills

# 🎙️ AudioMind **Use when:** User asks to generate speech, narrate text, create a voice-over, compose music, or produce a sound effect. AudioMind is a smart audio dispatcher. It analyzes your request and routes it to the best available model — ElevenLabs for speech and music, fal.ai for fast SFX — and returns a ready-to-use audio URL. --- ## Quick Reference | Request Type | Best Model | Latency | |---|---|---| | Narrate text / Voice-over | `elevenlabs-tts-v3` | ~3s | | Low-latency TTS (real-time) | `elevenlabs-tts-turbo` | <1s | | Background music | `cassetteai-music` | ~15s | | Sound effect | `elevenlabs-sfx` | ~5s | | Clone a voice from audio | `elevenlabs-voice-clone` | ~10s | --- ## How to Use ### 1. Start the AudioMind server (once per session) ```bash bash {baseDir}/tools/start_server.sh ``` This starts the ElevenLabs MCP server on port 8124. The skill uses it for all audio generation. ### 2. Route the request Analyze the user's request and call the appropriate tool via the MCP server: **Text-to-Speech (TTS)** When user asks to "narrate", "read aloud", "say", or "create a voice-over": ``` Use MCP tool: text_to_speech text: "<the text to narrate>" voice_id: "JBFqnCBsd6RMkjVDRZzb" # Default: "George" (professional, neutral) model_id: "eleven_multilingual_v2" # Use "eleven_turbo_v2_5" for low latency ``` **Music Generation** When user asks to "compose", "create background music", or "make a soundtrack": ``` Use MCP tool: text_to_sound_effects