audiomindlisted
Install: claude install-skill wells1137/media-skills
# 🎙️ AudioMind
**Use when:** User asks to generate speech, narrate text, create a voice-over, compose music, or produce a sound effect.
AudioMind is a smart audio dispatcher. It analyzes your request and routes it to the best available model — ElevenLabs for speech and music, fal.ai for fast SFX — and returns a ready-to-use audio URL.
---
## Quick Reference
| Request Type | Best Model | Latency |
|---|---|---|
| Narrate text / Voice-over | `elevenlabs-tts-v3` | ~3s |
| Low-latency TTS (real-time) | `elevenlabs-tts-turbo` | <1s |
| Background music | `cassetteai-music` | ~15s |
| Sound effect | `elevenlabs-sfx` | ~5s |
| Clone a voice from audio | `elevenlabs-voice-clone` | ~10s |
---
## How to Use
### 1. Start the AudioMind server (once per session)
```bash
bash {baseDir}/tools/start_server.sh
```
This starts the ElevenLabs MCP server on port 8124. The skill uses it for all audio generation.
### 2. Route the request
Analyze the user's request and call the appropriate tool via the MCP server:
**Text-to-Speech (TTS)**
When user asks to "narrate", "read aloud", "say", or "create a voice-over":
```
Use MCP tool: text_to_speech
text: "<the text to narrate>"
voice_id: "JBFqnCBsd6RMkjVDRZzb" # Default: "George" (professional, neutral)
model_id: "eleven_multilingual_v2" # Use "eleven_turbo_v2_5" for low latency
```
**Music Generation**
When user asks to "compose", "create background music", or "make a soundtrack":
```
Use MCP tool: text_to_sound_effects