oma-voice

Solid

Local-first text-to-speech and speech-to-text via the Voicebox MCP server. Generates speech from cloned or preset voice profiles for agent notifications, content voiceovers, and audio asset creation, and transcribes audio files for meeting notes or memos. Runs entirely on-device with no cloud, no API keys, no per-call cost. Use for voice generation, TTS, STT, transcription, voiceover, narration, dictation, audio asset work.

AI & Automation 1,042 stars 119 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Voice Skill - Local TTS and STT via Voicebox ## Scheduling ### Goal Drive the Voicebox local app through its MCP server so any MCP-aware agent can speak (TTS) or listen (STT) without invoking cloud vendors. The skill standardizes intent routing, voice profile resolution, output layout, and guardrails while voicebox itself owns the engines, voice cloning UI, captures archive, and stories editor. ### Intent signature - User asks to generate speech, narrate text, produce a voiceover, create an mp3 or wav from text. - User wants an audio file transcribed into text, meeting notes, or a transcript. - User asks for a voice notification when a long task completes or a workflow step is blocked. - Another skill needs local audio generation infrastructure. ### When to use - Generating short notification audio for agent task completion or blockers. - Producing voiceover, narration, or audio assets (mp3 or wav) for apps and content. - Transcribing local audio files (mp3, wav, m4a, webm, flac) to Markdown. - Comparing voice profiles by re-running the same text against different profile ids. ### When NOT to use - Cloud TTS or high-fidelity multilingual cloud voices -> out of scope; future multi-vendor extension. - Real-time microphone dictation loop in the terminal -> use Voicebox app's built-in hotkey dictation. - Voice cloning sample upload and profile creation -> done in the Voicebox desktop app UI. - Video synthesis, music, sound design -> out of scope. - Stories Editor multi-voi...

Details

Author
first-fluke
Repository
first-fluke/oh-my-agent
Created
4 months ago
Last Updated
today
Language
TypeScript
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

audiomind

Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys.

25 Updated 2 months ago
wells1137
AI & Automation Solid

speech

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

27,705 Updated today
davila7
AI & Automation Solid

speech

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

2,210 Updated 1 weeks ago
foryourhealth111-pixel
AI & Automation Listed

speech

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

1 Updated today
HGGodhand33
AI & Automation Listed

voice

Voice — text-to-speech and transcription. Triggers on /agent:voice, /agent:voice status, /agent:voice setup, /agent:voice test, "configurar voz", "prueba voz", "voice setup", "speak this", "read this aloud", "transcribe audio".

56 Updated today
crisandrews