oma-voice

Featured

Local-first text-to-speech and speech-to-text via the Voicebox MCP server. Generates speech from cloned or preset voice profiles for agent notifications, content voiceovers, and audio asset creation, and transcribes audio files for meeting notes or memos. Runs entirely on-device with no cloud, no API keys, no per-call cost. Use for voice generation, TTS, STT, transcription, voiceover, narration, dictation, audio asset work.

AI & Automation 1,195 stars 136 forks Updated today MIT

Install

View on GitHub

Quality Score: 92/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Voice Skill - Local TTS and STT via Voicebox ## Scheduling ### Goal Drive the Voicebox local app through its MCP server so any MCP-aware agent can speak (TTS) or listen (STT) without invoking cloud vendors. The skill standardizes intent routing, voice profile resolution, output layout, and guardrails while voicebox itself owns the engines, voice cloning UI, captures archive, and stories editor. ### Intent signature - User asks to generate speech, narrate text, produce a voiceover, create an mp3 or wav from text. - User wants an audio file transcribed into text, meeting notes, or a transcript. - User asks for a voice notification when a long task completes or a workflow step is blocked. - Another skill needs local audio generation infrastructure. ### When to use - Generating short notification audio for agent task completion or blockers. - Producing voiceover, narration, or audio assets (mp3 or wav) for apps and content. - Transcribing local audio files (mp3, wav, m4a, webm, flac) to Markdown. - Comparing voice profiles by re-running the same text against different profile ids. ### When NOT to use - Cloud TTS or high-fidelity multilingual cloud voices -> out of scope; future multi-vendor extension. - Real-time microphone dictation loop in the terminal -> use Voicebox app's built-in hotkey dictation. - Voice cloning sample upload and profile creation -> done in the Voicebox desktop app UI. - Video synthesis, music, sound design -> out of scope. - Stories Editor multi-voi...

Details

Author: first-fluke
Repository: first-fluke/oh-my-agent
Created: 5 months ago
Last Updated: today
Language: TypeScript
License: MIT

Integrates with

Model Context Protocol · AI

Bundled in these plugins

oma

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

voice-master

Foundational voice authority and AI humanizer — writes content in the user's authentic voice with built-in AI detection, and supports stealth / anti- attribution writing (forum personas, anonymous posts, "write as not-me"). Use when asked to write/draft/generate content, invoke /voice, /write-as-me, or /humanize, run voice calibration, check "does this sound like me?", "make this sound human" / "de-AI this", "write a forum post as [persona]", or run AI detection ("does this sound like AI?", "check for AI patterns", "anti-slop check"). Do NOT use this skill for code, technical docs, or any output the user has not asked to be written in their voice — code styling defers to the separate code-voice skill.

89 Updated today

WingedGuardian

AI & Automation Listed

ai-voiceover

The AI narration / voiceover mini-skill (ElevenLabs-led). Use when someone wants an "AI voiceover," "narration," "text-to-speech for a video," "voice for my Reel/Short/explainer," "clone my voice," or to "dub a video into other languages." Picks the voice and model, writes for the ear, and directs the delivery; ElevenLabs generates the audio, the human mixes/reviews, WoopSocial schedules/publishes. Sits below the ai-video router, sibling to veo-3 and heygen. Consented voices only; disclose AI voice in ads/political.

5 Updated 3 days ago

social-media-skills

AI & Automation Solid

voice

Voice — text-to-speech and transcription. Triggers on /agent:voice, /agent:voice status, /agent:voice setup, /agent:voice test, "configurar voz", "prueba voz", "voice setup", "speak this", "read this aloud", "transcribe audio".

61 Updated today

crisandrews