← ClaudeAtlas

gemini-live-apilisted

Expert developer skill for implementing real-time voice and video interactions using the Google Gemini Live API. This skill should be used when implementing bidirectional audio streaming, voice conversations with interruption handling, real-time transcription, function calling in live sessions, session management, or voice customization.
Hildegaardchiasmal966/claude-skills · ★ 3 · API & Backend · score 71
Install: claude install-skill Hildegaardchiasmal966/claude-skills
# Gemini Live API Developer Skill ## Overview The Gemini Live API enables low-latency, real-time voice and video interactions with Gemini models. This skill provides comprehensive guidance for implementing natural voice conversations, handling interruptions, integrating tools and function calling, managing sessions, and customizing voice output. ## When to Use This Skill Use this skill when: - Implementing real-time two-way voice conversations between AI and users - Building voice agents that can be interrupted naturally mid-response - Adding function calling and real-time web search to live voice sessions - Implementing text-to-speech with natural cadence and tone - Debugging audio streaming issues or session management problems - Customizing voice characteristics (cadence, tone, style, language) - Managing WebSocket connections, session resumption, or context compression - Implementing ephemeral tokens for client-side authentication ## Core Capabilities Covered - **Bidirectional Audio Streaming**: 16kHz PCM input, 24kHz output with real-time processing - **Voice Activity Detection (VAD)**: Automatic or manual speech detection with interruption handling - **Function Calling**: Tool integration with async execution and scheduling parameters - **Session Management**: Connection lifecycle, resumption tokens, context compression - **Voice Customization**: Multiple voices, speech configuration, natural prosody - **Audio Processing**: PCM encoding/decoding, sample rate conve