whisper

Solid

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

AI & Automation 191,515 stars 33299 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Whisper - Robust Speech Recognition OpenAI's multilingual speech recognition model. ## When to use Whisper **Use when:** - Speech-to-text transcription (99 languages) - Podcast/video transcription - Meeting notes automation - Translation to English - Noisy audio transcription - Multilingual audio processing **Metrics**: - **72,900+ GitHub stars** - 99 languages supported - Trained on 680,000 hours of audio - MIT License **Use alternatives instead**: - **AssemblyAI**: Managed API, speaker diarization - **Deepgram**: Real-time streaming ASR - **Google Speech-to-Text**: Cloud-based ## Quick start ### Installation ```bash # Requires Python 3.8-3.11 pip install -U openai-whisper # Requires ffmpeg # macOS: brew install ffmpeg # Ubuntu: sudo apt install ffmpeg # Windows: choco install ffmpeg ``` ### Basic transcription ```python import whisper # Load model model = whisper.load_model("base") # Transcribe result = model.transcribe("audio.mp3") # Print text print(result["text"]) # Access segments for segment in result["segments"]: print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}") ``` ## Model sizes ```python # Available models models = ["tiny", "base", "small", "medium", "large", "turbo"] # Load specific model model = whisper.load_model("turbo") # Fastest, good quality ``` | Model | Parameters | English-only | Multilingual | Speed | VRAM | |-------|------------|--------------|--------------|-------|------| | tiny | 39M | ✓ | ✓ | ~32...

Details

Author: NousResearch
Repository: NousResearch/hermes-agent
Created: 10 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

OpenAI · AI Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

whisper

9,609 Updated 1 months ago

Orchestra-Research

AI & Automation Featured

whisper

27,984 Updated today

davila7

AI & Automation Listed

openai-whisper

Speech-to-text transcription via OpenAI Whisper. Supports two modes — Local CLI (no API key, runs on-device) and Cloud API (fast, scalable, requires OPENAI_API_KEY). Use when the user needs to transcribe audio files, translate speech, or convert audio to text.

3 Updated today

rkz91