Whisper
audio
open-weight
OpenAI Whisper (2022) is an encoder-decoder Transformer ASR model trained on 680K hours of multilingual audio. It excels...
Version: 1.0
Released: 3y 1m 11d ago on 09/21/2022
Architecture
- parameters: 1.55B (Whisper Large)
- context_length: 30-second audio segments
- training_data: 680,000+ hours of multilingual audio-transcript pairs
- inference: Encoder-decoder Transformer
Capabilities
- Automatic speech recognition (ASR) in ~100 languages
- Speech translation (transcribes and translates to English)
- Robust to accents, background noise, and technical jargon
Benchmarks
- ErrorRate: 50% fewer errors on diverse speech recognition tasks compared to prior models
Safety
- No built-in content filtering (transcribes all audible speech)
- performance depends on training data (some bias possible); licensed MIT.
Deployment
- regions: private
- hosting: Runs locally on GPUs or via API
- integrations: integrated into various speech-to-text applications globally
Tags
speech recognitionASRmultilingualtranslationopen-source