Whisper

audio
open-weight
OpenAI Whisper (2022) is an encoder-decoder Transformer ASR model trained on 680K hours of multilingual audio. It excels...
Version: 1.0
Released: 3y 1m 11d ago on 09/21/2022

Architecture

  • parameters: 1.55B (Whisper Large)
  • context_length: 30-second audio segments
  • training_data: 680,000+ hours of multilingual audio-transcript pairs
  • inference: Encoder-decoder Transformer

Capabilities

  • Automatic speech recognition (ASR) in ~100 languages
  • Speech translation (transcribes and translates to English)
  • Robust to accents, background noise, and technical jargon

Benchmarks

  • ErrorRate: 50% fewer errors on diverse speech recognition tasks compared to prior models

Safety

  • No built-in content filtering (transcribes all audible speech)
  • performance depends on training data (some bias possible); licensed MIT.

Deployment

  • regions: private
  • hosting: Runs locally on GPUs or via API
  • integrations: integrated into various speech-to-text applications globally

Tags

speech recognitionASRmultilingualtranslationopen-source

Join our community

Connect with others, share experiences, and stay in the loop.