Whisper

audio

open-weight

OpenAI Whisper (2022) is an encoder-decoder Transformer ASR model trained on 680K hours of multilingual audio. It excels...

Version: 1.0

Released: 3y 1m 11d ago on 09/21/2022

Architecture

parameters: 1.55B (Whisper Large)
context_length: 30-second audio segments
training_data: 680,000+ hours of multilingual audio-transcript pairs
inference: Encoder-decoder Transformer

Capabilities

Automatic speech recognition (ASR) in ~100 languages
Speech translation (transcribes and translates to English)
Robust to accents, background noise, and technical jargon

Benchmarks

ErrorRate: 50% fewer errors on diverse speech recognition tasks compared to prior models

Safety

No built-in content filtering (transcribes all audible speech)
performance depends on training data (some bias possible); licensed MIT.

Deployment

regions: private
hosting: Runs locally on GPUs or via API
integrations: integrated into various speech-to-text applications globally

Tags

speech recognitionASRmultilingualtranslationopen-source

Join our community

Connect with others, share experiences, and stay in the loop.

LinkedIn

Connect with us and explore career opportunities.

Facebook

Follow us for updates and community news.

YouTube

Watch our latest videos and tutorials.

Twitter

Follow our latest updates and announcements.

Instagram

Follow us for behind-the-scenes content.

TikTok

Follow us for short-form content and trends.