Cohere Transcribe: The Open-Weight ASR Model That Outperforms Whisper with 5.42% Word Error Rate

Cohere has launched Transcribe, a new open-weight automatic speech recognition model that achieves a word error rate of just 5.42% – outperforming OpenAI’s Whisper Large v3, ElevenLabs Scribe, and other leading speech recognition systems.

The model, available under the Apache-2.0 license, represents a significant shift in the enterprise transcription landscape, offering production-grade accuracy with the flexibility of self-hosted deployment.

Breaking the Accuracy-Flexibility Tradeoff

Until recently, enterprise transcription has been a trade-off between closed APIs offering accuracy but creating data residency risks, or open models offering deployment flexibility but lagging on performance.

Cohere’s Transcribe is built to compete on all four key differentiators: contextual accuracy, latency, control, and cost. Unlike Whisper, which launched as a research model under MIT license, Transcribe is commercially ready from release and can run on an organization’s own local GPU infrastructure.

The company said they were able to achieve this by extending what they call the Pareto frontier – delivering state-of-the-art accuracy (low WER) while sustaining best-in-class throughput within the 1B+ parameter model cohort.

Performance Benchmarks

Transcribe currently tops the Hugging Face ASR leaderboard with an average word error rate of 5.42%, outperforming Whisper Large v3 at 7.44% WER, ElevenLabs Scribe v2 at 5.83% WER, and Qwen3-ASR-1.7B at 5.76% WER.

Based on other datasets tested by Hugging Face, Transcribe also performed well. On the AMI dataset, which measures meeting understanding and dialogue analysis, Transcribe logged a score of 8.15%. For the Voxpopuli dataset testing understanding of different accents, the model scored 5.87% – beaten only by Zoom Scribe.

Multi-Language Support

Transcribe is trained on 14 languages: English, French, German, Italian, Spanish, Greek, Dutch, Polish, Portuguese, Chinese, Japanese, Korean, Vietnamese, and Arabic. This broad language support makes it suitable for global enterprise deployments.

Enterprise Implications

Early users have flagged accuracy and local deployment as the standout factors – particularly for teams that have been routing audio data through external APIs and want to bring that workload in-house.

For engineering teams building RAG pipelines or agent workflows with audio inputs, Transcribe offers a path to production-grade transcription without the data residency and latency penalties of closed APIs.

The model has 2 billion parameters and can be accessed via Cohere’s API or in Cohere’s Model Vault as cohere-transcribe-03-2026. Organizations can deploy it to their own local instances with more manageable inference footprints for local GPUs.

With transcription being a foundational component for voice-enabled automations, transcription pipelines, and audio search workflows, Cohere’s entry into this space signals intensifying competition in the enterprise speech recognition market.

Breaking the Accuracy-Flexibility Tradeoff

Performance Benchmarks

Multi-Language Support

Enterprise Implications

Related Posts

Newsletter

Join the discussion Cancel reply