Podcast Episode
The release includes two models: Voxtral Mini Transcribe V2 for batch processing and Voxtral Realtime for live applications. Both support translation across thirteen languages including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.
On the FLEURS multilingual speech benchmark, the models achieve approximately four percent word error rate, competitive with or superior to alternatives from OpenAI and Google. Voxtral Mini Transcribe V2 processes audio roughly three times faster than ElevenLabs' Scribe v2 at one-fifth the cost.
Pricing starts at just 0.3 cents per minute for batch transcription and 0.6 cents per minute for real-time processing.
Mistral Launches Ultra-Fast Open Source Translation Models That Run on Your Phone
February 6, 2026
Audio archived. Episodes older than 60 days are removed to save server storage. Story details remain below.
French AI startup Mistral has released Voxtral Transcribe 2, a family of speech-to-text models that can transcribe and translate across 13 languages with sub-200 millisecond latency. At just four billion parameters, the models are small enough to run on phones and laptops, marking a significant challenge to larger competitors like Google and OpenAI.
Mistral Takes on AI Giants with Lightning-Fast Translation Models
French AI startup Mistral has released Voxtral Transcribe 2, a new family of speech-to-text models that the company claims will pave the way for seamless real-time conversation between people speaking different languages.The release includes two models: Voxtral Mini Transcribe V2 for batch processing and Voxtral Realtime for live applications. Both support translation across thirteen languages including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.
Small Models, Big Performance
At just four billion parameters, the models are compact enough to run locally on phones and laptops, a claimed first in the speech-to-text field. Voxtral Realtime offers configurable latency down to sub-200 milliseconds, making it roughly ten times faster than Google's latest translation model, which operates at a two-second delay.On the FLEURS multilingual speech benchmark, the models achieve approximately four percent word error rate, competitive with or superior to alternatives from OpenAI and Google. Voxtral Mini Transcribe V2 processes audio roughly three times faster than ElevenLabs' Scribe v2 at one-fifth the cost.
The Efficiency Philosophy
Pierre Stock, Mistral's Vice President of Science Operations, told WIRED that the models are laying the groundwork for fully seamless real-time speech-to-speech translation, predicting the problem would be solved in 2026. Stock also offered a pointed critique of the big-spending approach favoured by American AI labs: "Frankly, too many GPUs makes you lazy. You just blindly test a lot of things, but you don't think what's the shortest path to success."Open Source and Privacy First
Voxtral Realtime is released under the Apache 2.0 open-weights licence, allowing developers to deploy the model freely. Because the models run on-device, private conversations stay local rather than being uploaded to cloud servers. The models support GDPR and HIPAA-compliant deployments, positioning Mistral as a European alternative to proprietary American AI systems.Pricing starts at just 0.3 cents per minute for batch transcription and 0.6 cents per minute for real-time processing.
Published February 6, 2026 at 1:52pm