Microsoft launched three new foundational AI models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — marking the most concrete evidence yet that the $3 trillion software giant intends to compete directly with OpenAI, Google, and other frontier labs on model development, not just distribution.
The Three New Models
The trio spans three of the most commercially valuable modalities in enterprise AI:
MAI-Transcribe-1 is a state-of-the-art speech-to-text model that achieves the lowest average Word Error Rate on the FLEURS benchmark across the top 25 languages by Microsoft product usage, averaging just 3.8% WER. It beats OpenAI’s Whisper-large-v3 on all 25 languages and Google’s Gemini 3.1 Flash on 22 of 25. The model uses a transformer-based text decoder with a bi-directional audio encoder, accepts MP3, WAV, and FLAC files up to 200MB, and delivers batch transcription speeds 2.5 times faster than existing Azure Fast offerings.
MAI-Voice-1 is Microsoft’s text-to-speech engine, capable of generating 60 seconds of natural-sounding audio in a single second. The model preserves speaker identity across long-form content and supports custom voice creation from just a few seconds of audio through Microsoft Foundry. It is priced at $22 per 1 million characters.
MAI-Image-2 debuted as a top-three model family on the Arena.ai leaderboard and delivers at least 2x faster generation times on Foundry and Copilot compared to its predecessor. Microsoft is rolling it out across Bing and PowerPoint, pricing it at $5 per 1 million tokens for text input and $33 per 1 million tokens for image output. WPP is already building with MAI-Image-2 at scale.
The Contract Renegotiation That Made This Possible
Until October 2025, Microsoft was contractually prohibited from independently pursuing artificial general intelligence. The original 2019 deal with OpenAI gave Microsoft a license to OpenAI’s models in exchange for building cloud infrastructure — but when OpenAI sought to expand its compute footprint beyond Microsoft, the two companies renegotiated.
As Mustafa Suleyman explained in a December 2025 interview with Bloomberg, the revised agreement meant that “up until a few weeks ago, Microsoft was not allowed — by contract — to pursue artificial general intelligence or superintelligence independently.” The new terms freed Microsoft to build its own frontier models while retaining license rights to everything OpenAI builds through 2032.
“Back in September of last year, we renegotiated the contract with OpenAI, and that enabled us to independently pursue our own superintelligence,” Suleyman told VentureBeat. “Since then, we’ve been convening the compute and the team and buying up the data that we need.”
Efficiency as a Competitive Moat
Suleyman was quick to emphasize that the OpenAI partnership remains intact. But the message is clear: Microsoft no longer needs OpenAI to build world-class AI models. With MAI-Transcribe-1 specifically, Suleyman said the company is “able to deliver the model with half the GPUs of the state-of-the-art competition” — a significant efficiency claim that, if validated, could reshape the economics of the speech-to-text market.
A Precarious Moment for Microsoft
The announcement lands at a challenging time. Microsoft’s stock just closed its worst quarter since the 2008 financial crisis, as investors demand proof that hundreds of billions in AI infrastructure spending will translate into revenue. These models — priced aggressively and positioned to reduce Microsoft’s own cost of goods sold — are Suleyman’s first concrete answer to that pressure.