Corti's new Symphony for Speech-to-Text model beats OpenAI at medical terminology accuracy, highlighting the value of specialized AI

 

Today, Copenhagen-based healthcare AI Corti is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models engineered specifically for real-time dictation, conversational transcription, and batch audio processing — and their accuracy rate is the highest for this specific use case yet recorded. "We are focused on ensuring our AI scribes can be trusted by physicians, medical practitioners and patients...the entire healthcare system," said Andreas Cleve, co-founder and CEO of Corti, in an exclusive video call interview with VentureBeat. The performance data the company is bringing to the table paints a stark picture of the current state of enterprise AI: when it comes to highly regulated, specialized industries, domain-specific models can beat out the foundation model providers. In a newly published research paper, Corti revealed that its new clinical-grade speech models reduced word error rates (WER) by up to 93% when compared against leading generalist speech models and APIs on medical terminology. On English medical terminology, its Symphony for Speech-to-Text achieved a remarkably low 1.4% WER. By comparison, OpenAI’s speech model registered a 17.7% WER, ElevenLabs hit 18.1%, Whisper recorded 17.4%, and Parakeet scored 18.9%. Corti’s announcement serves as a critical inflection point for healthcare builders. While general-purpose APIs like OpenAI’s whisper are sufficient for broad-domain transcription, they frequently stumble over medical acronyms, complex medication dosages, shorthand, and noisy emergency room environments. Symphony for Speech-to-Text aims to solve this by providing developers with a highly specialized, production-grade API designed from the ground up for clinical workflows. The agentic era demands flawless data inputs The launch of Symphony for Speech-to-Text highlights a fundamental shift in how healthcare uses voice technology. For decades, medical speech recognition was primarily about generating a static text document for human doctors to review—a digital replacement for a notepad. But as the healthcare industry hurtles into what technologists call the "agentic era," where autonomous AI agents actively assist in clinical decision-making, EHR navigation, and real-time support, the transcript is no longer the final product. It is the foundational data layer. “Speech has always been one of healthcare’s most important inputs,” Cleve said in a statement provided to VentureBeat. “What is changing is what happens after the words are captured. In the agentic era, speech recognition requires more than simply producing a transcript - we need to give AI systems accurate clinical facts to reason from. If a model mishears a medication, dosage, or symptom, every downstream step becomes less reliable. Symphony for Speech-to-Text gives healthcare builders a speech layer accurate enough to thrive in clinical reality.” This is where the compounding danger of high word error rates comes into play. If a general-purpose AI model hallucinates a transcription—turning "hyperthyroidism" into "hypothyroidism," or misinterpreting a critical medication dosage—every subsequent AI agent relying on that transcript will operate on corrupted data. Corti’s architecture mitigates this risk by producing structured, clinically usable output directly from the API, helping downstream AI applications reason over clean facts rather than messy, unformatted text. Nowhere is this more evident than in Corti’s entity recall benchmarks. Symphony for Speech-to-Text reached an astonishing 98.3% recall rate on formatted clinical entities—such as dosages, measurements, and dates. In contrast, Corti reported that the strongest general-purpose baseline model maxed out at just 44.3% recall for the same entities. For developers building ambient AI documentation tools, that 54% gap is the difference between a tool that saves a physician time and a tool that constitutes a medical liability. Dethroning the industry ldears While Corti’s benchmarks against modern LLM builders like OpenAI and ElevenLabs are striking, the company is also taking aim at legacy medical transcription giants. For years, the gold standard for dedicated clinician dictation has been Dragon Medical One. However, these legacy systems were historically optimized strictly for intentional clinician dictation, not as underlying infrastructure for ambient AI, complex multi-party conversations, or real-time clinical support tools. In evaluations of real-world English medical dictation, Corti achieved a 4.6% WER, outperforming Dragon’s 5.7% (a 19% relative improvement). Furthermore, Corti demonstrated a higher medical term recall than Dragon (93.5% versus 92.9%). By providing this level of accuracy via an API endpoint, Corti is enabling third-party developers, EHR vendors, and virtual care platforms to build their own custom dictation and ambient listening tools that outperform the industry's legacy incumbent. "We want people to build apps atop our models," Cleve said. "The goal is to diffuse the technology as widely as it is needed so it can be as helpful as possible to patients and their doctors and professionals." For Cleve and his co-founders, the mission is a personal one: Cleve's own mother was a healthcare professional attacked by a patient and spent years struggling to recover. He sought to improve healthcare processes as a way of honoring her sacrifice.

Post a Comment

Previous Post Next Post

Contact Form