The acclaimed translation platform DeepL is moving beyond text-based services with the introduction of a real-time audio translation feature known as DeepL Voice. This service allows users to listen to someone speaking in one language and see it translated immediately into another, aiming to meet a growing demand for real-time, cross-language communication. This innovation builds on DeepL’s reputation for nuanced and accurate translations, a promise that has helped it secure a $2 billion valuation and over 100,000 paying customers.
DeepL Voice Supports Real-Time Translations in Multiple Languages
DeepL Voice currently “hears” and transcribes spoken words into 13 languages, including English, German, Japanese, and Russian, then translates them into text across the 33 languages DeepL supports. This is ideal for live interactions such as meetings and video calls. Rather than producing audio or video files, the service presents translations in text format, suitable for subtitles in video conferencing apps or mirrored text on smartphones for face-to-face interactions.
Meeting Industry Demand and Keeping Data Private
This expansion into audio is DeepL’s first voice-related product, but CEO Jarek Kutylowski hints it may not be the last. According to Kutylowski, DeepL Voice responds to the company’s most frequent customer request since 2017. As AI startups rush to build similar capabilities, DeepL’s service stands out for its rapid, real-time translation—a critical feature that has been challenging for many AI translation models. DeepL’s approach is deliberate; instead of relying on existing language models, DeepL developed its own LLM, claiming it surpasses offerings from competitors like GPT-4 and Google.
DeepL Voice’s unique approach also addresses user privacy, an essential consideration in today’s data-conscious landscape. Kutylowski emphasized that while audio is processed through DeepL’s servers, no data is retained or used to train its LLMs. This privacy-focused model aims to comply with regulations like GDPR, addressing potential concerns among customers and end-users.
Limited Integrations for Now, With More Expected
Currently, DeepL Voice supports Microsoft Teams, covering most of the company’s B2B clientele. While Zoom and Google Meet integration has not yet been confirmed, the growing demand for real-time language translation across industries could prompt further expansions. In addition to aiding business meetings, DeepL sees potential in the service sector, envisioning frontline workers using the technology to communicate with customers from diverse linguistic backgrounds.
DeepL’s foray into audio translation places it in direct competition with other tech giants like Google, which recently introduced real-time translated captions in Google Meet, as well as startups such as ElevenLabs and Panjaya, which specialize in AI-driven voice dubbing and deepfake translations. DeepL’s emphasis on speed and real-time accuracy aims to capture a unique market niche, addressing the demands of users who need immediate, accurate translations in live scenarios.
For more details, visit TechCrunch.