top of page

Breaking Language Barriers: Real-Time Speech Translation for Critical Environments

Updated: Sep 24

I worked on a project at Mabel AI developing real-time speech-to-speech translation software designed for critical environments such as hospitals, disaster zones, and conflict areas. In these settings, the cost of miscommunication can be deadly. For people in crisis, especially refugees or victims of conflict, not being able to communicate with the people that can provide help can feel devastating. When time is critical and stakes are high, doctors should not have to guess. They need accurate, real-time communication that works even without internet access.



Building Translation for Critical Environments

The software integrated speech recognition, machine translation, and speech synthesis into two distinct pipelines: one fully offline, and one cloud based. My role centered on the automatic speech recognition (ASR) component, where I was responsible for improving transcription accuracy across both pipelines.


When I joined the project, we focused on the Ukrainian model. It was based on a self-supervised, transformer-style encoder architecture and achieved a Word Error Rate (WER) of 34%. While promising, the accuracy was not yet reliable enough for use in high-stakes environments like hospitals, where transcription errors can have serious consequences. There were also technical challenges with the fully offline version. It was not known if it would be possible to have large machine learning models running locally on a phone, and moreover if they would be fast enough to provide real-time translations. It had never been done before.


I re-engineered the ASR system by transitioning to a fully convolutional encoder-decoder architecture with subword tokenization, utilizing supervised training instead of self-supervised training. The new ASR system offered greater efficiency for real-time processing and was significantly more amenable to quantization and latency optimization, making it better suited for offline low-resource devices.

To further enhance decoding quality, I implemented a beam search decoder with integrated language model fusion, allowing the system to better incorporate contextual and domain-specific information during inference. I also developed an end-to-end optimization pipeline that included quantization-aware training, model pruning, and inference-level optimizations to ensure real-time performance on CPU-bound hardware without compromising accuracy.





From 34% to 3%: Achieving Breakthrough Accuracy

These improvements significantly reduced the WER across many languages, most notably in Ukrainian, where it dropped from 34% to 3%. This dramatic reduction fundamentally improved the usability of the system. The enhanced model was validated in live hospital environments, allowing the staff to communicate with patients across language barriers in a way comparable to using professional human interpreters.


The offline solution is now available on any smartphone for offline translations, and the cloud-based solution can be integrated within an organization’s own firewalls. Both options are optimized for low-latency, high-accuracy performance, and can be trusted in critical situations to deliver real-time results, even under hardware or network constraints. For refugees, this means no longer being silenced by language barriers in moments of vulnerability. They can explain symptoms, ask questions, and receive care without fear of being misunderstood, restoring a sense of agency, safety, and human connection when it’s needed most. For medical staff, it empowers them to take immediate and informed action without guessing, which could cost lives.



AI That Serves Real Human Needs

For me, this work was a reminder of what AI should be: useful, ethical, and grounded in real human needs. I had the chance to build something that didn’t just work in theory but had real, measurable life-saving impact, and I can think of no greater privilege than the possibility to add something like this to the world. The solution I was part of implementing is now deployed and have been used thousands of times around the world to provide help for those in need. That’s the kind of AI I intend to keep building.

bottom of page