Speech-to-Text Solution


The client specializes in dialectal speech technology solutions for Dialectal Arabic and other under-resourced languages. Automatic speech recognition & text-to-speech are some of the areas the client focuses on. The client is part of the Mohammed Bin Rashid Innovation Fund (MBRIF), an initiative launched by the UAE Ministry of Finance to support innovation in the UAE. The client set out to find a partner that would meet its solution development & scaling needs and immediately found traction with us.


  • Arabic is considered as one of the challenging languages to be used in speech recognition systems due to its large lexical variety and complicated morphology.
  • One of the significant challenges is the automatic detection & conversion of over 19 Arabic dialects.
  • Building lexicons for various use cases such as media, call centers & education.
  • Support for multiple file types – wav, mp3, mp4, acc, and more.
  • Ability for both real-time as well as batch processing.


Quantilus developed the product based on automatic speech recognition, machine translation, and Natural Language Processing (NLP). The product can be deployed on the cloud, on-premise, and hybrid models. The Arabic speech recognition models have leading accuracy across the board. We also built speech-to-text features such as speaker detection, language switching, time stamps, and diarization.


The solution was the winner of the 2021 GITEX Future Stars’ Supernova Challenge held in Dubai and was named the Best AI innovator for its cutting-edge Arabic speech and voice technology. It won from over 700 entries.


The solution can provide highly trained and tailored transcriptions to the clients’ customers with greater than 90% accuracy levels. By pushing our models to perform under complex, real-life conditions with background noise, multiple speakers, and diverse accents, the clients’ customers achieve vastly improved accuracy rates without compromising the transcription speed.


The clients’ customers use built-in reporting to look for keywords and phrases in collected audio data rather than a faulty outputted transcript. This enables them to pinpoint specific timestamps and gather helpful insight seamlessly.



How AI Empowers AR & VR for Business

Wednesday, June 19, 2024

12:00 PM ET •  9:00 AM PT