NU Researchers Developed an Automatic Speech Recognition Technology for Turkic Languages

Automatic speech recognition (ASR) is the task of converting human speech into the corresponding text. Examples include voice assistants like Siri and Alexa, virtual assistants, and dictation systems.

Automatic speech recognition (ASR) is the task of converting human speech into the corresponding text. Examples include voice assistants like Siri and Alexa, virtual assistants, and dictation systems. Researchers from the Institute of Smart Systems and Artificial Intelligence at NU (NU ISSAI) have developed a new model that can recognize ten Turkic languages—Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, and Uzbek. The model can also recognize English and Russian speech.

“Our aim was to develop an ASR model for Turkic languages for most of which very few publicly available speech data on Intenet exist,” says NU ISSAI data scientist Saida Mussakhojayeva. “By utilizing the common features of Turkic languages in terms of lexis, phonology, and morphology, we sought to develop a robust joint model in which the ten Turkic languages in our study would reciprocally benefit from each other.”

During speech recognition, the developed model makes a minimum number of errors. “For Bashkir, Kazakh, Tatar, Turkish, Uyghur, and Uzbek, the percentage of errors in characters made by our model is below 5%,” says Kaisar Dauletbek, a fourth-year NU student and an ISSAI research assistant. Our model takes advantage of the similarity of the Turkic languages. These results would not have been possible to achieve had we created separate models for each language”.

The multilingual ASR model developed by NU ISSAI can be freely tested on ISSAI’s website. In addition, all the developed models, datasets, and codes used in the research project are publicly available for download.

“We believe that the most important outcome of these projects is the training of highly-qualified technical experts who will not only drive the technological development of Kazakhstan, but also willingly share and apply their professional knowledge and know-how to contribute to the advancement of technologies in other countries, thus creating a better world for future generations.”, says Prof. Huseyin Atakan Varol, NU ISSAI Founding Director.

So far, Institute’s researchers have already achieved well-deserved success in creating the first open-source Kazakh speech corpora (KSC and KSC2), large-scale open-source Kazakh text-to-speech corpora (KazakhTTS and KazakhTTS2), as well as the largest publicly available Kazakh named entity recognition dataset (KazNERD).

“The Institute has put constant and considerable effort into promoting the Kazakh language in the digital world. However, our Institute’s interest in language and speech technologies also extends to other Turkic languages. In this way, our Institute will emerge as one of the scientific centers for artificial intelligence and data science in the Turkic world and Eurasia.”, – thinks Prof. Varol.

Up

 © Nazarbayev University

Republic of Kazakhstan, Astana city, 53 Kabanbay Batyr Ave.