all news
2024, December 12 NU nowScience#NUnews#NUresearch

Nazarbayev University’s ISSAI Presents Kazakh Large Language Model KAZ-LLM

Nazarbayev University’s ISSAI  Presents Kazakh Large Language Model KAZ-LLM

2024, December 12

Share this article

Researchers from the Institute of Smart Systems and Artificial Intelligence (ISSAI) at Nazarbayev University (NU) presented the Kazakh large language model, ISSAI KAZ-LLM, to President of Kazakhstan Kassym-Jomart Tokayev.

Based on a neural network, this project is the basis of Kazakh Chat GPT. It marks a pivotal milestone in Kazakhstan's journey into the global artificial intelligence (AI) arena.

Tailored to the country's unique multilingual and multicultural context, ISSAI KAZ-LLM supports Kazakh, Russian, and English, with additional capabilities in Turkish. This makes it a bridge across linguistic divides and a key tool for advancing generative AI in low-resource languages.

The ISSAI team meticulously collected, processed, synthesized, and translated more than 150 billion tokens of information (words) ensuring robust language performance. The model’s training has achieved competitive results in Kazakh, Russian, and English, rivaling global AI leaders.

Beyond technological innovation, ISSAI KAZ-LLM provided hands-on training for local talent, bolstering national expertise in AI. Kazakhstani researchers were involved at every stage, from data preparation to model execution, laying the groundwork for sustainable AI innovation. Collaborations with leading institutions in Kazakhstan enabled the development of benchmarking tools and datasets tailored to the Kazakh language by linguistic specialists, leveraging advanced machine translation techniques.

ISSAI KAZ-LLM has diverse applications, including Kazakh language translation, content creation, and large-scale text processing. The work on the project started in April 2024, and training of the model required five months. Training data were sourced exclusively from publicly available resources, such as Kazakh websites, news articles, and online libraries, supplemented by data contributions from various organizations.

"This model reflects Kazakhstan's commitment to innovation, autonomy, and the growth of its technological ecosystem. Our team prepared two versions of the ISSAI KAZ-LLM, with 8 billion and 70 billion parameters, built on the Meta Llama architecture and optimized for high performance systems and resource-constrained environments. CC-NC-BY license models, which are made available for non-commercial use on Hugging Face, facilitate global academic and research collaboration. Thus, developers will be able to download and run our model on both sophisticated servers and laptops," said NU ISSAI Director Prof. Dr. Hüseyin Atakan Varol. 

ISSAI intends to develop the next-generation AI systems, including language vision models, and expand these models to support additional Turkic and regional languages. These efforts could be used to strengthen regional connectivity, facilitate language integration, and foster significant economic and technological impact in Kazakhstan and beyond. 

The ISSAI KAZ-LLM project was made possible by the support of the NU and NIS Development Fund, as well as Astana Hub and QazCode (Beeline), and was developed independently of government funding. Beeline Kazakhstan and its IT company QazCode were key partners in creating the national large language model. The additional computational support in the form of 8 DGX H100 cloud servers made it possible to complete the project in time for Kazakhstan's Independence Day, reducing the time for one iteration of training models from 3 years on the A100 server to only 50 days in the cloud. 

"Our team actively participated in the development and training of the Kaz-LLM model. The complex process, which included the creation of a Kazakh language-specific model and 50 days of calculations, made it possible to improve context understanding and ensure high-quality interaction with users. Testing has shown that the model effectively solves technical problems, taking into account cultural specifics. We are confident that Kaz-LLM will become an important tool for the whole of Kazakhstan, helping to overcome the language barrier and improve the quality of digital services in the region," commented Alexei Sharavar, CEO of QazCode.

 

 

Similar news