Saudi Arabia leads the development of Arabic language models by 2025

In a move reflecting the success of digital transformation strategies and the Kingdom’s Vision 2030, a recent study confirmed that the Kingdom of Saudi Arabia will be at the forefront of countries developing Arabic language models during 2025. These results highlight the pivotal role that the Kingdom plays in enhancing the presence of the Arabic language in digital spaces, and enabling artificial intelligence technologies to understand and comprehend Arabic cultural and linguistic contexts with high accuracy.
Context of technological development and Saudi leadership
This achievement was not a coincidence, but rather the result of tireless efforts led by the Saudi Data and Artificial Intelligence Authority (SDAIA) in collaboration with the King Salman Global Academy for the Arabic Language. This step is of paramount strategic importance in the global race towards “digital sovereignty,” as developing national linguistic models is a pressing necessity to reduce reliance on foreign models that may lack a deep understanding of the nuances of Arabic dialects and culture. These efforts aim to build an integrated artificial intelligence system that supports innovation in both public and private institutions.
From traditional rules to generative intelligence
The study reviewed the historical sequence of the development of the Arabic language processing machine, pointing to the radical shift from systems based on strict rules before 2000 AD, through statistical models, to the current revolution represented by large language models (LLMs) and their generative applications between 2022 and 2025. The recent period witnessed a qualitative leap with the launch of dozens of models that serve vital sectors such as education and technology.
Numbers and facts from the digital landscape
The study examined the state of Arabic language models up to the first quarter of 2025, documenting more than 53 Arabic language models. While the Kingdom emerged as a leading developer, the analysis revealed a technical challenge: the dominance of text-based (monomedia) models at 81%, compared to a mere 7% for multimedia models (which process images and audio), indicating significant room for future development.
Balsam scale and performance challenges
Regarding competency assessment, the study relied on the “Balasam” standard issued by the King Salman Global Academy for the Arabic Language. The results showed a disparity in performance; while international models still excel in reasoning and programming capabilities, Arabic models have achieved remarkable progress and strong competitiveness in specific tasks such as summarizing, creative writing, and reading comprehension, indicating a promising future for these national technologies.
A roadmap for the future
The study concluded by outlining an ambitious roadmap to bridge the gap with global models, emphasizing the need for large, high-quality Arabic datasets covering diverse dialects and scientific fields. It also called for increased investment in multi-capacity models and the development of precise benchmarks to ensure the Kingdom becomes a leading regional and global hub for exporting Arabic technological knowledge.



