The first academically verified Arabic heritage dataset company. Pre-Islamic and early Islamic classical Arabic content — structured, annotated, and authenticated by Arab scholars — for the sovereign Arabic AI programmes building the next generation.
The pre-Islamic Jahiliyya era produced the foundational texts of the Arabic language — the Mu'allaqat, the tribal histories, the poetry that preserved an entire civilisation's memory for over fourteen centuries. Arab scholars have verified, annotated, and taught this heritage for generations.
Arabic AI has never encountered it. The knowledge exists — it sits in university libraries, doctoral theses, and annotated manuscripts that only specialists know how to find. MYSHBAH.AI was built to build the bridge from the shelf to the system.
"The question is not whether AI will mediate the world's relationship with Arabic heritage. It already does. The question is whether that mediation will be grounded in verified scholarly knowledge — or in Wikipedia and the contents of the internet."
Five AI systems tested in Arabic on the War of al-Basus — no prior context — including two purpose-built Arabic LLMs — 2026
A verified dataset is not a technology product. It is a scholarly judgement, made permanent. When a professor of classical Arabic confirms that a root attribution is correct, that confirmation becomes part of what Arabic AI will learn. The scholar's name, institution, and published expertise are embedded in every licensing agreement.
The mishkah (المشكاة) holds the light. The misbah (المصباح) is the light. The scholar is not a service provider in this framework — the scholar is the product.
"The dataset does not exist without the scholar."
The pilot corpus is 'Antara ibn Shaddad's Mu'allaqa — one of the seven canonical pre-Islamic odes, among the most studied texts in Arabic literary scholarship, and the centrepiece of Arabic literature curricula across the Arab world.
This is not a test of whether the content matters. Every Arabic AI that encounters a question about Antara, pre-Islamic Arabian tribal culture, or the Mu'allaqat tradition will demonstrate the gap immediately. The pilot demonstrates that MYSHBAH.AI can close it.
"The difference between an AI that has read the poem and an AI that has studied under its greatest living interpreter."
There was a time when Arabic was to the world what English is today. From the 8th to the 13th century, to be a scholar anywhere between the Atlantic and the Indian Ocean was to read and write in Arabic. Persian scientists, Turkish administrators, and Jewish and Christian scholars in Andalusia all conducted their intellectual lives in Arabic — because Arabic was where human knowledge lived.
That civilisation did not begin with Islam. It began before it — in the poetry and oral histories of the Jahiliyya era. The scholars who have spent careers in this heritage are the only ones who can ensure that Arabic AI learns it correctly.
We invite scholars to: