Abstract
This paper examines the development of Moroccan Darija Wikipedia since its launch in July 2020. It details the strategies employed by the Wikimedia Morocco user group, focusing on bot automation and editing contests, to foster growth within this low-resource language Wikipedia. The paper highlights the opportunities Darija Wikipedia presents for Artificial Intelligence research, particularly in Natural Language Processing, given its status as the largest online Darija dataset. It also explores how the standardization efforts undertaken by the user group enable valuable collaboration between volunteers, experts, and researchers, potentially setting a prece-dent for other similar language communities. Furthermore, the paper addresses key challenges, including ensuring community sustainability and mitigating vandalism, and analyzes the manifestation of diverse spelling conventions (phonetic, etymological) within the encyclopedia’s content.
References
Al-Nassir, Abdulmunim Abdulamir. 1985. Sibawayh the phonologist: A critical study of the phonetic and phono-logical theory of Sibawayh as presented in his treatise Al-Kitab. York: University of York. (Doctoral dissertation.)
Alshahrani, Saied & Wali, Esma & Matthews, Jeanna. 2022. Learning from Arabic corpora but not always from Arabic speakers: A case study of the Arabic Wikipedia editions. In Bouamor, Houda etc. (eds.), Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), 361-371. Abu Dhabi: Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/2022.wanlp-1.34
Alshahrani, Saied & Alshahrani, Norah & Dey, Soumyabrata & Matthews, Jeanna. 2023. Performance implica-tions of using unrepresentative corpora. In Sawaf, Hassan etc. (eds.), Arabic Natural Language Processing. Proceedings of Arabic NLP 2023, 218-231. Singapore: Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/2023.arabicnlp-1.19
Alshahrani, Saied & Haroon, Hesham & Elfilali, Ali & Njie, Mariama & Matthews, Jeanna. 2024. Leveraging corpus metadata to detect template-based translation: An exploratory case study of the Egyptian Arabic Wiki-pedia edition. In Al Khalifa, Hend & Darwish, Kareem & Mubarak, Hamdy (eds.), Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024, 31-45. Torino: LREC-COLING
Barrett, Daniel J. 2008. MediaWiki. Beijing etc.: O’Reilly.
Baytiyeh, Hoda & Pfaffman, Jay. 2010. Volunteers in Wikipedia: Why the community matters. Journal of Educa-tional Technology & Society 13(2). 128-140.
Behnstedt, Peter & Benabbou, Mostafa. 2005. Données nouvelles sur les parlers arabes du Nord-Est marocain. Zeitschrift für Arabische Linguistik 44. 17-70.
Boumans, Louis. 2006. The attributive possessive in Moroccan Arabic spoken by young bilinguals in the Netherlands and their peers in Morocco. Bilingualism: Language and Cognition 9(3). 213-231. DOI: https://doi.org/10.1017/S1366728906002598
Caubet, Dominique. 2018. New elaborate written forms in Darija: Blogging, posting and slamming in Morocco. In Benmamoun, Elabbas & Bassiouney, Reem (eds.), The Routledge handbook of Arabic linguistics, 387-406. London: Routledge. DOI: https://doi.org/10.4324/9781315147062-22
Chtatou, Mohamed. 1997. The influence of the Berber language on Moroccan Arabic. International Journal of the Sociology of Language 123. 101-118. DOI: https://doi.org/10.1515/ijsl.1997.123.101
Darija Wikipedia. 2025a. Discussion Page: Namespace. (https://w.wiki/AiaQ) (Accessed 2025-04-24.)
Darija Wikipedia. 2025b. List of Administrators. (https://w.wiki/AipB) (Accessed 2025-04-24.)
English Wikipedia. 2025a. The five pillars of Wikipedia. (https://w.wiki/5) (Accessed 2025-04-24.)
English Wikipedia. 2025b. Vandalism. (https://w.wiki/mrS) (Accessed 2025-04-24.)
Ennaji, Moha & Makhoukh, Ahmed & Es-Saiydi, Hassan & Moubtassime, Mohamed & Slaoui, Souad. 2004. A grammar of Moroccan Arabic. Fès: Faculty of Letters Dhar El Mehraz.
Forte, Andrea & Larco, Vanesa & Bruckman, Amy. 2009. Decentralization in Wikipedia governance. Journal of Management Information Systems 26(1). 49-72. DOI: https://doi.org/10.2753/MIS0742-1222260103
Gilfillan, Ian. 2024. October 2024 African language Wikipedia update. (https://www.greenman.co.za/blog/?p=2944) (Accessed 2025-04-24.)
Heath, Jeffrey. 1997. Moroccan Arabic phonology. Phonologies of Asia and Africa (including the Caucasus) 1. 205-217.
Heath, Jeffrey. 2015. D-possessives and the origins of Moroccan Arabic. Diachronica 32(1). 1-33. DOI: https://doi.org/10.1075/dia.32.1.01hea
Heath, Jeffrey. 2020. Moroccan Arabic. In Lucas, Christopher & Manfredi, Stefano (eds.), Arabic and contact-induced change, 213–223. Berlin: Language Science Press.
Konieczny, Piotr. 2018. Volunteer retention, burnout and dropout in online voluntary organizations: Stress, conflict and retirement of Wikipedians. In Coy, Patrick G. (ed.), Research in social movements, conflicts and change, vol. 42, 199-219. Bingley: Emerald Publishing Limited DOI: https://doi.org/10.1108/S0163-786X20180000042008
Massa, Paolo, & Scrinzi, Federico. 2011. Exploring linguistic points of view of Wikipedia. Proceedings of the 7th International Symposium on Wikis and Open Collaboration, 213-214. New York: Association for Computing Machinery. DOI: https://doi.org/10.1145/2038558.2038599
McCarthy, Philip M. & Jarvis, Scott. 2010. MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior research methods 42(2). 381-392. DOI: https://doi.org/10.3758/BRM.42.2.381
MediaWiki. 2010. Manual:Bots. (https://w.wiki/DsYp) (Accessed 2025-04-24.)
Meta Wikimedia. 2007. Language proposal policy. (https://w.wiki/5RsC) (Accessed 2025-04-24.)
Meta Wikimedia. 2025. List of Wikipedias. (https://w.wiki/7iw) (Accessed 2025-04-24.)
Michalski, Marcin. 2016. Spelling Moroccan Arabic in Arabic script: The case of literary texts. In Grigore, George & Bițună, Gabriel (eds.), Arabic varieties – far and wide: Proceedings of the 11th International Conference of AIDA – Bucharest, 2015, 385-394. București: Editura Universității din București.
al-Midlāwī al-Mnabbhi, Muḥammad. 2019. Al-ʿArabiyya al-dāriǧa: Imlāʾiyya wa-naḥw: Al-aṣwāt, al-ṣarf, al-tarkīb, al-muʿǧam (Darija Arabic: Spelling and grammar: Sounds, conjugation, structure, vocabulary). Zākūra: Markaz Tanmiyat al-Dāriǧa.
Miller, Catherine. 2017. Contemporary dārija writings in Morocco: Ideology and practices. In Høigilt, Jacob & Mejdell, Gunvor (eds.), The politics of written language in the Arab world: Writing change, 90-115. Leiden: Brill. DOI: https://doi.org/10.1163/9789004346178_006
Moussa, Hanane Nour & Mourhir, Asmaa. 2023. DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect. Data in Brief 48. 109234. DOI: https://doi.org/10.1016/j.dib.2023.109234
Moustaoui Srhir, Adil. 2012. Language planning, standardization and dinamics of change in Moroccan Arabic. Dialectologia 9. 53-69.
Moustaoui Srhir, Adil. Sociolinguistics of Moroccan Arabic: New topics. Frankfurt/Berlin: Peter Lang.
Mrini, Khalil & Bond, Francis. 2018. Putting figures on influences on Moroccan darija from Arabic, French and Spanish using the Wordnet. In Bond, Francis & Piek, Vossen & Fellbaum, Christiane (eds.), Proceedings of the 9th Global Wordnet Conference, 372-377. Singapore: Global Wordnet Association. DOI: https://doi.org/10.18653/v1/2018.gwc-1.46
Ouhalla, Jamal. 2015. The origins of Andalusi-Moroccan Arabic and the role of diglossia. Brill’s Journal of Afroasiatic Languages and Linguistics 7(2). 157-195. DOI: https://doi.org/10.1163/18776930-00702002
Šafīq, Muḥammad. 1999. Al-Dāriǧa al-maġribiyya: Maǧāl tawārud bayn al-amāzīġiyya wa-al-ʿarabiyya. Rabat: Academy of the Kingdom of Morocco.
Sedrati, Anass & Ait Ali, Abderrahman. 2019. Moroccan Darija in online creation communities: Example of Wiki-pedia. Al-Andalus Magreb 26(1). 1-14. DOI: https://doi.org/10.25267/AAM.2019.i26.11
Shang, Guokan & Abdine, Hadi & Khoubrane, Yousef & Mohamed, Amr & Abbahaddou, Yassine & Ennadir, Sofiane & Momayiz, Imane & Ren, Xuguang & Moulines, Eric & Nakov, Preslav & Vazirgiannis, Michalis & Xing, Eric. 2024. Atlas-Chat: Adapting Large Language Models for low-resource Moroccan Arabic dialect. arXiv preprint. (https://arxiv.org/pdf/2409.17912) (Accessed 2025-04-24.)
The Economist. 2021. Wikipedia is 20, and its reputation has never been higher. (https://www.economist.com/international/2021/01/09/wikipedia-is-20-and-its-reputation-has-never-been-higher) (Accessed 2025-04-24.)
Wikimedia Foundation. 2025. About us. (https://wikimediafoundation.org/about/) (Accessed 2025-04-24.)
Wikimedia Incubator. 2007. Incubator: About. (https://w.wiki/3Sav) (Accessed 2025-04-24.)
Wikimedia Morocco. 2021. Annual Report. (https://w.wiki/DsY$) (Accessed 2025-04-24.)
Wikimedia Morocco. 2023. Annual Report. (https://w.wiki/Cg2t) (Accessed 2025-04-24.)
Wikimedia Statistics. 2025. Moroccan Darija Monthly Overview. (https://w.wiki/DsZ4) (Accessed 2025-04-24.)
License
Copyright (c) 2025 Anass Sedrati, Mounir Afifi, Reda Benkhadra

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
