Searching and Comparing Isim Ma’rifat with Diacritic Removal in the Quran and Sahih Muslim Hadiths


  • Ryan Fahreza Maliki Telkom University, Bandung, Indonesia
  • Eko Darwiyanto Telkom University, Bandung, Indonesia
  • Moch. Arif Bijaksana Telkom University, Bandung, Indonesia



Prefix; Sahih Muslim; Tokenizer; Quran; Diacritics


This research aims to address the scarcity of comprehensive websites providing detailed lists of Isim Ma’rifat in the Quran and Sahih Muslim Hadith. The absence of a comprehensive resource hinders the ability to study and compare Isim Ma’rifat between these significant Islamic texts. To overcome this issue, the study develops a natural language processing approach utilizing an integrated Java tokenizer program with a MySQL database containing the Sahih Muslim Hadith and Quranic texts. The program identifies the occurrence of the alif lam prefix, followed by diacritic removal to facilitate accurate verse comparison between the two texts. The research focuses on identifying alif lam prefixed Isim Ma’rifat exclusively present in the Quran, exclusive to Sahih Muslim Hadith, and similarities between them. The analysis yields a comprehensive understanding of the distinctions and similarities of alif lam prefixed Isim Ma’rifat between the Quran and Sahih Muslim. These findings provide valuable input for the Al-Quran project, contributing to the development of comprehensive and accessible resources for Islamic studies. It is expected that this research will enhance the understanding of Isim Ma’rifat in the religious and linguistic context, offering a significant contribution to the field of natural language processing especially in the Arabic language.


Download data is not yet available.


F. Beirade, H. Azzoune, and D. E. Zegour, “Semantic query for Quranic ontology,” Journal of King Saud University - Computer and Information Sciences, vol. 33, no. 6, pp. 753–760, Jul. 2021, doi: 10.1016/j.jksuci.2019.04.005.

M. N. Al Salem, S. Alghazo, I. Alrashdan, N. Abusalim, and M. Rayyan, “On English translation variation of similar plural nouns in the Holy Quran,” Cogent Arts Humanit, vol. 10, no. 1, 2023, doi: 10.1080/23311983.2023.2196136.

A. M. Abu Nada, E. Alajrami, A. A. Al-Saqqa, and S. S. Abu-Naser, “Arabic Text Summarization Using AraBERT Model Using Extractive Text Summarization Approach”, [Online]. Available:

T. R. M. Romli, M. Z. Othman, M. H. Abdullah, and M. Z. A. Hamat, “Word Classification in the Online Database of Malay-Arabic Comparable Phrases,” International Journal of Academic Research in Progressive Education and Development, vol. 7, no. 4, Nov. (2018), doi: 10.6007/ijarped/v7-i4/4853.

S. S. Saloum, “DAD: A Detailed Arabic Dataset for Online Text Recognition and Writer Identification, a New Type,” Journal of Computer Science, vol. 17, no. 1, pp. 19–32, (2021), doi: 10.3844/jcssp.2021.19.32.

M. Alkaoud and M. Syed, “On the Importance of Tokenization in Arabic Embedding Models,” 2020. [Online]. Available:

A. Hallberg, “Principles of variation in the use of diacritics (tašk?l) in Arabic books,” Language Sciences, vol. 93, Sep. 2022, doi: 10.1016/j.langsci.2022.101482.

Z. Alyafeai Dhahran, S. Arabia, and M. S. Al-Shaibani Dhahran, “ARBML: Democratizing Arabic Natural Language Processing Tools,” (2020).

R. G. Disclaimer, “The Mysterious Disjointed Letters in The Qur’an: Evidence of Divine Authorship”, doi: 10.13140/RG.2.2.12311.60327.

B. Justice and M. Ahmad Mughal, “Kinds of Alif, L?m ( ???? ????? ????? ??? ??? ????? ).” [Online]. Available:

M. Zakki Mubarok, M. Irham, and S. Darul Fattah Bandar Lampung, “Analisis Isim Ma’rifat dalam Al-Qur’an Surat Ash-Shaff,” (2021).

A. A. Amer, M. H. Mohamed, and K. Al_Asri, “ASGOP: An aggregated similarity-based greedy-oriented approach for relational DDBSs design,” Heliyon, vol. 6, no. 1, Jan. (2020), doi: 10.1016/j.heliyon.2020.e03172.

A. Al Qifari, “Shaut Al-’Arabiyah Nakirah dan Ma’ Rifah Fii Al-Qur’an,” vol. 10, no. 1, 2022, doi: 10.24252/saa.v10i1.29432.

Z. Alyafeai, M. S. Al-shaibani, M. Ghaleb, and I. Ahmad, “Evaluating Various Tokenizers for Arabic Text Classification,” Jun. (2021), [Online]. Available:

O. Hamed and T. Zesch, “The Role of Diacritics in Adapting the Difficulty of Arabic Lexical Recognition Tests.” [Online]. Available:

A. Alhasan and A. T. Al-Taani, “POS Tagging for Arabic Text Using Bee Colony Algorithm,” in Procedia Computer Science, Elsevier B.V., 2018, pp. 158–165. doi: 10.1016/j.procs.2018.10.471.

Q. Bsoul, R. A. Salam, J. Atwan, and M. Jawarneh, “Arabic Text Clustering Methods and Suggested Solutions for Theme-Based Quran Clustering: Analysis of Literature,” Journal of Information Science Theory and Practice, vol. 9, no. 4, pp. 15–34, 2021, doi: 10.1633/JISTaP.2021.9.4.2.

N. Hizbullah and A. Mutaali, “Quranic Corpus Models for Corpus-Based Studies,” (2019).

“Exploring the Meaning of Huroof-e-Muqatta’at (Abbreviated / Disjointed Letters) in the Quran Exploring the Meaning of Huroof-e-Muqatta’at (Abbreviated / Disjointed Letters) in the Quran Exploring the Meaning of Huroof-e-Muqatta’at (Abbreviated / Disjointed Letters) in the Quran,” (2021).

L. Nahda Sahib Hashim, “The Muqatta’at Disjointed Letters in the Holy Qur’an Analytical Study.”(2019)

Muhammad Misbah,etc “STUDI KITAB HADIS: Dari Muwaththa’ Imam Malik hingga Mustadrak Al Hakim”(2020)

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Searching and Comparing Isim Ma’rifat with Diacritic Removal in the Quran and Sahih Muslim Hadiths


Published: 2023-08-10
Abstract View: 59 times
PDF Download: 26 times