LughaNet: Automated Arabic WordNet Construction and Evaluation Using Semantic Question Similarity

Ammar Al-Edhari; Mohsen  Kahani

Authors

Ammar Al-Edhari Ferdowsi University of Mashhad
Mohsen Kahani Ferdowsi University of Mashhad

Keywords:

WordNet construction, Arabic WordNet, Synonyms Extraction, Arabic Semantic Question Similarity

Abstract

Several efforts have been undertaken to enhance the Arabic lexicon and address the limitations of Arabic WordNet. However, efforts to develop a comprehensive and automated Arabic WordNet, comparable in scale and functionality to the English WordNet, have been insufficient. This study addresses the gap in Arabic NLP by introducing LughaNet, an automated Arabic WordNet developed through five key stages. First, Princeton WordNet (PWN) synsets were aligned with Arabic words using a bilingual dictionary and pre-trained machine translation models. Second, Arabic words were extracted from resources such as Wikipedia and the existing Arabic WordNet. Third, natural language processing (NLP) methods, including Skip-gram with AraVec 2.0 embeddings, were applied to extract synonyms from Arabic Wikipedia. Fourth, synonym selection accuracy was enhanced using a pre-trained BERT model and cosine similarity. Finally, PWN glosses and examples were translated into Arabic. This process produced 85,991 synsets for the new Arabic WordNet. An evaluation demonstrated 64.23% coverage of dictionary terms and assessed LughaNet's usability in Arabic Semantic Question Similarity (ASQS) tasks. LughaNet achieved an accuracy of 64.11%, a precision of 57.57%, a recall of 79.02%, and an F1 score of 66.61%, surpassing the original Arabic WordNet’s F1 score of 56.62%. These findings indicate that LughaNet has the potential to be a helpful resource for Arabic NLP research and various applications.