Optimizing Sentiment Analysis of Indonesian Texts: Enhancing Deep Learning Models with Genetic Algorithm-Based Feature Selection

Noor Zuraidin Mohd Safar; Siti Mujilahwati; Ku Muhammad Naim Ku Halif; Nasyitah Ghazalli

Authors

Noor Zuraidin Mohd Safar UTHM
Siti Mujilahwati Universitas Islam Lamongan
Ku Muhammad Naim Ku Halif Universiti Malaysia Pahang Al-Sultan Abdullah
Nasyitah Ghazalli Thales UK

Keywords:

Automatic text classification, Feature selection, Genetic Algorithms, Sentiment analysis, Deep learning models

Abstract

Automatic text classification techniques are used to solve real-world problems, such as spam filtering, sentiment analysis, and news classification. However, representing text in a term-document matrix with large dimensions can pose challenges in managing complex dimensions and the risk of overfitting the model. Feature selection (FS) plays an important role in increasing learning accuracy, eliminating irrelevant data, and reducing dimensionality. Feature selection (FS) is performed during data pre-processing to prevent irrelevant features from affecting the efficiency and accuracy of the classification model. The feature selection process in text classification can be categorized into four main strategies: filter methods, wrapper approaches, embedded methods, and hybrid models. A commonly used hybrid model employs Genetic Algorithms (GA) for feature selection in the context of text classification. GA is utilized to solve optimization problems by simulating the process of natural selection, where individuals develop and adapt over generations. This article proposes enhancing the performance of deep learning models, such as LSTM, DNN, and CNN, in sentiment analysis of Indonesian language texts by incorporating feature selection through a genetic algorithm. The FS+LSTM, FSGA+DNN, and FSGA+CNN models provided different results from models that did not use feature selection after tests were carried out on 20,769 selected features. The addition of this feature significantly reduces data dimensions and increases model computing speed. For sentiment classification accuracy results with the same parameters, namely a learning rate of 0.0001 and a batch size of 64, there was an average increase of 1% for the three previous models. The model achieved the highest accuracy with FSGA+DNN at 89.53%, followed by FSGA+LSTM at 89.32%, and FSGA+CNN at 85.16%.

Downloads

Download data is not yet available.

Author Biographies

Siti Mujilahwati, Universitas Islam Lamongan

Siti Mujilahwati received the Bachelor of Informatics Engineering at the Universitas Trunojoyo Madura, Indonesia 2009. and Magister Computer in Magister Technology Information at the Institute Sains dan Teknlogi Terpadu Surabaya (iSTTS), Indonesia 2013. Currently she works at Universitas Islam Lamongan as a lecturer and researcher in informatics engineering.
Ku Muhammad Naim Ku Halif, Universiti Malaysia Pahang Al-Sultan Abdullah

Ku Muhammad Na’im Ku Khalif is currently a senior lecturer in Centre for Mathematical Sciences, Universiti Malaysia Pahang, Malaysia. He received a PhD in Computational Intelligence from University of Portsmouth, United Kingdom. His research interests are in the development of computational intelligence methods focusing on decision making support systems and the applications for modelling and simulation of complex systems under fuzzy/ uncertain environment. He is also interested in data mining and machine learning through probabilistic models.
Nasyitah Ghazalli, Thales UK

Nasyitah Ghazalli received the BSc in Communication Engineering from International Islamic University Malaysia and MSc in Communication Systems Engineering from University of Portsmouth, United Kingdom. In 2019, she received Ph.D degree at the Cranfield University, United Kingdom in Radar Systems. She is now working with Thales UK as a Research Engineer. Her research interests include decision support/making using Artificial Intelligence and Machine Learning and Reinforcement Learning.