Optimizing Sentiment Analysis of Indonesian Texts: Enhancing Deep Learning Models with Genetic Algorithm-Based Feature Selection
Keywords:
Automatic text classification, Feature selection, Genetic Algorithms, Sentiment analysis, Deep learning modelsAbstract
Automatic text classification techniques are used to solve real-world problems, such as spam filtering, sentiment analysis, and news classification. However, representing text in a term-document matrix with large dimensions can pose challenges in managing complex dimensions and the risk of overfitting the model. Feature selection (FS) plays an important role in increasing learning accuracy, eliminating irrelevant data, and reducing dimensionality. Feature selection (FS) is performed during data pre-processing to prevent irrelevant features from affecting the efficiency and accuracy of the classification model. The feature selection process in text classification can be categorized into four main strategies: filter methods, wrapper approaches, embedded methods, and hybrid models. A commonly used hybrid model employs Genetic Algorithms (GA) for feature selection in the context of text classification. GA is utilized to solve optimization problems by simulating the process of natural selection, where individuals develop and adapt over generations. This article proposes enhancing the performance of deep learning models, such as LSTM, DNN, and CNN, in sentiment analysis of Indonesian language texts by incorporating feature selection through a genetic algorithm. The FS+LSTM, FSGA+DNN, and FSGA+CNN models provided different results from models that did not use feature selection after tests were carried out on 20,769 selected features. The addition of this feature significantly reduces data dimensions and increases model computing speed. For sentiment classification accuracy results with the same parameters, namely a learning rate of 0.0001 and a batch size of 64, there was an average increase of 1% for the three previous models. The model achieved the highest accuracy with FSGA+DNN at 89.53%, followed by FSGA+LSTM at 89.32%, and FSGA+CNN at 85.16%.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Journal of Soft Computing and Data Mining

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.









