Demonstration of Efficacy of Exploiting ChatGPT Data to the Transformers-Based Models by Performing Bangla Intent Analysis

Authors

  • Al-Mahmud Kyushu Institute of Technology
  • Kazutaka Shimada Kyushu Institute of Technology

Keywords:

Intent analysis, Conventional machine-learning models, Transformers-based models, Combined data technique, Semi-supervised learning approach, Stepwise learning approach

Abstract

With the expanding mode of online opinion sharing, an automatic approach to intent analysis is necessary and useful in the practical scenario. Intent analysis inspects persons' and entities’ viewpoints from online user-created texts. Conventional sentiment analysis deals with two classes: positive and negative. In this study, to extend the conventional sentiment analysis task, intent analysis deals with more important classes to obtain deeper insights. Accordingly, this study deals with five classes: pessimism, optimism, suggestion, sarcastic, and miscellaneous. Intent analysis with machine learning essentially needs a massive amount of data to generate a robust model. However, manually accumulating the training data is expensive, particularly in less dominant languages like Bangla. Hence, to obtain sufficient training data, this study generates, collects, and pre-processs Bangla restaurant data for the task by OpenAI ChatGPT API through prompt and data augmentation. These data are called “source data”. As no user-generated Bangla data is available in the literature, this study prepares and validates a new Bangla intent analysis dataset by collecting user-generated real data. These data are referred to as “target data”. Source data is utilized to assist the target task (i.e., main task) performed on the target data. By utilizing both source and target data, three approaches are proposed: combined data approach, semi-supervised learning, and stepwise learning. Experimental results demonstrated that the proposed semi-supervised learning with transformers-based models is effective in improving the performance of the target data by exploiting ChatGPT-generated source data. The best F1 score of the proposed semi-supervised learning is 0.74, while that of the baseline is 0.72. Additionally, we proposed some feature concatenation methods. In this case, the highest F1 score is 0.75

Downloads

Download data is not yet available.

Downloads

Published

27-11-2024

Issue

Section

Special Issue 2024: SAES2023 (E)

How to Cite

Al-Mahmud, & Kazutaka Shimada. (2024). Demonstration of Efficacy of Exploiting ChatGPT Data to the Transformers-Based Models by Performing Bangla Intent Analysis. International Journal of Integrated Engineering, 16(7), 12-25. https://penerbit.uthm.edu.my/ojs/index.php/ijie/article/view/18533