Demonstration of Efficacy of Exploiting ChatGPT Data to the Transformers-Based Models by Performing Bangla Intent Analysis

Al-Mahmud; Kazutaka Shimada

Authors

Al-Mahmud Kyushu Institute of Technology
Kazutaka Shimada Kyushu Institute of Technology

Keywords:

Intent analysis, Conventional machine-learning models, Transformers-based models, Combined data technique, Semi-supervised learning approach, Stepwise learning approach

Abstract

With the expanding mode of online opinion sharing, an automatic approach to intent analysis is necessary and useful in the practical scenario. Intent analysis inspects persons' and entities’ viewpoints from online user-created texts. Conventional sentiment analysis deals with two classes: positive and negative. In this study, to extend the conventional sentiment analysis task, intent analysis deals with more important classes to obtain deeper insights. Accordingly, this study deals with five classes: pessimism, optimism, suggestion, sarcastic, and miscellaneous. Intent analysis with machine learning essentially needs a massive amount of data to generate a robust model. However, manually accumulating the training data is expensive, particularly in less dominant languages like Bangla. Hence, to obtain sufficient training data, this study generates, collects, and pre-processs Bangla restaurant data for the task by OpenAI ChatGPT API through prompt and data augmentation. These data are called “source data”. As no user-generated Bangla data is available in the literature, this study prepares and validates a new Bangla intent analysis dataset by collecting user-generated real data. These data are referred to as “target data”. Source data is utilized to assist the target task (i.e., main task) performed on the target data. By utilizing both source and target data, three approaches are proposed: combined data approach, semi-supervised learning, and stepwise learning. Experimental results demonstrated that the proposed semi-supervised learning with transformers-based models is effective in improving the performance of the target data by exploiting ChatGPT-generated source data. The best F1 score of the proposed semi-supervised learning is 0.74, while that of the baseline is 0.72. Additionally, we proposed some feature concatenation methods. In this case, the highest F1 score is 0.75

Downloads

Download data is not yet available.

Demonstration of Efficacy of Exploiting ChatGPT Data to the Transformers-Based Models by Performing Bangla Intent Analysis

Authors

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

info

journalsofuthm

index

Latest publications

Counter