Diabetes Prediction Using The Smote-Cart Framework Model for Imbalanced Data Case

Authors

  • Farah Najidah Noorizan Universiti Tun Hussein Onn Malaysia
  • Nur Anida Jumadi Universiti Tun Hussein Onn Malaysia
  • Muhamad Amir Irfan Roslan Universiti Tun Hussein Onn Malaysia
  • Ng Li Mun Universiti Tun Hussein Onn Malaysia
  • Manveer Pal Singh Putra Specialist Hospital Batu Pahat
  • Yukihiro Ishida SECOND HEART Inc.

Keywords:

Diabetes Mellitus, Synthetic Minority Oversampling Technique, Classification and Regression Tree, Hyperparameter Tuning, Evaluation Metrics

Abstract

Diabetes mellitus (DM) is described by chronic high blood glucose levels, which can result in long-term damage, dysfunction, and organ failure. As a result of technological advancements, many researchers are employing machine learning to predict diabetes. They collect patients’ demographics and health information, organizing them into a dataset. However, in most real-world data, the non-diabetic cases exceed the diabetic cases, contributing to bias in the majority class and resulting in low predictive diabetic cases. Therefore, a Synthetic Minority Oversampling Technique (SMOTE) has been proposed to improve diabetic prediction on the dataset samples before training the Classification and Regression Tree (CART) model. The proposed framework involved the preprocessing step (SMOTE and categorical conversion), CART training, hyperparameter tuning, and evaluation metrics. With a combination of 8 leaf numbers per node, a maximum of 10 splits, and deviance as the split criterion, the model achieves an overall accuracy of 98.72%, a precision of 98.94%, a sensitivity of 98.44%, and an F1-score of 98.67%. In conclusion, the proposed SMOTE-CART framework can effectively address the imbalanced data in a diabetes dataset and improve the accuracy of diabetes prediction.

Downloads

Download data is not yet available.

Downloads

Published

10-02-2026

Issue

Section

Special Issue 2025: ICESNANO2025

How to Cite

Noorizan, F. N., Jumadi, N. A. ., Muhamad Amir Irfan Roslan, Ng, L. M., Manveer Pal Singh3, & Yukihiro Ishida. (2026). Diabetes Prediction Using The Smote-Cart Framework Model for Imbalanced Data Case. Journal of Soft Computing and Data Mining, 7(1), 1-10. https://penerbit.uthm.edu.my/ojs/index.php/jscdm/article/view/23279