Diabetes Prediction Using The Smote-Cart Framework Model for Imbalanced Data Case
Keywords:
Diabetes Mellitus, Synthetic Minority Oversampling Technique, Classification and Regression Tree, Hyperparameter Tuning, Evaluation MetricsAbstract
Diabetes mellitus (DM) is described by chronic high blood glucose levels, which can result in long-term damage, dysfunction, and organ failure. As a result of technological advancements, many researchers are employing machine learning to predict diabetes. They collect patients’ demographics and health information, organizing them into a dataset. However, in most real-world data, the non-diabetic cases exceed the diabetic cases, contributing to bias in the majority class and resulting in low predictive diabetic cases. Therefore, a Synthetic Minority Oversampling Technique (SMOTE) has been proposed to improve diabetic prediction on the dataset samples before training the Classification and Regression Tree (CART) model. The proposed framework involved the preprocessing step (SMOTE and categorical conversion), CART training, hyperparameter tuning, and evaluation metrics. With a combination of 8 leaf numbers per node, a maximum of 10 splits, and deviance as the split criterion, the model achieves an overall accuracy of 98.72%, a precision of 98.94%, a sensitivity of 98.44%, and an F1-score of 98.67%. In conclusion, the proposed SMOTE-CART framework can effectively address the imbalanced data in a diabetes dataset and improve the accuracy of diabetes prediction.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Soft Computing and Data Mining

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.









