Diabetes Prediction Through Classification Using Pima Dataset: Survey and Evaluation
Keywords:
Diabetes, Prediction, Diagnosis, Prognosis, Dataset, Classification, Evaluation, PimaAbstract
Diabetes prediction using machine learning techniques has been extensively investigated in the literature, resulting in diverse prediction and evaluation approaches. This diversity often leads to inconsistent comparisons between these approaches. This paper offers a comprehensive survey of state-of-the-art diabetes prediction through classification, encompassing various processing techniques and machine learning methods. As such, the primary objectives of this paper are as follows: 1) Analyzing the performance of existing machine learning methods and the advancements in diabetes prediction outcomes over time. 2) Establishing a baseline for evaluating and benchmarking different diabetes prediction approaches. 3) Assessing the performance of common machine learning methods. 4) Proposing future research directions to enhance diabetes prognosis methodologies further. 5) Provide state-of-the-art results for the performance of the common machine learning methods on the Pima dataset. The review outcomes show huge variations in the existing prediction and evaluation approaches. The results of the proposed evaluation approach showed that scaling, feature selection, and over-sampling improve the results of the prediction approaches. Besides, the results of the machine learning techniques varied, with the best results mostly achieved by the random forest algorithm.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Soft Computing and Data Mining

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.









