Diabetes Prediction Through Classification Using Pima Dataset: Survey and Evaluation

Ahmad Abu-Shareha; Mosleh M.  Abualhaj; Mohammad A.  Alsharaiah; Adeeb Al-Saaidah; Anusha Achuthan

Authors

Ahmad Abu-Shareha Al-Ahliyya Amman University
Mosleh M. Abualhaj Al-Ahliyya Amman University
Mohammad A. Alsharaiah Al-Ahliyya Amman University
Adeeb Al-Saaidah Al-Ahliyya Amman University
Anusha Achuthan Universiti Sains Malaysia

Keywords:

Diabetes, Prediction, Diagnosis, Prognosis, Dataset, Classification, Evaluation, Pima

Abstract

Diabetes prediction using machine learning techniques has been extensively investigated in the literature, resulting in diverse prediction and evaluation approaches. This diversity often leads to inconsistent comparisons between these approaches. This paper offers a comprehensive survey of state-of-the-art diabetes prediction through classification, encompassing various processing techniques and machine learning methods. As such, the primary objectives of this paper are as follows: 1) Analyzing the performance of existing machine learning methods and the advancements in diabetes prediction outcomes over time. 2) Establishing a baseline for evaluating and benchmarking different diabetes prediction approaches. 3) Assessing the performance of common machine learning methods. 4) Proposing future research directions to enhance diabetes prognosis methodologies further. 5) Provide state-of-the-art results for the performance of the common machine learning methods on the Pima dataset. The review outcomes show huge variations in the existing prediction and evaluation approaches. The results of the proposed evaluation approach showed that scaling, feature selection, and over-sampling improve the results of the prediction approaches. Besides, the results of the machine learning techniques varied, with the best results mostly achieved by the random forest algorithm.