An Ensemble Learning Model for Multi-Type Cancer Prediction in Clinical Diagnostic Decision Support Systems

Salama A.  Mostafa; Dhafar Fakhry  Hasan; Maha A. Abdul-Jabar; Aida Mustapha

Authors

Salama A. Mostafa Department of Artificial Intelligence Engineering Techniques, College of Technical Engineering, Alnoor University, Mosul https://orcid.org/0000-0001-5348-502X
Dhafar Fakhry Hasan Computer Unit, College of Medicine, University of Mosul
Maha A. Abdul-Jabar Networks and Cybersecurity Department, Hourani Center for Applied Scientific Research, Al-Ahliyya Amman Universit
Aida Mustapha Centre for Artificial Intelligence and Data Science, Universiti Malaysia Pahang Al-Sultan Abdullah

Keywords:

Machine learning (ML), ensemble learning (EL), ensemble learning, Decision Support System (DSS), multi-cancer diagnosis, breast cancer, liver cancer, cervical cancer, brain cancer

Abstract

Cancer attacks various parts of the human body, causing different effects that result in numerous health conditions. Diagnosing the type of cancer at an early stage is critically important, as it ensures timely clinical management of the patient, which has been a necessity in relevant research. Machine learning (ML) and Deep Learning (DL) algorithms are widely used for identifying cancer cases by collecting data from various fields, including medical, biomedical, and bioinformatics. These algorithms can reveal important features and detect cancer cases within complex cancer datasets. In this paper, an ensemble learning method comprising five widely used ML and DL algorithms has been proposed for constructing an ensemble-based Multi-Type Cancer Prediction (eMTCP) model. The algorithms employed in building the eMTCP are Naive Bayes (NB), Random Forest (RF), Support Vector Machines (SVM), Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM). The eMTCP is deployed for developing a Cancer Diagnostic Clinical Decision Support System (CDCDSS). Subsequently, four cancer diagnostic datasets: liver cancer, breast cancer, brain cancer, and cervical cancer, are utilized to test the performance of the eMTCP model. The most promising algorithm is eMTCP (stacked ensemble), which yields the highest F1-scores on all datasets: breast (0.979), liver (0.765), brain (0.898), and cervical (0.482), illustrating better performance in multi-type cancer prediction. CNN and LSTM achieved high stability with superior F1-scores of 0.957 (breast), 0.669 (liver), and 0.853 (brain). CNN outperformed in the case of liver cancer and showed similar performance in other cancer types. The SVM ML model has the lowest scores of all: 0.969 (breast), 0.716 (liver), 0.844 (brain), and only 0.153 (cervical), indicating that its representation in the clinical-only dataset is not very reliable.

Downloads

Download data is not yet available.

Author Biography

Salama A. Mostafa, Department of Artificial Intelligence Engineering Techniques, College of Technical Engineering, Alnoor University, Mosul

SALAMA A. MOSTAFA received a B.Sc. degree in Computer Science from the University of Mosul, Iraq, in 2003 and an M.Sc. and Ph.D. in Information and Communication Technology (ICT) from Universiti Tenaga Nasional (UNITEN), Malaysia, in 2011 and 2016, respectively. He is the former Head of the Center of Intelligent and Autonomous Systems (CIAS) at Universiti Tun Hussein Onn Malaysia (UTHM). He is currently the Chief Editor of the Journal of Soft Computing and Data Mining (JSCDM). Stanford University and Elsevier have selected him as one of the World's TOP 2% Scientists in 2022-2024. He has produced over 222 Scopus-indexed articles published in journals, books, and conference proceedings. His current H-index in Scopus is 40. He has completed 14 industrial projects and 23 research projects. His specialization and research interest are in Artificial Intelligence, including machine learning, deep learning, soft computing, autonomous agents, adjustable autonomy, human-computer collaboration, optimization, computer vision, and their integration into various automated or autonomous systems.