An Ensemble Learning Model for Multi-Type Cancer Prediction in Clinical Diagnostic Decision Support Systems
Keywords:
Machine learning (ML), ensemble learning (EL), ensemble learning, Decision Support System (DSS), multi-cancer diagnosis, breast cancer, liver cancer, cervical cancer, brain cancerAbstract
Cancer attacks various parts of the human body, causing different effects that result in numerous health conditions. Diagnosing the type of cancer at an early stage is critically important, as it ensures timely clinical management of the patient, which has been a necessity in relevant research. Machine learning (ML) and Deep Learning (DL) algorithms are widely used for identifying cancer cases by collecting data from various fields, including medical, biomedical, and bioinformatics. These algorithms can reveal important features and detect cancer cases within complex cancer datasets. In this paper, an ensemble learning method comprising five widely used ML and DL algorithms has been proposed for constructing an ensemble-based Multi-Type Cancer Prediction (eMTCP) model. The algorithms employed in building the eMTCP are Naive Bayes (NB), Random Forest (RF), Support Vector Machines (SVM), Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM). The eMTCP is deployed for developing a Cancer Diagnostic Clinical Decision Support System (CDCDSS). Subsequently, four cancer diagnostic datasets: liver cancer, breast cancer, brain cancer, and cervical cancer, are utilized to test the performance of the eMTCP model. The most promising algorithm is eMTCP (stacked ensemble), which yields the highest F1-scores on all datasets: breast (0.979), liver (0.765), brain (0.898), and cervical (0.482), illustrating better performance in multi-type cancer prediction. CNN and LSTM achieved high stability with superior F1-scores of 0.957 (breast), 0.669 (liver), and 0.853 (brain). CNN outperformed in the case of liver cancer and showed similar performance in other cancer types. The SVM ML model has the lowest scores of all: 0.969 (breast), 0.716 (liver), 0.844 (brain), and only 0.153 (cervical), indicating that its representation in the clinical-only dataset is not very reliable.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Soft Computing and Data Mining

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.









