Identifying Genes Related to Diabetes Mellitus Using Penalized Logistic Regression

Authors

  • Masithoh Yessi Rochayani Department of Statistics, Universitas Diponegoro, Semarang, 50275, INDONESIA
  • Arief Rachman Hakim Department of Statistics, Universitas Diponegoro, Semarang, 50275, INDONESIA
  • Sugito Department of Statistics, Universitas Diponegoro, Semarang, 50275, INDONESIA

Keywords:

Diabetes Mellitus, gene expression, penalized logistic regression, Lasso

Abstract

Identification of genes associated with Diabetes Mellitus is important for early detection of this disease. This study tried to find some potential genes related to T2DM. The dataset used was GSE25462 and the method used was penalized logistic regression, specifically Lasso. The top eight selected genes were ABRA, EVX1, MIR7-3HG, SAYSD1, SLC26A1, SRGAP3, WFDC1, and 240244_at. The training data reaches the accuracy and kappa of 1 for the model with 8 genes. But, when the model is used for testing data the maximum accuracy is 0.9 and the maximum kappa is 0.615, obtained in models with 14 genes. This happened because the dataset lacked samples of the positive class. The use of ensemble learning methods is recommended to combine predictive results. The role of some genes we found in T2DM remains unclear. Biology researchers can further study the role of these genes in T2DM.

Downloads

Published

03-10-2023

Issue

Section

Articles

How to Cite

Rochayani, M. Y., Hakim, A. R., & Sugito. (2023). Identifying Genes Related to Diabetes Mellitus Using Penalized Logistic Regression. Journal of Soft Computing and Data Mining, 4(2), 35-42. https://penerbit.uthm.edu.my/ojs/index.php/jscdm/article/view/15264