Deep Learning-Based Prediction Models for the Detection of Vitamin D Deficiency and 25-Hydroxyvitamin D Levels Using Complete Blood Count Tests


Eşsiz U. E., Aci Ç. İ., Saraç E., Aci M.

Romanian Journal of Information Science and Technology, vol.27, no.3-4, pp.295-309, 2024 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 27 Issue: 3-4
  • Publication Date: 2024
  • Doi Number: 10.59277/romjist.2024.3-4.04
  • Journal Name: Romanian Journal of Information Science and Technology
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.295-309
  • Keywords: 25(OH)D level, classification, deep learning, feature selection, prediction, vitamin D deficiency
  • Çukurova University Affiliated: Yes

Abstract

Vitamin D (VitD) is an essential nutrient that is critical for the well-being of both adults and children, and its deficiency is recognized as a precursor to several diseases. In previous studies, researchers have approached the problem of detecting vitamin D deficiency (VDD) as a single ”sufficient/deficient” classification problem using machine learning or statistics-based methods. The main objective of this paper is to predict a patient’s VitD status (i.e., sufficiency, insufficiency, or deficiency), severity of VDD (i.e., mild, moderate, or severe), and 25-hydroxyvitamin D (25(OH)D) level in a separate deep learning (DL)-based models. An original dataset consisting of complete blood count (CBC) tests from 907 patients, including 25(OH)D concentrations, collected from a public health laboratory was used for this purpose. CNN, RNN, LSTM, GRU and Auto-encoder algorithms were used to develop DL-based models. The top 25 features in the CBC tests were carefully selected by implementing the Extra Trees Classifier and Multi-task LASSO feature selection algorithms. The performance of the models was evaluated using metrics such as accuracy, F1-score, mean absolute error, root mean square error and R-squared. Remarkably, all three models showed satisfactory results when compared to the existing literature; however, the CNN-based prediction models proved to be the most successful.