Predicting COVID-19 Infection Using Machine Learning Methods Combined with Feature Selection

Çetin U. A., Abut F.

Avrupa Bilim ve Teknoloji Dergisi, vol.37, pp.52-58, 2022 (Peer-Reviewed Journal)


COVID-19 is an infection that has affected the world since December 31, 2019, and was declared a pandemic by WHO in March 2020. In this study, Multi-Layer Perceptron (MLP), Tree Boost (TB), Radial Basis Function Network (RBF), Support Vector Machine (SVM), and K-Means Clustering (kMC) individually combined with minimum redundancy maximum relevance (mRMR) and Relief-F have been used to construct new feature selection-based COVID-19 prediction models and discern the influential variables for prediction of COVID-19 infection. The dataset has information related to 20.000 patients (i.e., 10.000 positives, 10.000 negatives) and includes several personal, symptomatic, and non-symptomatic variables. The accuracy, recall, and F1-score metrics have been used to assess the models’ performance, whereas the generalization errors of the models were evaluated using 10-fold cross-validation. The results show that the average performance of mRMR is slightly better than Relief-F in predicting the COVID-19 infection of a patient. In addition, mRMR is more successful than the Relief-F algorithm in finding the relative relevance order of the COVID-19 predictors. The mRMR algorithm emphasizes symptomatic variables such as fever and cough, whereas the Relief-F algorithm highlights non-symptomatic variables such as age and race. It has also been observed that, in general, MLP outperforms all other classifiers for predicting the COVID-19 infection.