Predicting COVID-19 Infection Using Machine Learning Methods Combined with Feature Selection

Çetin U. A., Abut F.

5th International Symposium on Innovative Approaches in Smart Technologies, Ankara, Türkiye, 28 - 29 Mayıs 2022, ss.38, (Özet Bildiri)

Yayın Türü: Bildiri / Özet Bildiri
Basıldığı Şehir: Ankara
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.38
Çukurova Üniversitesi Adresli: Evet

Özet

COVID-19 is an infection that has affected the world since December 31, 2019, and was declared a pandemic by WHO in March 2020. The COVID-19 pandemic has infected 465 million people and claimed more than 6 million lives. In this study, Multi-Layer Perceptron (MLP), Tree Boost (TB), Radial Basis Function Network (RBF), Support Vector Machine (SVM), and K-Means Clustering (kMC) individually combined with minimum redundancy maximum relevance (mRMR) and Relief-F have been used to construct new feature selection-based COVID-19 prediction models and discern the influential variables for prediction of COVID-19 infection. The dataset has information related to 20.000 patients (i.e., 10.000 positives, 10.000 negatives) and includes several variables, including age, sex, race, pregnancy, fever, breathing difficulty, cough, runny nose, throat pain, diarrhea, headache, lung comorbidity, cardio comorbidity, renal comorbidity, diabetes comorbidity, smoking comorbidity, and obesity comorbidity. The accuracy, recall, and F1-Score metrics have been used to assess the models’ performance, whereas the generalization errors of the models were evaluated using 10-fold cross-validation. The results show that the average performance of mRMR is slightly better than Relief-F in predicting the COVID-19 infection of a patient. In addition, mRMR is more successful than the Relief-F algorithm in finding the relative relevance order of the COVID-19 predictors. The mRMR algorithm emphasizes symptomatic variables such as fever and cough, whereas the Relief-F algorithm emphasizes non-symptomatic variables such as age and race. It has also been observed that, in general, MLP outperforms all other ML classifiers for predicting the COVID-19 infection.