A Novel Comparative Approach: Logistic Regression Enhanced by Bat Optimization Versus Logistic Regression Enhanced by Deep Belief Network for Remote Homologous Protein Detection


Gemci F., Ibrikci T., ÇEVİK U.

IEEE Access, cilt.13, ss.209723-209728, 2025 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 13
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1109/access.2025.3641298
  • Dergi Adı: IEEE Access
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.209723-209728
  • Anahtar Kelimeler: Bat algorithm, deep belief network, imbalanced data, logistic regression, protein remote homology, smote-tomek
  • Çukurova Üniversitesi Adresli: Evet

Özet

Identifying remote homologous proteins is an important field in computational biology. An experimental study was conducted to find a solution to this using machine learning, and natural language processing algorithms. The SCOP 1.53 dataset, which has 54 families, was used. In this study, two different new designs were developed. As a preprocessing step, some numerical features were obtained from protein sequences using the TF-IDF vectorization method. Then, data augmentation was performed using the SMOTE-Tomek algorithm. The same preprocessing steps were used in the both methods. One of our new methods is a classification study using a two-stage Logistic Regression, and Deep Belief Network (LR-DBN), with an average accuracy of 77%, and with an F1 score of 75%. The other is also a classification study using a Logistic Regression method with Bat optimization (LR-B), with an average accuracy of 84%, and with an F1 score of 86%. LR-B with the SMOTE-Tomek method outperformed with an ROC-AUC score of 89%. Although LR-DBN with the SMOTE-Tomek method slightly performed poorly than LR-B with the SMOTE-Tomek method, it performed well in detecting remote homologous proteins.