Fuzzy C-Means based DNA motif discovery


KARABULUT M., İBRİKÇİ T.

4th International Conference on Intelligent Computing, Shanghai, Çin, 15 - 18 Eylül 2008, cilt.5226, ss.189-190 identifier identifier

  • Cilt numarası: 5226
  • Doi Numarası: 10.1007/978-3-540-87442-3_24
  • Basıldığı Şehir: Shanghai
  • Basıldığı Ülke: Çin
  • Sayfa Sayıları: ss.189-190

Özet

In this paper, we examined the problem of identifying motifs in DNA sequences. Transcription-binding sites, which are functionally significant sub-sequences, are considered as motifs. In order to reveal such DNA motifs, our method makes use of Fuzzy clustering of Position Weight Matrix. The Fuzzy C-Means (FCM) algorithm clearly predicted known motifs that existed in intergenic regions of GAL4, CBF1 and GCN4 DNA sequences. This paper also provides a comparison of FCM with some clustering methods such as Self-Organizing Map and K-Means. The results of the FCM algorithm is compared to the results of popular motif discovery tool Multiple Expectation Maximization for Motif Elicitation (MEME) as well. We conclude that soft-clustering-based machine learning methods such as FCM are useful to finding patterns in biological sequences.