Fuzzy C-Means based DNA motif discovery


KARABULUT M., İBRİKÇİ T.

4th International Conference on Intelligent Computing, Shanghai, China, 15 - 18 September 2008, vol.5226, pp.189-190 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 5226
  • Doi Number: 10.1007/978-3-540-87442-3_24
  • City: Shanghai
  • Country: China
  • Page Numbers: pp.189-190

Abstract

In this paper, we examined the problem of identifying motifs in DNA sequences. Transcription-binding sites, which are functionally significant sub-sequences, are considered as motifs. In order to reveal such DNA motifs, our method makes use of Fuzzy clustering of Position Weight Matrix. The Fuzzy C-Means (FCM) algorithm clearly predicted known motifs that existed in intergenic regions of GAL4, CBF1 and GCN4 DNA sequences. This paper also provides a comparison of FCM with some clustering methods such as Self-Organizing Map and K-Means. The results of the FCM algorithm is compared to the results of popular motif discovery tool Multiple Expectation Maximization for Motif Elicitation (MEME) as well. We conclude that soft-clustering-based machine learning methods such as FCM are useful to finding patterns in biological sequences.