4th International Conference on Intelligent Computing, Shanghai, Çin, 15 - 18 Eylül 2008, cilt.5226, ss.189-190
In this paper, we examined the problem of identifying motifs in DNA sequences. Transcription-binding sites, which are functionally significant sub-sequences, are considered as motifs. In order to reveal such DNA motifs, our method makes use of Fuzzy clustering of Position Weight Matrix. The Fuzzy C-Means (FCM) algorithm clearly predicted known motifs that existed in intergenic regions of GAL4, CBF1 and GCN4 DNA sequences. This paper also provides a comparison of FCM with some clustering methods such as Self-Organizing Map and K-Means. The results of the FCM algorithm is compared to the results of popular motif discovery tool Multiple Expectation Maximization for Motif Elicitation (MEME) as well. We conclude that soft-clustering-based machine learning methods such as FCM are useful to finding patterns in biological sequences.