A Fast Algorithm to Initialize Cluster Centroids in Fuzzy Clustering Applications


CEBECİ Z., CEBECİ Ç.

Information, cilt.11, sa.9, ss.446, 2020 (ESCI) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 11 Sayı: 9
  • Basım Tarihi: 2020
  • Doi Numarası: 10.3390/info11090446
  • Dergi Adı: Information
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, Aerospace Database, Communication Abstracts, Compendex, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.446
  • Anahtar Kelimeler: prototype-based clustering, partitioning, fuzzy clustering, soft clustering, initialization of centroids, FCM, C-MEANS, VALIDITY INDEX
  • Çukurova Üniversitesi Adresli: Evet

Özet

The goal of partitioning clustering analysis is to divide a dataset into a predetermined number of homogeneous clusters. The quality of final clusters from a prototype-based partitioning algorithm is highly affected by the initially chosen centroids. In this paper, we propose the InoFrep, a novel data-dependent initialization algorithm for improving computational efficiency and robustness in prototype-based hard and fuzzy clustering. The InoFrep is a single-pass algorithm using the frequency polygon data of the feature with the highest peaks count in a dataset. By using the Fuzzy C-means (FCM) clustering algorithm, we empirically compare the performance of the InoFrep on one synthetic and six real datasets to those of two common initialization methods: Random sampling of data points and K-means++. Our results show that the InoFrep algorithm significantly reduces the number of iterations and the computing time required by the FCM algorithm. Additionally, it can be applied to multidimensional large datasets because of its shorter initialization time and independence from dimensionality due to working with only one feature with the highest number of peaks.