Clustering Analysis Using Ensemble Methods in Machine Learning

Tekeli E.

I. Uluslararası Uygulamalı İstatistik Kongresi, Tokat, Türkiye, 1 - 04 Ekim 2020, ss.1

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Şehir: Tokat
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.1


In recent years, multiple classification systems also called Ensemble systems, have gained increasing interest in the computational intelligence and machine learning community. This interest has been deserved as the Ensemble systems have proved to be very effective and extremely versatile in a wide variety of problem areas and real-world applications. Initially, Ensemble Systems developed to reduce variance and thus improve the accuracy of an automated decision-making system that has been successfully used to address a variety of machine learning problems such as feature selection, confidence prediction, missing feature, incremental learning, error correction, and imbalanced class data. This study provides an overview of ensembled systems, their features, and how they can be applied to such a wide range of applications. In this study, the computer application of bagged clustering methods, which is an ensembled method, has been done by using the e1071 package in the R programming language. For this purpose, models were created and analyzed using the Bagged clustering method that uses k-means as a community method for a real data set. In order to compare with the bagged clustering method, the data set was clustered with 3 different hierarchical methods in addition to the k-means method. Clusters were estimated with each method and cluster performance was measured with 3 different criteria. Although the bagged clustering method is the best clustering method according to the Rand index and classification rate criteria, the average linked hierarchical method has shown the best performance according to the silhouette criterion. Moreover, the study was supported by a Monte Carlo simulation.