PAKISTAN JOURNAL OF STATISTICS, cilt.28, sa.1, ss.141-158, 2012 (SCI-Expanded)
In cluster analysis, identifying the number of clusters in a dataset is one of the most important problems. Although there are many methods have been proposed for this manner, unfortunately there is no generally accepted procedure. Many previously offered approaches or algorithms to get over this problem either require initial values of parameters or used with some predefined clustering techniques that need complicated calculations. In this paper, a new method is developed for choosing the number of clusters based on representative values. The proposed method is easy and is computationally efficient and straightforward to estimate the number of clusters. Our proposed method to estimate the number of clusters can be called as the sequential minimum difference method. We show its effectiveness for choosing the number of clusters on some well known real datasets in cluster analysis under the assumption of non nested cluster structure and nested cluster structure cases.