Hierarchical clustering of mixed variable panel data based on new distance

Akay, ÖZLEM; Yüksel, GÜZİN

doi:10.1080/03610918.2019.1588306

Hierarchical clustering of mixed variable panel data based on new distance

Atıf İçin Kopyala

Akay Ö., Yüksel G.

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2019 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası:
Basım Tarihi: 2019
Doi Numarası: 10.1080/03610918.2019.1588306
Dergi Adı: COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Çukurova Üniversitesi Adresli: Evet

Özet

One of the important aspects of panel data is the poolability of different units in the data set. However, considering that regression parameters (coefficients) are homogeneous across different units, it is normal for the units to be pooled. Clustering may occur in the panel data to solve the problem. In this study, we suggest a new distance for a mixed variable panel data set containing invariant time binary variable, without performing variable conversion to avoid information loss. In this approach, the mixed variable panel data set is divided into pure categorical data and pure numerical data sets. Then, distance measures are calculated using simple matching for the categorical data set and using distance that normalizes all variables in the numerical data set. After distance measures of each data set are combined, the new distance measure is integrated into the agglomerative hierarchical clustering algorithms. The experimental analysis was exemplified by the real data groups using STATA and R software package. The performance of proposed distance is compared with the Gower and K-prototype distances by cluster validation methods. The experimental results demonstrated that the new approach we suggest here provides better clustering results than the Gower and K-prototype approaches.