Clustering the mixed panel dataset using Gower's distance and k-prototypes algorithms


AKAY Ö., YÜKSEL G.

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, vol.47, no.10, pp.3031-3041, 2018 (SCI-Expanded) identifier identifier

Abstract

Panel datasets have been increasingly used in economics to analyze complex economic phenomena. Panel data is a two-dimensional array that combines cross-sectional and time series data. Through constructing a panel data matrix, the clustering method is applied to panel data analysis. This method solves the heterogeneity question of the dependent variable, which belongs to panel data, before the analysis. Clustering is a widely used statistical tool in determining subsets in a given dataset. In this article, we present that the mixed panel dataset is clustered by agglomerative hierarchical algorithms based on Gower's distance and by k-prototypes. The performance of these algorithms has been studied on panel data with mixed numerical and categorical features. The effectiveness of these algorithms is compared by using cluster accuracy. An experimental analysis is illustrated on a real dataset using Stata and R package software.