ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, cilt.20, sa.4, 2021 (SCI-Expanded)
Online Social Networks (OSNs) are very popular platforms for social interaction. Data posted publicly over OSNs pose various threats against the individual privacy of OSN users. Adversaries can try to predict private attribute values, such as gender, as well as links/connections. Quantifying an adversary's capacity in inferring the gender of an OSN user is an important first step towards privacy protection. Numerous studies have been made on the problem of predicting the gender of an author/user, especially in the context of the English language. Conversely, studies in this field are quite limited for the Turkish language and specifically in the domain of OSNs. Previous studies for gender prediction of Turkish OSN users have mostly been performed by using the content of tweets and Facebook comments. In this article, we propose using various features, not just user comments, for the gender prediction problem over the Facebook OSN. Unlike existing studies, we exploited features extracted from profile, wall content, and network structure, as well as wall interactions of the user. Therefore, our study differs from the existing work in the broadness of the features considered, machine learning and deep learning methods applied, and the size of the OSN dataset used in the experimental evaluation. Our results indicate that basic profile information provides better results; moreover, using this information together with wall interactions improves prediction quality. We measured the best accuracy value as 0.982, which was obtained by combining profile data and wall interactions of Turkish OSN users. In the wall interactions model, we introduced 34 different features that provide better results than the existing content-based studies for Turkish.