INTERNATIONAL JOURNAL OF INFORMATION SECURITY SCIENCE, cilt.10, sa.1, ss.1-15, 2021 (Hakemli Dergi)
Abstract—User profile matching (i.e., user cross-referencing, user identification) aims to find accounts that belong to the same users over different websites or online social networks (OSNs). Solving this problem can be useful for many operations and functionalities such as friend recommendation and link prediction across different OSNs. Additionally, identifying users across different OSNs may enable an adversary to aggregate incomplete information of users. Hereby, an adversary can extract and use online footprint of users to violate their privacy and security via putting them into threats such as identity theft, online stalking, and blackmailing among many others. Usernames are indispensable elements of all websites that require user registration. Even though usernames are generally short strings, they potentially reflect users’ characteristics and habits such as the political sense of belonging, hometown, and so on. In this study, we make an effort to match users of distinct OSNs relying only on their usernames. We use two different approaches based on machine learning and vector-based username similarity to build our learning function. We also explore different feature spaces from the literature and further investigate which approach produces better results. We conducted our experiments on a real-world username data set that is extracted from the OSN accounts of Turkish users we crawled in our previous work. Our results show that building learning function by binary classification outperforms the similarity approach and it achieves the best F-score of 0.921 without feature selection and extension.