Improving Web Page Classification with two Novel Approaches on Semi Supervised Learning

Ünal H. E., Özel S. A.

İzmir International Conference on Technology and Social Sciences IICTSS 2022, İzmir, Türkiye, 17 - 19 Ağustos 2022, ss.25, (Özet Bildiri)

Yayın Türü: Bildiri / Özet Bildiri
Basıldığı Şehir: İzmir
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.25
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Çukurova Üniversitesi Adresli: Evet

Özet

The amount of information on the Web is increasing tremendously every second and most of this information is in unlabelled form. There is always need for effective approaches to drive useful information from this extensive amount of unlabelled data. In our study, two novel semi supervised learning methods are proposed and the results of these methods are compared with the Co-Training and the Iterative Cross-Training methods from the literature. In the first proposed method (Incremental Parallel Training with Cross-Validation) the classifiers work in parallel and a validation rule is applied in order to enlarge the labelled set. On the other hand, in the second approach (Incremental Serial Training) three classifiers are combined and unlabeled examples are serially used to form a labeled set. The experiments are done on nine binary classification datasets which are publicly available WebKB, Banksearch, and the individually collected Conference datasets. Statistical analysis of the results is performed by using SPSS. According to these analyses it is observed that the performance of the two proposed methods are very high, especially the Incremental Parallel Training with Cross-Validation method has the highest classification performance among all methods.