ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, cilt.48, sa.8, ss.10457-10477, 2023 (SCI-Expanded)
There are plenty of unlabeled data in different domains, and effective ways that apply machine learning techniques are in dire need to be found for the ability to use them efficiently. Semi-supervised learning methods are utilized to extract useful information from these unlabeled data. In our study, the Incremental Parallel Training with Cross-Validation (IPT-CV) method is proposed as a novel semi-supervised learning method. This proposed method employs several classifiers and different views of the datasets to label the unlabeled data in an efficient manner. The classifiers used in the algorithm work in parallel each round and enlarge the labeled set according to a validation rule. The method was compared with two well-known SSL methods in the literature. The web was chosen as the domain of the experiments, since it is a land of unlabeled files. Nine binary classification datasets were used from the publicly available WebKB, Banksearch, and the individually collected Conference datasets. The results were statistically analyzed, and according to these analyses, the proposed IPT-CV method showed the highest classification accuracy among all of the methods that were examined.