A Comparative Analysis of Training from Scratch, ImageNet-Based Transfer Learning, and In-Domain Self-Supervised Learning for Plant Disease Classification under Low-Label Regimes


Gökten A., Dönmez H. B., Tekeli E., Toklu F.

PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES, cilt.63, sa.2, ss.429-438, 2026 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 63 Sayı: 2
  • Basım Tarihi: 2026
  • Doi Numarası: 10.21162/pakjas/26.431
  • Dergi Adı: PAKISTAN JOURNAL OF AGRICULTURAL SCIENCES
  • Derginin Tarandığı İndeksler: Academic Search Ultimate (EBSCO), Biomedical Reference Collection: Corporate Edition (EBSCO), Scopus, Science Citation Index Expanded (SCI-EXPANDED), CAB Abstracts
  • Sayfa Sayıları: ss.429-438
  • Çukurova Üniversitesi Adresli: Evet

Özet

In this study, three representation learning paradigms (scratch supervised training, ImageNet pre-trained transfer learning, and SimCLR-based in-domain self-supervised learning [SSL]) are systematically compared within the framework of linear probing (LP) and full fine-tuning (FT) protocols. Experiments were conducted on a nine-class subset of the PlantVillage dataset comprising maize, potato and pepper leaf images, at label ratios of 1%, 10%, and 100%, using three independent randomised seeds. Performance evaluation utilised accuracy, Macro-F1, and Weighted-F1 metrics. The results demonstrate that, within the frozen backbone protocol, in-domain SSL representations provide a clear advantage over ImageNet transfer and scratch training across all label budgets. Notably, at a 1% label budget, SSL achieved an absolute difference of 34.5 points in accuracy and 34.7 points in Macro-F1 compared to ImageNet. When full fine-tuning was applied, the performance gap between the paradigms narrowed significantly; however, the fact that SSL delivered performance close to that of fully labelled training from scratch with a 10% label budget demonstrates that the need for and cost of expert labelling can be substantially reduced. These findings indicate that in-domain contrastive pre-training offers an effective and scalable alternative in low-labelled agricultural scenarios, particularly in regions where expert access is limited. As the experiments were conducted on controlled environment data, further validation is required to confirm the generalisability of these findings to real-world field conditions. Keywords: Smart agriculture, label efficiency, SimCLR, PlantVillage, contrastive learning