A New Large Language Model for Attribute Extraction in E-Commerce Product Categorization

Çiftlikçi, Mehmet; Çakmak, Yusuf; Kalaycı, Tolga; ABUT, FATİH; AKAY, MEHMET; Kızıldağ, Mehmet

doi:10.3390/electronics14101930

A New Large Language Model for Attribute Extraction in E-Commerce Product Categorization

Çiftlikçi M. S., Çakmak Y., Kalaycı T. A., ABUT F., AKAY M. F., Kızıldağ M.

Electronics (Switzerland), cilt.14, sa.10, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14 Sayı: 10
Basım Tarihi: 2025
Doi Numarası: 10.3390/electronics14101930
Dergi Adı: Electronics (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
Anahtar Kelimeler: attribute extraction, deep learning, e-commerce, large language models, named entity recognition, product catalog
Çukurova Üniversitesi Adresli: Evet

Özet

In the rapidly evolving field of e-commerce, precise and efficient attribute extraction from product descriptions is crucial for enhancing search functionality, improving customer experience, and streamlining the listing process for sellers. This study proposes a large language model (LLM)-based approach for automated attribute extraction on Trendyol’s e-commerce platform. For comparison purposes, a deep learning (DL) model is also developed, leveraging a transformer-based architecture to efficiently identify explicit attributes. In contrast, the LLM, built on the Mistral architecture, demonstrates superior contextual understanding, enabling the extraction of both explicit and implicit attributes from unstructured text. The models are evaluated on an extensive dataset derived from Trendyol’s Turkish-language product catalog, using performance metrics such as precision, recall, and F1-score. Results indicate that the proposed LLM outperforms the DL model across most metrics, demonstrating superiority not only in direct single-model comparisons but also in average performance across all evaluated categories. This advantage is particularly evident in handling complex linguistic structures and diverse product descriptions. The system has been integrated into Trendyol’s platform with a scalable backend infrastructure, employing Kubernetes and Nvidia Triton Inference Server for efficient bulk processing and real-time attribute suggestions during the product listing process. This study not only advances attribute extraction for Turkish-language e-commerce but also provides a scalable and efficient NLP-based solution applicable to large-scale marketplaces. The findings offer critical insights into the trade-offs between accuracy and computational efficiency in large-scale multilingual NLP applications, contributing to the broader field of automated product classification and information retrieval in e-commerce ecosystems.