Enerji Tüketiminin Makine Öğrenmesi Modelleriyle Tahmininde Değişken Seçim Yöntemlerinin Etkisi


Creative Commons License

Ural N. B., Çetin M.

VI. International Applied Statistics Congress, Ankara, Türkiye, 14 - 16 Mayıs 2025, ss.40, (Özet Bildiri)

  • Yayın Türü: Bildiri / Özet Bildiri
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.40
  • Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
  • Çukurova Üniversitesi Adresli: Evet

Özet

This study aims to examine the impact of variable selection methods on the prediction performance of machine learning models. In multivariate and time-series datasets such as energy consumption, proper variable selection significantly improves model generalization capacity and helps prevent overfitting. For this purpose, a dataset containing temperature, humidity, and energy consumption measurements recorded every 10 minutes over a 4.5-month period was used. The data were collected from sensors placed in various rooms and on the exterior of a household, comprising a total of 19,735 observations and 28 variables. Nine variable selection methods were applied: Correlation-Based Selection, Variance-Based Selection, Forward Selection, Backward Elimination, Stepwise Selection, Genetic Algorithms, Lasso, Ridge, and Robust Feature Selection. Using the variables selected by each method, predictive models were constructed with six machine learning algorithms: Linear Regression, Decision Trees, Random Forests, Support Vector Regression, Principal Component Analysis, and Artificial Neural Networks. The performance of the models was evaluated using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Coefficient of Determination (R²). The results reveal that variable selection methods have a significant impact on model accuracy. In particular, some methods yielded superior results when combined with specific algorithms. This study highlights the critical role of variable selection in machine learning workflows and offers methodological guidance for researchers working with energy data. By providing a comparative analysis of methods and models, the study contributes both to practical modeling strategies and to the broader understanding of how variable selection can influence the performance of predictive systems in complex, real-world datasets.