A novel GAM-based method for predicting ¤ phosphate (PO4) concentrations in irrigated drainage/return-flow systems


Can M. E.

PeerJ, cilt.14, 2026 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 14
  • Basım Tarihi: 2026
  • Doi Numarası: 10.7717/peerj.21181
  • Dergi Adı: PeerJ
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, CAB Abstracts, EMBASE, MEDLINE, Directory of Open Access Journals, Natural Science Collection (ProQuest), Biological Science Database (ProQuest)
  • Anahtar Kelimeler: Agricultural pollution, Generalized additive model (GAM), Machine learning, Phosphate prediction, Water quality monitoring
  • Çukurova Üniversitesi Adresli: Evet

Özet

Phosphate (PO4) pollution in irrigated catchments and their return-flow and drainage networks threatens water quality and agricultural sustainability, particularly under conditions of intensive fertilization and shallow groundwater. This study presents a predictive approach to estimate PO4 concentration using a Generalized Additive Model (GAM) based on daily monitoring data from the Akarsu Irrigation District in Türkiye’s Lower Seyhan Plain. Here, the modeled variable is PO4 in irrigation return-flow/drainage water, measured at the main drainage outlet (L4), which integrates excess irrigation water that has passed through the agricultural landscape and collected surface runoff and subsurface drainage. Downstream of L4, drainage water is conveyed by the main drainage channel; part is reused for irrigation, and the remainder flows toward lagoon and wetland areas and ultimately the Mediterranean Sea. The dataset comprised 522 daily observations from the 2022–2023 water years and included nitrate (NO3), nitrite (NO2), electrical conductivity (EC), pH (hydrogen ion activity), flow rate (Q), and precipitation (P) as predictors. Despite weak pairwise correlations of PO4 with individual variables (maximum r = 0.1293 with NO3), the GAM captured nonlinear multivariate relationships and produced good agreement between predicted and measured PO4 at the L4 outlet (mean squared error (MSE) = 0.019966; root mean squared error (RMSE) = 0.1413 mg L−1; mean error = −0.00457 mg L−1; error SD = 0.14136 mg L−1), indicating minimal bias and stable performance. In benchmark comparisons using identical inputs and the same time-structured validation design (80/20 split; random splits were used only for sensitivity analysis), the GAM substantially outperformed linear regression (LR), artificial neural network (ANN), and support vector machine (SVM), which showed very low predictive skill (R2 ≈ 0.03–0.05). Predictive performance was evaluated primarily using error-based metrics; R2 was reported only as a goodness-of-fit measure. The L4 outlet drains an intensively managed agricultural catchment dominated by irrigated cropland. Model fit, expressed as explained variance values (training R2 = 0.832; testing R2 = 0.788), indicated consistent performance without evidence of substantial overfitting. Overall, the findings demonstrate that GAM-based estimation can reliably reproduce both peak and moderate PO4 concentrations and serve as a practical screening tool for nutrient monitoring at irrigated drainage/return-flow outlets. By leveraging routinely monitored variables, the model can reduce the frequency of laboratory PO4 assays—often requiring additional reagents, consumables, and handling time—thereby lowering analytical workload and spectrophotometric operating time while enabling near-real-time assessment of PO4 dynamics. These results support the use of data-driven estimation to inform nutrient management and reduce eutrophication risk in irrigated catchments by monitoring drainage exports. Background. Phosphate pollution in irrigation areas, particularly in regions with shallow groundwater and intensive agriculture, poses serious environmental and agricultural risks, including eutrophication and water-quality degradation. Conventional methods for phosphate monitoring are often time-consuming, costly, and spatially limited, making them unsuitable for real-time applications. Furthermore, the complex, nonlinear interactions between phosphate concentrations and environmental variables, including nitrate, nitrite, pH, EC, flow rate, and precipitation, challenge traditional predictive approaches. While various machine learning models have been explored for phosphorus prediction, their computational demands and overfitting risks often limit their field-level applicability. Therefore, this study aimed to develop a robust, efficient, and interpretable method for predicting phosphate concentrations using a GAM and leveraging daily environmental data collected in a Mediterranean irrigation district in Türkiye. Methods. Daily water samples were collected at the outlet of the L4 agricultural catchment in the Akarsu Irrigation District (AID) on the Lower Seyhan Plain, Türkiye, during the 2022 and 2023 water years. The area is characterized by intensively managed irrigated cropland and shallow groundwater conditions. A total of 522 daily observations were compiled, including PO4, NO3, NO2, EC, pH, flow rate (Q), and precipitation (P). Laboratory analyses were performed using spectrophotometric methods for nutrients and electrochemical measurements of EC and pH, while discharge data were obtained from an on-site automatic monitoring and sampling system. A GAM was developed to represent nonlinear relationships between PO4 and the predictor variables using penalized smoothing functions. Because the dataset is a daily time series, temporal dependence was addressed by including a smoother for time (date/time index) and by fitting the model with an AR(1) residual correlation structure (GAMM). To ensure realistic model evaluation under temporal dependence, predictive skill was assessed primarily using a time-structured (blocked, contiguous) 80/20 split, with the earlier 80% of observations used for training and the later 20% for testing. To assess robustness to the choice of partition (sensitivity analysis only), we additionally repeated the split–fit–evaluate procedure over 100 independent randomized 80/20 splits. These random-split results are reported as a secondary check and are not interpreted as the main estimate of predictive skill under autocorrelation. Model predictive performance was primarily assessed using error-based metrics (MSE, RMSE, bias/mean error, and error SD), while R2 was reported only as explained variance (goodness-of-fit). Residual diagnostics, including inspection of the residual distribution and autocorrelation (ACF), were used to evaluate model assumptions, stability, and potential overfitting. Results. This study developed a data-driven method for estimating PO4 concentrations at the L4 drainage outlet using a Generalized Additive Model. Although same-day Pearson correlations between PO4 and routinely monitored predictors (EC, pH, Q, P, NO2, NO3) were weak (maximum r = 0.1293 for NO3), the GAM captured nonlinear and conditional multivariate effects. It demonstrated strong agreement between predicted and measured PO4 values. Model performance was evaluated primarily using error-based metrics, yielding MSE = 0.019966; RMSE = 0.1413 mg L−1; mean error (bias) = −0.00457 mg L−1; and error SD = 0.14136 mg L−1. R2 was reported only as explained variance (goodness-of-fit): training R2 = 0.8319; testing R2 = 0.7875. Because the dataset is a daily time series, temporal dependence was addressed by fitting a GAMM with a smooth function of time and an AR(1) residual structure; and generalization was assessed using a time-structured (blocked or contiguous) train–test split to reduce information leakage from autocorrelation. Repeated random 80/20 splits were used only as a sensitivity analysis and showed consistent performance (mean R2 = 0.772, SD = 0.0166 across 100 trials). In benchmark comparisons, the GAM substantially outperformed traditional alternatives (LR, ANN, SVM), which showed very low predictive skill for PO4 (R2 ≈ 0.03–0.05), highlighting the need for a flexible nonlinear structure to reproduce the observed phosphate dynamics. The model reproduced the overall temporal pattern of PO4, while some underestimation remained for the highest short-duration peaks—consistent with the sparse nature of extreme events in the dataset. Overall, the results support the use of the proposed GAM/GAMM framework as an outlet-scale screening tool for near-real-time identification of periods with elevated PO4, thereby helping to prioritize laboratory sampling and monitoring efforts when direct PO4 measurements are costly or intermittent.