Abstract:
Air pollution, especially nitrogen dioxide (NO₂), poses significant risks to public health
and environmental quality. Traditional statistical models for predicting NO₂ levels often
fail to capture the complex, nonlinear relationships in environmental data and typically
do not provide uncertainty estimates. This study addresses these shortcomings by
utilizing advanced machine learning techniques, specifically Gaussian Process
Regression (GPR), to enhance the accuracy and reliability of NO₂ predictions. Using a
comprehensive dataset of hourly averaged responses from chemical sensors in an urban
area in Italy, we developed and evaluated GPR models with various kernels, including
polynomial-like, rational quadratic, and combined kernels. These models were
compared with traditional regression models, such as Lasso and Ridge regression. The
results showed that GPR models, particularly those with optimized polynomial-like and
rational quadratic kernels, significantly outperformed the traditional models. The
polynomial-like kernel GPR model achieved a Mean Squared Error (MSE) of 0.034 and
an R-squared (R²) value of 0.959. Similarly, the rational quadratic kernel GPR model
achieved an MSE of 0.035 and an R² of 0.959. In contrast, Lasso and Ridge regression
models had higher MSEs and lower R² values. Additionally, the GPR models provided
valuable uncertainty estimates, enhancing prediction reliability. This study
demonstrates the effectiveness of GPR models in environmental monitoring and
underscores the importance of kernel optimization in improving model performance,
suggesting substantial potential for GPR in air quality prediction and environmental
management.