Identifying Effective Variables Using Mutual Information and Building Predictive Models of Sulfur Dioxide Concentration with Support Vector Machines
MetadataShow full item record
Sulfur dioxide (SO2) is an issue of increasing public concern due to its recognized adverse effects on human health. Therefore, accurate SO2 prediction models are very important tools in developing public warning strategies. The goal of this study is to identify the relevance of meteorological and air pollutant variables using a classical and widely used measure of dependence, Shannon's Mutual Information (MI), and to build an accurate SO, prediction model using the relevant variables as inputs. Specifically, features ranked by MI measure are tested on how much joint predictive power they have of the target using a popular machine learning tool, support vector machines (SVM), and in comparison to multilayer perceptron (MLP), which is the most commonly used machine learning tool in previous studies for the prediction and analysis of air pollutants. It was found that the SVM model gave a higher correlation coefficient (r) and less root mean squared error (RMSE) than MLP for both test and validation sets. The predictive model used 6 input variables for both data sets as the relevant features for maximum SO, concentration prediction at time t+1, which are the average SO,, maximum SO2, outdoor temperature (OT), average nitrogen dioxide (NO2), average ozone (O-3), and average wind speed at time t. The results of this study indicate that MI can be used efficiently in determining the importance of input variables in the prediction of SO2 concentration and SVM is a popular machine learning tool well suited for use in air pollution modeling.