Professional Documents
Culture Documents
B-13 (DD)
SPECULATING DAILY MAXIMUM
CARBON MONOXIDE (CO) LEVEL
Submitted By –
Kumar Parth (17803017)
SPECULATING DAILY MAXIMUM CARBON MONOXIDE (CO) LEVEL
Objective
Considering the increasing pollution levels in the city and its harmful effects on kid’s health, in this study
we wish to predict Carbon monoxide levels given the various sensor values. If CO levels are within 2ppm to
9ppm then it is considered to be tolerable.
Forecasting Description
To forecast the daily maximum Carbon Monoxide (CO) level for next one week (5th April 2005 to 11th
April 2005) by using data of various air pollutants including CO from 10th March 2004 to 4th April 2005.
Data Description
The dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical
sensors embedded in an Air Quality Chemical Multi sensor Device. Data were recorded from 10th March
2004 to 4th April 2005 (one year). Ground Truth hourly averaged concentrations for CO, NonMetallic
Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) and were provided by
a co-located reference certified analyzer.
Attribute Information
0 Date (DD/MM/YYYY)
1 Time (HH.MM.SS)
2 True hourly averaged concentration CO in mg/m^3 (reference analyzer)
3 PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted)
4 True hourly averaged overall Non Metallic Hydro Carbons concentration in micro g/m^3 (reference
analyzer)
5 True hourly averaged Benzene concentration in micro g/m^3 (reference analyzer)
6 PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted)
7 True hourly averaged NOx concentration in ppb (reference analyzer)
8 PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted)
1
9 True hourly averaged NO2 concentration in micro g/m^3 (reference analyzer)
10 PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted)
11 PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted)
12 Temperature in °C
13 Relative Humidity (%)
14 AH Absolute Humidity Group
Key Characteristics
Data was found with missing values which were visible as “-200”. Data had monthly seasonality and was
also changing as per the days of the week, which could be because of the varying number of automobiles
(emitting air pollutants) on weekdays and weekends.
Plot of CO vs Time
X-axis -> Days of the year (Ex. 1st day is 5th April’04 and vice-versa)
Y-axis – Concentration of CO in PPM
2
This suggests a seasonality of CO w.r.t. days of the year to compensate that we will introduce dummy
variables
X4 = 1 if days of the year are between 200 to 300
= 0 otherwise
This suggests a seasonality of CO w.r.t. days of the week. to compensate that we will introduce dummy
Variable
X5 = 1 if Monday, Tuesday, Saturday and Sunday
= 0 otherwise
3
Input Variables:
Linear correlation coefficients computed among analyzed species using on field recorded data
r 0.98
NMHC-C6H6
r 0.78
CO-NOx
r 0.67
CO-NO2
r 0.72
C6H6-NOx
r 0.60
C6H6-NO2
r 0.76
NOx-NO2
r 0.90
CO-C6H6
As regard as benzene-NMHC coefficient, it should be noted that it has been computed using only the first 8
days of measurements, after which the NMHC targeted analyzer went out of service.
After checking different available variables we decided that the following variables can affect the CO
levels:
Regressors:
• Daily maximum C6H6 (lag 7)
• Daily maximum T (lag 7)
• Daily maximum AH (lag 7)
• Monthly dummy variables
• Weekly dummy variable
FAQ:
4
Multiple Regression Analysis
Y = Xβ + ε (Model)
Full analysis:
1. Coefficient table
2 ANOVA
5
Residue Analysis:
Normal probability plot of the residual: This is a graph designed so that the cumulative normal distribution
will plot as a straight line. Let t[1] < t[2] < . . . < t[n] be the externally studentized residuals ranked in
increasing order. If we plot t[i] against the cumulative probability Pi = − ( ) i n 1 2 / , i = 1, 2, . . . , n , on the
normal probability plot
Plot of Residuals against the Fitted Values yˆI : plot of the (preferrably the externally studentized residuals,
t i ) versus the corresponding fi tted values yˆi is useful for detecting several common types of model
inadequacies
6
Conclusions:
Y = 2.2 + 0.15 (Max C6H6) – 0.05 (Max T) – 0.02 (Max AH) + 0.31 (Monthly dummy) + 0.16
(Weekly dummy)
R2_adjusted = 0.656 => Our model can explain 65% of the variability in the data
Normal probability plot of the residual behaves properly
Plot of Residuals against the Fitted Values yˆI behaves properly too
References:
On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario
S. De Vito a,∗ , E. Massera a, M. Piga b, L. Martinotto b, G. Di Francia a
https://archive.ics.uci.edu/ml/datasets/Air+Quality