# Geostatistical analysis of Particulate matter concentration: A case study of Pune

MAP INDIA 1919-21 Jan 2010

Sulochana Shekhar Associate Professor, NDA, Pune

Increasing Urbanisation

Increasing Population

Increase in number of vehicles other industrial activities

Increasing Air Pollution Increasing health hazards Need for monitoring the air quality Environment Management GIS - Geostatistics

INDIA - MAHARASHTRA

MAHARASHTRA ² PUNE DISTRICT

**PUNE DISTRICT- PUNE CITY
**

W

To Mum bai

Bh

im

a

ve Ri

Pune City

r

']

Mula Mutha river

NH-4

Scale: 1cm is to 150 km

**Study area ± Pune city
**

PUNE CITY

Kirkee Cant Aundh

Kasba

Kothrud

Pune Cant Hadapsar

On a list of 52 towns and cities ranked on the basis of respirable suspended particulate matter (RSPM) measured in residential areas in 2004, taking annual average concentrations. Delhi comes 16th Kolkata is 21st. Mumbai is 39th but Pune is at the 13th spot.

Warje Bibewadi

Pune was selected as demonstration city for the Urban Air Quality Management project taken up by the USEPA, MoEF and GOI agreement since 2002.

Air Pollution

Air Quality SO2<80µg/m3 NOx<80µg/m3 SPM<200µg/m3 RSPM<100µg/m3

Air quality trends in Pune (source: PMC Environmental Cell)
.

Some of the major effects of particulate pollutants are increased risk of respiratory death in infants less than 1 year. deterioration in rate of lung function development.5 microns) seriously affects health. aggravated asthma and also causes other respiratory symptoms such as cough and bronchitis in children.
. increasing deaths from cardiovascular and respiratory diseases and lung cancer.5 (2.It is equally important to know that Pune city's pollution has been always concerned with concentrations of particulate matter which are 10 microns in size (10-6m) known as PM10 .and are so small that they cant be seen visually but enter into the respiratory system of human beings and affect to a great extent. The much smaller particles in the size range of PM2.

In the last nine months alone.9 lakh vehicles in the city. 14.The city s vehicular population is likely to hit the 19 lakh mark next month. In Pune.10.
.821 are two-wheelers.929. Of the more then 18. over 74 per cent of the total vehicles are twowheelers. there has been an addition over one lakh vehicles. the number of registered vehicles was 18.91. which comes to around 370 vehicles a day on average. Statistics provided by the Regional Transport office (RTO) show that till December 2009. Cars and jeeps account for about 14 per cent of the total vehicles.

.Contrary to popular belief. The tar plants. the growing number of vehicles alone is not responsible for the air pollution in the city. burning of garbage and plastic. resuspended dust from unpaved roadsides and other factors also aggravate air pollution.

.

Methodology
.

There are two main groups of interpolation techniques: Deterministic and Geostatistical. The Inverse Distance Weighted (IDW) and Spline methods are referred to as deterministic interpolation methods.
. Deterministic interpolation techniques create surfaces from measured points on specified mathematical formulas. Geostatistical techniques quantify the spatial autocorrelation among measured points. Geostatistical interpolation techniques utilize the statistical properties of the measured points.

11/28/2010 ss
.Geostatistics
It is a branch of applied statistics that deals with spatially distributed properties Geostatistics was devised to treat problems that arise when conventional statistical theory is used in estimating changes on ore grade within a mine. However now it is applicable to many circumstances in different areas of geology and other natural sciences.

Geostatistics
Regionalized variable Semivariance Semivariogram Kriging
11/28/2010
ss
.

named after D.Kriging
This estimation procedure is called Kriging.G. Kriging estimate requires prior knowledge in the form of a model of the semivariogram or the spatial co variance. a South African mining engineer and pioneer in the application of statistical techniques to mine evaluation. The addition of a statistical model that includes probability separates kriging methods from the deterministic methods. It depends on mathematical and statistical models.Krige.
ss 11/28/2010
.

This is the general formula for interpolators.
.

and the spatial relationships among the measured values around the prediction location. the distance to the prediction location. in Deterministic interpolation depends solely on the distance to the prediction location. Thus. i. in the weight. the weights are based not only on the distance but also depends on spatial autocorrelation .The weight. i.
. depends on a fitted model to the measured points. Kriging method.

. Make the predictions. It is because of these two distinct tasks that it has been said that kriging uses the data twice. To realize these two tasks.Creating a prediction surface map with kriging To make a prediction with the kriging interpolation method. kriging goes through a two-step process: It creates the semivariogram to estimate the statistical dependence (called spatial autocorrelation) values that depend on the model of autocorrelation (fitting a model). the first time to estimate the spatial autocorrelation of the data and the second to make the predictions. two tasks are necessary: Uncover the dependency rules. It predicts the unknown values (making a prediction).

so the difference in their values. then.Z(sj)²)
where var is the variance. are close to each other in terms of the distance measure of d(si. Z(si) . so the difference in their values. si and sj. sj). will become larger. will be small. they become less similar. As si and sj get farther apart.Semivariogram
The semivariogram is defined as:
Y(si.
11/28/2010 ss
.Z(sj). Z(si) Z(sj).sj) = ½ var(Z(si) . If two locations.

.The empirical semivariogram is a graph of the averaged semivariogram values on the y-axis and the distance (or lag) on the x-axis.

11/28/2010 ss
. The variance of the difference increases with distance. sill. and nugget
The semivariogram depicts the spatial autocorrelation of the measured sample points.A typical semivariogram
Understanding a semivariogram Range.

when two locations. where cov is the covariance. si and sj. As si and sj get farther apart. and their covariance (a correlation) will be large. and their covariance becomes zero. sj) = cov(Z(si).
. which shows the anatomy of a typical covariance function. Z(sj)). Covariance is a scaled version of correlation. So. This can be seen in the following figure.Covariance function The covariance function is defined to be: C(si. you expect them to be similar. they become less similar. are close to each other.

ArcGIS provides the following functions from which to choose for modeling the empirical semivariogram: Circular Spherical Exponential Gaussian Linear
11/28/2010
ss
.

11/28/2010
ss
.Making a prediction The first use of the data Uncovering the dependence or autocorrelation in our data The second use of data Make a prediction using the fitted model.

Search radius controls computational speed.
11/28/2010
ss
. The smaller the search radius.Search radius
Establish our search radius or neighborhood by assuming that as the locations get farther from the prediction location. Fixed and Variable search radius. the faster the predictions can be made. the measured values will have less spatial autocorrelation with the unknown value. The specified shape of the neighborhood restricts how far and where to look for the measured values to be used in the prediction.

y = -34.Geostatistical Wizard: Searching Neighborhood dialog box Neighbors to include = 5 Search strategy: circle with four quadrants.1 Coordinates of test point (x = 18.01) Estimated Value= 841.895
.9. Radius = 0.

The autocorrelation between (s) and (s + h) does not depend on the actual location s. as containing the spatial x. The symbol s simply indicates the location.
11/28/2010
ss
. decomposed into a deterministic trend µ(s) and a random. but only the displacement h between the two.Kriging Formula
Z(s) = µ(s) + (s) where Z(s) is the variable of interest. auto correlated errors form (s).(latitude) coordinates. Variations on this formula form the basis for all of the different types of kriging.(longitude) and y.

Types of Kriging
.

global and local. Global techniques calculate predictions using the entire dataset. Geostatistical Analyst in ArcGIS provides Global Polynomial as a global interpolator and Inverse Distance Weighted. Local techniques calculate predictions from the measured points within neighborhoods. and Radial Basis Functions as local interpolators.
.Deterministic interpolation techniques can be divided into two groups. Local Polynomial. which are smaller spatial areas within the larger study area.

To predict a value for any unmeasured location. Those measured values closest to the prediction location will have more influence on the predicted value than those farther away. IDW will use the measured values surrounding the prediction location.
.

However. It is best used for surfaces that change slowly and gradually.Global Polynomial (GP) is a quick deterministic interpolator that is smooth. there is no assessment of prediction errors and it may be too smooth.
.

When the dataset exhibits short-range variation. there is no assessment of prediction errors. Local Polynomial interpolation maps can capture the short-range variation. In this method.
. Local Polynomial interpolation is sensitive to the neighborhood distance.

The functions produce good results for gently varying surfaces such as elevation.They are moderately quick deterministic interpolators that are exact. RBFs are used for calculating smooth surfaces from a large number of data points.
. There is no assessment of prediction errors.

.Geostatistical interpolation techniques
Exploring data Things that are closer together tend to be more alike than things that are farther apart .

A Kriging method in which. the weights of the values sum to unity. It uses an average of a subset of neighboring points to produce a particular interpolation point.
.

and produces a smoother result. the weights of the values do not sum to unity.
. Simple kriging uses the average of the entire dataset.A kriging method in which.

such as a sloping surface. the expected values of the sampled points are modeled as a polynomial trend. Kriging is carried out on the difference between this trend and the values of the sampled points
.A kriging method often used on data with a significant spatial trend. In universal kriging.

the assumptions are difficult to verify. Disjunctive Kriging tries to do more than Ordinary Kriging.
. Disjunctive Kriging requires the bivariate normality assumption and approximations to the functions fi(Z(si).In general. and the solutions are mathematically and computationally complicated.

Therefore NST (Normal Score Transfromation) method was used with cell and polygon options to decluster the data. However.
. the data may have been sampled preferentially. with a higher density of sample points in some places than in others . it may be easier to detect and model autocorrelation using the NST. For various reasons. NST can be useful for geostatistics because when the data is dependent. Samples should be taken so they are representative of the entire surface. many times the samples are taken where the concentration is most severe.Declustering of data: Often times the spatial locations of our data are not randomly or regularly spaced. To get the best result. Declustering accounts for skewed representation of the samples by weighting them appropriately so that a more accurate surface can be created. thus skewing the view of the surface. declustering was done for the data which were actually clustered in one part of the study area.

.

If the data comes from a normal distribution. the true value will be within prediction ± 2 times the prediction standard errors approximately 95 percent of the time.Prediction Standard error map
A standard error map quantifies the uncertainty of the prediction.
.

.

This procedure is repeated for a second point. one at a time. eliminating the "worst" of the two being compared. It removes each data location. The predicted and actual values at the location of the omitted point are compared. and predicts the associated data value.
. the diagram below shows 10 randomly distributed data points. and so on. Cross-validation omits a point (red point) and calculates the value of this location using the remaining nine points (blue points)."how good" the model is´
Cross-validation Cross-validation uses all of the data to estimate the trend and autocorrelation models. For example. until the two "best" surfaces remain and are compared with one another.
We can systematically compare each surface with another.

.Therefore the goal should be to have
Standardized mean prediction errors near 0 Small root-mean-squared prediction errors Average standard error near root-mean-squared prediction errors Standardized root-mean-squared prediction errors near 1 Spread of the points should be as close as possible around the dashed gray line.

we have only the estimated standard errors to assess our uncertainty of that prediction. the rootmean-squared prediction error may be closer to the average estimated prediction standard error. This is a more valid model . we can be confident that the prediction standard errors are appropriate.
. When the average estimated prediction standard errors are close to the root-mean-squared prediction errors from cross-validation.Optimal Model and valid Model
The root-mean-squared prediction error may be smaller for a particular model. Because when we predict at a point without data. when comparing to another model. one might conclude that it is the "optimal" model. However. Therefore.

If the average standard errors are close to the rootmean-squared prediction errors. we are underestimating the variability in our predictions. we are correctly assessing the variability in prediction. we are overestimating the variability of our predictions. we are underestimating the variability in our predictions. If the average standard errors are less than the rootmean-squared prediction errors. If the root-mean-squared standardized errors are greater than 1.
. If the average standard errors are greater than the root-mean-squared prediction errors. we are overestimating the variability in our predictions. If the root-mean-squared standardized errors are less than 1.

. There fore these methods are not considered for air quality model. making it less flexible and more automatic than Kriging. They do not allow us to investigate the autocorrelation of the data.Deterministic Interpolation Techniques
These techniques are inappropriate when there are large changes in the surface values within a short horizontal distance and/or when we suspect the sample data is prone to error or uncertainty. These functions make no assumptions about the data.

Geostatistical Interpolation Techniques
As per the Prediction error statistics the Root-Mean Square is 10. But OK is only Optimal model.
.75 of Ordinary kriging (OK) is smaller than Simple kriging.

.75 of Universal kriging (UK) is smaller than Simple kriging.As per the Prediction error statistics the Root-Mean Square is 10. But UK is only Optimal Model.13 which is little far from zero. In both the cases the root-mean squared standardized errors are closer to 1. the UK has the value of -0. In case of Mean standardized. The average standard error is greater than the root-mean-squared prediction error in UK show that we are overestimating the variability of our predictions.024 which is close to zero whereas the SK has the value of -0. Which shows both the methods are reasonably good predictors.

In UK.The average standard errors are above to the root-mean-squared prediction errors in Universal Kriging (UK) and it is below in DK. That shows we are underestimating the variability in our predictions in Disjunctive method. the root-mean squared standardized error is more than 1. The average standard error is greater than the root-mean-squared prediction error in UK shows that we are overestimating the variability of our predictions.
. Similarly the average standard error is lesser than the root-mean-squared prediction error in DK shows that we are under estimating the variability of our predictions. the root-mean squared standardized error is closer to 1where as In DK.

That shows we are underestimating the variability in our predictions in both methods.
The average standard error is lower than the root-mean-squared prediction error in SK and DK shows that we are underestimating the variability of our predictions.
. The root-mean squared standardized errors are more than 1.After declustering the data by using Normal Score Transformation in simple kriging and Disjunctive kriging methods the cross validation statistics are again compared to select the best method.

That shows our predictions are better in Simple kriging method than other methods. the root-mean squared standardized error is little more than 1. the root-mean squared standardized error is closer to 1where as In SK 6.The Simple kriging method by using Normal Score transformation (NST) with Polygon option was compared with simple kriging with out NST
As per the Prediction error statistics the Root-Mean Square is 11.26 of Simple kriging 6 (SK 6) is smaller than Simple kriging (SK).
. In SK. The average standard errors are very close to the root-mean-squared prediction errors in Simple Kriging 6 (SK 6) compared to the average standard errors in SK. The average standard error close to root-mean-squared prediction error in both the methods shows that both are reasonably good for predictions (Valid models).

the points should lie roughly along the dashed line. If the errors are normally distributed.
.
If the errors of the predictions from their true values are normally distributed.The graphs of simple kriging prediction standard error of SK 6 and SK were compared below to select the better one. In this case Simple kriging_2 graph. the values are relatively closer to dashed line than other methods. one can be confident of using methods that rely on normality.

9942 -0.09
Average Mean Standard Standardize Error d
12.8646
-2.9315 1.824 -0.75 11.6071 -2.828 9.75 12.466 0.627
Ordinary Kriging Simple Kriging Universal Kriging Disjunctive Kriging Simple Kriging 3 Disjunctive Kriging 3 Simple Kriging 6
-0.086
11.1649
1.9315 1.284 1.28 10.2335 -0.68 11.085 -0.41 9.09032
Root Mean Square Standardized
0.17 11.02492 -0.37
-0.1361 0.2017 -0.7 1.02492 -0.029
.75 11.006 0.269 9.41 11.6071 -1.Comparing the results
Kriging Method Mean Root Mean Square
10.26
11.

03 135.88 133.96 132.08
Universal Kriging UK
148.37 135.12 164.37
149.75 134.53 139.98 148.10 145.53
149.08 141.51
OK
159 170
159.62 140.52
156.51
149.63 140.87 136.09 164.52
UK OK
.54 134. SK 6
150
149.07 135.94
Ordinary Kriging OK
148.82 130.25 147.43 144.37 134.10 145.52
157.17 140.67 126.96 132.44 162.48
159.80 135.02
Simple Kriging 6 SK6
145.71 145.51 144.Comparing the Measured value with the predicted value
ID
Measured value
(MV)
Karve Rd1 Karve Rd2 Nal Stop Navipeth Swargate Mandai Oasis Jog Building Koregaon park Bosari2 Bosari
147 158.80 141.98 148.31 132.39
Method close to MV
SK SK SK 6 SK UK SK SK SK.77 138.39
Simple Kriging SK
147.95 142.39
Value close to MV
147.03 132.87 135.35
148.17 138.61 135.09 164.30 161.93 132.08 141.5 147.53 147.20 144.05
159.66 141.52 135.

.1361
Simple Kriging 6
-2. Therefore.Valid Model
Kriging Method Mean Root Mean Square
11. This is a more valid model.824
-0. from cross-validation.28
Average Standard Error
11. one might conclude that those methods are the "optimal" models.37
-0. However.26
11. when comparing to SIMPLE KRIGING model.1649
1. Both the models are also satisfying other criteria such as Standardized mean prediction errors near 0 Standardized root-mean-squared prediction errors near 1 and Spread of the points should be as close as possible around the dashed gray line.029
The root-mean-squared prediction error was small for UK and OK models. we can be confident that the prediction standard errors in SK are appropriate. the rootmean-squared prediction error is closer to the average estimated prediction standard error.086
11. Therefore.006
Simple Kriging
-1.75
Mean Standardized
Root Mean Square Standardized
1.

. Models improve the effectiveness of air quality management.Air quality dispersion models have an important place in air quality management.(re)locate stations) in a given area. There will always be a need for both measurements and models. and Geo statistical (Kriging) models are most appropriate tools to obtain this information. Knowledge about the spatial distribution of the pollutant concentrations in the area is therefore required. They are essential tools in the development of action plans for improving air quality.e. Based upon model estimates it may also be possible to design measurement networks (i.

RS
GIS GPS
.Today's towns are Tomorrow s cities: Today's cities are the Future of Mankind.