You are on page 1of 13

Geostatistic Analysis, Predicting the values of ozone concentration in the state of California

Roya Olyazadeh,

1. Introduction
The purpose of this work is to interpolate the ozone concentration in California State. There are 175 stations which level of ozone is known for all these stations and they are taken on 2005 by The California Air Resources Board (CARB) [1]. Due to the cost and difficulty of preparing the stations, the ozone value for the entire unmonitored locations in California are required. The ozone concretion causes ozone pollution caused by high level of ozone and it affects to people's health. Figure 1 shows the ambient ozone concentrations, in parts per billion (ppb), to health risk [2]. This work addresses the following environmental challenge: Where are the risky area in California State derived by ozone pollution? To interpolate the ozone concentration Geostatistical analyst can be useful. It builds prediction by examining the relationship between the known points. The two important interpolation analyses are carried out in this research: Inverse Distance weight (IDW) and Kriging. They are implied by ArcGIS. Compared to Rpackage and Gstat [10], ArcGIS statistical analysis is easy and limited. These work focused on most of the tools which can be carried out by Geostatistical analysis tool in ArcMap.

Figure 1: Air Quality index for Ozone

2. Data
The data includes the ozone levels for 175 various locations within the State of California. These data are taken from The California Air Resources Board (CARB) [2]. It contains: Index = index of hte observation Date = acquisition date (2005) Site = acquisition site (location) Latitude = location latitude Longitude = location longitude O3 = Ozone measurement Apart from stations, Shape files of outline of California and hillshade are exercised for better visualization [3, 4].

Figure 2: Layout of California with observation stations

3. Methods
Geostatistic analysis is regarded as a collection of numerical technique to deal with spatial attributes and is viewed from information on three components:

Where the first one is deterministic component and the two others are random component and pure noise [5]. For spatial prediction model, it defines inputs, outputs and the computational procedure to derive outputs based on the given inputs

Where z(si) is the input point dataset, qk(s0) is the list of deterministic predictors and (h) is the covariance model defining the spatial autocorrelation structure. Models are in fact just a special case of a more general prediction model. Interpolation models can be divided to empirical and statistical model as follows[5]: MECHANICAL/EMPIRICAL MODELS there is no estimate of the model error is available and usually no strict assumptions about the variability of a feature exist: - Thiessen polygons; - Inverse distance interpolation; - Regression on coordinates; - Splines; STATISTICAL (PROBABILITY) MODELS the predictions are accompanied with the estimate of the prediction error and it is satisfied with strict statistical assumptions: Kriging (plain geostatistics); Environmental correlation (e.g. regression-based); Bayesian-based models (e.g. Bayesian Maximum Entropy); Mixed models (regression-kriging);


Inverse Distance weight (IDW)

Inverse Distance Weighting (IDW) is referred to one of the oldest method for data interpolation [6]. This method is using a known scattered set of points. The assigned values to unknown points are calculated with a weighted average of the values from inverse distance available at the known points [7]. IDW interpolation

implements the assumption that the closest measured values have the most influence [5]. To predict a value for any unmeasured location, IDW will use the measured values surrounding the prediction location [8].


is the weight for neighbor I and d(s0, si) is the distance from is a coefficient that is used to adjust

the new point to a known sampled point and the weights.



Kriging is a group of geostatistical techniques to interpolate the value of a random field . It belongs to the family of linear least squares estimation algorithms [14]. A standard version of kriging is called ordinary kriging (OK).

Where is the constant stationary function (global mean) and is the spatially correlated stochastic part of variation.kiriging divided to the following methods [5]: Simple kriging assumes a known constant trend: . Ordinary kriging assumes an unknown constant trend: . Universal kriging assumes a general polynomial trend model, such as linear trend model: IRFk-kriging assumes . to be an unknown polynomial in .

Indicator kriging uses indicator functions instead of the process itself, in order to estimate transition probabilities. Multiple-indicator kriging Disjunctive kriging is a nonlinear generalisation of kriging. Lognormal kriging interpolates positive data by means of logarithms.

4. Results: ARCGIS Geostatistical analyst

With ArcGIS Geostatistical analysis tool, it is easy to do spatial analysis. This toolbar needs to be marked in extension of ARCMAP 10 and then to be added to ARCMAP from Customize menu [9]. Geostatistical analysis tool can perform following interpolation methods: 1. Determinestic method: a. Inverse Distance Weighting (IDW) b. Global Polynomial Interpolation c. Local Polynomial Interpolation d. Radial Basis Function 2. Geostatistical method: a. Kriging/Cokriging 3. Interpolation with barriers: a. Kernel Smoothing b. Diffusion Kernel In this work IDW and Kriging are interested. Also it is possible to explore the data in Geostatistical analysis as follows: Histogram NormalQQplot Trend Analysis Voronoi Map Semivariogram/Covariance Cloud



Normal QQ Plot is created by plotting data values with the value of a

standard normal where their cumulative distributions are equal. For the cumulative distribution, the median value splits the data into halves, while quartiles split the data into quarters and etc. The normal QQ plot is constructed by plotting the quantile values for the dataset versus the

quantile values for a standard normal distribution [11]. Figure 3 shows a straight line of normality of the data. Before Kriging it is important to check the normality of the data and if it does not show it in straight line, the data should check again until it fixed with lines. The only departure can be seen on top of the histogram which can be neglected.

Figure 3: NormalQQplot of Ozone concentration 4.2 Trend Analysis

Trend analysis is to define trends and outliers of data and shows that if the line is flat, this would point to that there would be no trend. The light green line in Figure 4 shows that the data starts with low value and then increases. The blue line illustrate that direction on x axis is weaker than direction on y axis for green line.

Figure 4: Trend Analysis



In spatial statistical analyst, the variogram is a function describing the degree of spatial dependence of a spatial random field or stochastic process . It is represented as graph that displays the variance of the difference between field values at two locations across realizations of the field [12]. Semivariogram and covariance both measure the strength of autocorrelation between the two sample points as distance [5]. The variance of the difference increases with distance, so the semivariogram can be thought of as a dissimilarity function.Figure 5 shows the variogram modeling of ozone points. Each red point shows the distance of pair of locations. It is possible to investigate the pair of locations which the data are inaccurate. Some of the points have high semivariogram but they are closer in x axis so they need to be removed from data.

Figure 5: Semivariogram 4.4 Voronoi map

Voronoi maps are constructed from a series of polygons which formed the closest area around one sample point[9]. It uses following method to calculate this area: Simple, Mean, Mode, Cluster, Median and Standard Deviation [13].

Voronoi Map Type: Simple

0,019 : 0,043044 0,043044 : 0,060595 0,060595 : 0,073405 0,073405 : 0,090956 0,090956 : 0,115

Data Source: OzonePoints Attribute: O3

Figure 6: Voronoi Map of Ozone points



As mentioned before IDW is one of the methods to interpolate the data in ArcGIS. This tool follows three steps. Figure 7 shows the parameters of IDW with 15 Maximum neighbors and 10 minimum neighbors. Figure 8 displays the predicted value within the straight line. The prediction map of ozone points by IDW method can be seen in Figure 9.

Figue 7: Parameters for IDW

Figure 8: Predicted Value

Figure 9: Prediction map of ozone points by IDW



In this work, Ordinary kriging has been applied. Semivariance/Covariance modeling is to determine the best fit for a model that will pass through the points in the semivariogram( the blue line in Figure 10).The model assigned to spherical model and a good lag distance can also help reveal spatial correlations. By choosing the small lag size, it will zoom to the model fit and there will be little autocorrelation between the range. Figure 11 displays the predicted value and how well the model fits (Cross Validation). Figure 12 illustrates the prediction map of Ozone level by Kriging methods.

Figure 10: Fit the model (Semivariogram)

Figure 11: Predicted Value of ozone points

Figure 12: Prediction map of Ozone level by Kriging methods

5. Conclusion
From both Kriging and IDW methods it is clear that the east area of California has the highest predicted levels of ozone and it is unhealthy for sensitive people addressed by air quality index control for ozone. While we move to the west, the condition will be moderate and near the sea it concludes to the good condition.

1. Data 2. 3. 4. 5. Tomislav Hengl, A Practical Guide to Geostatistical Mapping of Environmental Variables, European Commission Joint Research Centre Institute for Environment and Sustainability, 2007 6. Shepard, D., 1968. A two-dimensional interpolation function for irregularlyspaced data. In: Blue, R. B. S., Rosenberg, A. M. (Eds.), Proceedings of the 1968 ACM National Conference. ACM Press, New York, p. 517524. 7. 8. e_Distance_Weighted_(IDW)_interpolation_works 9. ArcGIS 9, ArcGIS Geostatistical Analyst Tutorial 10. 11. _plot_and_general_QQ_plot 12. Cressie, N., 1993, Statistics for spatial data, Wiley Interscience 13. aps 14.