You are on page 1of 9

Generalized Least Squares

Modeling Spatially
Continuous Data

n First fit this model to the observed data using ordinary least
squares

Lecture 12

n Then estimate a variogram model using the residuals from


the ordinary least squares estimation

October17, 2002

n Then compute a covariogram using the variogram model


n From the covariogram estimate the covariance matrix and
refit the model using generalized least squares
Bailey and Gatrell Chapter 5
The validity of the final model depends on:
appropriate trend surface model choice
appropriate variogram model choice

Generalized Least Squares

Want to predict a value for the attribute at location s

Objectives
n Understanding and describing the nature of the spatial
variation in an attribute
Knowledge of trend and covariance structure are
sufficient

n Prediction of the attribute values at locations where it


has not been sampled
How to use derived model for prediction purposes

Developing Models for Prediction


Assume we have the model

Matheron coined the term in honor of D. G. Krige, a South


African mining engineer

Y (s) = x ( s) + U (s)
T

U(s) as a zero mean process with covariance function C()


From this model can do better than just a
prediction based on mean value

Kriging

= x (s )
T

Can add a local component to the mean based on knowledge of


the covariance structure and the observed values at sampled
points

An optimal spatial linear prediction method


Optimal in the sense that it is unbiased and minimizes the
mean squared prediction error
Based primarily on the second order properties of the process Y
Prediction weights are based on the spatial dependence
between observations modeled by the variogram

This approach is known as kriging

Kriging Models

Simple Kriging
Assumes mean is known and does not need to be estimated

n Simple Kriging
Mean is known and constant throughout the study region
n Ordinary Kriging
Mean is fixed but unknown and needs to be estimated
n Universal Kriging
Mean varies, is an unknown linear combination
of known functions

n Subtract mean from sample observations to derive a set


of residuals ui u i = y i (s )
n We assume these residuals with zero mean also have
2
a known variance and covariance function C(s)
n Find an estimate u ( s ) for a value u (s ) of the random
variable U(s) at the location s given observed values ui of the
random variable U(s i)at the n sample locations s i
n With an estimate u ( s ) the predicted value of the random
variable y ( s) is derived by adding u ( s ) to the known trend at
the point s

Simple Kriging

Weighted sum of n random variables at sample sites si


n

U (s) = i ( s)U (si )


i =1

i (s) different weights applied to different locations s


U ( s )as the sum of random variables is a random variable itself

U ( s) = i (s)U (si )

s2

Estimates are weighted linear combinations of the observed


residuals

i=1

2
s1

Unmeasured point to be estimated

s0
s6

s1 ,sn are observed point


locations

s3

3
4

s4

s5
U ( s ) should have a mean value of zero for any set of weights

since by assumption the mean of U (s) is zero and


weights are constants

Simple Kriging

Simple Kriging

Assuming U ( s ) and U (s ) both have zero mean the expected


mean square error is:

Want to chose weights to minimize the mean square error

E ((U ( s) U ( s)) 2 = E (U 2 ( s )) + E (U 2 ( s )) 2 E (U ( s )U ( s))

Differentiating with respect to (s ) in order to minimize gives

= i ( s) j (s )C( si , s j ) + 2 2 i (s )C (s , s i )
i=1 j =1

i =1

= ( s)C (s ) + 2 (s )c(s )
T

C is an (n x n) matrix of covariances, between all possible


pairs of the n sample points
C(s) is an (n x 1) column vector of covariances between
the prediction point s and each of the n sample points

( s ) = C 1c( s )
U ( s) = T ( s)U = c T ( s)C 1U
The minimized expected mean square error corresponding to
the choice of weights is

E ((U (s) U (s)) 2 ) = 2 cT ( s)C 1c (s)


Mean square prediction error or kriging variance

e2

Worked Example Simple Kriging


ID
1
2
3
4
5

Distances
1
2
3
4
5

Y
29
73
82
91
22

1
0

Z
82
105
65
52
22

2
49.2
0

122
183
148
160
176

Worked Example Simple Kriging


u
-38.37
22.63
-12.37
-0.37
15.63

3
53.9
40.3
0

4
65.2
55.9
15.8
0

Covariances
1
2
3
4
5

5
60.795
97.1
72.09
69.6
0

Worked Example Simple Kriging


c(s)

Simple kriging weights need not sum to 1

6.895
5.032
10.15
8.083
4.079

*
( s)

( s ) = C 1c( s )

0.225
0.066
0.351
0.128
0.107
Sum

2
4.571
20

3
3.970
5.970
20

4
2.828
3.739
12.450
20

5
3.228
1.086
2.300
2.479
20

C-1

0.054914 -0.01004 -0.00646 -0.00095 -0.00746


-0.01004 0.056751 -0.01508 0.000168 0.000252
-0.00646 -0.01508 0.087403 -0.05044 -0.00194
-0.00095 0.000168 -0.05044 0.082023 -0.00422
-0.00746 0.000252 -0.00194 -0.00422 0.051936

Worked Example Simple Kriging

C-1
0.054914 -0.01004 -0.00646 -0.00095 -0.00746
-0.01004 0.056751 -0.01508 0.000168 0.000252
-0.00646 -0.01508 0.087403 -0.05044 -0.00194
-0.00095 0.000168 -0.05044 0.082023 -0.00422
-0.00746 0.000252 -0.00194 -0.00422 0.051936

1
20

0.877

U ( s) = i ( s)U ( si )
i =1

u ( s) = 0.225 * 38. 37 + 0 .066 * 22 .63 + 0.351 * 12 .37 + 0.128 * 0 .37 + 0.107 *15.63

9. 86

y (s) = 160.37 9.86 150.5

e2 = 2 c T ( s)C 1c (s)
= 20.0-6.92
= 13.08

i(s)
0.225
0.066
0.351
0.128
0.107

95 percent confidence interval is y(s ) 1 .96 e


95 percent CI = 150 .5 7. 09

s2
n

U ( s) = i (s)U (si )

s1

i=1

0.066

0.225

s0

0.351

s3

150.5

0.128

s4

0.107

Optimal local spatial prediction


n First order effects are implicitly estimated as part
of the prediction process
n Prediction of yi occurs in one step using a weighted
linear combination of the observed values yi
n

Y ( s ) = i ( s) Y ( si )
i= 1

s5

In this case we chose weights so that the mean value of


Y(s) is constrained to be the mean

s1 ,sn are observed point


locations

Ordinary Kriging
The mean of

Ordinary Kriging

Ordinary Kriging

Y(s) and each of the Y (si ) are all

The mean of Y ( s) will also be


weights sum to one

as long as the

With this constraint we minimize the mean squared


error between values of Y(s) and Y ( s )
The expected mean square is
E ((Y ( s ) Y ( s )) 2 ) = T ( s ) C ( s ) + 2 2 T ( s )c (s )

E ((Y ( s) Y ( s)) 2 ) = T ( s) C ( s) + 2 2 T ( s )c (s )

C is an (n x n) matrix of covariances, C(si, sj) between all


possible pairs of the n sample sites and c(s) is an (n x 1)
column vector of covariances, C(s,s i) between the
prediction point s and each of the n sample sites
To carry out optimization with constraints use the method
of Lagrange multiplier v(s)
T ( s)C ( s) + 2 2 T ( s)c ( s) + 2( T 1 1)v ( s)

Ordinary Kriging

Ordinary Kriging

1 =1
These equations are expressed as modified
matrix C+ and vectors + and c+
C (s) + 1v( s) = c (s)
T

C+
C ( s1, s1 )

C ( s n , s1 )

? + (s )

L C ( s1 , sn ) 1 ( s)

O
M
1
M

M
M
( s)
L C ( sn , sn ) 1 n


L
1
0 v ( s)

T ( s)C ( s) + 2 2T (s )c ( s) + 2( T 1 1) v (s )
Leads to 2 simultaneous equations

c+ ( s)

C ( s, s1 )

M
C (s, s )
n

T1 =1
C ( s) + 1v( s) = c( s )
And a solution

+ (s ) = C+ 1c+ (s )

Ordinary Kriging
To obtain the prediction
Solve the equation for

y ( s)

+ (s)

Extract from this the vector


Then

(s )

y ( s ) = T (s ) y

yi is the original set of observations

Source

http://www. msu.edu/~ashton/466/2002_notes/3-18-02/ok_ill.html

Worked Example Ordinary Kriging

1
2
3
4

x y
2 4
4 7
8 9
7 4

z
3
4
2
4

The distance matrix for the 5 data points is:


[1] [2] [3] [4] [5]
[1] 0.000 3.605 7.810 5.000 4.000
[2] 3.605 0.000 4.472 4.243 3.605
[3] 7.810 4.472 0.000 5.099 5.385
[4] 5.000 4.243 5.099 0.000 1.000
[5] 4.000 3.605 5.385 1.000 0.000

The distance vector between the data points and the


unknown point is:
[1] 3.162 2.236 5.000 2.236 1.414

Worked Example Ordinary Kriging


The vector of covariances for the points to the unknown point is:
4.061023 5.026350 2.343750 5.026350 5.919616 1.000000
A vector of weights lambda, along with the LaGrange multiplier
in column 6:
[1]
[1] 0.17289193
[2] 0.26523729
[3] 0.05887157
[4] 0.16986833
[5] 0.33313088
[6] -0.13471033

Worked Example Ordinary Kriging


Variogram model is a spherical model with nugget=2.5, sill=7.5,
range=10.0

3
( h) = C ( h )

3h
h
0 h 10
2.5 + ( 7.5 2.5)

20
2000

( h) = 0
h=0
7.5
otherwise

3h
h3

C (h) = 7. 5 2.5 + (7 .5 2.5 )

20 2000

The covariance matrix for the data points for the given model
[1]
[1] 7.500000
[2] 3.6200650
[3] 0.5001733
[4] 2.3437500
[5] 3.2400000
[6] 1.0000000

[2]
3.620065
7.500000
2.804380
3.013076
3.620065
1.000000

[3]
[4]
[5]
0.5001733 2.343750 3.240000
2.8043796 3.013076 3.620065
7.5.00000 2.260774 2.027458
2.2607737 7.50000 6.378750
2.0274579 6.378750 7.50000
1.0000000 1.000000 1.000000

[6]
1
1
1
1
1
0

Worked Example Ordinary Kriging


Calculate the variance as the transpose of c times the inverse
of C times c, subtracted from the sill
5.135453
The kriging standard error is the square root of this:

Then multiply the weight for each data


point by the attribute value of that point
to determine the ordinary kriging
estimate:

e2 =

2.266154

4.375627

Ordinary versus Simple Kriging


Primarily a local neighborhood estimator
There is no need to estimate a first-order trend; instead, the
mean is estimated from nearby data only.
Estimates are not as sensitive to non-stationarity (though the
covariance model may be affected, it is not as strongly affected
at short ranges as at long ones).

Universal Kriging

Universal Kriging

Includes a first order trend component

Y (s ) = 0 + 1 x1( s) + 2 x2 ( s) + L + n xn ( s) + ( s)
n

Y ( s ) = i ( s) Y ( si )
i= 1

As for ordinary kriging, the weights are chosen to minimize


mean squared error

Subject to the constraint that Y ( s ) is unbiased for

{ }

that is E Y (s ) = E{(Y (s )}

i =1

=1

To obtain the weights that minimize the mean square error


subject to this constraint, again use the method of Lagrange
multipliers

Forms a prediction for y in one step

MSE = E | Y ( s) Y (s) | 2

Y ( s ) is unbiased if and only if

Y(s)

for all 0 1 2 ,L p

Taking the derivatives with respect to and v, setting the


expressions to zero and re-arranging terms we get the
kriging equations (using variogram)
n

(s
i =1

i= 1

=1

sk ) + v 0 + v j x j ( sk ) = ( sk s0 ); k = 1, 2L , n
j =1

i k

( s i ) = xk ( s0 ); k = 1,2 L , p

i =1

Universal Kriging

Universal Kriging

In matrix notation
(s1 , s2 )
0

0
( s2 , s1 )

M
M

( sn , s1 ) (s1 , s2 )
1
1

x1 (s1 )
x1 (s 2 )

M
M

x (s )
x p (s2 )
p 1

L ( s1 , s n ) 1 x1 (s1 ) L xp ( s1 ) 1

L (s 2 , sn ) 1 x1 (s 2 ) L x p (s 2 ) 2
O
M
M
M
M
M M

L
0
1 x 1 (s n ) L x p (s n ) n
L
1
0
0
L
0 v0

L x1 ( sn ) 0
0
L
0 v
1

M
M
M
M M

L x p (s n ) 0
0
L
0
vp

(s 1 s0 )

( s2 s0 )

( sn s0 )

x1 (s 0 )

x (s )
p 0

C+
C (s1 , s2 )

C (s , s )
1 2

x1 ( s1 )

x (s )
p 1

L C( s1 , s 2 )
O
M

x1 (s1 ) L
M
O

L C( s1 , s 2 ) x1 (s n )
L x1 (s n )
0
O
M
M
L x p ( sn )
0

L
L
O
L

x p ( s1 )

M
x p (s n )

0
M
0

+ (s )

1 ( s)

M
(s )
n
v1 ( s)

M
v p ( s)

c + ( s)

C ( s, s1)

M
C ( s, s )
1

x1 (s )

M
x (s )
p

solution

+ ( s) = C+1c+ ( s)
MSE

Prediction is
1
+ +

y ( s) = T ( s) y

= c (s )C c ( s)
2
e

T
+

Universal Kriging
n May make more sense to estimate trend explicitly
n Need to estimate trend to derive residuals for variogram
modeling in any case since it is only safe to estimate
variogram model from y when the mean is assumed
constant
n Does not make sense to use it for local neighborhoods

Best to use generalized approach and remove trend explicitly