Professional Documents
Culture Documents
Michael I. Jordan - An Introduction To Probabilistic Graphical Models
Michael I. Jordan - An Introduction To Probabilistic Graphical Models
$ )
*+
,---.
/
0 0 + .
0
1
#
+ .
+ . + . + .
,
θ
θ
Zn
N
Z1 Z2 Z3 ZN
(a) (b)
"
+
.
+ .5
+ . +,.
"
+ .
6
!
!
2
,6
2
3
$
!
2 7
"
2
#
+ .
0 1 + .
+ .5 ,
+ .
+7.
7
K
W θ
Zi
λ
Xi
Ci
β
Yi Ui
N
2 7 !
$
)*
<
+7==7.
+
.
#
>
X4
X2
X6
X1
X3 X5
+
. +
. + . + . + .
8
8 &
)
* + .
5
5 . /
"
#
+ .5 ,
+ . +>.
2
# "
+
9& +>.
C
X1 X2 X3 X4 X5
fa fb fc fd
2 C !
:
8 + . 5 + . + .
+ . + .
&
& /
& +.
+
.
@
+ .
&
&
+ . D#
+ .
8
Æ
6
+,---. E +,---.
/
2 >
+ . /
+
F
.
+ .5
, +
. + . +
. + . +
. +C.
4
+
. /
+ . 5 ,
+ .
+ .
+
. +
. +
.
, +
5 +
. +
. +
. . +
.
, +
5 +
. +
. +
.
.
,
5 +
. + .
+ . +
.
,
5
+
. + . +
.
5 ,
+ . +F.
/
/
B
#
@ @
+
.
4
&
A
i
X1
m ji ( xi ) m 12 ( x2) m 12 ( x1)
j X2
m 32 ( x2) m 42 ( x2)
m kj ( xj ) mlj ( xj )
m 23 ( x3) m 24 ( x4)
k l X3 X4
(a) (b)
+.
#
!
/
&
Æ
&
1
+ 2 F+..
+ .5 + . +
. +
. +A.
+
.
+ . + .
+ .
+G.
9& +A. )* + .
&
G
7
+
+ . + .
+ .
.
#
B
+ .
8
2
#
"
8
+!" %9
7===
,--=.
)8* )*
)*
2
)
*
&
8
&
"
6
%
%#
D
#
+
.
;
! "!!#
+1 <
.
4
%%
# 1
&
%#
#
+)
*
. %#
#
? 1
# +E
,--F % 7==7.
8
8
8
+!
8
. /
8 "
+ .
+ . 5
+ .
5
+;.
#0
+ .
!
8
+ .
-
&
& &
)*
9& +A. + .
0
$ &
+K
7==,.
F
& !
# )
* $
+K
7==,.
0#
D +7==7. %# +7==7. / +7==7.
2
$
! +7===. 1 $
+7==,.
%
%
8
#
@
)*
2
+,-;,.
2
9%
+ 9
.
#
%%
$
# +'
# $
# 7==,.
! "
,=
M
2 A !
4
+ .
2
&
/
B
!
J
!
J
%
#
0 J
!
#
,,
( n) ( n) ( n) ( n) ( n) ( n)
Gπ( i ) f Pπ( i ) Gπ( i )m Gμ( i ) f Pμ( i ) Gμ( i ) m
( n) ( n)
Hi m Hi f
( n)
Pi
( n) ( n)
Gi f Gi m
!
"
. @
#
& +
.
B B
3
"
!
!
2
)
*
&
+
L
. &
8
+ L
.
B
%#
&
I , &
" "
9
#
$
% &
'
% &
,7
( n) ( n+1)
Hi f Hif
( n) ( n+1)
Hi m Him
( n) ( n+1)
G G
( n) ( n+1)
P P
8
!
2 ; 4 8
B
2 ;
#
"$# +1 E ,--A.6 2 ,=+.
'%% B
+0 1 ,-;G
9
,-G,. /
%
1
6
+7===. %
#
2 ; 1 E
+,--A.
1
'%%
% +7==7.
1
1
! J
#
1
,>
f1 f2 f3 f4
X1 X2 X3 X4
fa fb fc
+ &
.
+.
&
#
, .
3 &6
$
/ #
2
#
6
+
.
2 -
#
"&'(!#
6
7
6 &
&
9& +>. #
,C
8
#
@
/
03:
+
.
%
#
%%
&
#
+
. %
#
+<
7==,. !
8
#
!
#
#
"
1
4
#
#
+
2 -
.
! "
#
1
%#
%#
&
)*
# 8
,
&
%#
'
!
'%%
&
+
( "
)
)
$ '
* +
,-./
0
,F
. #
+
8. #
+
'%%
8 !
'%%
+1 E ,--A.
+
# .
'%%
%%
+1 E ,--A.
1
0
$,2
33*'
,A
(a) (b)
(c) (d)
2
2 ,=+.
+
E ,---.
#
%#
2
%#
'%% +%
:# 7==7.
'%% +<
,--A. '%% +$
7==>. 2
$
+7==>.
&
)
* @
+ 2 .
,G
β
M
N
α θ Zn Wn
#
/
&
#
8 $
+7==7.
2 ,, /
1
#
# $
0
#
?
8
2
$
8
8
%%
/
#
"
#
#
B
B
#
#
J
$
,;
#
% !" < E %9
8
))) *
* CA>7FM>C> 7===
! 3 1
! :##
&' ( ! ! % : %! ,-A>
N 1 %E $
:
$
+
-
( .2 7==,
N 1 % E 2
%#
& >G,;>M7>>
,--A
/ 1
# ! 3
!
$
# 9
$
,-
0 08 %
:
,--A
% ! < 0# ' E D 1
3 +4
( ) !
7==7
%# + :3 %
7==7
D % ' - 5 6
& :3
?
$#
7==7
D % % :# 0
'%% + -
( .2 7==7
E :
( 6 5 - (
%
D
% ! ,-;;
< 0 0
E : E 1 %
"
# 7==7
< ! #
3 +4 ,A-M,-; ,--=
! ! 1 ( !# ! $
%
#
# 1
!
,=7F-M7A- 7===
% E / 5
:3 %
7==7
E K / 2 K / 1
8
+ -
( .2 7==,
7=
! "
!
# ! $
!
% &!
&
%
'
'
#
#
&
. 0 2
& /
3
/ 4
. / / 0
. /
/ /
0
# '
4
3
%
'
*
5
5
&
6
. 0
.
0
$
7
%
'
%
2
%
,
#
,
5 %%
. 0
# . 0
)
'
6 / 8
5 . 0 :
:
'
#
%
-
. 0
$ . 80
. 0
> )
$*
&
$ . 80 . 0
9
. 0
&
. 0
5 . 0
. 0:
. 0/ . 0 . 0
!
!
ÔÜ Ü
"!
!
#
$
!
%
!
!
& '
?
X4
X2
X6
X1
X3 X5
. 0
&
) #
7 8 *
*
:
#
*
*
*%
. %0 &
* '
* %
* 7 *
A)
B
x2
0 1
x1
0 1
x4 0
1
0 x5 1
x2 0
1 X4
0
x6
X2 1
0 1
x1
0 X6 x2
X1
1
X3 X5
x3
x1 0 1
0 1
0
0 x5
x3 1
1
7 : &
$
%
$
%
- 8 (
&
7 *
:
.*
0
5
:
. C0
5
7 *
7 8
. E0
. 8D0
. 880
. 8 0
. 840
. 830
. 0
:
. 8;0
/ . 0 . 0 . 0. 0. 0
0 . . 8?0
. 0. 0. 0. 0 . 0 . 0
/
. 8B0
/ . 0. 0. 0. 0 . 8C0
:
. 0 /
*
%
$ . C0
A
)
7 *
7 8
&
$ . 830
'
)
A
>
* 7 4
X4
X2
X6
X1
X3 X5
%
6 )
)
%
$ . C0 2
'
)
'
.*
0
2
'
$ . C0
)
$*
<
&
$ . C0
$ . 0
$ . B0
%
#
(
%
&
)
'
- 88
8
X Y Z X Y Z
(a) (b)
7 3: .0 &
%
.0
)
$ . 0
&
<
#
+
. 0 + )
#
'
<
&
- 88
' !
"
7 3
&
. 0
:
. 0 /
. 0 . 30
. 0
/
.0. 0. 0 . ;0
.0. 0
/ . 0 . ?0
&
*
> '
.
#
$ . 400 &
*
7
*
. 0(
. 0
& ) 7 3
%
.
0 &
.
0 <
)
:
"%
8?
*
7 *
# $ . 40
'
9
7 7 3
&
A)
+
7 ;
X Z X Z
(a) (b)
7 ;: .0 &
%
.0
)
#
&:
.
0/ . 0 . 0 . 0 . C0
:
<
2 )
7 ; '
.<
0
7
7 ? G
:
. 480
8;
X Z
X Z
(a) (b)
7 ?: .0 &
.0
2
*
#
+
'
late
)
)
%
F
%
G
7 3
7 ; * %
& 7 3.0
)
-
7 ;.0
)
+
7 ?
'
(
' %
&
=
& )
% " 8?(
% <
8B
W
X
&
7
. 7 C0 2
.
0 2
)
'
7 3
"
&
)
-
(
)
#
7 E.0
8C
X Y Z X Y Z
(a) (b)
7 E: &
.0
)
.0
.
0
&
7 E.0
)
+
&
#
7 8D
7
7 ?
.
% 0 G
&
)
!
+
"
'
9
)
#
7 8 2
)
&
&
)
)
#
7 84
6
*
7 83 %
. A)
0
&
.
8E
Y Y
X Z X Z
(a) (b)
7 8D: &
.0
)
.0
X Z X Z
Y Y
(a) (b)
7 88: &
.0
)
D
X Y X Y
(a) (b)
7 8 : &
7 88 .0
)
.0
X Y X Y
(a) (b)
7 84: &
7 8D .0
)
X1 X2 X3 X4 X5
7 83: &
<
8
X4
X2
X6
X1
X3 X5
7 8;:
)
)
. 4 0
'
&
)
<
(
)
"
7 8
.
%
(
. 7 8;0 &
)
7 88.0
2
)
- 88
"
. 7 8?0 -
X4
X2
X6
X1
X3 X5
7 8?:
X4
X2
X6
X1
X3 X5
7 8B:
- 88
&
* -
<
*
<H
<
H
7 8C
9
2
4
aliens watch
late
report
&
<
"
7 8E.0
)
A
)
(
7 8 .0
)
&
&
-
<
%
+
%
)
Æ )
H
<
,
,
9
3
late late
report report
(a) (b)
.0
)
+ <
)
) *
)
A
#
6
" 8?
&
&
:
. 0
#
7
%
#
;
X1 X2
X4
X2 X4
X1
X2 X4 X1
X3 X3 X4 X1
X2 X4 { X1 , X 3 }
X2
{ X2 , X3 } X4 X1
(a) (b)
7 D: &
.0
.0
7
%
. 0 =
6
%
'
5 )
# &
<
7
<
7 *
7 D
. 0 ) %
%
7
&
#
'
* '
'
' " 8?
&
)
?
& #
#
#
&
,
'
)
&
. 0
%
%
&
%
%
&
- 84
#
+
. - 80 #
#
) 7
.
0 #
&
*
7
.
" 8?0 7
% .
7 80 &
(
(
)
#
*
B
XB
XA
XC
'
> &
7
' 7 .0
#
%
'
% . % 0 .=
X Y X Y
Z Z
(a) (b)
)
#
7
#
%
&
'
Æ
#
+
&
,
9
<
)
(
'
'
)
&
-
.
0
#
,
=
> '
'
)
&
*
*
*
&
*
.
0
(
%
" 8? 6
)
*
6
*
. 0
#
*
%
&
#
+
,
!
)
-
-
!
+
,
(
.
/
,
,
4D
x4
0 1
0
x2 x2
0 1 1
0 X4
x1
1 X2
X6
X1
x6 1
x3 0
0 1 X3 X5
x5 0
x1
0 x5
0 1 1
1
0 0 1
x3
1 x2
&
#
&
&
:
8
. 0
. 0 . 4;0
# :
. 0
. 4?0
*
7 4 &
*
)
&)
#
. 0
48
X Y Z
&
) &
#
, $ . 4;0,
)
%
#
&
. 0 $ . 4;0
. 0>
*
"
7 3 &
&
.
0 # :
&
7 3
$ . 4B0
. 0
. 0
. 0 . 0
.0
. 0
&
#
<
& :
. 0
/ . 0 . 0 . 4C0
Xi _1 Xi Xi +1
(a)
xi x i +1
_1 1 _1 1
_ 1 1.5 0.2 _ 1 1.5 0.2
xi _1 xi
1 0.2 1.5 1 0.2 1.5
(b)
7 ;: .0
8 8 .0
'
+
%
%
%
&
&
%
"
8 8 / D
%
7 ;.0 '
( / 8
)
7 ;.0
& / 8
/ 8 /
8
/
8
'
&
*
*
. 0 / *. 0
:
%
. 0
&
44
. 0
& *
'
*%
$ . 4;0 :
8
.0 /
*
. 0 . 3D0
/
8
*
. 0 . 380
&
*
.0
. 0 . 3 0
:
8
. 0 / *
. 0
. 340
<
#
*
A
<
#
1
' - 84
9 #
,
,
7
*
"
)
' " 8?
G
%"
9
& #
)
43
)
,
%
'
# & :
.0 8 . 0
. 3;0
$ . 330
$ . 3;0
*
"
*
'
*
)
A
#
%
%
%
*
Æ
+
)
'
&
)
.
0
'
;D
"
4;
X1 X2 X 50
7 ?: *
'
%
# + #
" C
:
.
/ 8 0 / . @ @ @ 0 . 3?0
.0
.D 80 .
*
" C0 G
;D
'
#
$ . 3?0 &
& %
'
)
#
>
" 3
" 8B Æ
&
&
%
#
-
"
7%
B.0 7
,
%
$
.
0 -
4?
X1 X1
X2 X3 X2 X3
(a) (b)
7 B: .0
)
.0
&
:
:
. 0 /
. 0. 0. 0 . 3C0
=
7 B.0,
. 0
& %
$ . 3B0
.% 0
7 B.0
. 0
# $ . 3B0 +
& #
&
#
7
%
G
' " 3
#
*
7
%
*
%
*
,
,
*
" 3
'
)
#
&
%
#
,
#
< #
#
!
F
9 #
9
%
)
)
%
&
#
<
9
-
%
"
!
"
#
# $ %
&
' (
( % ) *
"
) *
+
% #
, -.-
"
-
2
X4
X2
X6
X1
X3 X5
) *1
) * )--*
) *
& 6
+
(
0 - ) * "
& )*
) * 7
,
0 - )*
" !
) * %
3
)* 1 ) *)* )-5*
5
X X X
Y Y Y
) *
)
*
"
(
) :# ) -**
3
) *
& ) *
) *)? * )C
? *
) *
+
!
3
% ) *
) * # &
0
3
) ? * 1 ))**))* *
)-.4*
"
, -.
"
$ #
'
? "
&
?
&
Æ ) ? *
1
?
$
"
3 "
) * ?
) * Æ )
? * 3
$
8
$!
)* 1 ? 8
)
? * $
) ? * E
$
! )? *
"
Æ )
? *
&
) *
"
&3
)* ) *Æ) ? * )- -*
' #
) * ) *
3
$
3
)* .
) * )- 2*
E
$
.
) * $
) *
$
!
"
) * 1 ?
Æ) ? *
8
8
) *
Æ )
? * ) *
+ &
!
"
"
) 0 -- * /
1 ) *
#
&
"
" $
) *
Æ )
? *
+
&
) * "
8
"
&
# "
& ) $*
) ? * ,
$
)? *
'
, -.
8
"
# 1 )4 5 2 - .*
#
) * ) *
Æ ) ? *
& "
) * Æ) ? * "
) * 1 ) *Æ) ? * ,
) * 1 )? *
) * Æ ) ? *
) *
?
"
) *
!"
#
$
% %
&
! '
(
$
)
% *
#
+,
-
.
$
/
(
(
.B
) *
) *
) *
) *
) *
) *
) *
) *
Æ )
? *
) *
&
) *
) * 1 ) *
) *
) *
) ? * ) *
) *
X4
X2
X6
X1
X3 X5
! )
) * 1 ) * ) *
$
? * : 1 ) *
$
)? *
8
"
0 --
"
$
) *
' 7
F
, -.
) ? *
0 -2
(
#
&
)
? * "
) *
+
) *
.
3
) ? * 1 ))* * )- <*
$
$
"
!
%
$
.-
"
9
$!
#
9
#
8 "
8
'
$
8
+
8
8
+
1 ) *
8
!
) *
"
0 -5
>
H 1 ) H*
H
0 -4) * )4 5 2 - .*
'
,
&
0 -4)*
0 -4)* E
0 -4)* "
0 -4)*I)*
.2
X4 X4
X2 X2
X6 X6
X1 X1
X3 X5 X3 X5
(a) (b)
X4 X4
X2 X2 X2
X1 X1 X1
X3 X5 X3 X3
(c) (d) (e)
X2
X1
X1
(f) (g)
0 -43 + )4 5 2 - .* "
) *
.5
X4
X2
X6
X1
X3 X5
*
.<
(a) (b)
0 -@3 ) * +
# )* +
#
!
" CF8 ( 8
8
Æ
%
" 8
#
8
8
"
Æ
CF8
0
#
"
#! $
" $
X3 X5
) *
) 0 -.* "
#
"
& " & #
&
(
"
6
0
6
)
* )
*
"
"
)
*
%
#8
3
9
$9
D
#
Æ
:
8 )
*
2 E
.<
(
$
"
Æ E
!
# Æ
" &
!
) *
B
X1 X2 X3 XT
q q
Y1 Y2 Y3 YT
:
+
# Æ
# &
# $
#
8
0 -..
.
.5 J
#
6
8
(
"
(
6
#
.
!"
#
$ %
&
'(
)
*
%
*
%+
!"
!
"
4
0
5
7 (
!"
3 (
4 5%
-
4 5%
8
(
4
4 5% 9
-
3
4
.
3
7
-
&
/
-
4 /
$
; 5
; &
1 (
/
'& 5%
; %
<
!
"
=
Æ
=
/
Æ =
$
Æ =
5+
;
-
1
3
(
? @
-
&
(
- &
1
- -
!
"
1
1
(
.
(
&
i
to root
i m ji ( xi )
j
j
m kj ( xj ) mlj ( xj )
k l
(a) (b)
4 5 $ 3
8
7
*
4 5 1
1
(
)
1
7 ! "
!" 6 8 (
:
(
B
(
1 ! " 3
4 5
!:
"
7
3
3
4 5
:
$
;
5,
. *
4 50
1 &
8
3
4 50
6
4 50
4 50
!"
.
.
'& 5, 5B !
"
&
$
4
%
3 (
4 55 1
8 :
8
4
'& 5B
4
9
!" 1
*
1 &
!" @
:
6 5%
%2
X1 X1
m 12 ( x1) m 12 ( x2)
X2 X2
X3 X4 X3 X4
(a) (b)
X1 X1
X2 X2
m 32 ( x2) m 42 ( x2)
m 32 ( x2)
m 24 ( x4) m 23 ( x3) m 24 ( x4)
X3 X4 X3 X4
(c) (d)
(a) (b)
(c) (d)
4 55$ )
%
;
i
; Æ
=
j
;
k
i
j
;
k
:
4 5+
'(
) A A
. :
(
9
; $
%5
(
( (
;
8
&
/
G
$
5%2
@
Æ
4 - '& 5%%
X1 X2 X3 X4 X5
fa fb fc fd
4 5>$ 3 (
fa
X1 X5 fd X4
X1
X3 fc X3
X2 X4 X2 fe X5
fb
(a) (b)
4 5,$ 3
8
(
%>
X1 X1 X1
fa fb f
X2 X3 X2 fc X3 X2 X3
(a) (b) (c)
X1 X1
Z1 Z3 W1 W3
X2 Z2 X3 X2 W2 X3
(a) (b)
3
-
4 5B
3
:
$ :
:
1 :
3
4 5%2 :
$
; 5%
%B
ft Xj
μ ti ( xi ) νjs ( xj )
νis ( xi ) μ si ( xi )
Xi fs Xi fs
μ ui( xi ) νks( xk )
fu Xk
(a) (b)
3 &
4 5%%
* (
4 5>
4 5% 7
4 5%
'& 5%0
& ) '& 5%0
:
$
;
3
4 5%
4
'& 5%
8
4 5% 3
;
; 8
4 5%
$ ; ;
4 4 5% 4 5% 4 5%
4 5% $ ; ; H
! "
;
;
Æ Ò
;
X1 fd X2 fe X3
X1 X2 X3
fa fb fc
(a) (b)
ν1d ( x 1 ) ν3e ( x 3 )
μa1 ( x 1 ) μb 2( x 2 ) μc3 ( x 3 )
(c) (d)
μd 2( x 2 ) μe 2( x 2 ) ν2d ( x 2 ) ν2e ( x 2 )
ν2b ( x 2 )
(e) (f)
μd 1( x 1 ) μe3 ( x 3 )
ν1a ( x 1 ) ν3c ( x 3 )
(g) (h)
Xi
Xi
μ si ( xi )
m ji ( xi ) fs
ψ ( xj )
Xj fr Xj
μ ti ( xj ) μuj ( xj )
m kj ( xj ) mlj (xj )
ft fu
Xk Xl
Xk Xl
(a) (b)
*
4 5%5 3
- -
$
3
! "
!
"
*
0
X1
X1 X1
X2 X2
X2
Z
X3 X4
X3 X4 X3 X4
X5 X6 X5 X6
X5 X6
(a) (b) (c)
H
- '& 5 2 )
4 5%5
1
'& 5 2
4 5%5 8
4 5%5
&
- & -
3
(
3
&
@
5
X1 X2
X1 X2
X4
X3 X4 X3
X5
X5
(a) (b)
4 5%+$ 3
8
4 5%+
3
*
!(
"
*
8
3
@
!
"
.
3
4 5%+
4 5%+ 1
)
F
(
%CB0 I A
; ( Æ
= 5
(
5 0
20>
3 )3A
-
& (
4 5%,
- $
; 5 5
)3A
(
(-
(-
$
( ; ( ( ( ( ( (
"#$ %
%
%
%
& &
!
%
&
>
X 1 2 3
1 .6 1 0 .6 .4 1 2 3
2 .2 2 1 0 0 .4 .36 .24
3 .2 3 1 0 0
Y
p (x) p ( y | x) p (y)
X4
X2
X6
X1
X3 X5
4 5%,$ 3
,
!
"
#
$
"
#
Æ
=
#
; ( #
$
( ;
(
-
*
(-
! " 6&
-
!"
! ("
)
!"
! ("
:
$
( ; ( 5 ,
!$
"
$
; (
5 C
C
( ; (
502
'& 502 )3A
?
(
)3A
(
'& 502
(
(-
F
(-
(
1 (
7
(
(-
$
!
"
:
!
"
:
8
$
(-
(- Æ $
Æ (
50
Æ
(
02
F Æ
Æ
(-
(- G
; Æ
- 4 5%C 8
!
<
1
4
$
(
F
1
1
*
%> %,
. &
1
0%
$
;
$%& ; (
; (
%
; (
Æ (
%
; Æ
!
"
#
$
% $
$
&
'
&
(
$
)
$
(
$
(
)
*
+
,
(
-
$
(
.
/
0
1+2
3
4
%
&
0 2
'
-
0 2
5
%/
!
)
*
! (
"
# ( 6
!
$
3
$
!
(
$
$
7
!
0 2 0
9$ 01822
0 2
0 2
3
(
0 2
(
! $
(
!
$
"
&#
:
/
; 8
0; 82
(
$
!
4 ( 4
4
/
! $
"&# 3
$
%/
) (
0 2
!
<
0 2 (
0
2 &
!
$ (
*
4
!
$
& $
!
$
!
"
# 0
2
!
/
"
#
=
0
2
%/
(
* 0
2
$
! $
0
2
/
?
"
#
>
/
0
2
%/
0 2
/
) 3 18
%
)
=
/
6
B 7 / 0 2 01 2
! /
/*
=
&%
/
$
/
$
*
0 /
"
# 2
* /
/
C
)
"
#
$
4
9$ 0182
6
)
(
!
&
/
4
( $
. /
( $
(
5
0 2
E
5
!
6
B
7 0 2 01,2
$
( !
6
B 7 / 0 2 0112
7 / 0 202 01>2
$
*
0 2
(
?)-
/
9$ 01>2 /*
! !
/*
0
* 2
!
6
B 7 /
0
2 F
0 2 01C2
/
?)- G "
#
0 2
- *
0 :
>2
*
?)-
(
(
(
$
*
!
H
X Xnew
4 ?)-
(
.
.
*
3
5
6
3
$
(
"
# !
0
2
0 2
D
$
$
9$ 018;2 =
"#
5
"
# 0
B 2 G
4 $
(
%5
/
B 9$ 018;2
I
*
/
%
0
2
!
"
#
$
A
&
&
)
<
3
&
8;
!
%
!
%
4
0
2
0 2
&
%
1 1 +
A
0
2
&
. /
/
.
6
8 8
0 2 7 0 2 /
0 2
01882
0 3 182 -
9$ 01882
:
!
"
#J .
!
*
0
2
J
J .
!
3 1+
*
5
0 %
/
5 K % :
812
!
3 1+ '
"#
/
(
:
(
/
$
88
X1 X2 X3 XN
3 1+6 )
A
!
'
9
*
.
0
2
0 2 &
0
2
3 1+6
8 8
0 2 7
0 2
/
0 2 018 2
8 8
0
2
7
0 2 /
018+2
!
/*
8
7
0
2 018>2
8
$
*
6
8
B 7
018C2
F 8
0
2
7 018I2
$
*
6
8
B
7
0
B 2 01 ;2
0D
% &
$
*
2
5 /
B $
B
(
0
2
.
!
0
2
0 2
0 2
J !
*
%
)
0 2
*
&
/
"
# (
/
0 2
6
J !
(
!
* "
#
8+
X1 X2 X3 XN
3 1,6 !
(
%
(
4
/
.
0 2
%/
%/
*
!
*
3 1, !
D
!
3 1+5
!
$
9$ 018+2 !
6
02 7 0 8 2 /
8
2
0
01 82
&
6
8 8 8 8
0 2 7 0 2 /
0
2 0 2 /
0 2 01 2
*
0 2 ?
/
/
$ 5
*
"
$# )
/ )
0 :
8+
/
$
2 !
6
8 8
0 2 7 0 L 2
/
L2
L
0
01 +2
8,
L 7 8
F 8
M F
F 8 01 ,2
M
L
7
F 8 01 12
L L
(
D
%
9$ 01 12
!
3
9$ 01 ,2
!
!
5
9$ 01 ,2
)
M
*
!
(
/
. $
%
0A2
-
A
3 1102
3 1102
3
(
3 1>02
3 1>02 D
5
3
"
# !
3 1102
3 1>02
&
/
X1 X2 X3 XN Xn
N
(a) (b)
3 116 =
! A
02
02 !
&
/ 5
02
02
μ
μ
Xn
X1 X2 X3 XN N
(a) (b)
3 1>6 ! (
02
02
)
02
02
8>
!
%
$
$
*
!
6
8 ; ;
; 8 ;
01 >2
; ; 8
7 8
D
7 8 %
N
0
7 82 6
0
2 7
01 C2
!
? 08 2
7 0 2
!
6
0 2 7
01 H2
7 01 I2
/
!
/
9$ 01 I2
6
0 5 2 7
01+;2
/
/*
!
*
.
!
.6
L0 5 2 7
F 08 2 01+82
6
L05 2 7
01+ 2
8C
$
*
6
7 01++2
B
?
B
6
7
01+,2
7
01+12
7 01+>2
3 9$ 01+>2
9$ 01++2
6
B 7 8
01+C2
D
/
7 01,82
! A
F
7
5 6
; 8 !
6
0
2 7 08 2 5
01, 2
7 0 2
!
P; 8Q -
3 1C
D
7 8 7 8
)
9$ 01+;2
(
/
*
/
% P; 2 !
P; 2
%
R
%
-
?/
(
/
0 /
:
H2
!
:
3 1H !
8I
(.5,.5)
5
(1,1)
(2,2)
4 (10,30)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
?
'
!
/
/
' 4
4
/
"
$#
.
. 0
2
%
/
0 26
0 2 7 0 2 01,,2
6
7 8 01,12
2
! 0
;
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
3 1H6 )
!
/
6 0
2 !
0 2
%
/
! /
3 1H /
7 6
0 2 7 0 2 F 0 2
01,>2
3
:
&
.
5
0 7 82 ?
0 7 82
0 2 ! &
6
X
3 1I6 ) /
!
0 2 !
6
0 2 7 0 7 8 2 01,I2
7 0 2 011;2
/
9$ 01,,2
!
%
:
+
(
3 1I
6
0 7 8 2 7
0 7 8 20 7 82 01182
0 7 8 20 7 82
7
0 2
011 2
0 2
!
/
!
/* 0 7 8 2
.
/
A
5 7
8
A
!
A
3 18; D
/
/
Zn
Xn
N
!
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
−3 −2 −1 0 1 2 3
x
3 1886 ) /
!
0
2 9
7 ; +1 !
0 7 H2 !
0
2
J
$
*
"#
?
0
2
0
2 !
5
!
"
#
6
8
B0 2 7 0
2 011,2
0
$
2
!
/
9$ 011,2
3
"
#
*
7 ;
%
)
(
!
N
$
3
%/
1 8
%
5
B B
F 8
F 8
5
)
"
#5
"
#
4
"%
#
"%
#
9$ 011,2
/
9$ 01,,2 :
/
9$ 01,,2
9$ 011,2
/
%/
/
8
4
4J
) 4
! /
/
!
/
1
/
)
?
/
%
!
3 /
/
!
0
1+2
/
/
! 4
/
%/
5
/
02
)
"
# /
0
2 )
Æ
)
<
!
4 '/ Æ
"#
!
$
)
D
$
!
) 3 18
)
3 18 02
0 2
/
) "#
3 18 02
/
*
)
"#
! / /
%
/
%
>
Zn
Xn
N
Xn
N
(a) (b)
3 18 6 !
02 !
/
02 )
/
/
5 *
/
!
(
/
9
/
0 2
0 3 18+2
&
C
Y
3 18+6 )
Xn
Yn
N
0 2
&
!
?
%/ /
A
0
25 7 8
0A2
!
A
3 18,
.
0
2
H
y
x
x x
x x
x x x
x
x x x x
x x
x x
x x x
x
x
x
x
3 1816 !
/
%
)
/
:
> 8; 1
)
/
082
0 2
6
7
F
01112
*
!
/
$
P
Q 7
!
/
0 3 1812 !
%
".
#
0; 2 0 :
H " *
#
2
6
8 8
! 0
2 7 0 2 /
0
2
011>2
!
I
.
%
!
6
7 "0 2 F 011C2
"0 2 %/
%
!
4
/
6
05 2 7
2
0
011H2
*
0
2
0
2 7 0
7 8
20
7 8
2 011I2
9 /
0
7 8
2
4
! /
0
7 8
2 "#
! 3 18>02 /
4
*
/
! "
$#
0
:
8;2
! /
3 18>02 * /
3 18>02
G
0
2 ?
* 4 "
#
+;
Xn
y
x
x x
x
x
x
(a) Zn x x x
x
x
x
x
zn = 0 x x zn = 1
x x
x
x x x
x
x x
Yn x
N x x x
x
Xn
y
x
x x
x
x x x
x x
x
x
x x
(b) Zn x
x
x
x
x x x
x x x x
x x x
Yn x
N x x
3 18>6 !
02
!
0
2
) /
%
7 8
0
7 8
2
7 ;
0
7 ;
2 !
0
2
7 ;
7 8 02
!
/
+8
τ2 α
μ β
Xn
θ σ2
Yn
N
/
)
9/
+
1 d
Xn X n2 Xn
Yn
N
9$ 011I2
/
0
7 8
2 !
9$ 01>;2
*
0
2 0
2 /
0
2
!
/
%/
/
.
%
D
0 3 18H2 !
%/
5
++
Q Q
X X
(a) (b)
3 18I6 02 !
%
3
%
$
0 2
0 %
2
02 !
%
3
$
0 % 2
=
0
2
0 2
&
!
/
"
#
: %
!
%
%
%
&
&
&
&
%
!
%
&
! %
3 18I02
&
!
+,
Qn Qn
Xn Xn
N N
(a) (b)
3 1 ;6 ! A %
02
02
%
$
0 2
& 0
2 !
$
"
#
0 % 2
&
!
%
G
& 0 3 18I022 !
0 % 2
&
0 % 2 -
0 2
%
)
0
2 6 7 8 A
!
%
3 1 ;
%
( '
/
%
!
/
*
.
$
) /
0 $
+1
x2 x2
x x
x xx x x xx x
o o
x o x
o x x x x
o o
o o o o o o
o x o x
μ1 μ1
o o
x 1 x1
μ0 μ0
(a) (b)
3 1 86 02 :
7 0
2 !
7 ; %
%
7 8
$
02 !
%
0
7 8
2
09$ 01>822 !
%
2
3 1 802 (
%
7 8 /
%
*
7 ;
%
7 8 !
3 1 802
)
:
C
6
0%
7 8
2 7 8 F '8
01>82
!
&
3 1 802
0 2
0
2
3
!
+>
x2 x2
x x
x x
x x x x x x
x x
o x x x
o x x x x x x x
x x
x x
o oo o x xx x x
x x
o x x
oo o xx x
o x
o x
o x
x1 x1
(a) (b)
%
3 1 6 02 ) %
7 ; ";#
%
7 8 "/# 02 !
02
1 8
9
&
:
C
3 1 802
*
%
7 ;
7 8 %
9
?
%
=
*
(
/
%
J
J
+C
Q1 Q2 Q3 Q4 Q5 Q6
X1 X2 X3 X4 X5 X6
3 1 +6 )
:
3 1 02
%
D
3 1 02
%
)
=
%
/
! /
1 8
"
#
!
0
2 9$ 011 2 /
!
%
/
%
"#
3 18I02 /
3 1I02 !
4
&
%
/
%
/
"
#
&
!
3 1 +
9?
5 "
# $
$
/
%
"
#
"
#
/
%
&
&
5
%
"
#
'
"
#
+H
/
%
/
%
/
%
& /
/
)
/
%
/
0 %
2
%
/
/
3
(
3
!
%
?
:
>
(
%
/
5 0 2 (
%
%
)
(
0 (
2
0 (
2
(
. (
=
(
5
!
$
"
(
*
!
(
6
0( 2 7 0 (0220(2 01> 2
D
$
3
(
$
5
0 2
(
/
!
(
.
!
9$ 01> 2 $
0 ( 2
6
0 ( 2
?
(
0 2
0 ( 2
*
0 2
)
5
/*
)
*
0 2
?
(
!
!
6(
0
2 7 0
( 2( 01>12
3
$
(
$
6
0 (
2
9$ 01> 2
0 (
2
9$ 0182 !
"#
0
(
25
"
# !
9$ 01>,2
9$ 01>C2
/ (
/
/
:
>
,;
3$
(
=
$ 3
/
$ )
"#
"# %
K .
/
)
/
/
?
/
%
/
!
!
"#
/
?/
"
%#
$
"
# /
! ):
:
>
/
%
)
:
>
%
= &
6
8 8 8 8
0 2 7 0 2 /
0
2 0 2
/
0 2
01>H2
* &
.
"
#
6
8 8
0 2 /
F
F
01>I2
8 8 8
7 /
F F
F
01C;2
8
8 8 F
7 /
F
F
01C82
,8
/
8 F 8
M F 01C 2
8 8 F 8
M F
F
/
01C+2
8 !
L L
7 /
01C,2
L
7
F 8 01C12
L 7 8
F 8
M F
F 8 01C>2
/
0
*
1
2 %
&
% &
# $
3
. "
2
2
4
2 #
#
# 5
9 : (;<)
xn
yn
μn
θ
Ì
Ò
Ò
Ò
Ì Ò
Ò
Ò
Ò
;
8 ,
"
3
1
, "
"
1
!
'
( )
( )
3
?
?
?
( )
.
3
<
9 :
( ) (;)
1
% & 9 ( )
1
9 .
C
-# ;< #
*
#
9 B
156
" 8
D
x2
(6)
θ
θ*
x1
(1)
θ
(0)
θ
> ; 3
156
,
£
156
x2
x1
x3
F 9 (;D)
F
* -
( > ;0)
F
G
(1)
X
y^
(2)
X
col ( X )
> ;03 ,
*
1 "
( ) *
F
"
F ?
( )
<B
7
+
4
F
,
"
£
?
'
4 "
> ;0
F
?
'
9 F
F 9 £
#
3
( £ ) 9 B (;E)
£ 9 (;G)
#
+ +
£
#
+ ?
-
.
+
#
F
9 F -
(# )
#
+
()3
< <
( ) 9 ( ) (;<B)
'
3
9 ( ) (;<<)
3
9 ( ) (;< )
+
+ £
*
*
* # H
-# (;< )
( ) -#
!
"
#
$
!!
<<
H F 9
3
9 ( ) (;</)
> +
3
( £ ) 9 B (;<0)
# 3
£ 9 (;<)
£ 9 ( ) (;<G)
5
" A *
( ,* III)
"
# #
6
%
&
*
#
?
#3
! JH
*
"
4 *
2
> ;< > ;0
#
>
-# (;< )
3
9 : ( ) (; B)
+
+
@
Æ
Æ
!
@
2
2
-# (; B)
(
)
2
%
&2
"
Æ
( )
?
3
156
> #
+
*
156
#
+
156
*
(
)
*
2
2
$ + #
3
9 : ( ) (; )
9 : ( ) (; /)
9 :
(; 0)
9 ( ) : (; )
9 (
9 ( ) (
) :
) : :
(; ;)
(; D)
9 ( ) : ( ) (; E)
" 1
" +
"2
+
3
½
½
9 ( ) (; G)
9 ( ) (;/B)
9 ( ) (;/<)
"
<0
* * # 3
B
K L
(;//)
. +
*
> ;3 ( ) * > ;
9 £
* ? *
*
3
9 M (;/)
9 M (;/;)
@
9 3
<
() 9 : () () (;/D)
<
9 : (;/E)
<
9 : M (;/G)
£ # N
# (
)
3
9 ( ) (;0 )
9 M (;0/)
#
#
3
9 (< ) (;00)
156 ( % & )
*
156
.
*
3 (<)
"* ( )
(/)
5 (0)
% &.
" ,
%
& + "* %
&
156
+
+.
(
+)
,
156
# ?
%
&'
$
()
<D
+ #
%
&
%&
%& 1
#
9 < 1
3
<
( ) 9 ( ) (;0D)
* " * ( )
3
<
() 9 ( ) ( ) (;0E)
*
"
£
-# (;0E)3
9 (;0G)
( ) 9
< *
!
3
<
(
) (;<)
?
3
<
<
( ) 9 * (
) (; )
( )
<E
* 3
<
( . ) 9 ( ) (;/)
# #
( ) -# (;<B)
*+
# + #
!
2
*
2 + #
5 #
+ *
#
! '
$
!
7
-
.
!
?"
#
#
%&
#
"
# + #
!
! #
*
#
#
,
"
+ > *
5
* 5
1
<G
x2
x1
(2)
θ
(1)
θ
(0)
θ
> ;;3
156
#
# $
%
&
' (
)
""
&
! "
! " #
$
%
!
"
&
'
(
% ) (
*
# +
#
,
%
! "
)
! "
'
(
'
' &
0
1
2
3
0 1 0 1
-
%
! " ! " 4
(
! "
$
$
&
! " $
5
4
'
6
! "
7
(
89
$
/
'
'
#
/
! "
1.2
0.8
0.6
y
0.4
0.2
−0.2
!8- 8" ! < 6 "
/ !< 68" 98 ; 8-
! "
; 8-
8 /
?
8-- 8 2
92
986 ! < 6 " &
4 9-
@ /
,
(
@
(
/
1.2
0.8
0.6
y
0.4
0.2
−0.2
0 1
A
A
A
<
! < 62"
< 63
; ! " <'
63! "
$
'
/
< 63!"
% '
< < 63!"
6
10
6
x2
5
1
1 2 3 4 5 6 7 8 9 10
x1
Y Y Y
X1 X2 Xm X1 X2 Xm X
< 63. /
. ! "
'
!"
!"
C
/
< 63! "
@
/
* 6 8
/ < 63! " 4
: 9 8 )
.
; 9 < ; 8
.
8 8
! ; 8 " ;
! " !62"
!
"
=
=
< 6-! "
@
'
;
4
< 6-!"
/ D
< 63! "
.
! " ; ! " ! " !63"
; ! "
E
10 10
9 9
8 8
7 7
6 6
x2 5 x2 5
4 4
3 3
2 2
1 1
0 0
0 2 4 6 8 10 0 2 4 6 8 10
x1 x1
(a) (b)
:
! ; 8 " /
/ .
8
8
! ;
" ;
! " F ! " !6-"
! " F
9 8
! "
@
F !
" .
4
.
8
F ! " ! " F ! G " G !6C"
8
89
1
0.8
0.6
φ( z )
0.4
0.2
0
−5 0 5
< 6.
; 8 .
8
! " !6E"
8 G
/
D
D 4
4 $ F ;
4 4
A
! < 66! ""
4
; 9 !
54 !6E" 9-
; 9" /
8 4
!!8 ""
.
! G "
; ! "
!689"
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 2 4 6 8 10 0 2 4 6 8 10
(a) (b)
@ ' '
8
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 2 4 6 8 10 0 2 4 6 8 10
(b) (a)
< 6C. / ; 8
; 8
!"
9-
. ; ! "$ ; 8 /
'
- ?
; 9
; 8 /
; 8$ &
'
@
/ '
.
! " ;
! "
!
"
!68 "
82
; ! " G !
" !682"
& %
.
I ; ! " !683"
; G !8 " !8 " !68-"
5
; !
" /
%
: J 54 !62" !
; 8
"
; 9
<
5 %
.
!
I
" G !8 "!
I
"
I
; $ !6 "
%
:
)
.
; ! ; 8 " !6 2"
< @ ' .
8
8
! ; 8 " ;
! " F ! " !6 3"
! " F
< 63! " <
F
< 63!"
/
) .
; 8 "! ; 8 "
! ; 8 " ; !! ; 8 ! ; 8 "" !6 -"
! " F ! "
; !6 "
! " F ! "
F F G
; !6 6"
F F G
10
0
−2 0 2 4 6 8 10 12
< 6E. 5 ! " ; ! "
; * 4
4
.
! ; 8 " ;
!6 E"
/ ! "
/ % '
! ; 8 " ! ; 8 "
;
! " /
D
4
! < 6E"
&
F ;
54 !6 C"
4
A
< F
F
Æ
/
@ '
5 /
%
.
A
*
@
&
/
/
8
@
'
< 63! " 0
) 1 )
! ; 8 ; 8 "
=
54 !62 "
7
:
) .
! ; 8 " ; !622"
G
G
; !623"
@
% .
! ; 8 " ; !62-"
86
)
)
A
@
54 !623" .
8
! ; 8 " ; !62"
8 G
<
)
. ;
! "$ ; 8
! "; ! " G !
" !626"
/
<
4
:
%.
K! "
G !8 " !62C"
@
$
'
/
'
@
J +
C
$
'
/
.
! " ;
!" !" !633"
! " !"
!
C"
:
'
8
; !636"
8 G
! " ! " G ! "
/
*
.
; 8 "! ; 8 "
! ; 8 " ; !!
; 8 "! ; 8 " !63C"
! " !"
; !63E"
! " !"
! "
; !6-9"
! "
4
/
* 6 2
'
' '
/
'
Æ ' /
X1 X2 Xm
< 689. /
=
!"$
! " /
%
!" ! ; 8 "
)
.
! " ; !" !8 !"" !6-2"
/ ) $
)
!"
/
!8"
!"
! "
*
5
x2
4
1
1 2 3 4 5 6 7 8 9 10
x1
< 688.
Æ
' 4
%
?
.
8
$
; !6 "
8 $
; !8 " !62"
!8 "
; ! " !63"
! "
&
%
= ' :&* A
/ !
"
A
?
<
A
1 1
0.8 0.8
0.6 0.6
y y
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
x x
< 68 . ! " / < 68 !"
!"#$%
/
54 !63"
4
/ ?:* =
'?
4
=
'?
' ! "
%
? !
LLL"
=
'?
.
; (
' !6"
' ( > ' ! "
!
"
' ; ! " ! " 4
'
.
' ; ! " !66"
( ; !6C"
*
& +
,
-
/
=
'?
% ' .
; G ! " ! " !6E"
; ! " !669"
; ) !663"
=
)
)
%
* 54 !6"
.
; G ! ) " ! " !66"
; ! ) " ) G ! " !666"
; ! ) " ) !66C"
.
; G M) N ! " !66E"
/ 54 !66C" ?:*
54 !66C"
0
4 1 5
'4
!" <
% 0
1 ; / %
0
1
!
54 !6-""
5
' /
.
$
G
$
! " !6C9"
/ %
. ) 4
!8 " /
4
) 54 !66-"
) /
)
4
)
4
4
%
?:* 54 !66C"
/ =
'?
'
54 !66C" Æ
%
'
?
* 6 8
' .
! ; 8 " ;
!6C8"
% ' '
4
@ !
" '
'
'
54 !6C8"
01
'
6
; Æ !6C-"
; !Æ " !6C"
Æ 4 ; %
?
.
! " ; !6C6"
! "
/
/
.
! "; !6CC"
C
'
/
; ?
%
.
$
; !6CE"
$
;
!Æ " !6E9"
; !Æ " !6E8"
; ! " !6E "
; 8
/
O !? 54 !63" 54 ! 8""
C
7
'
'
(
'
'
E
,
0 1
! "
8
9
/ ,
- ! " .
! ; 8 " ; !,
" ; - ! " !6E2"
8
$.
P!" ; !6E"
½ ! "
X
Z
Y
< 682.
0.9
0.8
0.7
0.6
φ (z ) 0.5
0.4
0.3
0.2
0.1
0
−5 0 5
< 683. : /
!54 6--" ' @ !54 6E" '
' ! 54 689-"
28
X1 X2 Xm
Z1 Z2 Zm
Y
< 68-.
',?
$
%&
01
',?
:
) .
; !6E6"
.
*
A
/
,
.
8 !
, " !
, " !
, "
; !6EC"
9
/
,?
<'
68-
. ! ; 8" )
,
54 !6EC".
! ; 9 . " ; ! ; 8" ; . !6EE"
2
/
!
"
/ B
/
',?
54 !6EE".
! ; 9 . " ;
. !6899"
! ; 8 " ; 8
!6898"
',?
! " ; !689 "
O
J
<
'
J
, %
, '% !<
" +
4
.
! ; 8 " ; 8
!689-"
22
'
&
/
'
)
/ ?:* %
'
:&*
/
;
4
'
@
/
Æ
4
,
'
=
0
1
(
;
!
-"
$ %
"
#
#
' &
'
#
*
+ , - +,
+, +, +./,
0
1
! + , Æ 2
! + ,
2
3
+
, )
+ , 45 +./,
*
+, - +,
+,
+. ,
+ ,
" !
&
*
5
+ , - + , '
45 +./,
# 45 +.6 ,
!
7
*
#
#
!
7
*
-
/
+.:,
+, - +.6,
+, - +/
, - +/ 8 , +..,
+, - / +.;,
<
= 45 +.:,
*
- / 8/ +./>,
!
*
+ , - ?
+.//,
*
+ , - /?
+./ ,
- +./0,
+, - +./1,
+, - - +./9,
/
+, -
? +./:,
! + ,
*
+ ,
-
+ ,
/ /
+./.,
-
/
/ /
+./;,
!
*
-
/ +. >,
+, -
+. /,
+, -
8 - / + , +. ,
1
+, - / +. 0,
<
Æ
!
2
4 @
/0
:
' &
$ -+ ,
$
*
+
, - ? ?
?
?
½ ¾
+. 1,
A
*
+
, -
+. 9,
+ , 5 "
'
+ = ./ ,
# + ,2
)
*
- / $ &
(
+
/
,
! #
"
&
/
*
+
, -
+. :,
-
8 / /
+. 6,
8
-
/
/
+. .,
- /
A
*
-
/
-
+. ;,
- /
/ A
& 2 45 +. ;,
# - >
6
'
45 +. ;,
! #
45 +. ;,
*
-
+.0>,
'
# " + , 7
7
+ , - +/ 8 , ! # &
*
-
+.0 ,
/8
/
- +.00,
/ 8
- +.01,
7
! # *
-
+.09,
- +/ , +.0:,
7
& + , 5 + ,
2 +
,
4 @ & + , *
-
+,+,
+.06,
+,
+,+,
+,+,
- +.0.,
- +,
+, +,+,
+.0;,
- + , +.1>,
.
!
& + , 5 Æ
$
# *
- +,
+, +,+ +, ¼+,,+,
+.1/,
- +,
+, +,+ +, +
,, + , +.1 ,
- +,
+, +,+,
+ , +,
+, +,+,
- +, + + ,, +.10,
- B C +,D +.11,
$ #
+ ,*
/
+ ,- 1
+ , +.19,
-
- /
-
+.16,
/
- +.1.,
Æ
#
+ 4 ,
;
+ ,
*
-
+.9/,
!
+ , A
&
E
- + ,
!
"
(
" ( (
"
" 2 =
.//
"
5
"
"
Æ
Æ =Æ "
Æ
'
+ , =
!
Æ + , Æ
+ , ! + ,
$ #
=Æ &
F
7 5
#
$
7
+ , Æ
*
+ , +.9 ,
*
!
A ./+ , Æ
/>
X T (X) θ
(a)
X T (X) θ
(b)
X T (X) θ
(c)
A ./*
Æ 5
+ , !
+ ,
! 5 & Æ
A
./+, !
A ./+ ,
+ ,
"
F
A " G
*
!
5 & Æ
& + , Æ
+ ,
7 7 5 & Æ
" + ,
"
#
! "
<
A ./+,
A ./+ , A ./+,
A ./+,
#
G
*
+
+ , , - + + , , + + ,, +.99,
//
<
+ ,
+ ,
5 H + ,
*
'
7
"
7 5
# 5
5 & Æ #
< "
Æ
'
Æ
&*
+ , - +,
+, +, +.96,
-
+
, "
! #
G *
+ , - + ,
+ , +, - + ,
+ , +, +.9.,
A
Æ
+ ,
A
+ = .//, 7
Æ + , 5
A H Æ 5
! Æ
!
A Æ
+ , - +
,
H Æ 5
#
$
#
"
#
@
9
/
@ H - + ,
A 45 +.9.,
#*
+ , - + , 8 + , +, +.9;,
! #
*
- + ,
+, +.:>,
$
% &$'(
&
)
"
#
L #$ +L$, !
2
&
) @
; // /;
L #$ +L$, 7
L$
! # L$ # & &
M+ , !
/0
*
M+, / Æ+ , +.:0,
+ Æ , L# + , Æ
5 5 "
+ , + , M+ ,
#
*
/
M+, + , -
Æ+ , + , +.:1,
/
-
Æ+ , + , +.:9,
/
-
+ , +.::,
+ ,
/
- +.:6,
!
# /
$
L$
+ , *
!
#
Xn
Yn
N
A . * !
"
+ A . , ' + , &
@
: 6
' &
$
"
*
- + , +.6/,
+, A &
+, +
, +
,
!
&
A
& 7 + ,
+
,
!
#
7
' $ #
+ ,*
!
! -
!
+!, !
#
/9
θ
f ψ
ξ μ η
x
A .0* ' "
$
!
"
!
" A .0 <
"
$
#
#
$
#
#
*
45 +./,
"
*
+ ,
-
+ ,
/ /
+.60,
/
-
+.61,
'
5
!
<
5 + "
,
! 3
<
$
+ ,
!
Æ
!
& $ * +/,
+ ,
+ ,
/:
!
!
7
!
& $ !
3
A
7
> /
+> /, =
+> , =
!
( ( 5
!
- 5 + , - + ,
< + ,
!
$
"
& F
F
3 F
! &
#
! - + ,
! *
$
# "
" #
2
+ 4 ,
@ H
- + ,2 - /
! #
45 +.6 ,
#
/6
@
-
7
- / +/ 8 ,
-
-
-
!
- + , - + , - !
- #
&*
+ , -
+ , 8 + , +.6.,
-
+ , 8 + , +.6;,
A
(
Æ
! & & + ,
!
5
&
" !
$
#*
-
+..>,
- + ¼+ ,,
+../,
-
+
,
+.. ,
!
/.
!
A
-
#*
- + , +..0,
!
JK + ,
'
# @ &
#
*
-
8 $+
,
+..1,
- +
,
$
"
&
$
+ , + , !
Æ
"
$ = !
( $ =
# 45 +..1,
$
+= 60/,
!
"#
( <
!
$ A
*
- "/ + , - "/ + , +..9,
&
/
/
/;
% - "/
+..:,
- " &
/
+..6,
-
8 + &
, +
,
+..;,
- + &
, &
8 +
,
+.;>,
&*
'
- 8 C&
D +
,
+.; ,
! 45 +.;/, $=
& E
+ 4 , <
!
$
E <
#
%
!
&
(
$=
½
!
!
" #
$
%
&
'
&
(
)*
+
,-
,.
/
0
(
1
*
X1
X1 X1 X1
X2 X3
X2 X3
X2 X3
X4
X4
(a) (b)
0 $
% ),34
$
2
2
3 4 6 3 4 7 3 4 7 3 4 7 3 1 4 3) 4
%
8
&
&
.
5
9
&
:
;
3 4
2
3 4 6 3 4
3)14
6 3 4
5
3<4
(
!
<
9
3 4 3 4
6 3 4
<
!
3 4
,
3 % ) 4
3 4
3 4
=
X1 X1 ,1 X1,2 X1,N
X2 X 2 ,1 X 2,2 X 2,N
X3 X 3 ,1 X 3,2 X 3,N
(a) (b)
3 4 5
>
3 4
6 3 4
2
6 3
4 3)*4
3 4
2
3 4 6 3 4 3).4
6 3 4
3)=4
3; 4 6
3 4; 3)?4
?
'
6 '
¼
!
Æ
&
Æ
3 4
Æ
9
Æ
$
"
#
Æ
%
3 4
3
4
2
3 4
Æ3 4 3)@4
Æ 3 4
&
"
#!
%
3 4
2
3 4
3 4 3))4
Î Ò
3 46
3 4 6 3 4 3),,4
¾ ¿ ¾
'
5
!
9
2
3
4
3 4 3), 4
Î Ò
3
4
(
9
$
$
3
4
$
&
2
3
4 6 3 4 6 , 3),14
A
&
2
3 4 3
4 3),*4
&
$
2
3 4 6 3
4 3),.4
6 3 4
3),=4
3&4
<
6
3 4 5
&
)
3 4 6 3 4 Æ Î Î
3),?4
Î
B
C
$
2
3 4 6 3 4 3),@4
6
Æ
3
Î
4 3 4 3) -4
Î
6 3 4 3 4 3) ,4
Î
'
;
$
+
2
3 4 6
3 4 3 4 3) 4
Î
6 3 4 3 4 3) 14
Î
6 3 4 3 4 3) *4
Î
6 3 4 3
4 3) .4
Î Ò
6 3 4 3
4 3) =4
Î Ò
6 3
4 3
4 3) ?4
$
> 3) ?4
3
4
Æ
3
4
,-
'
3
4
&
3
4 &
3
4 3
4
3
4 5 9
&
> 3),14
3 > 42
3 4 3 4
D 3
4 6
6
3) @4
3 4 3 4
!
$
3
&
4
%
3 4 &
3A9E 4
+
@
E
A9E
Æ
!
:9(
!
A9E
:9(
3
4
C
F
&
( F
&
!
,
$
&
!
!
&
%
8
+
,)
E
9
3 4
,,
<
2
6 3
4 3) )4
F
&
3 4
'
A
$
3 4 2
,
3 4 6 3 4
3)1-4
6 3 4
&
2
6 3 4
3)1,4
Î
3
4
&
3 +
,)
4
:
2
3 4
Æ 3 4 3)1 4
3 4
3 4 3)114
Î Ò
6
'
3 4 3)1*4
Î
3
Æ
4
2
3 4 6 3 4 Æ Î Î
3)1.4
Î
,
2
3 4 6 3 4 3)1=4
6 3 4 Æ Î Î
3)1?4
Î
2
3 ; 4
6
Æ
6 3
3
4
4 3 4
3)1@4
3)1)4
Î
6 3 4 3 4 3)*-4
Î
,
6 3 4 3 4 3)*,4
Î
3 4
Æ
'
8
!
3 4
5
;
E
(
)1*
9
3 4
3
4
> 3)*14
3 4
;
3 4 3 4
2
,
6 3G 4 3)**4
3
4 3
4
,1
6
,Æ 3G 4 3G
4 3)*.4
Æ
3 4
,
6 3G 4 3G 4 3)*=4
6
Æ
3G 4
,
,
4
3G 3)*?4
Æ
3G 4
,
6 3G 4 3G4 3)*@4
3 4
3 4
3 4
6 3)*)4
2
6
3 4
33 44
3).-4
3
4 3
4
3 4 - 3 >
4
(
<
G34 34
G3 4 3 4
2
D
3 4 6 G3 4 3). 4
&
!
B
H > 3). 4
3
4
3 4
+
% )1 E
2
3 4 6 , 3 4 3 4 3).14
,*
X1 X2 X3
Æ
<
G3 4 G3 4
%
B
H
9
2
D 3
¿
4 6
D
3 4 6 G3 4 3)..4
¿
D
3 4 6 D
3 4 6 G3 4 3).=4
½
E
%
2
D
3 4 6 G3 4 3).?4
D 3 4 6
G3 4 3).@4
G3 4
6 ,
$
;
% )*34
5
>
% )*3 4 B
2
> 3).)4
,.
X1 X2 X1 X2
X3 X4 X3 X4
(a) (b)
% )*2 34 5 3 4 5
% )*3 4
3 4
3 4
&
!
!
+
,? %
5
$
% )1
6 6 6 (
% )*3 4
6 6 6
% )*34
&
A
2
,=
8
% )1 )*34 )*3 42
/ % )*34
3
4
5
"
#
$
J%
!
J%
J%
$
J%
$
3 4
% > 3).-4
2
G3 4 3 4
6 3)=-4
3 4 3 4
G3 4 3 4
'
3 4
3 4
3 4
3 4
3 4 9
G3 4
3
4 6 3 4
3)=,4
3 4
,?
J%
J%
> 3)=,4
;
0
J%
2
3
4 6
34 3)= 4
6
Î Ò
,
3
4 3)=14
6
Î Ò
,
3 4 3
4 3)=*4
6
,
Î Ò
3 4
G3 4
3
4 3)=.4
3 4
Î Ò
G3 4 ,
3
6 4 3)==4
3 4
Î Ò
6
G3 4
34 3)=?4
3 4
Î Ò
6
G3 4 3
4 3)=@4
3 4
6
G3 4 3)=)4
'
G3 4
& 3 4 &
2
6 3)?-4
&
J%
E
3,42
3 4 6 G3 4 3)?,4
,@
J%
G3 4
J%
&
(
)1
J%
&
J%
5
3> 4
J%
$
2
3
4 6 3 4
G3 4
3)? 4
3 4
J%
%
2
3
4 6 3 4G3 4
3)?14
"
# 3 4
&
(
)1
% > 3)*=4
2
6
3 4 Æ
3G 4 3G 4 3)?*4
3 3
4 4
9
!
3 4
&
3G 4
5
F
2
6
3 4 Æ
3G 4 3G 4
3)?.4
3
4 3
4
½
! !
"# $
,)
"#
3G 4
3 4
(
&
J%
B
3 4 5
3 4
&
3 4 3 4
J%
6
&
3 42
6
3 4 Æ
3G 4 3G
4 3)?=4
3 4
3 4
Æ
3 4
6
3G 4 3G
4 3)??4
3 4
Æ
6
3 4
3G 4
,
3G
4 3)?@4
3 4 3 4
6
3 4
3 4
3)?)4
3 4 3 4
J%
2
3
4 6 3 4
G3 4
3)@-4
3 4
&
J%
5
K9
K9
F
$
3 4 6
3 43 4 3 4 6 3 43 4
2
33
4 3
44 6 33 4 3 44 7
3 4 33 4 3 44 3)@,4
3
5 LLL4
:
&
&
K9
2
3
4
K9
8
G34 G34
4
3G 34 3 44
$
3 4
&
K9
5 > 3)@,4
2
+
3 4
8
3 4
> 3)@14
3 4 &
K9
&
&
3 4
G3 4 0
J%
J%
3G 34 3 44
5
J%
>
3 4
2
6
3 4
3 4
3)@*4
3 4 3 4 3 4
2
3
4 6 3 4 7
G3 4
3 4
3)@.4
3
4
&
8
A
J%
$
J% +
,)
$
<
&
&
¾
$% % ! $
& '
,
Z Z
X1 X2 X3 X1 X2 X3
(a) (b)
% ).2 34
3 4
8
+
% ).34 B
;
2
3 ; 4 6
% ).3 4
2
3 4 3)@)4
6
>E
% ).3 4
% ).34
>E
>E
> 3))-4
>
+
,,
>E
%
>
Æ
Æ
&
:9(
'
%
<
;
F
; +
,?
$
+
-
%
!"
!
"
! #"
!"
!
! $
%
"
!
!
&
! $
"
"
!
!
#
!
#"
'
!
(
"
)
"
(
!
"
'
*
%
+
!"
,&
!
(
-
.
/
!
!
! ! !
0 ½ /
!
!"
"
! (
!
1
2 !
#
" ! (
!
"
#
" !
( #"
!
!
(
!
+ "
3
"
"
!
4
"
!"
!
!
!
"
!
!
!
"
#
"
! 1(
"2 (
!
!
"
½ #
" 5 (
"
!
#
!
!"
"
;
! $
< ="
!
! 1
2
3
"
!
!"
8
) 0 0 99
'
!
"
#
!
!
%
3
"
&
!"
! 4
"
1
2
" (
8
)
99-
&
!
>
"
"
;
!
"
#
!
<
8
) 0 99.
0 99:
0
99?
99@
99A
?
(
"
¾ # & 99@ BC
&
!"
&
!
$
DDD
'
!
'
!
$ E 9
" (
´µ
+
´·½µ "
´·½µ
"
!
´·½µ 3
" 8
´·½µ 0
´µ 99=
´·½µ 0
´·½µ 999F
! ( 1
2
)
(
&
!
"
8
0
9999
0
999
)
0 999-
"
&
)
%
"
´µ
)
"
!
´·½µ 0 ´µ !
"
´µ
8
´µ ´µ 0 ´µ 999.
´µ
0 ´µ ´µ 999:
0 ´µ 999?
0 ´µ ) 999@
3
)
´µ "
´µ
!
!
; !
(
;
)
*
%
*%
8
)
0 )
999A
0
999=
0
99 F
0
99 9
# $
DDD *%
& BC
&
!" *%
&!
!
& ´µ
,
; )
&
"
´µ
!
3
´µ "
"
´µ
1 2
"
1 2
!
´·½µ !
" 1 2 ´·½µ "
&
;
) G # "
#
!
"
# " " !
" 8
´µ ) 0
´·½µ ´µ 99
!
" "
"
! !
*%
A
/
< A
&
*%
%
&
" *%
8
H
0 H E H H 99 -
H
E H H 99 .
0 H
E H H 99 :
H
0 H
99 ?
0 H
99 @
*%
' &
!
'
! 1 *%
"2 *%
5
H H
!
#
"
*%
&
!
*%
7(
H
" 8
´·½µ 0
´µ 99 A
´·½µ 0
´·½µ 99 =
'
*%
!
!
"
!
" "
< 9="
& 5
'
%,
< ?
Æ
I,
!
J
!
!
"
# "
$
%
&
'(
$
)
$ "
"
) "
$ $
' (
*
'( $
"
+
%
, %
%
-
' ( +
. +
./
,
- $
' ( $
$
$
%
0
1
2
'
(
+
3
"
$
4
$ " %
2 " 2
5 6
$ "
)
+
/$ "
$
'
( '
( &
5 6 = 5 = .
65 = . 6 5.;.6
=
5 = . 6 5.; 6
= 5
6
4
/
X
# .;.< ) "
& %
5 =
. 6
> "
"
>
5
? 6 7
"
5 6 =
"
# @ 5.; 6
> "<
.
.
5
6 ? 5
6 5.;06
5
6 ?
# .;
> " $
&
"
2
2
#
A
<
5 = . 6 5.;/6
3
1 1 1
0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
(a) (b) (c)
# .; < " 0 >
%
5 6
"
$ 56
5 = . 65 = .
6
= 5.;36
5 6
=
5
? 6
5
? 6
5.;96
7
9
$
? $
% "
'
( '
( &
8
A
B $
$
A
$
"
C = < =
. &
# .;0$
' ( "
2 <
=
5
? 6 5.;E6
@ 5.;06 7
$
$
$ "
$
"
F
9
Zn
Xn
N
&
",
" HHH
",
@"
%: ", 5@:6 &
-
$
"
+
0 +
.9 A
Æ $ @:
G
@: > "
+
..
@:$ > " $
4
= < = .
$
&
$
' ( &
7
$ ' (
D
'+ ( -
$
5
6
2
'% %(
$
& %
5 6
&
#
&$ $
$
,
& %
$ "
&
$
5
=
@ 6 $ $
. =
5.;.;6
;
$
=
<
5.;..6
&
%
%
5 "
"6 %
$
)
% $
,
$
,
<
=
5.;. 6
%
$
= $
56
E
7
,
) ,
$
&
&
56$
5 6
7"
%
$ %
%
$ $
56 &
$
56$ %
56 " %
" )
%
$
' ( &
" $ 8
' ( & >
5 @ 59 ;66<
I =
5.;.06
.;
& %
$
$
$
"
$ 5 6 4
"
2
@ 5.;.06<
I =
5.;.16
&$
@
%
&
' $(
@ 5.;.16 @ 5.;96
2 % %
$
) @ 5.;96$
%
$
$ @ 5.;.16$
$
% %
5
6 >
>
$
&
@: > "
& $ <
=
5
? 6
5
? 6
5.;./6
#
<
?
=
5
65
6
5.;.96
$
# $
' (
$
! " ! # $
..
$
<
.
=
5.;.D6
7
- ' (
@ .;./$ .;.D$ .;.3$ .;.9 @: > "
&
@ 5.;./6
'@"
$( '@
( &
@ .;.D$ .;.3$ .;.9 ': "%
,
$( ':
( & "
+
..
# .;/
@:
$
5 6 . ; 2
56
B 56
:%
2 $ $ >
$
$
>
$
@:
56
$ 7 @:
5
)
$
+
..$
@: %
.
L=8 L = 10 L = 12
<
=
5
? 6 5.;.E6
5
? 6
= 5
? 6 5.; ;6
5
? 6
= 5
? 6 5.; .6
= ? 5
6 5.; 6
,
=
5.; 06
)
<
? =
5
65
6
5.; 16
"
<
.
= 5.; /6
& "
$ @ .; /$ .; 0 .; 1
%
$ 2 $
$
$
$
% @ .; /$
.; 0 .; 1
$
+
@ .;.3$ .;.9$ .;.D$
@:
2 @:
%
$ $
:$
$
$
&
@:
$
!
@: > "
%
$
@:
&
.1
"
@:
+
..$
$
+
..
& $
&
$
$
5 6 = 5 6 < = .
$
$
<
5 6 = 5 6 5.; 36
= J
5
? 6K 5.; 96
= J
5
? 6K 5.; D6
7 L
$
<
5 6=
5
? 6 5.; E6
"
$
<
5 6 = J
5
? 6K 5.;0;6
=
5
? 6 5.;0.6
=
5
? 6 5.;0 6
+
@ 5.; D6 @ 5.;0 6$
"
' (
$
./
@
@: ' "
( > "
$
$
$ $
@
"
:$
:
@: ",
"
2
"
+
..$
:
@ .;.D$ .;.3$ .;.9 "
@ 5.;0 6
+ @ 5.;0 6
$ 5
6$
<
.
5
6 =
5
6 ? 5
6 5.;006
2
%
+ 5
6
, <
=
5.;016
$ " ? $
<
.
5? 6 =
? N 5
6 ? 5
6 5.;0/6
&
" > &
? , <
?
=
5
65
6
5.;036
5
6 =
5.;096
# .;3<
# .;9<
@:
$
F
", "
", $
! 2
@:
"
! F
! 2 !
&
+
.. A $
$
& '% %(
" $
"
+
# .9.3$
2 ,
$
Æ
$
)
, " 2 %
"
$
$
2
$
)
)
.9
&
& "
$
L
, 7
L L
, $
&
5 6
,
# "
$
" <
5 = . 6 =
5.;0E6
= 5
&
$
5 =
. 6 &
$
, 5>: 6 # "
5 = . 6 7
)
" &
B
$
<
5 6 = 5 = . 65 = . 6 5.;1;6
=
5 = . 65 = . 6
5 = . 65 = . 6
5.;1 6
5 65 = . 6
= 5.;106
5 65 = . 6
=
.D
# .;D<
"
5 = . 6
5 6 =
5 6 5 6 5.;116
2 "
" , %
$
) "
"
<
5 6 =
5 6
5 6 5.
5 66 5.;1/6
% %
&
$
A >
"
#
)
,
# .9.3$
"
.;.0 .;./ "
$ "
$
@: "
2 C = 5 6 < = . &
@ 5.;1;6<
5 6=
5 65 = . 6 5.;136
7
- $
& '
( = 5 6 < = . $
&
<
5 6 =
5 65 = . 6 5.;196
6=
& ,$ "
<
5
56
5 65 = . 6 5.;/.6
5
6$
2 "
$
:
",
+
$ 5 6$
<
5 6 = 56
5 6 5.;/ 6
&
5 @ 5 66$
5 6
5 6 &
' (
' ( A %
5 6
4
:
+
"
$
5 6$
<
5 6 = 56 5 = . 6 5.;/06
!
, $
$
<
= N "5
5
66 5.;//6
"
,
A
&
@ 5.;/16
: $
,
&$ ' (
"
$
$
$
"
$
, ,
G
&
@ 5.;/16
&
: $
L
5 = .6
$
5 = ;6
>
$
@ 5.;//6 L
)
#%
.9.356$
$
%
) @: $
,
G
"$
"
;
.
= . $ ?
%
2
"
"
#
$
"
<
"
5# 6
= 5.;/36
"
5# 6
"
%
"
" +
<
? = $ $ 5.;/96
$
?
=
$ 5.;/D6
*
$ %
#!! & %
' ( ) *
+
'
( ' ( )
"
&
, - - .
)
/
0 #!!
"
)
"
)
#!!
, - -
1
#!!
2
'( 0 #!!
, - - 3
3
$
& , - - 4
#!!
5
6
#!!
7
8
y2 x x
y2 x x x
x x
x x x x
x x x x
x
x
x x
x x
x x x
x x x
x
x x x x
x x x x
x x x
x x x
x x
x x x
x
x x
y1 x
y1
x
(a) (b)
, - -.
#!!
&
)
+
Æ2
)
#!! "
$
/
9 / 6
"
2
" 0 #!!
)
#!!
, - "
#!!
" $
"
' ( 0
'
(
½
:
q0 q1 q2 qT
A A
π q q
y0 y1 y2 yT
, - . )
#!!
;
)
0
< = ) $
$ & <
#!!
0
" /
,
5
)
>
%
) '
( )
9
'
(
)
)
"
0
0
& #!!
) $
%
< - ;
?
%
0
)
<
+
.
<
- -
)
$
)
" "
·½ ,
.
·½ @ A
·½ -
B
5 %
"
C
¼ .
¼
@ A¼ - 7
5
"
0
·½ ¼
;$ -
- 7
D +
.
< ¼ ·½
- 8
)
&
$
F
$
$
'(
$
>
)
,
G
<
$
G $
6
<
)
;$ - 8 0
H
> "
.
<
·½
- :
¼ ½
) #!!
B
$
"
qt qt+1
A
yt yt +1
, - 7.
#!!
;
0
) "
"
"
)
, - 7 0
6
.
< - ?
< - E
,
" .
< - I
< - J
- -=
$
<
- --
$
J
/
$
;$ - J
.
<
- -
- -7
&
0
)
$
G
/
$
! "
, - 7 +
.
<
- -8
<
- -:
<
- -?
<
<
- -E
- -I
<
- -J
<
- =
<
- -
<
·½
-
)
< -
< $
9
'
( )
.
<
- 7
<
- 8
<
¼ - :
<
<
- ?
- E
<
·½
- I
<
·½
- J
<
·½
·½
- 7=
·½
.
< - 7-
<
- 7
<
- 77
<
- 78
"
"
;$ - 7-
%
Æ ! ;$ - -
--
"
Æ )
$
)
$
"
"
"
"
%
0
) 5
"
$ <
) "
"
, - 7 1
<
$
% "
.
<
- 7:
)
$
!
2
2
<
)
.
- 7?
<
·½
- 7E
<
·½
- 7I
<
·½
- 7J
·½
<
- 8=
·½
·½
< - 8-
·½
·½
2
C - I
Æ
)
+ C - I
"
$
G
- 8
<
- 87
<
<
·½
< - 88
·½
< - 8:
) &
#!! / $
$ "
$
6
*
) *
Æ &
0 ;
)
&
&
9
2
&
"
2
! &
;$ - 8-
&
&
<
·½
- 8?
¼ ½
1 &
+
&
&
;!
#!!
-8
) ;! #!!
Æ
"
" ,
%
;
$
&
0
%
<
- < - K
.
< @ A - 8E
" !
Æ ;
0 .
< ¼
- 8I
·½
·½
< L
L
- :=
,
Æ
Æ Æ )
"
.
M <
- :-
M <
- :
M < - :7
9
2 C
-:
,
0 ;
;! >
Æ
<
0 .
<
- :8
< < -
- ::
- :?
C Æ
.
<
- :E
<
- :I
- :J
¿
"
(
#(
""*
'(
(
+ %,&%
+ %,&,
#
+- '
-?
< -
M < < - ?-
< .
M < - ?
) ;!
!
"
;
! "
#
$ # " %
#
&
'
" (
$ $
% #' )
% ' # % ' %$*$'
% ' #
*
7
# 8 "
#
# ,
% &
6 '
9
#
(
%'
) %$* '
# ) % '% ' %$**'
9
"
+
+ 6+ $*$ &
(
: ) # %$*7'
) #
%$*;'
(
$
%
:' )
4
: %$*.'
) :
%$*<'
# ) :
%$*='
>
? 9
? "
@
$ $ $
) 4 A
#
(
# #
) #) %$*B'
# #
;
!
+
+ "
:
1 "
C 9
C
! +
+
1 "
2
+ 3
+
2
+ 3
D &
9
E
-
(
) %$*$$'
%D "
' !
(
0
)
0
%$*$ '
0 0
!
& "
&
"
6+ $*$ !
" 1
(
!
%6+ $*$7 6+ $*$.'
+ ! "(
%
'
)
4
% '
%$*$<'
%
'
)
% '
%$*$='
H 6+ $*$<
" 2 3 6+ $*$=
+ !"
"
D
6+ $*$0
"
" % ' ) % '% ' A
<
&
6+ $*$ (
$ # #
# #
$ 0 %## ' 0 # #
)
#
# 0 # 0
$
)
% # #
% '' %## ' % # #
% ''
$
% ' # % ' %$*$B'
6+ $*$;
(
$ $
) %$* 0'
% ' #
% %## # '
'
$ $
) %$* $'
% ' ##
% ' #
!
&
6+ $*$B
& 6+ $* $
%$* *'
! % # '
% # '
6+ $* 6+ $* * "
(
(
) %$* 7'
# ) # %$* ;'
(
) 4 # #
% ' %$* .'
# ) # # #
# %$* <'
=
! " " "
"
6
(
(
(
)
: %$**0'
: ) : %$**$'
I
" " )
"
(
$
% # ' ) # % ' #
% ' %$** '
) % ' # %$***'
(
$
J) %$**7'
I %# ' # (
$
%# ' ) # % ' #
% ' %$**;'
# !
+
(
K L ) K L ) KL %$**<'
"
K L ) %$**B'
)
%$*70'
KL ) %$*7$'
+ (
) K L ) K L ) %$*7 '
$0
(
$ M
) %$*7;'
M !
"
(
) M %$*7.'
! +
%
& '(
) %$'
%$*7<'
%
"
! ' / "
) #
$ K% ' #
% 'L %$*7B'
) #
$ K% '% ' #
L %$*;0'
$$
J
) $
#
% J
'% J
' %$*; '
!
(
J ) #
: J %$*;*'
J
J ) #
J
1 %$*;7'
K "
L
% &'
(
)
)
#
*
( %
(
#
(
( +$
,
%
+$
,
-
* &
$ .
- / + ,
$
+ , %
+ ,
'
&
y3
λ1
λ2
μ
y2
y1
%
/
12!
3 ( #
12! $
12! * &&
6 )0
6 ) 7 &
7
( ( 8
(
8
#
0
9
6 ) 6 &'
:
Y
% & 0
5 7
;- & #
$
0
9
6 ) 6 &&
9
6 ) 6 &:
9
&<
0
= 9 >
6 ) 6
6 ) 6
? &@
9 >) 6 ) 6 ? &A
9 ) ) 6 &B
9 )) 6 7
&5
2
: C
!
9 &
= 9 = 6 =
&
<
4
$
E ;- &
0
9
6 ) 9
&'
= 9 =
6 ) 6 7 &&
9 >) ) ?67 &:
9 )) 6 7F &<
#
;- &' 0
2 9 >
6 ) 6
? &@
9 > ) 6 ? &A
9 )
&B
DDD0
0
2 9 2
6 ) 9 ) &
2
8
(
5
0
)
G9
&
) )) 6 7
9 ) )) 6 7 ½
& '
H
;- 0
9 6 ) 7 ½
) ½
) 7 ½
& &
y3
y2
y1
½
67
½
½
) & :
9 6 ) 7 ) & <
;-
!
;- & <
% "
I
(
!
9
;- & &
;- & <
# +
8,
8
/
2
#
$
! $
L - ;- & @
;- & @
( $
$ 0
M 9 & B
B
;O
2
$ $
"
$
I
$
*
+
,/
2
(
"
% ;- &'
(
+ , ;
O
9 7 )
7 ½
)
&'5
!
) 7
" $
½
5
$
+ $, C
-
0
9 7 > )
7 ½
) ?
0
) )
&'
H
$
)
#
$
$
´µ E
0
7
7
´µ 9 &'' ½
#
/
- # 0
) ) 6 ) )
9
&'&
) )
6 )
)
9 &':
-
#
* &
Æ
- ;
0
9 &'<
9 = 6
&'@
0
>? 9
&'B
!
) ´µ 9
) )
6 ) ) 7
½
?
&&
/
- ¾
%
7
$
7 0
7 ´µ 9 7
7 ½
&&&
9
) )
´·½µ ´·½µ
6 )´·½µ )´·½µ
&&<
)
´·½µ
9 &&@
¾
ÖÓÛ
Ì
!
"
# $
$
%
#
&
'
& (' )*++,-
0
7 ´·½µ
9
)´·½µ
&&A
O
7
>*
?
> ?
- 0
>? 9 &&B
* ':
#
1
0
!
0
> ? 9 &::
- * & &
!
"#
$%""& '
%""
!
(
(
#
(
)*
+
,
),
%""
)
$--"&,
%"".
$
&
$
'
&
!
"#
%""
!
%""
%
%""
*
+
#
%""
*
+
/
#
#
0
$
'
2
2
/
--"
#
*
+
!
!
3
4
--"
%""
!
5
6
x0 x1 x2 xT
A A
Σ0
C C C C
y0 y1 y2 yT
7
4
--"
8
0
9
*
+
$
7
4& /
#
--"
%"" (
"
!
9
%""
'
:
#
;
·½ < ; $4&
)
,
'
/
!
'
'
'
·½
'
(
'
4
/
--"
'
!
'
7
¼ '
=
>¼
$
.
8!
???
&
A
A
(
+
·½
>
--"
%""
A
@
!
%""
(
%""
) ,
)# ,
(
A
#
A
/
+
--"
9
%""
)+
,
)
,
/
D
A
¼
$ ¼ &½
- '
'
--"
$
&
'
'
'
$
5&
$ ¼ &
'
! $
& 9
--"
+
#
/
+
A
/
E
A
¼
!
¼
.
E B ¼ C $4F&
B$ E &$ E & ¼ C
$4G&
(
+
¼ ½ (
E ½
! ½
*
+
7
4 (
¼
$ ¼ &.
E
/
¼ ½
/
·½
·½
E
E ·½ ·½
·½ ·½
H
+
I
A
$8A 4&
·½ < ; $4J&
½
¼
! "!
# $
"
%
&
" '
( '
( )) &
F
xt xt+1 xt x t+1
yt yt+1 yt yt+1
(a) (b)
7
4 $& 9
--"
$&
/
#
!
A
'
¼
E ·½ < E $4=&
- #
A
·½ < $ ·½ E ·½ &$ ·½ E ·½ & ¼ $4&
< $ ;
E &$ ;
E & ¼ $4 &
< ;
$45&
E ·½
K
#
·½
·½
·½
·½
·½
·½
)
,
·½
·½
B ·½ ¼ C < B ·½ ; ·½ ¼ C $46&
< E ·½ $44&
$ ·½ E ·½ &$ ·½ E ·½ & ¼
G
< $ ·½ ; ·½ E ·½ &$ ·½ ; ·½ E ·½ & ¼ $4D&
< ·½ ;
$4F&
$ ·½ E ·½ &$ ·½ E ·½ & ¼
< $ ·½ ; ·½ E ·½ &$ ·½ E ·½ & ¼ $4G&
< ·½ $4J&
/
¼
·½
·½
'
!
E ·½ ·½ ·½
$4 =&
E ·½ ·½ ·½ ;
"#
7
4 $&
'
·½ ·½ < ·½ ·½ $ ·½ ;
& ½
·½ $4 &
/
+
A
9
E
@
E ·½ ·½
·½ ·½
*
+
E¼ ½ < =
¼ ½ < ¼
8A 4 4
+
·½ ·½ $ ·½ ;
&½
$4 F&
L
"
!
!
(
8A 5F
8A 5G
·½ < ·½ $ ·½ ;
& ½
$4 J&
½ ;
& ½
½
< $ ·½
$45=&
< $ ·½ ; ·½ $ ·½ ;
&½ ·½ &
½ $45&
< ·½ ½
·½
$45 &
!
!
! ·½ ·½
*
+
A
H
A
8A 4 5
8A 4 4
8A 455
·½ (
E (
A
E
;
·½ .
$ ·½ E &
H"-
+
!
!
(
)
A
, ·½ < ;
)
,
H
7
!
8A 4
$
&
$
-
???& (
8A 4
<
; $456&
/
#
A
(
*
+
A
E ·½ < E ; ·½
½ $ ·½ E& $454&
½
!
/
A
·½
IH-
*
+
A
(
!
! ·½
½
8A 454
H"- $8A DD& H"-
!
*
+
/
=
!
H"-
A
A
.
*
+
9
*
+
H"-
!
·½
½
+
*
+
I
A
0
# $ !
& +
'
+
+
#
/
+
+
A
/
G
--"
# (
!
!
$8A 5F
8A 5G&
I
5
'
$ 8A 54& M < >½
<
>½ N
+
E ½
½
E
½ ½
½ /
A
8A 4 5
4 D
H
N
+
·½ ½
< ·½ $45D&
< $ ; &½ $45F&
< ½ ½ $ ½ ; ½ & ½ ½
$45G&
< ½ ½ $ ; ½ & ½ ½
$45J&
9
!
·½ ·½ < ·½½ ·½
$46=&
< $ ·½ ·½ $ ·½ ;
& ½ ·½ & ½
$46&
½ ;
½
< ·½
$46 &
< ·½ ;
½ $465&
E ·½ ½ E
< ·½ ·½ $466&
½ E
< ·½ $464&
½ E
< ·½ $46D&
< $ ; &½ E $46F&
< ½ $ ½ ; ½ & ½ E
$46G&
< ½ $ ; ½ & ½ E
$46J&
E ·½ ·½ < ·½½ ·½ E ·½ ·½
$44=&
½ $E
< ·½ ½
·½ ·½ ; ·½ ·½
$ ·½ E ·½ &&
$44&
< $ ·½½ ½ ½ E ½
·½
& ·½ ·½ ;
·½
$44 &
½ ;
½
< $ ·½
½ & ·½½ E ·½ ;
½ ·½
$445&
< E ·½ ;
½ ·½ $446&
/
+
A
9
E
@
E ·½ ·½
·½ ·½
E¼ ½ < O¼
¼ ½ < ¼
*
+
+
A
.
0
I
!
.
9
(
A
¼
¼
+
*
+
3
A
¼
¼
+
+
!
/
9
%""
A
xt xt+1 xt x t+1
yt yt+1 yt yt+1
(a) (b)
7
45 $& 9
--"
$&
·½
#
7
$
%""
& (
),
)I
-
$I-&
,
)
,
3
I-
7
4
7
45 /
·½
¼ I
E ·½ < E
B$ E &$ ·½ E ·½ & ¼ C < $44J&
!
E
$4D=&
E ·½ ·½
A
*
+
/
)# ,
(
·½
¼ L
'
5
$8A 5 D&
·½½
·½
·½
B ·½ ¼ C < B
·½ ¼ C $4D4&
< E ; $ ·½ E ·½ & $4DD&
A
A
.
·½
+
A
!
!
I
9
!
???
!
B C < B B C C $4DJ&
PB C < PB B C C ; BPB C C $4F=&
!
!
/
·½
¼
A
@
8A 4DD
#
!
¼
E B ¼ C $4F&
< B B ·½ ¼ C ¼ C $4F &
< BE ; $ ·½ E ·½ & ¼ C $4F5&
< E ; $ ·½ E ·½ & $4F6&
A
8A 4F6
·½
¼
6
8A 4F6
A
I-
/
+
E
·½
+
E ·½
!
A
/
#
A
$8A 4DG& L
8A 4F=
PB ¼ C $4F4&
< PB B ·½ ¼ C ¼ C ; BPB ·½ ¼ C ¼ C$4FD&
< PBE ; $ ·½ E ·½ & ¼ C ; B ·½ ¼ C $4FF&
< PB$ ·½ E ·½ & ¼ C ; ·½ $4FG&
< PB ·½ ¼ C ; ·½
$4FJ&
< ·½ ; ·½
$4G=&
< ; $ ·½ ·½ &
$4G&
!
#
¼
¼
/
I-
@
A
E ·½
·½ ½
+
E < E ; $
·½ E ·½ & $4G &
< ; $ ·½ ·½ & $4G5&
·½½
E
+
(
!
4
xt xt+1
A
C C
yt yt+1
7
46 $& 9
--"
(
!
#
½
(
#
½
$(
G
#
½ &
A
< ½ ·½ ½
$4G6&
#
),
.
·½ (
*
+
*
+
8A 4G6
7
46
$ ·½ &
H
A
> ·½ < > ;
(
! $ ·½ &
> >
$4G4&
> > ;
/
¾
A
$4GG&
> ·½
> ·½
½ &> ·½
K
!
½ $ ½
>·½
+
Q < ½ $ ½
>·½ ½& $4GJ&
Q> ·½
> ·½ Q
!
!
#
< Q ·½ ; Q Q ·½ $4J=&
Q
Q
$8A 4& (
Q < ½
$4J&
½ ·½
Q ·½ <
> ·½
$4J &
H
A
7
+
$
8!
???& Q ·½
),
·½
/
+
(
$
+
8A 45J 465 46J
446&
A
A
< ;
·½ < ; >½ ½ $ ·½ ·½ ;
½ > ½ & ½ ½
·½
$4J4&
< ½
·½ ;
$4JD&
E ·½ < ½ $ ·½ ·½ ; ½ > ·½
½ & ½ E
·½ ·½
$4JF&
E ½
< E ·½ ;
$4JG&
;
1
+
#
+
$ ·½ &
E ·½ < ½·½ E ·½
·½ < ½·½
F
+
$ ¼ &
$ ·½ &
$ ¼ &
A
+
(
.
9
E
E < $ ½ E ; ½·½ E ·½ & $4JJ&
!
½
< ½ ; ½·½ >½ $4==&
>½
A
+
#
.
/
>½
! !
H
½
¾ -
'
½
¾
/
$ ½ &
$ ¾ &
$ ½ ¾ &
H
½
¾
K
#
L
½
¾
7
44$&
A
$ &
$½ &
$¾ & 7
> 7
'
$ 8A
A'
&
½ < ½ ; ½ $4=&
¾ < ¾ ; ¾ $4= &
½
¾
'
½
¾
½
¾ K
½
¾
H
A
< ;
!
+
!
$ &
> >
$4=5&
> > ;
G
x x
z1 z2 z
(a) (b)
7
44 9
$&
½
¾
$&
½
¾
/
'
$8A 5 D
5 F&
N
E
½
E < > > ;
$4=6&
½
½ ; > ½
½
<
$4=4&
!
$8A 5G&
½
< > > > ;
> $4=D&
½
< >½ ;
½ $4=F&
!
$8A 5F&
½
¾
A
N
+
E $ &
½
½
½ ½ ½ ; > ½ ½
½ ½ ½
E½ <
$4=G&
½
¾
¾ ½ ¾ ; > ½ ¾
¾ ½ ¾
E¾ <
$4=J&
½
½ < ½
½ ½ ½ ; > ½
$4=&
½ ½
¾ < ¾
¾ ½ ¾ ; >
$4&
J
K
½
¾ '
½
¾
$ 7
44$&&
8A 4=4
4=F
½
½ =
¾
=
¾
$4 &
7
+
½
½ ½ = ½
½ ½ ½
;> ½
=
E < ½ ¾ ½ ¾
¾ ½
¾ ½
=
¾ =
¾
½
½
½ ½ ½ ; ¾
¾ ½ ¾ ; > ½ $½
½ ½ ½ ; ¾
¾ ½ ¾ &
<
$45&
½
< ½ ½ ; ¾ ½ > ½ $½ ½ E½ ; ¾ ½ E¾ &
$46&
/
!
8A 4=F
½
< ½ ½ ; ¾ ½ > ½
$44&
8A 46
&
½
¾
E
!
½
8A 4=J
½
½
- E ·½
8A 4=J
·½
8A 4
4 -
8A 4D
44
$8A 4JJ
4==&
Markov Properties
57
58 CHAPTER 4. MARKOV PROPERTIES
so that, once the value of C has been fixed, subsequently learning the value
of B tells us nothing further about the distribution of A. It should be em-
phasized that the property (4.1) must hold for every possible instantiation
of C .
If we multiply both sides of (4.1) by P (B jC ) we obtain an equivalent
statement for the distribution P (A B jC ) in the form
and so B ?? A j C .
A ?? (B D) j C ) A ?? B j C and A ?? D j C: (4.5)
hence
X
P (AjB C D) = P (A DjB C )
D
X
= P (AjB C D)P (DjB C )
D
X
= P (AjC ) P (DjB C )
D
= P (AjC ) (4.6)
A ?? (B D) j C ) A ?? B j C D and A ?? D j C B: (4.7)
A ?? B j (C D) and A ?? D j C ) A ?? (B D) j C: (4.8)
The following property does not hold universally. It does hold, how-
ever, for distributions which are strictly positive.
4.1. CONDITIONAL INDEPENDENCE 61
A ?? B j (C D) and A ?? C j (B D) ) A ?? (B C ) j D: (4.9)
P (A B C jD) = P (AjC D)P (B C jD) = P (AjD)P (B C jD) (4.14)
and hence A ?? (B C ) j D .
62 CHAPTER 4. MARKOV PROPERTIES
X1 X2
X3 X4
While this holds true for arbitrary distributions, we can define a re-
stricted family of distributions by introducting a factorization with respect
to a given directed acyclic graph in the form
Y
P (S ) = P (Sijpa(Si )) (4.16)
i2S
4.2. DIRECTED GRAPHS 63
where pa(Si ) denotes the set of parents of S i in the graph. A directed graph
is acyclic if (and only if) the nodes can be numbered such that for every
node all the parents of that node have a lower number than the node itself.
For example, given the graph in Figure 4.2 we have the following factoriza-
tion
X1 X2
X3 X4
X5
Figure 4.2: A directed graph comprising five nodes.
Comparing (4.15) with (4.17) we see that the latter has variables miss-
ing from some the conditioning sets of the conditional distributions, corre-
sponding to missing links in the directed graph of Figure 4.2. This implies
that any distribution which factorizes according to (4.17) will exhibit some
conditional independence properties. A central goal in this section is to un-
cover these conditional independencies and to relate them quantitatively to
the factorization property.
In discussing Markov properties it may be helpful to regard the directed
acyclic graph as a filter. A graph with M nodes defines a particular factor-
ization of the joint distribution according to (4.16). Any given distribution
over M variables will only pass through the filter if it can be expressed in
terms of the corresponding factorization. Thus the graph defines a family
of distributions, namely the set of all distributions over M variables which
can be expressed in the form (4.16). We denote this factorization property
with respect to a directed graph by DF (for ‘directed factorization’).
64 CHAPTER 4. MARKOV PROPERTIES
4.2.2 d-separation
In order to uncover the conditional independencies implied by a given
graph we consider some simple 3-node graphs, of the kind already dis-
cussed in Chapter ?. First we define a path from a node a to a node b as a
sequence of nodes starting with a and ending with b such that successive
nodes are connected by a link. Note that a path may involve traversing
directed links in either direction. We will use the notion of a path to intro-
duce the concept of nodes which are head-to-head, head-to-tail or tail-to-tail
with respect to a particular path through the graph. From this we obtain
the framework of d-speparation which allows all of the conditional indepen-
dencies implied by the graph to be obtained from the graph itself.
We begin by considering the directed graph shown in Figure 4.3 which
involves 3 nodes A, B , and C in which we have conditioned on C . As we
A C B
Figure 4.3: A directed graph of three nodes for which A ?? B j C.
have already seen in Chapter ? that this graph satisfies the independence
property A ?? B j C . Converseley, in general A and C are not independent
if C is not observed, so that A 6?? B j . The node C is said to be head-to-tail
with respect to the path A–C –B since the arrow on one of the links points
towards node C while the arrow on the other link points away from C .
Conditioning on the node C is said to block the path from A to B .
Similarly, consider the graph in Figure 4.4. As we discussed in Chap-
A B
Figure 4.4: Another directed graph for which A ?? B j C .
respect to the path A–C –B since both arrows point away from C . Again,
we have the notion that conditioning on C has blocked the path from A to
B , and rendered A and B conditionally independent.
Finally we consider the graph of Figure 4.5. In this case we have quite
A B
C
Figure 4.5: A directed graph which does not imply A ?? B j C .
(a) the arrows on the path do not meet head-to-head at the node, and the
node is in the conditioning set, or
(b) the arrows do meet head-to-head, and neither the node, nor any of its
descendents, is in the conditioning set.
Given a set of conditioning nodes C , if every path from any node in a set A
to any node in a set B is blocked, then A is said to be d-separated from B by
66 CHAPTER 4. MARKOV PROPERTIES
w C
A A
u u
B B
C
(a) (b)
Figure 4.6: Illustration of d-separation. In graph (a) the path from A to
B is not blocked by node w since it is not a head-to-head node and is not
observed, nor is it blocked by node u since, although the latter is a head-
to-head node, it has a descendent C which is in the conditioning set. Thus
the conditional independence statement A ?? B j C does not follow from
this graph. In graph (b) the path from A to B is blocked by node C since
this is a tail-to-tail node which is observed, and is also blocked by node u
since u is a head-to-head node and neither it nor its descendents are in the
conditioning set, and so A ?? B j C will be satisfied by any distribution
which factorizes according to this graph.
The converse of this theorem also holds, namely that if a joint distribu-
tion satisfies all of the conditional independence properties which can be
read from a graph by d-separation then the distribution must also factorize
according to that graph using (4.16). We give a formal proof of this result
in Section 4.4.
Of course, not all conditional independencies present in the distribution
can necessarily be determined from the graph by d-separation since there
may be aditional independencies arising from the specific numerical val-
ues associated with the conditional distributions. However, it is possible
to construct an example model which is such that any conditional inde-
pendencies which do not correspond to d-separation are not present in the
distribution, showing that in general d-separation will find all of the con-
ditional independences which can be determined directly from the DAG.
[here we return to the Bayes’ Ball algorithm introduced in Chapter ?,
and demonstrate its equivalence to d-separation]
In some probability distributions there may be deterministic relation-
ships between the variables. In this case there is probability one that the
relationship occurs, and combinations of variable values which do not re-
spect the relationship have probability zero. When the conditional distri-
bution of a node conditioned on its parents depends deterministically on
one of its parents, the corresponding edge in the directed acyclic graph is
sometimes denoted by a double arrow ‘=)’ connecting the nodes.
Since Theorem 4.6 does not require positivity of the joint distribution,
we can use d-separation to read off conditional independence properties
even when deterministic relations are present. However, we can extact ad-
ditional independencies using D-separation, which is simply an extension
of d-separation in which, if a variable is observed, then all other variables
which are deterministically related to that variable are also considered to
be observed. An example is shown in Figure 4.7
A B
Figure 4.7: An illustration of D-separation showing a directed acyclic graph
over four variables in which C depends deterministically on D . The condi-
tional independence statement A ?? B j D does not follow from the stan-
dard d-separation criterion. In the D-separation criterion, however, we also
infer that C is an observed node and hence that A ?? B j D .
venient to consider undirected graphs. Not only do they have simpler se-
mantics than directed graphs, but they will prove useful in formulating and
understanding the Markov properties of directed graphs.
We first introduce the concept of graph separation on an undirected
graph as follows. Given three disjoint subsets A, B , and C of nodes on the
graph, we say that C separates A and B if every path on the graph from any
node in A to any node in B passes through at least one node in C . This is
illustrated in Figure 4.8.
C
A B
Figure 4.8: An illustration of the concept of graph separation in undirected
graphs. The set C separates A from B since every path from a node in A to
a node in B must pass through a node in C .
joint distribution.
4.3.2 Factorization
We saw in Chapter ? how a probability distribution could be constructed
from a product of conditional distributions defined with respect to a di-
rected graph. Here we introduce the corresponding factorization property
for undirected graphs, which we denote by F . First we define a set of nodes
to be complete if there is a link from each node to every other node in the
set. A probability distribution is said to factorize with respect to a given
undirected graph if it can be expressed as the product of functions over the
complete sets of nodes of the graph
Y
P (S ) = a (Sa ) (4.18)
a complete
A B
C D
Figure 4.9: Illustration of the concept of a clique. In this graph there are
two cliques comprising the sets fA B C g and fB C D g.
Suppose we define the potential function for each clique in the graph
to be initially unity. Then we take each of the potential functions a (Sa )
in (4.18) and multiply it into the corresponding clique potential. We then
obtain a representation of the form
Y
P (S ) = C (SC ) (4.19)
C
where the product is over all cliques in the graph. Again, we can think
of the graph as a filter in which only joint distributions which can be ex-
pressed in the form (4.19) will be accepted.
We next show that, for any graph and any distribution, the factorization
property implies the global Markov property.
Theorem 4.7 (F ) G ) For any undirected graph, any distribution which satis-
fies the factorization property with respect to the graph will also respect the global
Markov properties of the graph.
Proof: Consider three disjoint subsets of nodes A, B and C such that C
separates A from B in the graph. Let V denote the set of all nodes in the
e
graph, and let A denote the set of all nodes in A together with all nodes in
V n S which are connnected to A. By the separation assumption A will not e
e e
contain any nodes from B . We then let B = (V n S ) A, so that A, B and ee
e e e
S form disjoint sets such that V = A B S , and S separates A from B . It e
e
follows that any clique is composed either of nodes from S A or of nodes
72 CHAPTER 4. MARKOV PROPERTIES
e
from S B , and so by the factorization property
We therefore have
e e e
and hence A ?? B j S . By noting that A is a subset of A, and B is a subset of
e
B , and then applying Theorem 4.2 twice we obtain A ?? B j S as required.
The converse of Theorem 4.7 does not hold for all distributions. How-
ever, the Hammersley-Clifford theorem states that, for strictly positive prob-
ability distributions, the global Markov property is equivalent to the fac-
torization property. The proof of this theorem is postponed to Section 4.5.3
since it makes use of additional Markov properties discussed in Section 4.5.
In the next section we exploit the formalism of undirected graphs to
gain further insights into the properties of directed graphs.
A C B
Figure 4.10: A directed graph of three nodes for which A ?? B j C .
ready seen in Section 4.2.2 that this graph satisfies the independence prop-
erty A ?? B j C . Note that this property could have been read off from an
undirected graph obtained simply by dropping the arrows on the links, as
shown in Figure 4.11.
A C B
Figure 4.11: The undirected graph obtained from Figure 4.3 by dropping
the arrows on the links. This graph also satisfies A ?? B j C .
Similarly, consider the graph in Figure 4.12. Again, the conditional in-
dependence property A ?? B j C can be read off by graph separation from
the corresponding undirected graph, again given by Figure 4.12.
However, the situation is different for the ‘head-to-head’ node shown
in Figure 4.13. The undirected graph in this case is again given by Fig-
74 CHAPTER 4. MARKOV PROPERTIES
A B
Figure 4.12: Another directed graph for which A ?? B j C.
A B
C
Figure 4.13: A directed graph for which A 6?? B j C .
ure 4.12, but this now does not give the correct conditional independence
result since the undirected graph predicts A ?? B j C , whereas this result
does not follow from the original directed graph. We can avoid the intro-
duction of this spurious conditional independence property by first intro-
ducing an extra link between the two parents of the conditioning node C
and then dropping the arrows on the links to give the undirected graph of
Figure 4.14. This separation properties of this graph now no longer imply
A B
C
Figure 4.14: The undirected graph obtained by moralizing the directed
graph of Figure 4.5.
(a) (b)
Figure 4.15: An example of a directed acyclic graph (a), together with the
corresponding moral graph (b).
continue to use the term moralization throughout this book since it is now in widespread
use.
76 CHAPTER 4. MARKOV PROPERTIES
We now ask whether the converse is true, that is whether every condi-
tional independence statement implied by the directed graph holds on the
moral graph. The answer is clearly that it does not, as illustrated in Fig-
ure 4.16. The problem in Figure 4.16 arises from the node W . Since this
A B A B
C C
W
(a) (b)
Figure 4.16: Example of a directed graph (a) which exhibits a conditional
independence statement A ?? B j C , which is not represented in the corre-
sponding moral graph (b).
node is not part of the conditioning set it should really be removed from
the graph before moralization to avoid the introduction of a spurious link.
We can see how to address this problem more generally by noting that
the conditional independence statement A ?? B j C is a property of the
marginal distribution P (A B C ) in which the remaining variables have
been summed out. Any variables which are not part of A B C or their an-
cestors can trivially be removed from the distribution to leave a marginal
distribution over the remaining variables simply by removing the corre-
sponding conditional distributions from the product in (4.15).
This motivates the consideration of ancestral sets. A node A is said to
be an ancestor of a node B if there is a directed link from A to B . Simi-
larly B is then said to be a descendent of A. We say that a sub-set of nodes
within a directed acyclic graph is an ancestral set if, for every node in the
set all ancestors of that node are also in the set. Using this concept we can
now introduce a new definition for the global Markov property of directed
graphs, which provides an alternative to d-separation.
The global markov property for directed graphs, denoted by DG , says
that A ?? B j C whenever C separates A from B in the moral graph of
the smallest ancestral DAG containing A, B and C . This is illustrated in
Figure 4.17, using the example of Figure 4.16.
4.4. DIRECTED GRAPHS REVISITED 77
A B
Figure 4.17: The moral graph of the smallest ancestral set from Fig-
ure 4.16(a) containing nodes A, B and C . In this case the conditional in-
dependence property A ?? B j C follows correctly from undirected graph
separation.
its parents
We next show that the directed global Markov property implies the di-
rected local Markov property. If we think of these properties as filters, this
says that any distribution which passes the DG filter will also pass the DL
filter.
Theorem 4.10 (DG ) DL) If a probability distribution respects the directed
global Markov property for a given directed acyclic graph then it also respects the
directed local Markov property.
Proof: We first note that nd() is an ancestral set. Next we observe
that pa() separates from nd() n pa() within this ancestral set. The
directed local Markov property then follows as a special case of the global
directed Markov property.
We now complete the loop by showing that the directed local Markov
property implies directed factorization.
Theorem 4.11 (DL ) DF ) For any directed graph, and any distribution, if the
distribution satisfies the local Markov property with respect to the graph then it
will also factorize according to the graph.
Proof: The proof is based on induction in the number N of vertices in the
graph. Suppose the result holds for an arbitrary directed acyclic graph with
N nodes. Now add an additional node XN +1 which is a terminal node of
the graph. For an arbitrary joint distribution over the N + 1 variables we
have
a b
rected pairwise Markov property for all graphs and for all distributions.
Theorem 4.13 (DL ) DP ) If a distribution satisfies the directed local Markov
property with respect to some directed graph, then it also satisfies the directed
pairwise Markov property with respect to the same graph.
Proof: From the definition (4.22) of the directed local Markov property
we have
where we have made use of 2 nd(). From Theorem 4.3 we then have
?? j (nd() n ) pa(): (4.26)
Note that the converse of this theorem is not in general true. However, if
we restrict attention to distributions which satisfy the intersection property
(4.9), for example distributions which are strictly positive, then DP ) DL.
Theorem 4.14 (DP ) DL) If a distribution satisfies the directed pairwise Markov
property with respect to some directed graph, and respects the intersection prop-
erty (4.9), then it also satisfies the directed local Markov property with respect to
the same graph.
Proof: We prove this result by induction as follows. Let the nodes in
nd( ) which are not members of pa() be labelled 1 : : : N , and suppose
This result clearly holds for m = 1 from the pairwise Markov property.
Also from the pairwise Markov property we have
DF , DL , DG , d-separation: (4.30)
Note that this confirms that the directed global Markov property defined
through the moral graph of the smallest ancestral set is equivalent to the
d-separation criterion. We have also shown that DL ) DP (Theorem 4.13)
and hence DF , DL, DG and d-separation all imply DP .
If we restrict attention to distributions satisfying the intersection prop-
erty (4.9) then we have the further result DP ) DL (Theorem 4.14) and
hence, for such distributions, all of the Markov properties as well as d-
separation and the factorization property are equivalent.
?? j S n f g (4.31)
82 CHAPTER 4. MARKOV PROPERTIES
a
C
A B
Figure 4.19: Example of a graph corresponding to case (i) in Theorem 4.15.
4.5. UNDIRECTED GRAPHS REVISITED 83
C
A B
Figure 4.20: Example of a graph corresponding to case (ii) in Theorem 4.15.
a
S \ cl(a)
Figure 4.21: Illustration of the local Markov property (4.32) in which the
black nodes represent the boundary of .
Clearly the local Markov property is a special case of the global Markov
property, so G ) L. We now prove that local Markov implies pairwise
Markov.
Theorem 4.16 (L ) P ) For any undirected graph and any distribution, if the
distribution satisfies the local Markov property with respect to a graph then it will
also satisfy the pairwise Markov property for that graph.
Proof: Consider two nodes and , let bd() denote the boundary of
and let B denote the remaining nodes including , as illustrated in Fig-
ure 4.22. Then the local Markov property tells us that
a b
Figure 4.22: Example of an undirected graph in which the black nodes de-
note the boundary of a particular node . The remaining white nodes con-
stitute the set B which includes a specific node .
4.5. UNDIRECTED GRAPHS REVISITED 85
?? B j bd(): (4.33)
Since 2 B we have B
(B n ). We can then apply the weak union
property (4.7) to obtain
?? j bd() (B n ) (4.34)
and since the union of bd() and (B n ) represents the complete set S of
nodes in the graph, less and we have
?? j S n ( ) (4.35)
Proof: We show that (4.37) implies (4.36), with the proof of the converse
proceeding analogously. Consider
X XX
b
( ) = (;1)jbncj (c)
b:ba b:ba c:cb
8 9
X< X jbncj
=
(;1)
c:ca :b:cba
=
8 9
X< X =
jhj :
(;1)
c:ca :h:hanc
=
The final sum on the right hand side is zero unless a n c = (i.e. c = a)
since for any finite, non-empty set the number of subsets of even cardi-
nality is the same as the number of subsets of odd cardinality (as is easily
verified by induction).
b b
where Sa has components S a = S for 2 a and Sa = S for 62 a. b
Thus Ha (S ) depends on S only through S a . Now define the following set
of functions for all a S
X
a (S ) = (;1)janbj Hb(S ): (4.39)
b:ba
4.5. UNDIRECTED GRAPHS REVISITED 87
which has the required form of a product over potential functions. The final
step is to show that
a (S ) vanishes unless the subset a is complete. To do
this we make use of the assumed pairwise Markov property. Let 2 a
be two nodes with no direct link between them, and let c = a nf g. If we
let Ha denote Ha (S ) then
X
a (S ) = ( ;1)jcnbj Hb ; Hb ; Hb + Hbfg : (4.42)
b:bc
88 CHAPTER 4. MARKOV PROPERTIES
P (Sb S S Sdnb )
Hbfg ; Hb = ln
P (Sb S S Sdnb )
P (S jSb Sdnb )P (S Sb Sdnb )
= ln
P (S jSb Sdnb )P (S Sb Sdnb )
P (S jSb Sdnb )P (S Sb Sdnb )
= ln
P (S jSb Sdnb )P (S Sb Sdnb )
P (Sb S S Sdnb )
= ln
P (Sb S S Sdnb )
= Hb ; Hb:
Hence the sum of the terms in the brackets on the right hand side of (4.42)
vanishes whenever we can find two nodes and having no direct con-
nection between them. Thus
a (S ) vanishes unless a is a complete set.
F ) G ) P , L: (4.43)
For distributions which satisfy the intersection property (4.9), for exam-
ple strictly positive distributions, we have also shown that P ) G (Theo-
rem 4.15) and G ) F (Theorem 4.17), and so for such distributions the three
Markov properties, as well as the factorization property, are all equivalent.
In Chapter ? we introduce the idea of triangulation of an undirected
graph. Essentially this involves the addition of extra links such that every
4.6. REPRESENTATIONAL LIMITATIONS 89
cycle of four or more nodes has a chord. It can be shown that the global
Markov and the factorization properties are equivalent for all distributions
(without restriction) if, and only if, the graph is triangulated.
D U
P
Figure 4.23: Venn diagram illustrating the set of all distributions P over
a given set of variables, together the set of distributions D which can be
represented as a perfect map using a directed graph, and the set U which
can be represented as a perfect map using an undirected graph.
A B
C
Figure 4.24: A directed graph whose conditional independence properties
cannot be expressed using a directed graph over the same three variables.
A B
D
Figure 4.25: An undirected graph whose conditional independence prop-
erties cannot be expressed in terms of a directed graph over the same vari-
ables.
consider directed graphs, and discuss whether two different graphs can be
Markov equivalent, that is whether they can imply the same list of condi-
tional independence statments. [to be completed]
Exercises
4.1 (?) Consider three binary variables A B C 2 f0 1g having the joint
distribution given in Table 4.1. Show by direct evaluation that this
distribution has the property that A and B are marginally depen-
dent, so that P (A B ) 6= P (A)P (B ), but that they become indepen-
dent when conditioned on C , so that P (A B jC ) = P (AjC )P (B jC )
for both C = 0 and C = 1.
92 CHAPTER 4. MARKOV PROPERTIES
A B C P (A B C )
0 0 0 ?
0 0 1 ?
0 1 0 ?
0 1 1 ?
1 0 0 ?
1 0 1 ?
1 1 0 ?
1 1 1 ?
4.2 (?) Using the result of the Exercise 4.1 show that the joint distribu-
tion specified in Table 4.1 can be expressed in the form P (A B C ) =
P (A)P (C jA)P (B jC ). Draw the corresponding directed graph.
4.3 (?) Show that there are 2M (M ;1)=2 distinct undirected graphs over M
variables. Draw the 8 possibilities for the case of M = 3.
4.4 (?) Consider the simple graph shown in Figure 4.26, together with the
probability distribution over three binary variables X , Y and Z such
that P (X = 0) = P (X = 1) = 0:5 and X = Y = Z . Show that for
this graph and distribution the pairwise Markov property is statis-
fied, but not the local Markov property. Note that this distribution is
not strictly positive since, for instance, P (X = 0 Y = 1 Z = 1) = 0.
X Y Z
Figure 4.26: A graph which provides a counter example to the claim P ) L.
!
Æ
"
#
"
"
$
%
"
"
)
2
"
3
" "
!
# "
/
4 565*+
(
)
#
7
$
(
)
* +
9
9 *565+
4
.
:
*
(
)+
"
*
+
* +
4 *+ -
/
/
/
& '
.
-
(
"
(
-
"
.
$
!
.
&
'.
9
;
X4 {X 2 , X 5 , X 6}
X2
{X 2 , X 3 , X 5}
X6
X1 { X 2 , X 4}
{X1 , X 2 , X 3}
{X1 , X 2 }
X3 X5 { X 1}
(a) (b)
X2 X4
X1 X1 X 2 X1 X 2 X 3 X2 X5 X6
X2 X3 X5
(c)
4 5657 *+
(
) *+
* + * +
<
X2 X4
X2
X1 X1 X1 X 2 X1 X 2 X1 X 2 X 3 X2 X5 X6
X2 X3 X2 X5
X2 X3 X5
X2 X4
X1 X1 X 2 X1 X 2 X 3 X2 X5 X6
X2 X3 X5
"
>
4 5* +
>
& '
! ,
!
!
.
"
>
# %
4 56)
# $
#
"
-
" "
#
-
#
"
&
'
1 8 * +
1
%
-
1
:
%
/
/
/
$
!
* +
/
>
" &
'
?
A B
C D
- ! #
/
7
5
*+
* + *56 +
!
>
.
"
.
.
9 *56 + !
#
1
/
"
3
/
%
! * +
/
/
(
4 562
/
9 *56 +
!
!
>
%
9 *56 +
#
0
/
@
*+
X4 X4
X2 X2
X6 X6
X1 X1
X3 X5 X3 X5
(a) (b)
.
. !
"
* +
: (
)
"
-
!
/ 4 56 A
-
* +
"
"
* +
#
>
/%
/
4 56
>
/
5B
C :
. .
-
& ' -
"
!
*
+ A
.
. ,
# *+
A
" &
'
!
# 0
#
9 *56 + 4
!
!
,
!
"
#
$
55
* +
8 * + * +
! ! D
7
5
*
D + 8 *
D + *56)+
&
'
$
9 *56)+
E
*
+ *
D +
! ! D
7
5
*
D + 8 E
*
+ *562+
9 *562+
/
!
/
E
%
/ 1
/ 0
E 7
E
*
D + *56;+
8
5
E
*
+ *56<+
-
"
*
+ ! F
D + 8 *D
7
*
D +
+
*D
8
E
E
*
*
+
+
*566+
E
*
+
*
D +
/
E 8 *D + >
/
9 *566+
*+ * +7
5 B?
8 *56?+
2 ;<
5
A B A B
G
/ 0
8 5 -
7
B?
E 8 *56@+
;<
>
/ >/
E 8 <2
/
* 8 5+
/
E E 8 <27
5 E
5 ;
8 *565B+
E ?6;
* 8 5+
/
*
+
/
(
!
"
#
/
7 8 5
/
/
$
&
' $
Æ * D +
/
-
(
)
/
*
+ 0
,
*
+ *
D +
&/'
/%
(
)
A B C
4 56?7 $
"
-
* +
* +
* +
* +
$
* +
0
Æ
Æ
9 *56 +%
Æ
9
Æ
- !
9
! =
#
=
* D +
/
* D +
/
/
#
%
(
"
4 56?
#
* + 8 * + * + * +
* + * +
/
* +
8 * +
/ * +
* +
=
$ #
# * +
* + * +
#
#
"
#
52
/
#
"
0
/
A
"
4 4 565<
4
5
-
#
* +
G
!
#
7
*+ 8 * +
* +
*5655+
>
/
-
&'
/
-
!
9
#
4 56?
7
#
#
.
#
=
$
/ -
/
!
5;
ΨV ΦS ΨW
V S W
4 56@7
=
!
/
/
/ 9 *5655+
/
!
/
"
.
/ !
!
/
9
#
*
9
+
- /
*
+
# *
+
* +
# -
>
(
%
#
- !
Æ
%
Æ
. &# '.
#
1
#
0
*
4 56@+
/
#
= - !
5<
"
& '7
8
*5652+
8
*565;+
!
& '
7
#
*
D + >
H! 8
7
8 *565<+
8 *5656+
#
! 9
-
%
#
8
-
"
7
*565?+
8 *565@+
*>
! 8 +
>
# *
D +
/
%
!7
8
*56 B+
8
*56 5+
8 *56 +
8
8
*56 )+
*56 2+
56
"
9
5652
565@
"
"
#
(
"
4 56?
* +
* +
!
7
8
* + 8 * + *56 ;+
* +
8 * + 8 * + *56 <+
5
"
% / * +
* +
>
0
8 5
"
8 5% "
/
7
8 * 8 5 + *56 6+
F
7
8 * 8 5 + * + 8 * 8 5 + *56 ?+
7
8 * 8 5 + *56 @+
8 * 8 5 + *56)B+
8 * 8 5 + *56)5+
/
>/ * + ,
* 8 5+ * 8 5+
* 8 5+ >
"
8 5
8 5
8 5
5?
C1 D1
V S W
C2 D2
!
-
4 565B
9
(
#
.
0 565B
C (
4 565B 0
1
&
'
&
' 9
5652 565;
A
7
%
&
% '
&
5@
A
!
7
* +
*
*
+ +
* +
* +
*
+
*
+
B
(a) (b)
* +
"
9 5652 565; (
*+
*+
-
"
$
A
¾
(
4 5655
5
A B A,B B,D
C D A,C C,D
$
4 5655
4 5655 >
"
!!
$
#
Æ
F # (
5?
/
H
*
+
-
/
/
F
F
Æ
!
4 565
$
4 565 >
,
G
*
+
=
>
"
#
#
#
#
-
4 565
.
#
$
$
"
4 565 %
$
-
!
"
4 565
"
7
$
#
#
#
#
-
:
.
#
#
#
!
" !
# $
%&%'( ! 8
8
)
*56) +
S
C
R
4 565)7
&
'
8
#
#
!
0
¾
-
!
*
D +
*
)
8 *
D + *56))+
8 *
D + *56)2+
! 1
#
#
I 5
-
E
# 1 E
E 8 E E E 8 E -
!
8 E 8 E 8 E 1 5
7
*
D + 8 *
D + 8 *
D + *
D + *56);+
2
0
7
*
D + 8 *
D + *56)<+
8
* +
*56)6+
* +
¼
* + ¼
8 *56)?+
¼
¼
¼
8
¼
* + ¼
*56)@+
¼
¼ ¼
8
¼
* +¼
*562B+
¼
¼
¼
9 562B
8
9 562B
*
D +
#
#
1
#
* + 8 * + -
8 * D +
* + 8 *
D +
7
* +
*
D + 8 *5625+
* +
* +
8 *562 +
*
D +
* + 8 *
D + ¾
# "
! $
#
Æ
#
-
# C
# C
Æ
# .
#
Æ
$ $
$
%
/
0
- !
Æ
!
"
;
! 8 * +
"
(
#
1
1
(
-
1
0
%
"
0 "
#
1"
¾
-
$
+
I 5
A
1
#
*
+
" #
" A
" 0
" G
# ¾
-
#
I 5
1 )
<
:
# -
#
1
%
#
$
"
8
#
*5+
#
% * +
# ¾
% &
"
# "
#
:
*5+
*+
* +
-
0
I 5
9
¾
G
*
/
+ !
-
&'
%
*
$ $+ G
#
*
+
/ /
#
/ $
6
0
A
* +
#
$
"
9
#
2
:
/
$ $
,
!
,
" Æ
# "
,
-
"
Æ
'
0 56?
#
.
.
#
#
C
#
(
4 5652*+
4 5652*+
# *
+ $ #
4 5652* +
4 5652*+
4 5652* +
!
4 5652* +
#
4 5652*+
* +
-
#
( %)**+&
?
A
ABD ABD
B D D BD
A
&
'
A
J
"K
7
$
A
*
+
(
(
(
7
5* +
5* + 5 *562)+
5*+
$
#
9 562)
#
-
" #
"
,
#
-
%)**.&
/
0
@
7
* + 8
*5622+
8 5* + *562;+
8 5* + *562<+
5* + 5 *5626+
8 5* + *562?+
8 *562@+
>
9 5626
#
¾
" (
"
&
'
# " -
/
!
!
/
>
"
#
"
! ! * +
0
0
! * +
7
! * + 8 ! * + *56; +
1 2
3
1 2
% &
)5
Ci μji Cj Ci Cj
S ij
μij
(a) (b)
A
7
* + ! * + *56;)+
9
56; 56;)
0
0
-
#
9 56;).
(
4 565;*+
H!
! * +
"
!
"
"# 7
! * + % *56;2+
7
! * + *56;;+
! * +
>
!
9 56;)
-
9 56;
-
0
!
"
"# 4
*
F
F +
$ 8
# 1
*
+
! * +
)
4 9 5652
7
8
! * + *56;<+
(
9 56;
! * + 8
! * + 8
>
# "
9 5652
7
8 ! * + *56;6+
8
! * + ! * + *56;?+
8 ! * + *56;@+
*56<B+
7
8
! * + *56<5+
5 (
9 56;
! * + 8
%
!
! ,
#
4
0
0
#
!
/
#
&,'
&'
/
1
/
& * I +
! ))
J
"K
& * +
FK
& * +
1
* !
#
+
!
&* +
! #
>F
!
-
$ $
,
#
>F
$ $
A
4
Æ
/
/
#
"
(
!
-
9
/
1
!
/
"
#
:
,
#
*
+
5
5
=
9
. /
2
G
/
/
,
#
-
%)**.&
)2
.
#
.
.
% #
( 9
-
.
.
A
# "
/
.
$
!
!
# "
/
0
N
"
G
!
&/ N
'
!
"
4 565< 1
!
. ,
.
/
# * +
!
7
Ü Ü
Ü
Ü Ü Ü Ü
Ü
Ü
Ü
Ü
Ü
Ü
(
* +
/
/
(
) (
/
&
'
&'
#
!
/
"
"
);
9 *5652+ (
/
7
/ /
$
"
#
/
/
#
-
C :
"
#
&'
&
'
/
&'
&/ '
-
*
+
* +
*' +
*' +
*
+
! 0
#
# 8 * +
I 5
1
H! 8 * 4 565)+ (
#
-
1 5
8 8 * +
-
#
#
(
,
#
9
*
+
:
#
-
# ¾
(
/ 0
0
!
!
" #$
%
& '# $ ( )
*
+ ,
-
.
# ½ / $. 0
.
# ·½ /
/ $ (
, # $ &
q0 q1 q2 qT q0 q1 q2 qT
y0 y1 y2 yT y0 y1 y2 yT
(a) (b)
& '1 # $ (
#$ ( +.
2
3
( q0 , y0 ) ( q0 , q1 ) ( q1 , q2) ( qT-1, qT )
( q1 , y1 ) (q2 , y2 ) (qT , yT )
ψ ( q 0 , y0 ) ψ ( q 0 , q1 ) ψ ( q1 , q 2 ) ψ ( q T-1 , q T )
φ (q 0 ) φ ( q1 )
ζ ( q1 ) ζ (q 2 ) ζ (q T )
ψ ( q 1 , y1 ) ψ ( q 2 , y2 ) ψ (q T , y T )
# ·½ $ / # ·½ $)
&%
,.
# $
# $
+
,
7
# $ 8
6
# ¼ ¼ $ # ¼ ½ $ # & '2$ * 1
# ¼ $ /
# #
¼ ¼ $ / ¼ ¼ $ / # ¼$ #'$
¼ ¼
# ¼ ½ $ /
# ¼ ½ $
# ¼$ / # ½ ¼ $ # ¼ $ / # ¼ ½ $ #' $
*
*
# $
# $ # ·½ $ *
+ ,
( -
<
.
# $
5 .
=
ψ ( qt , qt +1 )
φ*( qt ) φ (qt +1 )
*
ζ ( qt +1 )
ψ*( qt +1 , yt +1 )
,
,.
+. 0 +
,
-5
# $ ,
# ·½ $
# ·½ $
0 # ·½ $ # ½ $
# ·½ ·½ $.
1
# ·½ $ /
# ·½ $
# $ # ·½ $ #'3$
/ Ø Ø·½ # $ # ·½ ·½ $ #'$
.
.
+ Ø Ø·½ # ·½ $ >
.
1
# ·½ $ /
# ·½ $ #'=$
/
Ø
# $ # ·½
Ø Ø·½ ·½ $ #'!$
Ø
!
# ·½ $ /
# $ #
Ø Ø·½ ·½ ·½ $ #''$
/
# $ #
Ø
$ # ·½
·½ ¼ ·½ $ #'@$
/
#
Ø
¼ ·½ ·½ $ #'A$
Ø
/ #¼ ·½ ·½ $ #'$
- -
,
%
, # · $ 0
. # $ / Ø ½
# ½ $.
:0 ' ,1
# ·½ $ / Ø Ø·½
# $ # ·½
½ ·½ $ #' $
Ø ½
( ,
# ·½ $ / #¼ ·½ ·½ $ (. -
# ·½ $ #¼ ·½ ·½ $.
1
# ·½ $ / Ø Ø·½
# $ # ·½
½ ·½ $ #'2$
Ø ½
B5C ,
B
D C ,
(
# ½ $ (
# $ /
# $ #'3$
½
Ì ½ Ì
7
.
.
% +
%
*
7 '
.
.
(. # $ .
5
0, . :5 EEE
,
, # $ / # ·½ $ ( 0 , ,
- # ·½ $ , +
# ·½ $ ,
# ·½ $ # $ * 1
# ·½ $
# ·½ $ /
# ·½ $ #'$
# ·½ $
1
# $ /
# ·½ $
# ·½ $ #'=$
# ·½ $
Ø·½
# ·½ $
/ # ·½ $ #'!$
Ø·½ Ø
# ·½ $
@
ψ*( qt , qt +1 )
φ*( qt ) φ**(qt +1 )
*
ζ ( qt +1 )
ψ*( qt +1 , yt +1 )
# $ /
Ø Ø·½ # $
# ·½ $ #''$
# $
Ø·½ Ø Ø Ø·½
Ø Ø·½ # $
/ # ·½ $ #'@$
Ø Ø Ø·½
# $
Ø·½
9- # $ 0 # $.
,.
# :0 3$
;
# $ -
,
, # ¼ $ ( , # :5 EEE$
4 ,.
,
!.
# $ Ø·½
# ·½ $
/ #' A$
# $ # $
/
# ·½ $ # ·½ $
#' $
# $ # ·½ $
Ø·½
A
/
Ø Ø·½ # $ # ·½ #
·½ $ ·½ $
#' $
# $ # ·½ $
Ø·½
# ·½ $
/ Ø Ø·½ # ·½ ·½ $ #' 2$
# ·½ $
Ø·½
/
# ½ $
# ·½ $ #' =$
# $
#
Ø·½
# ½ $
/ ·½ $ #' !$
Ø ½
# ½ $
Ø·½
9- # ·½ $ 0
# ·½ $.
,
.
5
# ·½ $ 1
# $/
# ½ $ # ·½ $ #' '$
½
Ø ½
# ½ $
Ø·½
# $ /
# $ #' @$
½
# $ /
#
Ø ½
$ #'2A$
½
Ø ½
- ,
-
*
# B%C$ *
%
F
#-$
#$ F
5
,
ψ ( x 0 , y0 ) ψ ( x 0 , x1 ) ψ ( x1 , x 2 ) ψ ( x T-1 , x T )
φ ( x0 ) φ ( x1 )
ζ ( x1 ) ζ (x2 ) ζ ( xT)
ψ ( x 1 , y1 ) ψ ( x 2 , y2 ) ψ(x T , yT )
"
%
.
8 ,
+ (. . & '# $
% # BC B C$ & '#$ +.
4. %.
& '=.
¾ & '=
(
# 0$
#
$ *
0
. 0 #¼ ¼ $
# ·½ $ 0 # ·½ $ %%
# $
0 # $
-
# ·½ $ , 0 1
·½ / G #'2$
, # ·½ $
5 ! .
! 5 " ! .
1
½
# ·½ $ 5
# ·½ $ $ "
# ·½ $ $ #'2 $
:5
0
.
1
" ½
" ½
# ·½ $ / 5
#'22$
·½ " ½ " ½ ·½
6
. 0
%
.
, , 5
+
¾
(
4
0
.
+ 6
.
% .
(
, 8 + -
)
0 5
, +
&
0 %1
/ # G $ #'23$
*
# $ 6.
.
#$ / 5
'.
1
:5
0
,
. , 5
,
& H
#$
# $
. / # G $
,
( J K. L J K J K
+ #$. ,
+
# $ /
# $#$
2
,
#
5
.
5
$ :5 EEE
%
,.
M -
4
, 7
(
,
5 #
5 . :0 2=$1
A " ½ G H ½ " ½
#'3A$
A " ½ " ½
*
+ ,
4
# $ , 8
# $ . # $ ,
, # $
1
# $ / 5
# # $ % ½ # # $ #'3$
3
ψ ( xt , x t +1 )
φ*( x t ) φ (x t +1 )
*
ζ ( x t +1 )
ψ*( x t +1 , yt +1 )
1 ## # % ½ # $
7
,
# $
# ·½ $ *
, # 5
$ # $
# ¼ $) .
, ,
# $
#N ) $ O :0 '22
# ·½ $ ,
, , # $ (
,
1
N ) G " ½ " ½
#'3 $
A " ½ " ½
N ·½ / " ½
G " $ N
#) ½ ½
#'32$
½ ½
) ·½ / " " #) G " ½ $ ½
" ½ #'33$
( 5
·½ ,
,
0
# ·½ $ , # ·½ $ ( ,
+ # ·½ $1
N
) G " ½ " ½
#'3$
# % ½ ·½ " ½ " G # % ½#
½
(
-
#
:0 3 :0 32$
7!
% , 5
,
, %
% + ,
ψ*( x t , x t +1 )
φ*( xt ) φ**(x t +1 )
*
ζ ( xt )
ψ*( x t , yt )
* ,
0
%
# $
.
# $
# $ / # $ # $ *
,
# 7 ' $
# ·½ $ , #N ·½ ·½ ) ·½ ·½ $)
%
< ·½
*
# ·½ $ ,
, , # ·½ $
, # ·½ $.
,
# & '@$ (
1
A
" ½ G H ½ " ½
#'A$
N ·½ ·½ " ½ ½
) ·½ ·½ G " ½ H ·½
N ·½ / " ½
#) ·½ ·½ G " ½
½
H ·½ $ ½N
·½ ·½ #'$
½ ½ ½ ½
) ·½ / " #) ·½ ·½ G " H ·½ $ " ½ #' $
(
:0 @' :0 @=
*
0
# ·½ $ < (
# % ½ # 5 # % ½ 1
#
% ½
" ½ G H ½ G # % ½# " ½
#'2$
N ·½ ·½ " ½ ½
) ·½ ·½ G " ½ H ·½
N / N ·½ G # % ½
#'3$
½
) ·½ / ) ·½ G # % # #'$
,
%
4
.
4
,
)
.
0
%
(
, .
,
.
#. $
!" ! #
!"
$ " %
"
&
%' ( %
') (
) * " '
+
&
"
,
- - '
* +
"
%
.
/
#
'
/ ' 01
2
3
'
% 4
' ( )
( ) ( ) ( ) $ 5
) 6
( )
( )
( )
( ( 7 )
& ( )
/ '
$ 8
7 /
$ 01
$ "
4
" '
, - ' %
* '
. $
'
4
()
' , "
$
/ "
-
$ - ()
$ - 4
+ +
9 () () !
' %
, ()
/
:
' %
' #
; 4 7
, ' <
! '
. "
% /
% 01
01
=' (=&/!) "
( -
) >
( ) ,
=&/! '
& ' / ' ?-
( ) ,
<
(/ 4 7
)
= - ( ) 5
( ) ( ) ( 7 )
6
( ) ( 72)
/
," ' #
5
( ) 6 ( ) ( 73)
( )
6 ( ) ( 7:)
( )
6 ( ) ( 7
)
( )
, " /
"
x2
0 1
0 e θ1 1
x1 f 1 ( x 1, x 2 ) = δ ( x 1 = 0 , x 2 = 0 )
1 1 1
x2
0 1
0 1 eθ2
x1 f2 (x 1, x 2) = δ ( x 1 = 0 , x 2 = 1)
1 1 1
x2
0 1
0 1 1
x1 f 3 ( x 1, x 2 ) = δ ( x 1 = 1 , x 2 = 0 )
1 e θ3 1
x2
0 1
0 1 1
x1 f4 (x 1, x 2) = δ ( x 1 = 1 , x 2 = 1)
1 1 e θ4
x2
0 1
0 eθ1 eθ2
x1 ϕ12 (x 1, x 2) = e θ 1 f1 + θ 2 f 2 + θ 3 f 3 + θ 4 f4
1 e θ3 e θ4
4 7 5 , 01 , -
,
- , - '
%%%
, ( )
$ 5
( ) 6 ( ) ( 7
)
()
$
8 , 01 Æ
, " 01 "
, - " - %
' , - '
*
+ ( 7
) /
( "
)
=
/ ( ) "
&
+ ( )
< " ?-
( ) ( ) =
- # 5
( ) 6
( ) ( 78)
()
+ ( 7
)
, -
#%% @
& ,
* %
" $
- '.
/ "
A; 0 1 B
8
/ 8
' = C /
#- 0 '1 $
.
, -
4
' 01
" - /
-
, &
9
" !
"
( 0 1)
!
"
/ "
-
- "
$ "
(< ) 6
( ) ( ) ( 77)
6
( ) ( ) ( )
( 7 D)
6
( )
( ) ( ) ( 7 )
() -
Æ
7
E ( ) %
Æ 5
6
( ) ( ) ( ) ( 7 )
6
( ) ( )
( ) ( ) ( 7 2)
, ' " 5
( ) ( ) 6 F( ) ( ) ( 7 3)
F() () $ Æ
( 01) ,
7 - "
, '
" .
$
+ ( 7 3)
! /
" 4 7
+ ( 7 3)
& $
01 () E -
/
5
( ) ( ) 6 ( 7 :)
(
)
( )
()
( 7
)
#
( ) ( ) 6
()
( 7
)
( )6 ( 7 8)
-
, &5
6
( )
()
( ) ( ) ( ) ( 7 7)
()
() 6 () () ( 7 )
;
6 ( ) ( ) ( 7 )
()
( ) 6 () () ( 7 2)
()
, $
, () Æ
&
$ J , %
'
" /
. . "
G + ( 7 :)
, 0
1 =
5
6
F() ()
( 7 3)
' 5
( )
(
) ( ) ( 7 :)
#
( ) ( ) 6
F( ) ( )
( )
( 7
)
( ) 6 <
( 7
)
F() ,
! " + H&
() , '
, "
/ ( ")
/ ( )
,
. "
& $
()
" & (; KKK & ) ,
( ) '
E + ( 7 7) + ( 7 2) & $
< + ( 7 2) ' ' $ 5
( )
( ) 6 ( ) ( )( ( ) F( ) ( )
( )
6 ( ) ( ) ( ) ( ) ( ) I F( ) ( )
6
F( ) ( ) ( ) ( 7 8)
L " ( + 7 ) ,
"
4 '
( KKK
. ; ) ,
# /
+ ( 7 3) "
,
- "
6
() 5
() 6
()
() () ( 7 7)
G ,
( )
( ) 5
( )
( ) 6 M ()
( ) ( 72D)
( )
M
, -
5
6
() 5 ( ) ( ) 6 F( ) ( ) ( 72 )
G
/
$ "
( )
$ 4 7 <
½
Æ
2
q
E
p
pM
4 7 5 ,
" ' C H&
3
! 01 &
( )
$ 5
(
) 6 (
) I (
) ( 72 )
(
) (
) (
)
6
()
()
()
()
()
() ( 722)
()
()
()
6
()
()
()
()
( 723)
()
()
( ) ( )
6 ( ) M ( ) ( ) M () ( 72:)
( )
( )
6 M ( ) ( ) ( ) ( ) ( 72
)
6 D ( 72
)
-
$ C "
& + ( 72 )
$ 5
(
) ( 728)
#
( 727)
(
) 6 (
) I (
) ( 73D)
, = H& .
(
) . '
'
(
)
! " 5
(F
) ( 73 )
#
( 73 )
(F
) 6 (F
) I (
) ( 732)
:
"
4 ' =
- -
' (
)
, C
/%#
/ ' '
$ %
01 / "
%
/ . -
,
/
#
/ -
$
"
/ '
/
!"
# $ "
% &"
#
' "
'
%"
"
' %"
%
#"
(( )*
#"
+
' #
, "
"
-
# #%
% # " +
3
#
#
+
4 "
#
%
# #% 2"
5
6
' 7
(
8 9(
"
0%
'1/
"
%+ %
/
#
" %+
'
/
#
% %
.
(:
# %
*" $
; # $ "
$ 3
$ 7
(
8 < 8 9
(
8 < 8 95
2 "
" % $ # %
8
< 9=
8 < 9>
9:
8 < 9@
#
7
A 8
8 <
<
9!
=
X1 X3
X1 X 2 X 3
X5
X2 X3 X4 X3 X4 X5
X2 X4
(a) (b)
$
# #% #
%
%
) 9!
# #
"
"
"
% 2 ) 9! "
$
# #%
# *"
%+
'
"
%
Æ
#
B
%
$ "
A 8 <
<
9(9
%
# %
'" #
'
#
.
2
" 0 1
$
% ,
7
8 < 9(6
#% 0
1
#% .
!
#
$
# #%7
7
8
< 9(>
C
"
# ) 9(6
2
"
# +
$ 2
%
"
#
$
B % "
"
% #
% % %
2 %
(!
#
%
" #
#
% "
'
2 "
+
$
:
?
%
'
+
$ 2 "
%
#7
9(:
#$ 8 < 9(@
2" ) 9(@
"
%" % %+
'
*" # %
%
7
(
8 9(!
2
# % % %
, # ) 9(!
" 8 < "
# 7
<
8 9 9
+
$ #%
;
8 < 2
# #
%
7
8 9 (
"
&
"
$ # )
$ +
$ 2
$ *"
# <"
#
2
#
%
;*,
$
>/ > 2
$
;*,
)
$" % #
%
) % D
$ +
$"
$
0 1 ) (!5 ?
%
@
M C1
M C2
p(1)
p(0)
p(5)
pM
E
Æ
E %
F
"
%
% 7
(
8
9
"
' 7
8
9 5
2
7
< 8
< 9 6
!
9
"
7
<
9 >
*
% # #%
"
"
# 7
"
-,
# ' H -,
+
' 2
# "
) 9 > ?
% ' # +
) 9 >" % '
%
#
2
#
' #
8 (
" '" #
'
#
; -, B
)*
((
% #
/
# 2 #
8
< 959
8
2 #
#"
<
<
I ( 955
< 8
<
(
I (
(
J
I (
8 <
((
#
7 8 (
8 <
J
I ( 956
8 ( 2
"
%
# 7
<
<
J
I (
M 95>
#
C" " #
#
(
2
"
# 7
M
8
<
95:
'7
8
<
8 <
95@
# 7
8
<
95!
8
<
969
8
<
96(
"
#
#
,
# )* "
$
)
" '
*
#
# #% )*
%
#"
*
#
-,
"
'
'
"
.;,
*
%
%
)*
(5
!
;
% # #
#
#+
# B
%
# 7
(
8
96
2
% # 2
#% '
7
< 8
< 965
# "
# #% # #%
# #%
2
# # &
# " %
%
"
)
)*
; #O%
)*
(( .
P; 7
<
966
# %
# #% # 2 )*
'
7
8
8
<
96@
96!
8
<
8
<
9=(
#
%
#
) 95(
# <"
$ # <
"
% # #
"
"
#
# # # "
"
4 "
Æ
B
"
'
2
% % #
# *
)*
"
-, *
% #%
<
-,
) 9 > #%
<
2 " -, 7
<
8 9=
C" "
-,
B
# '
#
) *
-,
*
? )
*
#$
%
%+
'
"
)
#$
*"
$
#
;
)
$# )
Æ 2
%" #
#
#
2
%
)
"
%
% , B
(=
%
$ #
&
Chapter 1
Sampling Methods
There are many probabilistic models of practical interest for which exact inference is intractable.
One important class of inference algorithms is based on deterministic approximation schemes and
includes methods such as variational methods. Here we consider an alternative very general and
widely used framework for approximate inference based on numerical sampling, also known as
the Monte Carlo technique.
Although for some applications of graphical models the posterior distribution over unobserved
variables will be of direct interest in itself, for most situations the posterior distribution is required
primarily for the purpose of evaluating expectations, for example in order to make predictions.
The fundamental problem which we therefore wish to address in this chapter involves finding
the expectation of some function with respect to a probability distribution . Here, the
components of might comprise discrete or continuous variables, or some combination of the
two. Thus in the case of continuous variables we wish to evaluate the expectation
This is illustrated schematically for a 1-dimensional distribution in Figure ??. We shall suppose
f (x )
p(x )
that such expectations are too complex to be evaluated exactly using analytical techniques. In some
cases an expectation can be evaluated by first finding an analytic approximation to the posterior
distribution. One approach to finding an approximate posterior distribution is to use variational
1
2 CHAPTER 1. SAMPLING METHODS
methods, as discussed at length in Chapter ??. The Laplace approximation framework, described
in Chapter ??, has also been applied to this problem.
The general idea behind sampling methods is to obtain a set of samples (where
) drawn from the distribution . This allows the expectation (??) to be approximated by
a finite sum
(1.2)
The accuracy of such an approximation will depend on a variety of factors, some of which will be
discussed in greater detail later in this chapter. Here we note that as long as the samples are
drawn from the distribution then and so the estimator has the correct mean. The
variance of the estimator is easily seen to be
, where
(1.3)
is the variance of the function under the distribution . It is worth emphasizing that the ac-
curacy of the estimator therefore does not depend on the dimensionality of , and that, potentially,
high accuracy may be achievable with a relatively small number of samples . Furthermore, the
variance of the estimator will decrease with increasing number of samples.
One potential difficulty, however, is that the samples might not be independent, and
so the effective sample size might be much smaller than the apparent sample size. Also, referring
back to Figure ??, we note that if is small in regions where is large, and vice versa, then
the expectation may be dominated by regions of small probability, implying that relatively large
sample sizes will be required to achieve sufficient accuracy.
While sampling methods have wide applicability, we shall of course be primarily interested in
the case in which the distribution is specified in terms of a graphical model. In the case of a
directed graph with no observed variables it is straightforward to sample from the joint distribution
(assuming that it is possible to sample from the conditional distributions at each node) using the
following ancestral sampling approach. The joint distribution is specified by
(1.4)
where denotes the set of variables associated with the parents of . To obtain a sample from
the joint distribution we make one pass through the set of variables in the order sampling
from the conditional distributions . This is always possible since at each step all of the
parent values will have been instantiated. After one pass through the graph we will have obtained
a sample from the joint distribution.
Now consider the case of a directed graph in which some of the nodes are instantiated with
observed values. We can in principle extend the above procedure, at least in the case of nodes
representing discrete variables, to give the following logic sampling approach, which can be seen
as a special case of ‘importance sampling’ discussed in Section ??. At each step, when a sampled
value is obtained for a variable whose value is observed, the sampled value is compared to
the observed value and if they agree then again the sample value is retained and the algorithm
proceeds to next variable in turn. However, if the sampled value and the observed value disagree,
then the whole sample so far is discarded and the algorithm starts again with the first node in
the graph. This algorithm samples correctly from the posterior distribution since it corresponds
3
simply to drawing samples from the joint distribution of hidden variables and data variables and
then discarding those samples which disagree with the observed data (with the slight saving of
not continuing with the sampling from the joint distribution as soon as one contradictory value
is observed). However, the overall probability of accepting a sample from the posterior decreases
rapidly as the number of observed variables, and the number of states which those variables can
take, increases.
In the case of probability distributions defined by an undirected graph there is no one-pass
sampling strategy which will sample even from the prior distribution with no observed variables.
Instead, computationally more expensive techniques must be employed, such as Gibbs sampling
which is discussed in Section ??.
As well as sampling from conditional distributions we may also require samples from a marginal
distribution. If we already have a strategy for sampling from a joint distribution then it is
straightforward to obtain samples from the marginal distribution simply by ignoring the val-
ues for in each sample.
We can use sampling methods to approximate this integral by a finite sum over samples
drawn from the current estimate for the posterior distribution , so that
(1.6)
The
function is then optimized in the usual way in the M-step. This procedure is called the Monte
Carlo EM algorithm.
It is straightforward to extend this to the problem of finding the mode of the posterior distribu-
tion over (the MAP estimate) when a prior distribution has been defined, simply by adding
to the function
before performing the M-step.
A particular instance of the Monte Carlo EM algorithm, called stochastic EM, arises if we con-
sider a finite mixture model, and draw just one sample at each E-step. Suppose the -dimensional
binary latent variable characterizes which of the components of the mixture is responsible for
generating each data point. In particular there is one vector-valued element of for each data
point in the data set, and each such vector has all of its elements equal to zero except for the par-
ticular element representing the corresponding component of the mixture. In the E-step a sample
of is taken from the posterior distribution where is the data set. This effectively
makes a hard assignment of each data point to one of the components in the mixture. In the M-step,
this sampled approximation to the posterior distribution is used to update the model parameters
in the usual way.
Now suppose we move from a maximum likelihood approach to a full Bayesian treatment in
4 CHAPTER 1. SAMPLING METHODS
which we wish to sample from the posterior distribution over . In principle we would like to draw
samples from the joint posterior , but we shall suppose that this is computationally
difficult. Suppose further that it is relatively straightforward to sample from the complete-data
parameter posterior . This inspires the data augmentation algorithm, which alternates
between two steps known as the I-step (imputation step, analogous to an E-step) and the P-step
(posterior step, analogous to an M-step).
I-step. We wish to sample from but we cannot do this directly. We therefore note the
relation
(1.7)
and hence for we first draw a sample from the current estimate for ,
and then use this to draw a sample from .
we use the samples obtained from the I-step to compute a revised estimate of the
posterior distribution over given by
(1.9)
Note that we are making a (somewhat artificial) distinction between parameters and hidden
variables . From now on we blur this distinction and focus simply on the problem of drawing
samples from a given joint posterior distribution.
to the initial value , so that changes in subsequent values due to a small change in would
grow exponentially as the sequence progresses. However, it turns out that a chaotic
need not
give rise to a practically acceptable random number generator, even when the limiting distribu-
tion is correct. For example, the finite precision used to represent continuous variables in a digital
computer can result in some functions leading to sequences which converge to a fixed value.
Instead we seek choices for
which take account of the finite representations of digital com-
puters. It is often convenient to consider integer representations for random numbers, so that the
output of the random number generator is an integer from the set , where is the
largest integer which can be stored by the computer. Corresponding continuous numbers from
can subsequently be obtained using
which is interpreted as a floating point opera-
tion. First we define the period of a random number generator to be the smallest integer such that
for all , in other words such that
is the identity operator. Clearly the largest
value which the period of a simple generator of the form can take is ,
but carelessness over the design of the random number generator can easily lead to significantly
smaller values. Longer periods can be obtained using extensions of the transformation approach,
such as using functions of and .
Many common random number generators are based on the linear congruential method which
defines
(1.10)
Considerable care must be exercised in the choice of the constants and in order to ensure a
generator with acceptable properties. One obviously desirable property is that the sequence have
its maximum period of .
We next consider how to generate random numbers from non-uniform distributions, assuming
that we already have available a source of uniformly distributed random numbers. Suppose that
is uniformly distributed over the interval , and suppose that we transform the values of
using some function
so that . The distribution of will be governed by
(1.11)
where, in this case, . Our goal is to choose the function such that the resulting
values
of have some specific desired distribution . Integrating (??) we obtain
which is the indefinite integral of . Thus, , and so we have to transform
the uniformly distributed random numbers using a function which is the inverse of the indefinite
integral of the desired distribution. This is illustrated in Figure ??.
Consider for example the exponential distribution
(1.12)
(1.13)
6 CHAPTER 1. SAMPLING METHODS
h(y)
x
p(y)
0
y
Figure 1.2: Geometrical interpretation of the transformation method for generating non-uniformly
distributed random numbers. is the indefinite integral of the desired distribution . If a
uniformly distributed random variable is transformed using then will be distributed
according to .
In this case the inverse of the indefinite integral is the ‘tan’ function.
The generalization to multiple variables is straightforward and involves the Jacobian of the
change of variables, so that
(1.14)
As a final example of the transformation method we consider the Box-Muller method for gen-
erating samples from a Gaussian distribution. Suppose we generate pairs of uniformly distributed
random numbers (which we can do by transforming a variable distributed uni-
formly over using ). Next we discard each pair unless it satisfies
. This
leads to a uniform distribution of points inside the unit circle with
, as illustrated in
Figure ??. Then, for each pair we evaluate the quantities
x2
-1
-1 0 1
x1
Figure 1.3: The Box-Muller method for generating normally distributed random numbers involves
generating samples from a uniform distribution inside the unit circle.
1.1. BASIC SAMPLING ALGORITHMS 7
(1.15)
(1.16)
(1.18)
and so and are independent and each has a normal distribution with zero mean and unit
variance.
Clearly the transformation can be used to generate normally distributed random
numbers with mean and variance .
To generate vector-valued variables having a multi-variate normal distribution with mean
and covariance we can employ the eigenvector/eigenvalue decomposition of the covariance
matrix
(1.19)
It is then easily verified that, if are univariate and normally distributed, then the variable
(1.20)
has the required multi-variate distribution. This is illustrated geometrically in Figure ??. In practice
it is computationally more efficient, and also more robust, to use a Cholesky decomposition of the
form
. Then, if is a vector valued random variable whose components are independent
and normally distributed with zero mean and unit variance, then will have mean and
covariance .
Obviously the transformation technique depends for its success on the ability to calculate and
then invert the indefinite integral of the required distribution. Such operations will only be feasible
for a limited number of very simple distributions, and so we must turn to alternative approaches
in search of a more general strategy. Here we consider two techniques called rejection sampling
and importance sampling. Although mainly limited to univariate distributions and thus not directly
applicable to complex problems in many dimensions, they do form important components in more
general strategies.
x2 u2 u1
1/2
l2
1/2
l1
m
x1
Figure 1.4: Geometrical view of a procedure for generating samples from a multi-variate normal
distribution, given a source of univariate normally-distributed random numbers. The ellipse repre-
sents the one standard deviation contour of a Gaussian distribution with mean whose covariance
matrix has eigenvectors with corresponding eigenvalues . A sample from the multivariate
Gaussian distribution can be obtained using a linear combination of the eigenvectors with coeffi-
cients given by suitably scaled univariate Gaussian distributed random variables.
distributions considered so far, and that sampling directly from is difficult. Furthermore sup-
pose, as is often the case, that we are easily able to evaluate for any given value of , up to
some normalizing constant , so that
(1.21)
In order to apply rejection sampling we need some simpler distribution , sometimes called
a proposal distribution, from which we can readily draw samples. We next introduce a constant
whose value is chosen such that for all values of . The function is called the
comparison function, and is illustrated in Figure ??. Each step of the rejection sampler involves
generating two random numbers. First, we generate a number from the distribution . Next,
we generate a number from the uniform distribution over . This pair of random num-
bers has uniform distribution under the curve of the function . Finally, if then the
sample is rejected, otherwise is retained. Thus the pair is rejected if it lies in the grey shaded
region in Figure ??. The remaining pairs then have uniform distribution under the curve of ,
and hence the corresponding values are distributed according to , as desired.
We can see more formally that the rejection sampling procedure samples from the correct distri-
bution as follows. The original values of are generated with distribution and these samples
are then accepted with probability
and so the resulting distribution of is given by
1.1. BASIC SAMPLING ALGORITHMS 9
kq(x0 ) kq(x)
u0 ~
p(x)
x0 x
Figure 1.5: In the rejection sampling method, samples are drawn from a simple distribution
and rejected if they fall in the grey area shown. The resulting samples are distributed according to
, which is the normalized version of .
normalization as
(1.22)
(1.23)
Thus the fraction of points which are rejected by this method depends on the ratio of the area
under the unnormalized distribution to the area under the curve . We therefore see that
the constant should be as small as possible subject to the limitation that must be nowhere
less than .
As an illustration of the use of rejection sampling, consider the task of sampling from the
Gamma distribution
(1.24)
where . This has, for , a bell-shaped form, as shown in Figure ??. A suitable proposal
distribution is therefore the Cauchy (??) since this too is bell-shaped and since we can use the
transformation method, discussed earlier, to sample from it. We need to generalize the Cauchy
slightly to ensure that it nowhere has a smaller value than the Gamma distribution. This can be
achieved by transforming a uniform random variable ! using ! ", which gives random
numbers distributed according to
(1.25)
"
10 CHAPTER 1. SAMPLING METHODS
0.15
p(x)
0.1
0.05
0
0 10 20 x 30
Figure 1.6: Plot showing the Gamma distribution given by (??) as the solid curve, with a scaled
Cauchy proposal distribution given by the dashed curve. Samples from the Gamma distribution
can be obtained by sampling from the Cauchy and then applying the rejection sampling criterion.
The minimum reject rate is obtained by setting " , and choosing the constant
to be as small as possible while still satisfying the requirement . The resulting
comparison function is shown, together with the Gamma distribution, in Figure ??.
ln p(x)
x1 x2 x3 x
Figure 1.7: In the case of distributions which are log concave, an envelope function for use in
rejection sampling can be constructed using the tangent lines computed at a set of grid points.
If a sample point is rejected it is added to the set of grid points and used to refine the envelope
distribution.
1.1. BASIC SAMPLING ALGORITHMS 11
The function and its gradient are evaluated at some initial set of grid points, and the
intersections of resulting tangent lines are used to construct the envelope function. Next a sample
value is drawn from the envelope distribution. This is straightforward (Exercise ??) since the log
of the envelope distribution is a succession of linear functions, and hence the envelope distribution
itself comprises a piecewise exponential distribution of the form
(1.26)
Once a sample has been drawn the usual rejection criterion can be applied. If the sample is accepted
then it will be a draw from the desired distribution. If, however, the sample is rejected, then it is
incorporated into the set of grid points, a new tangent line is computed and the envelope function
is thereby refined. As the number of grid points increases, so the envelope function becomes a
better approximation of the desired distribution and the probability of rejection decreases.
A variant of the algorithm exists which avoids the evaluation of derivatives. The adaptive
rejection sampling framework can also be extended to distributions which are not log concave,
simply by following each rejection sampling step with a Metropolis-Hastings step, giving rise to
adaptive rejection Metropolis sampling.
Clearly for rejection sampling to be of practical value we require that the comparison function
be close to the required distribution so that the rate of rejection is kept to a minimum. Now let
us examine what happens when we try to use rejection sampling in spaces of high dimension-
ality. Consider, for the sake of illustration, a somewhat artificial problem in which we wish to
sample from a zero-mean multi-variate normal distribution with covariance # , where # is the
unit matrix, by rejection sampling from a proposal distribution which is itself a zero-mean normal
distribution having covariance # . Obviously we must have in order that there exists
a such that . In -dimensions the optimum value of is given by
,
as illustrated for in Figure ??. The acceptance rate will be the ratio of volumes under the
0.5
0.25
0
−5 0 x 5
Figure 1.8: Illustrative example of rejection sampling involving sampling from a normal distribu-
tion shown by the solid curve, by using rejection sampling from a proposal distribution ,
shown by the dashed curve, which is also normal.
a good proposal distribution and comparison function. Furthermore, the exponential decrease of
acceptance rate with dimensionality is a generic feature of rejection sampling. Although rejection
can be a useful technique in one or two dimensions it is unsuited to problems of high dimension-
ality. It can, however, play a role as a sub-routine in more sophisticated algorithms for sampling in
high dimensional spaces.
(1.27)
An obvious problem with this approach is that the number of terms in the summation grows
exponentially with the dimensionality of . Furthermore, as we have already noted, the kinds
of probability distributions of interest will often have much of their mass confined to relatively
small regions of space and so uniform sampling will be very inefficient since in high dimensional
problems only a very small proportion of the samples will make a significant contribution to the
sum. We would really like to choose the sample points to fall in regions where is large (or
more precisely where the product is large).
As in the case of rejection sampling, importance sampling is based on the use of a distribution
from which it is easy to draw samples. This is illustrated in Figure ?? We can then express the
f (x )
q(x )
p(x )
Figure 1.9: Importance sampling addresses the problem of evaluating the expectation of a function
with respect to a distribution from which it is difficult to draw samples directly. Instead,
samples are drawn from a simpler distribution and the corresponding terms in the
summation are weighted by the ratios
.
1.1. BASIC SAMPLING ALGORITHMS 13
expectation in the form of a finite sum over samples drawn from
(1.28)
The quantities
are known as importance weights, and they correct the bias
introduced by sampling from the wrong distribution. Note that, unlike rejection sampling, all of
the samples generated are retained.
It will often be the case that the distribution can only be evaluated up to a normalization
constant, so that
where can be evaluated easily, whereas is unknown.
Similarly, we may wish to use an importance sampling distribution
which has the
same property. We then have
(1.29)
where
. We can use the same sample set to evaluate the ratio
with the
result
(1.30)
and hence
(1.31)
As with rejection sampling, the success of the importance sampling approach depends crucially
14 CHAPTER 1. SAMPLING METHODS
on how well the sampling distribution matches the desired distribution . If, as is often the
case, is strongly varying and has a significant proportion of its mass concentrated over
relatively small regions of space, then the set of importance weights may be dominated
by a few weights having large values, with the remaining weights being relatively insignificant.
Thus the effective sample size can be much smaller than the apparent sample size . The problem
is even more severe if none of the samples fall in the regions where is large. In that
case the apparent variances of and may be small even though the estimate of the
expectation may be severely wrong. Hence a major drawback of the importance sampling method
is the potential to produce results which are arbitrarily in error and with no diagnostic indication.
This also highlights a key requirement for the sampling distribution , namely that it should not
be small or zero in regions where may be significant. In practice for continuous densities this
implies the requirement to use distributions which are heavy tailed.
We can apply the importance sampling technique to distributions defined by graphical models
in various ways. For discrete variables a very simple approach is called uniform sampling. The
joint distribution for a directed graph is defined by (??). Each sample from the joint distribution
is obtained by first setting those variables which are in the evidence set equal to their observed
values. Each of the remaining variables is then sampled independently from a uniform distribution
over the space of possible instantiations. To determine the corresponding weight associated with
a sample we note that the sampling distribution is uniform over the possible choices for
, and that $ , where $ denotes the evidence associated with instantiated variables, and
the equality follows from the fact that every sample which is generated is necessarily consistent
with the evidence. Thus the weights are simply proportional to . Note that the variables can
be sampled in any order. This approach can again yield poor results if the posterior distribution is
far from uniform.
An improvement on this approach is called likelihood weighted sampling and is based on ancestral
sampling of the variables. For each variable in turn, if that variable is in the evidence set then
it is just set to its instantiated value. If it is not in the evidence set then it is sampled from the
conditional distribution in which the conditioning variables are set to their currently
sampled values. The weighting associated with the resulting sample is given by
(1.32)
This method can be further extended using self-importance sampling in which the importance sam-
pling distribution is continually updated to reflect the current estimated posterior distribution. A
further refinement called Markov blanket scoring distributes fractions of samples to the states of a
node in proportion to the probability of these values conditioned on the Markov blanket of a node.
1.1.5 Sampling-Importance-Resampling
The rejection sampling method discussed in Section ?? depends in part for its success on the deter-
mination of a suitable value for the constant . For many examples of distributions and it
will be impractical to determine a suitable value for in that any value which is sufficiently large
to guarantee a bound on the desired distribution will lead to impractically small acceptance rates.
As in the case of rejection sampling, the sampling-importance-resampling (SIR) approach also
makes use of a sampling distribution , but avoids having to determine the constant . There
are two stages to the scheme. In the first stage samples
are drawn from . Then
1.1. BASIC SAMPLING ALGORITHMS 15
in the second stage weights % % are constructed using
%
(1.33)
A second set of samples is drawn from the discrete distribution with probabili-
ties given by the weights % %
.
The resulting samples are only approximately distributed according to , but the distri-
bution becomes correct in the limit . To see this we note that the cumulative distribution of
the resampled values is given by
%
#
(1.34)
where # is the indicator function (which equals if it’s argument is true and otherwise). Taking
the limit , and assuming suitable regularity of the distributions, we can replace the sums
by integrals weighted according to the original sampling distribution
#
#
# (1.35)
which is the cumulative distribution function of . Again we see that the normalization of
is not required.
For a finite value of , and a given initial sample set, the resampled values will only approx-
imately be drawn from the desired distribution. As with rejection sampling, the approximation
improves as the sampling distribution gets closer to the desired distribution . When
the initial samples
have the desired distribution, and the weights
%
so that the resampled values also have the desired distribution.
If moments with respect to the distribution are required, then they can be evaluated di-
16 CHAPTER 1. SAMPLING METHODS
rectly using the original samples together with the weights, since
&
& % (1.36)
We can apply the re-sampling formalism of Section ?? to obtain a sequential Monte Carlo algo-
rithm known as the particle filter. Consider the class of distributions represented by the graphical
model in Figure ??. In many applications the observed data arrive sequentially and we wish
xt-1 xt xt+1
yt-1 yt yt+1
Figure 1.10: Directed graphical model in which a sequence of observations is ex-
plained in terms of a hidden Markov chain of latent variables . Sampling from the
posterior distribution of the hidden variables can be performed sequentially using the particle fil-
ter algorithm.
to update the posterior distribution of the hidden variables in the light of each new observation.
The probabilistic model is specified in terms of the transition probability and the obser-
vation model . This class of models includes the hidden Markov model (Chapter ??) and
the Kalman filter (Chapter ??) as special cases. The standard hidden Markov model uses discrete
distributions while the simplest form of the Kalman filter is based on linear-Gaussian distributions,
and so both models are amenable to exact solution, as discussed already.
There is, however, considerable interest in extending such models to more complex choices for
the conditional probability distributions, particularly for the observation model . This can
lead to highly complex posterior distributions over the hidden variables which are no longer
analytically tractable. We therefore consider the application of sampling methods. In particular,
suppose we are given the observed values and we wish to draw samples from
the posterior distribution , in order to evaluate the expectation of some function with
1.1. BASIC SAMPLING ALGORITHMS 17
% (1.37)
where is a set of samples drawn from , and we have made use of the conditional
independence property which follows from the graph in Figure ??. The
sampling weights % are defined by
%
(1.38)
Thus the posterior distribution is represented by the set of samples together with
the corresponding weights % . Note that these weights satisfy % and % .
Since we wish to find a sequential sampling scheme we suppose that a set of samples and
weights have been obtained at time step ' and that we have subsequently observed the value of
and we wish to find the weights and samples at time step ' . We first sample from the
distribution . This is straightforward since, again using Bayes’ theorem
% (1.39)
18 CHAPTER 1. SAMPLING METHODS
where we have made use of the conditional independence properties
and which follow from the application of the d-separation criterion to
the graph in Figure ??. The distribution given by (??) is a mixture distribution, and samples can
be drawn by choosing a component with probability given by the mixing coefficients % and
then drawing a sample from the corresponding component.
In summary, we can view each step of the particle filter algorithm as comprising two stages.
At time step ' we have a sample representation of the posterior distribution expressed
as samples with corresponding weights % . This can be viewed as a mixture repre-
sentation of the form (??). To obtain the corresponding representation for the next time step, we
first draw samples from the mixture distribution (??), and then for each sample we use the new
observation to evaluate the corresponding weights % . This is illustrated
schematically in Figure ??.
p(xt|y(t))
p(xt|y(t+1 ))
p(yt+1|xt+1)
p(xt+1|y(t+1 ))
x
Figure 1.11: Schematic illustration of the operation of the particle filter. At time step ' the posterior
is represented as a mixture distribution, shown schematically as circles whose sizes are
proportional to the weights % . A set of samples is then drawn from this distribution, and the
new weights % evaluated using .
candidate sample from the proposal distribution and then accept the sample according to an
appropriate criterion. In the basic Metropolis algorithm we assume that the proposal distribution
is symmetric, that is for all values of and . The candidate sample is then
accepted with probability
(
(1.40)
This can be achieved by choosing a random number with uniform distribution over the unit
interval and then accepting the sample if ( . Note that if the step from to
causes an increase in the value of then the candidate point is certain to be kept.
If the candidate sample is accepted then , otherwise the candidate point is dis-
carded, is set to , and another candidate sample drawn from the distribution
.
This is in contrast to rejection sampling, where rejected samples are simple discarded. In the
Metropolis algorithm when a candidate point is rejected, the previous sample is included instead in
the final list of samples, leaded to multiple copies of samples1 . As we shall see, as long as
is positive for any values of and (this is a sufficient but not necessary condition) the distribu-
tion of tends to as ' . It should be emphasized, however, that the sequence
is not a set of independent samples from since successive samples are highly correlated. If
we wish to obtain independent samples then we can discard most of the sequence and just retain
every
sample. For sufficiently large the retained samples will for all practical purposes be
independent.
Figure ?? shows a simple illustrative example of sampling from two-dimensional normal distri-
bution using the Metropolis algorithm in which the proposal distribution is an isotropic Gaussian.
Further insight into the nature of Markov chain Monte Carlo algorithms can be gleaned by
looking at the properties of a specific example namely a simple random walk. Consider a state
space consisting of the integers, with probabilities
(1.41)
where denotes the state at step . If the initial state is then by symmetry the expected
state at time will also be zero , and similarly it is easily seen that
. Thus
after steps the random walk has only travelled a distance which on average is proportional to
the square root of . This square root dependence is typical of random walk behaviour and shows
that random walks are very inefficient in exploring the state space. As we shall see, a central goal
in designing Markov chain Monte Carlo methods is to avoid random walk behaviour.
2.5
1.5
0.5
0
0 1 2 3
Figure 1.12: A simple illustration using Metropolis algorithm to sample from a normal distribu-
tion whose one standard deviation contour is shown by the ellipse. The proposal distribution is
an isotropic normal distribution whose standard deviation is . Steps which are accepted are
shown as solid lines, while rejected steps are shown dashed. A total of candidate samples are
generated, which include rejections.
This of course can be represented as a directed graph in the form of a chain. We can then spec-
ify the Markov chain by giving the probability distribution for the initial variable together
with the conditional probabilities for subsequent variables in the form of transition probabilities
. A Markov chain is called homogeneous if the transition probabili-
ties are the same for all . Initially we will restrict attention to finite, discrete state spaces for the
variables, and discuss the extension to countably infinite and continuous state spaces later.
The marginal probability for a particular variable can be expressed in terms of the marginal
probability for the previous variable in the chain in the form
(1.43)
A distribution is said to be invariant, or stationary, with respect to a Markov chain if each step in
the chain leaves that distribution invariant. Thus, for a homogeneous Markov chain with transition
probabilities , the distribution is invariant if
(1.44)
¼
Note that a given Markov chain may have more than one invariant distribution. For instance, if
1.2. MARKOV CHAIN MONTE CARLO 21
the transition probabilities are given by the identity transformation, then any distribution will be
invariant.
A sufficient (but not necessary) condition for ensuring that the required distribution is
invariant, is to choose the transition probabilities to satisfy the property of detailed balance, defined
by
(1.45)
for the particular distribution . It is easily seen that a transition probability which satisfies
detailed balance with respect to a particular distribution will leave that distribution invariant, since
(1.46)
¼ ¼ ¼
Our goal is to use Markov chains to sample from a given distribution. We can achieve this if
we set up a Markov chain such that the desired distribution is invariant. However, we must also
require that for, , the distribution converges to the required invariant distribution
, irrespective of the choice of initial distribution . This property is called ergodicity, and
the invariant distribution is then called the equilibrium distribution. Clearly an ergodic Markov
chain can have only one equilibrium distribution.
In Appendix ?? we show that a homogeneous Markov chain will be ergodic, subject only to
weak restrictions on the invariant distribution and the transition probabilities.
In practice we often construct the transition probabilities from a set of ‘base’ transitions ) ) .
This can be achieved through a mixture distribution of the form
* ) (1.47)
for some set of mixing coefficients * * satisfying * and * . Alternatively, the
base transitions may be combined through successive application, so that
) ) )
(1.48)
½
½
If a distribution is invariant with respect to each of the base transitions, then obviously it will
also be invariant with respect to either of the given by (??) or (??). For the case of the
mixture (??), if each of the base transitions satisfies detailed balance, then the mixture transition
will also satisfy detailed balance. This does not hold for the transition probability constructed
using (??), although by symmetrizing the order of application of the base transitions, in the form
) ) ) ) ) ) , detailed balance can be restored.
A common example of the use of composite transition probabilities is where each base transi-
tion changes only a subset of the variables. Clearly each base transition itself will not be ergodic.
However, the composite probability will be ergodic, as long as condition (??) is satisfied.
22 CHAPTER 1. SAMPLING METHODS
Earlier we introduced the basic Metropolis algorithm, without actually demonstrating that it sam-
ples from the required distribution. Before giving a proof we first discuss a generalization, known
as the Metropolis-Hastings algorithm, to the case where the proposal distribution is no longer a
symmetric function of its arguments. In particular at step of the algorithm, in which the current
state is , we draw a sample from the distribution and then accept it with probability
( where
(
(1.49)
Here labels the members of the set of possible transitions being considered. This is generally
referred to as the Metropolis-Hastings algorithm. Obviously for a symmetric proposal distribution
this reduces to the standard Metropolis criterion given by (??).
We can show that is an invariant distribution of the Markov chain defined by the Metropolis-
Hastings algorithm by showing that detailed balance, defined by (??), is satisfied. Using (??) we
have
(
as required.
It is worth noting that the evaluation of the acceptance criterion (??) does not require knowledge
of the normalizing constant in the probability distribution
.
The specific choice of proposal distribution can have a marked effect on the performance of
the algorithm. For continuous state spaces a common choice is a Gaussian centered on the current
state, leading to an important trade-off in determining the width parameter of this distribution. If
the width is small then the proportion of accepted transitions will be high, but progress through
the state space takes the form of a slow random walk leading to very long correlation times. How-
ever, if the width parameter is large then the rejection rate will be high since, in the kind of complex
problems we are considering, many of the proposed steps will be to states for which the probability
is low. Consider a multi-variate distribution having strong correlations between the com-
ponents of , as illustrated in Figure ??. The scale + of the proposal distribution should be as large
as possible without incurring high rejection rates. This suggests that + should be of the same order
as the smallest length scale . The system then explores the distribution along the more ex-
tended direction by means of a random walk, and so the number of steps to arrive at a state which
is more or less independent of the original state is of order
. In fact in two dimensions
the increase in rejection rate as + increases is offset by the larger steps sizes of those transitions
which are accepted, and more generally for a multivariate Gaussian the number of steps required
to obtain independent samples scales like
where is the second smallest standard de-
viation. These details aside, it remains the case that if the length scales over which the distributions
varies are very different in different directions then the Metropolis Hastings algorithm can lead to
very slow convergence.
1.3. GIBBS SAMPLING 23
smax
smin
Figure 1.13: Schematic illustration of the use of an isotropic Gaussian proposal distribution (circle)
to sample from a correlated multivariate Gaussian distribution (dashed ellipse) having very differ-
ent standard deviations in different directions, using the Metropolis-Hastings algorithm. In order
to keep the rejection rate low the scale + of the proposal distribution should be of the order of the
smallest standard deviation , which leads to random walk behaviour in which independent
states are separated by roughly
steps where is the largest standard deviation.
(1.51)
Next we replace by a value obtained by sampling from the conditional distribution
(1.52)
so that the new value for is used straight away in subsequent sampling steps. Then we update
with a sample drawn from
(1.53)
and so on, cycling through the three variables in turn. The basic Gibbs sampling algorithm is
summarized in Figure ??.
To show that this procedure samples from the required distribution we first of all note that
24 CHAPTER 1. SAMPLING METHODS
1. Initialize ,
2. For - .
– Sample .
– Sample .
..
.
– Sample .
..
.
– Sample .
the distribution is an invariant of each the Gibbs sampling steps individually, and hence of
the whole Markov chain. This follows from the fact that when we sample from the
marginal distribution is clearly invariant since the values of the are unchanged.
Also, each step by definition samples from the correct conditional distribution . Since
these conditional and marginal distributions together specify the joint distribution, we see that the
joint distribution is itself invariant.
The second requirement to be satisfied in order that the Gibbs sampling procedure samples
from the correct distribution is that it be ergodic. A sufficient condition for ergodicity is that none
of the conditional distributions be anywhere zero. If this is the case then any point in space
can be reached from any other point in a finite number of steps involving one update of each
of the component variables. If this requirement is not satisfied, so that some of the conditional
distributions have zeros, then ergodicity, if it applies, must be proven explicitly.
The distribution of initial states must also be specified in order to complete the algorithm, al-
though samples drawn after many iterations should become independent of this distribution. Of
course successive samples from the Markov chain will be highly correlated, and so to obtain sam-
ples which are nearly independent it will be necessary to sub-sample the sequence.
Where appropriate we can generalize the Gibbs procedure slightly to sample from sets of vari-
ables, conditioned on the remaining variables, at each step. Note that these sets do not need to be
disjoint.
We can obtain the Gibbs sampling procedure as a particular instance of the Metropolis-Hastings
algorithm as follows. Consider a Metropolis-Hastings sampling step involving the group of vari-
able in which the remaining variables remain fixed, and for which the transition probability
is given by . We note that since these components are unchanged by
the sampling step. Also, . Thus the factor which determines the acceptance
probability in the Metropolis-Hastings (??) is given by
Thus the Metropolis-Hastings steps are always accepted, and therefore this choice of proposal dis-
tribution corresponds to the Gibbs sampling algorithm.
The practical applicability of Gibbs sampling depends on the ease with which samples can be
1.3. GIBBS SAMPLING 25
drawn from conditional distributions . In the case of graphical models, the conditional
distributions for individual nodes depend only on the variables in the corresponding Markov blan-
kets, as illustrated in Figure ??. For directed graphs, a very broad choice of conditional distribu-
(a) (b)
Figure 1.15: The Gibbs sampling requires samples to be drawn from the conditional distribution of
a variable conditioned on the remaining variables. For graphical models, this conditional distribu-
tion is a function only of the states of the nodes in the Markov blanket. In the case of an undirected
graph (a) this comprises the set of neighbours while for a directed graph (b) the Markov blanket
comprises the parents, the children and the co-parents.
tions for the individual nodes conditioned on their parents will lead to conditional distributions
for Gibbs sampling which are log concave. The adaptive rejection sampling methods discussed
in Section ?? therefore provide a framework for Monte Carlo sampling from directed graphs with
broad applicability.
As with the Metropolis algorithm, we can gain some insight into the behaviour of Gibbs sam-
pling by investigating its application to a Gaussian distribution. Consider a correlated Gaussian in
two variables, as illustrated in Figure ??, having a conditional distribution of width . and a marginal
distribution of width . The typical step size is governed by the conditional distributions and will
be of order .. Since the state evolves according to a random walk, the number of steps needed to ob-
tain independent samples from the distribution will be of order
. . Of course if the Gaussian
distribution were uncorrelated then the Gibbs sampling procedure would be optimally efficient.
For this simple problem we could rotate the coordinate system in order to decorrelate the variable,
however, in practical applications it will generally be infeasible to find such transformations.
One approach to reducing random walk behaviour in Gibbs sampling is called over-relaxation.
In its original form this applies to problems for which the conditional distributions are Gaussian.
This represents a more general class of distributions than the multi-variate Gaussian, since for ex-
ample the non-Gaussian distribution
has Gaussian conditional distributions.
At each step of the Gibbs sampling algorithm the conditional distribution for a particular compo-
nent has some mean and some variance . In the over-relaxation framework the value of
is replaced with
*
* / (1.55)
where / is a Gaussian random variate with zero mean and unit variance, and * is a parameter such
that 0 * 0 . For * the method is equivalent to standard Gibbs sampling. When * is
negative the step is biased to the opposite side of the mean. It is easily seen that this step leaves the
desired distribution invariant since if has mean and variance , then so too does . Also it
is clear that if the original Gibbs sampling is ergodic, then the over-relaxed version will be ergodic
26 CHAPTER 1. SAMPLING METHODS
x2
x1
Figure 1.16: Illustration of Gibbs sampling by alternate updates of two variables whose distribu-
tion is a correlated Gaussian. The step size is governed by standard deviation of the conditional
distribution, and is 1 ., leading to slow progress in the direction of elongation of the distribution.
The number of steps needed to obtain an independent sample from the distribution is 1
. .
also. Thus the effect of over-relaxation is to tend to produce directed motion through state space
when the variables are highly correlated. The framework of ordered over-relaxation generalizes this
approach to non-Gaussian distributions.
The simplicity of Gibbs sampling allows it to be applied to a broad range of models. In fact
it has been used as the basis for a general purpose software package BUGS (‘Bayesian inference
Using Gibbs Sampling’) for sampling from models specified as directed acyclic graphs. The con-
ditional distribution for each node is dependent only on the states of nodes in the corresponding
Markov blanket. If the graph is constructed using distributions from the exponential family, and if
the parent-child relationships preserve conjugacy, then the full conditional distributions arising in
Gibbs sampling will take the same form as the original conditional distributions (conditioned on
the parents) defining each node, and so standard sampling techniques can be employed. In general
the full conditional distributions will be of a complex form that does not permit the use of standard
sampling algorithms. However, these conditionals will be log concave and sampling can be done
efficiently using adaptive rejection sampling (assuming the corresponding variable is a scalar).
Because the basic Gibbs sampling technique considers one variable at a time, there are strong
dependencies between successive samples. At the opposite extreme, if we could draw samples
directly from the joint distribution (an operation that we are supposing is intractable) then suc-
cessive samples would be independent. We can hope to improve on the simple Gibbs sampler by
sampling successively from groups of variables rather than individual variables. This is achieved
in the blocking Gibbs sampling algorithm by choosing blocks of variables, not necessarily disjoint,
and then sampling jointly from the variables in each block in turn, conditioned on the remaining
variables. This can be done in a way that preserves tractability while leading to a large block by
starting with the junction tree for the full graph in which the block comprises the complete set of
variables, and then removing variables from the block until a tractable structure is obtained.
1.4. SLICE SAMPLING 27
and so we can sample from by sampling from and then ignoring the values. This
can be achieved by alternately sampling and . Given the value of we evaluate and then
sample uniformly in the range , which is straightforward. Then we fix and sample
uniformly from the ‘slice’ through the distribution defined by . This is illustrated
in Figure ??(a).
~
p (x ) ~
p (x )
u xmin u xmax
x(t) x x(t) x
(a) (b)
Figure 1.17: Illustration of slice sampling. (a) For a given value , a value of is chosen uniformly
in the region , which defines a ‘slice’ through the distribution, shown by the solid
horizontal lines. (b) Since sampling directly from a slice is infeasible, a new sample of is drawn
from a region which contains the previous value .
In practice it can be difficult to sample directly from a slice through the distribution and so
instead we define a sampling scheme which leaves the uniform distribution under invariant,
which can be achieved by ensuring that detailed balance is satisfied. Here we consider the case of
a univariate .
Suppose the current value of is denoted and that we have obtained a corresponding
sample . The next value of is sampled uniformly from a region which contains
28 CHAPTER 1. SAMPLING METHODS
. If the sample chosen lies within the slice, in other words if , then the sample is
retained and forms .
It is in the choice of the region that the adaptation to the characteristic length
scales of the distribution takes place. We want the region to encompass as much of the slice as
possible so as to allow large moves in space, while having as little as possible of this region lying
outside the slice, since this makes the sampling less efficient.
One approach to the choice of region involves starting with a region containing having
some width % and then testing each of the end points to see if they lie within the slice. If either
end point does not then the region is extended in that direction by increments of value % until the
endpoint lies outside the region. A candidate value is then chosen uniformly from this region
and if it lies within the slice then it forms . If it lies outside the slice then the region is shrunk
such that forms an end point and such that the region still contains . Then another candidate
point is drawn uniformly from this reduced region and so on, until a value of is found which lies
within the slice.
Slice sampling can be applied to multivariate distributions by repeatedly sampling each vari-
able in turn, in the manner of Gibbs sampling. This requires that we are able to compute, for each
component , a function which is proportional to .
(1.56)
-
1.5. THE HYBRID MONTE CARLO ALGORITHM 29
where the can be regarded as position variables in this dynamics perspective. Thus for each
position variable there is a corresponding momentum variable, and the joint space of position and
momentum variables is called phase space.
Without loss of generality we can write the probability distribution in the form
2 (1.57)
where 2 is interpreted as the potential energy of the system when in state . The system accelera-
tion is the rate of change of momentum and is given by the applied force which itself is the negative
gradient of the potential energy
3 2
(1.58)
-
4 (1.59)
The total energy of the system is then the sum of its potential and kinetic energies
5 2 4 (1.60)
which is called the Hamiltonian function. Using (??), (??), (??) and (??) we can now express the
dynamics of the system in terms of the Hamiltonian equations given by
5
(1.61)
-
5
(1.62)
-
(1.63)
During the evolution of this dynamical system, the value of the Hamiltonian 5 is constant, as is
easily seen by differentiation
5 5 5
-
- -
5 5 5 5
(1.64)
5 5
5 5
6 (1.66)
Now consider the joint distribution over phase space whose total energy is the Hamiltonian,
i.e. the distribution given by
5 (1.67)
Using the two results of conservation of volume and conservation of 5 it follows that the Hamil-
tonian dynamics will leave invariant. This can be seen by considering a small region of
phase space over which 5 is approximately constant. If we follow the evolution of the Hamilto-
nian equations for a finite time, then the volume of this region will remain unchanged as will the
value of 5 in this region, and hence the probability density, which is a function only of 5 , will also
be unchanged.
Although 5 is invariant, the values of and will vary, and so by integrating the Hamiltonian
dynamics over a finite time duration it becomes possible to make large changes to in a systematic
way which avoids random walk behaviour.
Evolution under the Hamiltonian dynamics will not, however, sample ergodically from
since the value of 5 is constant. In order to make arrive at an ergodic sampling scheme we can
introduce additional moves in phase space which change the value of 5 while also leaving the
distribution invariant. The simplest way to achieve this is to replace the value of with
one drawn from its distribution conditioned on . This can be regarded as a Gibbs sampling step,
and hence from Section ?? we see that this leaves the desired distribution invariant. Noting that
and are independent in the distribution we see that the conditional distribution of is a
Gaussian from which it is straightforward to sample.
In a practical application of this approach we have to address the problem of performing a
numerical integration of the Hamiltonian equations. This will necessarily introduce numerical
errors and so we should devise a scheme which minimizes the impact of such errors. In fact it turns
out that integration schemes can be devised for which Liouville’s theorem still holds exactly. This
property will be important in the hybrid Monte Carlo algorithm which is discussed in Section ??.
One scheme for achieving this is called the leapfrog discretization and involves alternately updating
discrete-time approximations and to the position and momentum variables using
7 2
- 7
- - (1.68)
- 7 - 7 - 7
(1.69)
7 2
- 7 - 7
- 7 (1.70)
1.5. THE HYBRID MONTE CARLO ALGORITHM 31
We see that this takes the form of a half-step update of the momentum variables with step size 7
,
followed by a full-step update of the position variables with step size 7, followed by a second half-
step update of the momentum variables. If several leapfrog steps are applied in succession, it can
be seen that half-step updates to the momentum variables can be combined into full-step updates
with step size 7. The successive updates to position and momentum variables then leapfrog over
each other. In order to advance the dynamics by a time interval - we need to take -
7 steps. The
error involved in the discretized approximation to the continuous time dynamics will go to zero,
assuming a smooth function 2 , in the limit 7 . However, for a non-zero 7 as used in practice,
some residual error will remain. We shall see in Section ?? how the effects of such errors can be
eliminated in the hybrid Monte Carlo algorithm.
In summary then, the Hamiltonian dynamical approach involves alternating between a series of
leapfrog updates and a re-sampling of the momentum variables from their marginal distribution.
Note that the Hamiltonian dynamics method, unlike the basic Metropolis algorithm, is able to
make use of information about the gradient of the log probability distribution as well as about the
distribution itself. An analogous situation is familiar from the domain of function optimization.
In most cases where gradient information is available it is highly advantageous to make use of it.
Informally, this follows from the fact that in a space of dimension the additional computational
cost of evaluating a gradient compared to evaluating the function itself will typically be a fixed
factor independent of , whereas the gradient vector conveys pieces of information compared to
the one piece of information given by the function itself.
5 5 (1.71)
If the leapfrog integration were to simulate the Hamiltonian dynamics perfectly, then every
such candidate step would automatically be accepted since the value of 5 would be unchanged.
Due to numerical errors, the value of 5 may sometimes decrease, and we would like the Metropolis
criterion to remove any bias due to this effect and ensure that the resulting samples are indeed
drawn from the required distribution. In order for this to be the case, we need to ensure that the
update equations corresponding to the leapfrog integration satisfy detailed balance (??). This is
easily achieved by modifying the leapfrog scheme as follows. Before the start of each leapfrog
integration sequence we choose at random, with equal probability, whether to integrate forwards
in time (using step size 7) or backwards in time (using step size 7). We first note that the leapfrog
integration scheme (??)–(??) is time-reversible, so that integration for steps using stepsize 7 will
exactly undo the effect of integration for steps using stepsize 7.
Next we show that the leapfrog integration preserves phase space volume exactly. This follows
from the fact that each step in the leapfrog scheme updates either the variable or the variable
by an amount which is a function only of the other variable. As shown in Figure ?? this has the
32 CHAPTER 1. SAMPLING METHODS
effect of shearing a region of phase space while not altering its volume.
ri ri'
xi xi'
Figure 1.18: Each step of the leapfrog algorithm (??)–(??) modifies either a position variable or a
momentum variable . Since the change to one variable is a function only of the other, any region
in phase space will be sheared without change of volume.
Finally, we use these results to show that detailed balance holds. Consider a small region of
phase space which, under a sequence of leapfrog iterations of step size 7, maps to a region .
Using conservation of volume under the leapfrog iteration we see that if has volume Æ6 then
so too will . If we choose an initial point from the distribution (??) and then update using
leapfrog interactions, the probability of the transition going from to is given by
5 Æ6
5 5
(1.72)
5 Æ6
5 5 (1.73)
It is easily seen that the two probabilities (??) and (??) are equal, and hence detailed balance holds 2 .
It is not difficult to construct examples for which the leapfrog algorithm returns to its starting
position after a particular number of iterations. In such cases the random replacement of the mo-
mentum values before each leapfrog integration will not be sufficient to ensure ergodicity since
the position variables will never be updated. Such phenomena are easily avoided by choosing the
magnitude of the step size at random from some small interval, before each leapfrog integration.
We can gain some insight into the behaviour of the hybrid Monte Carlo algorithm by consid-
ering its application to a multivariate Gaussian. For convenience consider a Gaussian distribution
with independent components, for which the Hamiltonian is given by
5
(1.74)
Our conclusions will be equally valid for a Gaussian distribution having correlated components
2 This proof ignores any overlap between the regions Ê and Ê , but is easily generalized to allow for such overlap.
¼
1.6. ESTIMATING THE PARTITION FUNCTION 33
since the hybrid Monte Carlo algorithm exhibits rotational isotropy. During the leapfrog integra-
tion each pair of phase space variables evolves independently. However, the acceptance or
rejection of the candidate point is based on the value of 5 which depends on the values of all
of the variables. Thus, a significant integration error in any one of the variables could lead to a
high probability of rejection. In order that the discrete leapfrog integration be a reasonably good
approximation to the true continuous-time dynamics it is necessary for the leapfrog integration
scale 7 to be smaller than the shortest length-scale over which the potential is varying significantly.
This is governed by the smallest value of which we denote by . Recall that the goal of the
leapfrog integration in hybrid Monte Carlo is move to a substantial distance through phase space
to a new state which is relatively independent of the initial state and still achieve a high probability
of acceptance. In order to achieve this the leapfrog integration must be continued for a number of
iterations of order
.
By contrast, consider the behaviour of a simple Metropolis algorithm with an isotropic Gaussian
proposal distribution of variance 8 . In order to avoid high rejection rates the value of 8 must be
of order . The exploration of state space then proceeds by a random walk, and takes of order
steps to arrive at a roughly independent state.
2 (1.75)
then the value of the partition function is not needed in order to draw samples from . How-
ever, knowledge of the value of can be useful for Bayesian model comparison, and so it is of
interest to consider how its value might be obtained. We assume that direct evaluation by sum-
ming, or integrating, the function
2 over the state space of is intractable.
For model comparison it is actually the ratio of the partition functions for two models which is
required. Multiplication of this ratio by the ratio of prior probabilities gives the ratio of posterior
probabilities, which can then be used for model selection or model averaging.
One way to estimate a ratio of partition functions is to use importance sampling from a distri-
bution with energy function 9
2
9
2 9
9
9
2 9
2 9 (1.76)
where are samples drawn from the distribution defined by . If the distribution is one
for which the partition function can be evaluated analytically, for example a Gaussian, then the
34 CHAPTER 1. SAMPLING METHODS
9 (1.77)
(1.78)
in which the intermediate ratios can be determined using Monte Carlo methods as discussed above.
One way to construct such a sequence of intermediate systems is to use an energy function con-
taining a continuous parameter
*
which interpolates between the two distributions
If the intermediate ratios in (??) are to be found using Monte Carlo, it may be more efficient to use
a single Markov chain run than to restart the Markov chain for each ratio. In this case the Markov
chain is run initially for the system : and then after some suitable number of steps moves on to
the next distribution in the sequence. Note, however, that the system must remain close to the
equilibrium distribution at each stage.
or alternatively several short ones. With a single chain there is only one burn in period. However,
such a chain may become stuck exploring a relatively isolated region of the distribution while
failing to discover other regions of significant probability.
An important concern in practice is the speed of convergence of a Markov chain to its equi-
librium distribution, as well as the decorrelation time between independent samples. This can
become particularly problematic in situations where the distribution has relatively isolated regions
of high probability such that transitions from one region to another are infrequent. One technique
for addressing this problem is called simulated annealing. Consider the problem of sampling from a
distribution of the form
2 (1.80)
2
(1.81)
2 2
(1.82)
with initially , and again is gradually reduced to . For example 2 may correspond to
the prior, and 2 may represent the contribution from the likelihood function.
An important consideration in the use of simulated annealing is the choice of annealing sched-
ule for the reduction in . In the simplest case this is pre-defined, for example a geometric reduc-
tion in obtained by multiplying by a fixed constant 0 * 0 after each iteration, although in
more sophisticated approaches the reduction in may be governed by the observed sequence of
sampled values.
Although simulated annealing does not quite sample from the correct distribution, the related
technique of simulated tempering avoids this problem. It involves the introduction of a fixed set
of temperature values of which is the lowest. The state of the temperature becomes a
stochastic variable and is itself sampled as part of the Markov chain, and a modification to the
energy function encourages the system to spend roughly equal times at the different temperatures.
At higher temperature values the system can move more freely from one region of state space to
another, whereas when the system has temperature the state space samples are drawn from
the required distribution.
36 CHAPTER 1. SAMPLING METHODS
Exercises
1.1 (;) Show that the finite sample estimator (??) has mean and variance
where is
defined by (??).
1.2 (;) Let be a -dimensional random variable having a Gaussian distribution with zero mean
and unit covariance matrix, and suppose that the positive definite symmetric matrix has
the Cholesky decomposition
. Show that the variable has a Gaussian
distribution with mean and covariance . This provides a technique for generating samples
from a general multivariate Gaussian using samples from a univariate Gaussian having zero
mean and unit variance.
1.3 (;) Suppose that ! has a uniform distribution over the interval . Show that the variable
! " has a Cauchy distribution given by (??).
1.4 (; ;) Determine expressions for the coefficients in the envelope distribution (??) for adaptive
rejection sampling using the properties of continuity and normalization. Hence determine a
scheme for sampling from this distribution.
1.5 (;) Use the sum and product rules of probability to verify directly the conditional indepen-
dence properties and for the graph in
Figure ??.
.1. ERGODICITY OF MARKOV CHAINS 37
1.6 (;) Show that the simple random walk over the integers defined by (??) has the property that
and hence by induction that
.
1.7 (; ;) Show that the Gibbs sampling algorithm, discussed in Section ??, satisfies detailed balance
as defined by (??).
1.8 (; ;) Consider the simple 3-node graph shown in Figure ?? in which the observed node is
given by a Gaussian distribution - with mean and precision - . Suppose that the
m t
x
Figure 1.19: A graph involving an observed Gaussian variable with prior distributions over its
mean and precision - .
marginal distributions over the mean and precision are given by 8 and
- ,
where
denotes a Gamma distribution. Write down expressions for the conditional
distributions - and - which would be required in order to apply Gibbs sam-
pling from the posterior distribution - .
1.9 (;) Verify that the over relaxation update (??), in which has mean and variance , and
where / has zero mean and unit variance, gives a value with mean and variance .
together with a particular distribution which is an invariant distribution of and which is such
that
/
(84)
¼ £ ¼
The corresponding Markov chain will be ergodic, so that for any choice of initial distribution
(85)
38 CHAPTER 1. SAMPLING METHODS
Proof: The proof uses induction to show that the distribution at step can be written as a
mixture of the invariant distribution and some other distribution. At each step the proportion
of the invariant component cannot decrease, while the condition (??) ensures that the remaining
component will generate some contribution to the invariant component.
We suppose that at step the following holds
/
/
(86)
where is an arbitrary non-negative and normalized function, which can represents a valid
probability distribution. Since we cannot have ,$ 0 for all , we must have / 0
, and so is a convex combination of two distributions and hence is also a valid probability
distribution. The result (??) clearly holds for , with . We now show that the
equivalent result must hold at step . Using (??) we have
¼
/
/
¼ ¼
/
/
/ /
¼
/
/
/
/
¼
/
/
(87)
Clearly , and from (??) we have , so that therefore represents
a probability distribution.
This theorem easily generalizes to Markov chains over continuous state spaces.
!
"# "$$%
!
#
#
"
$
#
"
!
%
"
"
#
&
'
(
)
*
'
+
,--.
--
/00 1 +
2
3
--
%
0
!
)
*
"
!
"
4 )*
"
"
"
5
$
--
!
#
--
/671 %
#
/8 09 7:1
!
4
5
;
%
'
!
5
7
'
'
$
'
5
6
5
8
<
5
8
5
:
5
=
%
#
5
9 <
& 5
0>
4
,
.
#
4
)
*
)*
%
--
$
4
$
#
$
'
;
%
"
% ? , .
%
%
, ? .
, ? > 0 0. &
@?
4
,.
,
,.
. 4
"
@
,. ? , . ,0.
A
, .
#
,.
"
, #
.
@
4
,
.
"
@
,. ? 0
, . ,;.
!
"
,0.
&
,0.
,;.
"
4 B
"
5 /:01
-
&
'
!
#
7
C
#
,
.
,;.
,
.
!
"
#
!
,
"
. #
,. !
#
/;71
%
2
/:01 +
"
,0.
,.
@
,.
,.
,.
,.
, .
'
,.
, ,..
&
,. ,.
,.
"
"
, . "
, . !
,.
2
"
A
!
? > 0 0
%
, .
? 5
D
, ? ;.
6>>
,
"
.
5
&
6
&
5
;8;
#
! '
$
!
$
#
B
5
,E.
& 0,. ! E
X1 X2 X3 XT
q q
Y1 Y2 Y3 YT
M
,. ,.
!
,)*.
8
!
% '
E
, . , . E
!
Æ
# "
,D.
E 5
'
,
"
. E
#
E /761 !
,
.
!
Æ
E
B
/0<1
E
&
!
,
.
,
. !
E
/8:1
E
%
,
' . E
D
/::1 !
3
"
D
E
/801
#
#
F#
E
!
&
%
& 0,.
#
,
.
!
)* /7<1 !
'
&
!
" #
$
%
& &
&
'(&
&
:
& ;,.
E
/:= =: 9:1
!
/:=1
#
!
& ;,.
!
G
#
/9;1
(a) (b)
(c) (d)
"
#
$$%
&&
& ;,.
!
/681 !
,
.
/<>1 & & ;,.
E
#
!
/801
&
,
.
/ 0>> 69 ;8 0>1
!
, .
&
;H
6#
& 7,. !
=
'
,
.
#
3
#
E
,. ,.
'(
)*
+
,
)*
-
#
-
,
/661.
#
#
/ 6>1
%
&
& 7,.
/:81 !
Æ
!
#
2
%
D
#
/ 6; :91
!
#
,FHI-.
? : ,
. &
4
#
? >
@?
0 7 6 ; 8 : 0 7 8 ; 6 : &
,0>0>>>.
,000>>>.
& 6
9
!
/8=1
,
.
!
4
H
, ? 0 .
,
. &
E
& 6
'
+
'
&
#
,=. %
#
A
#
!
#
#
,
.
/== :61
#
'
4
,
&J
. C
'
<
!
)D *
,
.
"
%
!
, .
Æ
) * ,
.
%
!
#
'
!
)
#
*
%
$
"
'
#
#
4
,0.
,
. 4
+
,0.
,0.
,;. !
! "
4
#
+
G
/0 7> 8= 87 :;1 4
E
#
, . @?
,. ,6.
¼ ¼
#
4
,
.
#
!
#
#
#
,HI.
"
#
/ =1 !
HI
@
0>
&
,. ? , . &
,. ? , .
,
'
.
!
& 8
*
5
6 1
&
@? A
,7.
4
@
, G .
, .
, . ,8.
4
,.
@
, . ?
, . , . ,:.
, . @?
E
"
&
, .
#
!
C
%
)
*
,. !
, .
,
. +
;
00
/=81
#
,=.
? , .
"
,:. '
5
,:.
"
5
,=.
"
!
#
,. #
#
#
!
, . @?
, .
¼ ¼
,.
G
/7> <81
#
/<7 96 7> 01 !
)
#
* ) #
*
0;
1 2 4 2 3 6
1 2 3 1 2 3
2 6 2 4
2 2
4 5 6 4 5 6 4 5 2 5 8 5 6
8 8
4 8 6 8
4 7 8 6 8 9
7 8 9 7 8 9
ΨV ΦS ΨW
V S W
+
(
,
F
'
'
!
'
'
' '
,
& :. !
, .
, .
4 "
,.
!
"
#
'
@
, . ?
, . ,9.
Ò
, . ? ,
, .
, . ,9.
.
4
)
* ,
& =.
G
%
'
'
!
, .
. , . , .
%
"
#
/:01
07
A
%
#
'
,. F
,
'
.
,
'
'
. &
,.
' ! '
,.
"
,. ?
, .
/ , .1
,<.
(
,;.
,<.
"
&"
'(
) -
, . ? , . , . , .
!
0 ; ; 7
; -
@
, . ? , ,
., .
.
#
,9. "
, . ? , ., .
, . ? , .
!
)
*
'
5
#
, . , .
'
4
K
5
@
, .
? 0
"
,0>.
!
'
Æ
&
@
* "
,0>. #
$
,<.#
!
'
06
&
'
#
,9.
"
'
-
"
!
"
#
!
'
&
'
2
!
'
#
#
#
+
5
;6
!
'
H
#
5
7 E
-
"
!
# )
*
#
,
. C
#
&
!
,=.
%
5
;6
%
#
%
#
&
!
> 0
'
,4
D 7
5
7;. -
!
-
,.
)
* + ",-
08
@
0
0 L /, L .1
? >
,00.
,> 0.
,.
!
!
, ? 0.
,00.@
0 L , L .
,0;.
!
%
#
#
##
,0;.
)
*
#
H
K H
K 4
K !
(
#
#
#
D
'
' &
/6 8 09 7:1 &
/08 6< =<1
!
&
5
;0 &
? 0
, ? .
, ? > 0 0. !
? ? 0
-
#
! @
? !,.
Ê . Ê /
0:
?
,
. F
%
#
F ? "
$
@
!
E
Æ
?
#
%
? "
&
,.
,D.
,. 4
"
,
.@
,G . ? ,. , .
,07.
!
@
, . ?
,. ,. ,06.
I
,G .
" ,
,G . ,. ? 0.
4
,G .
!
M @?
, . # ,08.
4
M
!
!
@
+
%
M
,08.
%
M
/
091
"
?
$ ,. ? $ ,.
, #. !
#
,(
# ,
#.
Æ
+
' !
E
#
"
,5
: =.
! 0
#
+
,
M
. ,
Æ
.
0=
%
Ü 0
1 / " 2
+" 3 - Ê
Ê
3 + 3
¾ -
Ê /
4 / 3
/ 3
5 / " 6
2
Ê
. "
7
1 / "
3
"
8 3 "
" 3
8 3 "
!
! 0
D
,0. ,;.
&
) 4
/: ;0 =71
%
> 0 E
, .
,5
. !
? L !
M
, .
E
#
!
"
2
&
,0:.
#
#
&
7#
#
,0:.
%#
% % % ?
> 0
09
? > 0 0
4
@
&-
"+.) %
/ 9=1
#
? , .
, .
, . A
? ; L
, L 0. , L 0.
@
>
0 > ' , .
/ @? 0 &, . @? ? ,0=.
', . , .
E ', .
#
/ 1 , .
& 9
2
@ )
* )
* '
H
!
? ," *.
! "
H
*
" '
9
. *
.*+
*
%
F
,G ". "
! H #
' 0 ; %
0<
,'G . ? /'1
/'1
' ? + &
'
" *
@
, ? , ' ? + *. ? ,* . + ? 0 % , ? 0 -
!
, ' *. ? * /'1 /1
+
@? , ' . FH%
@? ," *.
@
(
(
(
* )
$
M# %
,.*
+
$ , . L
M
#
!
I
;
/=<1
/091 !
I
;
,.
,;>.
E
, .
@
2
)
$
#
;>
"
@
%
@
! ,.
,. ,. ,. ?
@?
,;0.
A
N @ M
@
N, . @? /,.1 ? ,. ,G . ,. ,;;.
A
N
N, . ? , .
,;>. !
N
M
@? N, .
!
"
+
@
0 N
#
#
; "
M N
!
#
%
N
,.
!
,;0.
,.
#
N
4
@
* )
N
#
I
7
N ,. @? M N, . ?
,
.
,
#. Æ
M
3 !
$
> 0 5
,G .
,0 . L
M ?
,
.
" ? , ? +.
+ ? > 0 !
"
" >
L ? 0 &
/ >
Æ
N ,. @? ,
.
?
'
M N,M.
'
N,M. Æ
M
&
, .
? N, . N ,. !
4
N
;0
4 )
N
,
- ,# #$ N,M. ? -#
+
' !
Æ
%
#
#
5 % $
+
' !
,G . M
+
,;0.
,.
!
!
8
Æ
,;0.
I
7
,.
"
,
,.,. ,. ? . %
,G ,..
E
#
+
!
!
;
,.
,G ,.. ,.
N ,.
+
(
N ,.
,.
%
( ,. ? L
@
4
,
- .
$ ,.
N ,. # )
.
&
,. ? ,;8.
L (
;;
.
@? # $
,. ?
/0,,G ,...1 $
#
,- %
$
+
, . ?
,. ,;:.
"
,;:. !
"
! ;
' , .
#
&
M $
2
, !
;,..
%
: £
;
Ê $< = & $> <& = < <
9
> $ = & >
¾
$) &
5
> = > <
/
Ê > =
+
@? #
&
$
? /> 01
> 0 %
,. ?
L ,0 .
,0 .
"
!
,G ,..
"
!
? /> 01 $
$
!
"
+
, . / >
,
#.
,
.
A
,> >.
4
, .
, . @
), .
, . L , .
F
2 , . #
, . ? !
F
$
/0= ;>1
%
F
&J
#
,;9.
, .
,;<.
F @
), . $ ), . ? , . L , . ,7>.
A
&J
,;9.
!
;,.
,;:.
@
, . L ,. ? >
;6
(
,7>.
!
;,.
), . ? >
&
,;9.
, .
,7>.
F @
,
. !
F
& <
&
5
;7
C
, .
? /,.1
,G .
!
+
!
;,.
@
, . ?
,.
,7;.
"
,7;.
&
, .
"
5
,7;.
@
* .
M$
,7;.
+
,
-
#
,-
M?
&
( #
+
' 4
I
8,.
M ?
I
; E
F
"
I
,
! ;.
M ?
? ,> L.
!
, . ? , .
F
"
4
"
-
7
!
"
D 6
? 2
!
'
$ ,2. "
;:
%
"
,0=.
, L 0. , L 0.
,.
@
,. @? 0 0 ? ',. 0 ' ,.
,76.
,.
',. @? /1
,. @? / 1
%
? ? 0
@
* 3 %
0
$
F
!
/,.1 ? 0 %
&
? 0
/ >
"
7
A
?
-
,
? > 0
? 0 .
* 8 %
$
$
?
$ 3 , ' ,7:.
' # $
@? ,.#
+
' 5
#
/=<1
I
=
'
,7:.
I
=
%C , .
!
$ 3 !
,
.
%
%
' %C , .
"
& 0>
!
H" F /771
G
? =
;=
%C , .
$
$ ? 3
9
5
A1
B
Æ
; 0>
O
#
2@
8 &"
) !
> 0
Æ
%C ,.
#
? , . 4
,
.
D 7
, . !
@
? / 1 ? , ? 0G . ? / 1 ? , ? 0 ? 0G .
,!
'
. &
, .
'
, G .
@
,0 L . , .
, G . ?
, .
A
,
'
"
G . ? 0 !
'
Æ
#
@
0 L " > ,7=.
" > ,7=.
" >
? ,7=.
4
'
G I
0
?
, .
#
,7=.
5
0
#
"
6 , 0.
;9
+
' ,. D =
,
? > 0 0.
I
0
,. 5
'
5 5
;8;
'
2
(
+
#
#
-
,7=.@
#
,0 L .
#
, ? > ? >.
#
!
!
'
E
? > 0 0
&
, , .
? , 5
,, %. , .
, . ? ,, %. !
@
, .
? 0 , ,79.
, .
, . ,, %. ,79.
!
,
, . ? 0.
G
7#
, . 4
,79. ,79.
%
"@
@? , ? ,G .
@? ,, . ? ,, %.G . , . ,7<.
&
'
@
, . @?
, . , . @?
, . ,6>.
#
9 &4
(
) !
#
%C ,.
"
-
'
%
#
"
,
, . ? 0.
"
,
, . ? , .. %
@
F+-%F, . @? " > , . ? 0
, . ? , . ,60.
;<
, . A
"
'
"
'
" 5
,
.
%C , . % F+-%F, .
4
+
'
%C ,.
? L
G
E
,60.
¼
Æ
!
¼
I
8,.
#
D
#
#
Æ
? , 0. L , 0.
, ? ;. D =
%&
& &
4
#
,;7.
+
#
G
%
#
-
%
! ;
@ ,.
,.
N ,.G ,.
,G ,.. !
& 00
,.
N 0,,G ,... ,.
+
£
9
C
!
7>
* 9 )
# $
+
,
- 1
$
,. ? N ,.#
,- #
,- .
$
, . ?
&, . ,. L 0
,. L 0
;
4 ,66.
; ;
70
,7;.
"
,. & >
?
#
,66.
,. ? ;/&, .1
%
/,.1 ? 0
F 5
#
# #!
/91
,. ? ;/&, . L 5 1 5
F
, L 0. , L 0.
"
&
#
/8>1
,.
&, .
@
,. ',.' ,. ? ;/, .1 ,68.
',. ? /, .1 ', .
,68.
,! 5
.
&, .
,0=.
,09.
,68.
,68.
',.
!
,7;. #
4 7
4
#
5
607 %
'
#
'
!
? , ,..
Æ
,79.
?
, . , .
,6>.
$
'
,<. #
#
"
@
,. ? , . , . ,6:.
, . , .
(
$
" #5
,;6. !
;,.
"
%C ,.@
,. ?
0 , . L 6 , . ,6=.
& * 9:
)
&
7;
%
"
%C ,.
D 9
#
,7;.@
L 0 , .
6 , .
,6<.
A
@
2
%C ,.
,(,..
$
5
:
#
,=. F#
,6<.
,7;.
#
#
!
#
"
#
!
/ =7 : ;01 &
,7;.
"
5
6
Æ
,7;.@
"
" !
'
F 0
4
0
0
,0.
0
0
@
) ,0. @? M ? > "
#
,0. ,8>.
4
@
: &4
7
7
) !
,
#.
#
0
? , *. I
) ,0
. @? M ? > , .
#
, . !
,G . ? , G .
77
!
"
%
#
) ,. ? ? > , . (
,.
&
0
"
@
! , G 0. @? ? /,.1
) ,0. ,80.
!
! , G 0.
"
0 5
! , G 0. % , .
!
"
( $
)
&
4
F ,G .
!
@
* : &"
%
; 7)
.
$
+
,7;. %
P
J
&
!
8
,. N ,. (
,G ,.. @
, . ?
,G ,.. , G ,.. ,.
,.
" ,G ,.. ,.
,G ,.. ,.
? ,.
"
#F %
2
#
'
F
/7 ;=1
! <
(
%
!
# ,. ?
L ,0 .
,0 . %
# # ,.
H
@
L
L ,0 .
,0 . ,86.
F 7
,86.
!
?
,> 0.
"
H
@
8 L
,88.
8,'. @? /0 L ,'.1
%
,88.
,86. 4
,0;.
78
5
&-
%
) !
? / 1 ? / 1 ? / 1
? (
! , G 0
. ? ? ?
/ >
%
?
?
/ >
!
,. ?
, .
;
4
-
&
@
&, . ,. L 0
,
. L
;
4
$
; ;
E
?
,.
!
#
?
? L 5 # >
L !
#P
!
/701
/ 1
! #
%
+
"
!
5
P
/901
/ <<1
E
0
4
G
H
/ <<1
,87.
F
,0.
Æ
0
,0. @? "
,0.
!
@
,.
,0.
,0.
" #
0
,.
#
,0.
9
!,0. @?
, .#
,0.
+
9
!,0.
G
,0. %
9
!,0. ? : ,,0..
7:
:
$
"
,87.
,.
?
#
#
L
: ,,0.. # ,,0.. ,8:.
# # # #
+ FE5
"
! , G 0.
M % !
"
CE5
#
,0. ,0.
!
0 $ 0
, .
!,0
.G
: ,,0
.. ? 5 0
,0
.
/> 01 !
CE5
,8:.
,86.
F 7,,0..
CE5
,8:.
9
,0.@
.7 ? L
.: . # ,8=.
.
#
. .
&
I
9 % %
£
,0.
* ,0. -
%%&
"
#
" & @
* ,0. Q L
Q .: ,89.
#
.
%
'
Æ ,0.
,89.
P
/681 & 0;,.
, ? 7
.
,
. %
2
!
,.
7=
Æ
:
,. ,. ,.
6
# 3
+ ,
+
(
#
Æ
%
,.
A
: ,.
,8:.
:
,
,.. A
!
: ,.
;, . ;
+ ? 0
&
E
? 7 9 ? ,. : ,. ?
!
#
,
#
.
!
: ,.
, Æ ,.. +
%%'
Æ
>
9
, .#
,0. & Æ ? Æ
,89. % '
2 Æ
4
Æ
!
,.
"
D
/7;1 4 D
,D
. "
,
.
79
D
"
/;: =>1
!
D
D
!
D
Æ
#
%
!
3
D
D
G
F
? !
'
, ?.@
, ?G . ? , ?. , .
,8<.
,? G . ? , ?. ,:>.
, ?. ,?.
@? , ?.
, .
,:0.
4
4
#
@
, . ?
,.
,:;.
'
@
,. @?
, . ,:7.
, ?. ?
!
2
@
,G . ?
, . , . ,:6.
4
&
, .@
, . "
,. ,:8.
,G . "
,. , . ,::.
@? +, . ,:=.
7<
+, .
! D
+@
, . ? +, . ,:9.
," .
? +, .
,:<.
!
D
"
D
, . ,=0.
Æ
D
##
,=>.
, . ,::.
,:6. !
D
+, .
,G .
"
+
4
Æ
K +
D
,@
. ?
# +, . ,=;.
! , G 0.
!
D
#
# ,. ,=7.
+ E
D
+
4
Æ
Æ
)
D
* !
-
!
D
6>
%
* +
,
%
,87.
#
E
#
2
-
CE5
,8:. !
,0.
!
,
.
# ,.
#
# : ,.
:
#
&
-
,86.
: ,.
#
&
#
,87.
" # ,.
-
#
! , G 0. !
&<( ! ) -
,.
&
D =
, . $ %C , .
"
0 L " > " >
" >
? 0 ; 5
"
'
?
@
" ; 0 " > " ,=6.
& 07
#
F
,0 0.
?
?
? ; 0
,> >. , >.
-(
B
,
D'
1 ( ,
1
1
!
B
?
"
! , G 0
.
! , G 0. , .
2# !
, .
,! 098 /=<1. 5 ! , G 0.
#
, .
!
! , G 0.
, . ,
! , G 0. ?
! , G 0. ? , .
.
!
I
0>
-
, .
#
! , G 0.
, . F
@
"
I
=
, . ? %C , .
& 06 4 , G 0.
"
,
! , G 0.
%C , .
!
(
7
+E9
!
.
,
. !
! , G 0.
%C , .
6;
-
,
?
/ 1
#.
#
F
,.
5
"
,
"
.
"
!
#
&
5
)
*
G
P
/8;1
#
4
#
R &
4
/0>01 4
#
#
2
"
"
#
!
! $
#
G
5
=
&
? , . %
$
? > 0 0
,79.
, . ,
, . ,, %.
, .
@
,G .
, . L
, . ,=8.
< !
= +9"-
67
, . @?
, . , . @?
, . ,=:.
!
, . , .
, .
, .
,6>. &
%C , .
3
%
5
,6>.
4
'
%
#
#
,
5
6;7.
@
0 , . @?
, .
, . ,==.
6 , . @?
, .
, .
+
A
H
!
0 ( ,.
,. &
5
6;7
#
0 ( ,. ? ,.
+
' %
$
,==.
0 ( ,. ? ,0 .0 , . L
0 , .
!
$
R /0>01G
,=9.
3 47
7
A
$
0 (
#
%C , . E
5
607 "
"
#
%
%C , .
66
F , . , .
D 9
@
F+-%F, . ? " > , . ? 0
, . ? , .
%
F+-%F, .
#
,5 D 06
F+-%F, .
%C , .. &
%C , .
F+-%F, .
+
B
(
.30+.
+E9
%C , .
& 06
! *
&
&
-
0 ( #
F+-%F, .
@
L 0 , .
6 , .
,=<.
)
%
,
.
$
,$3I.
2
%
5
8
#
#
$
&
Æ
5
#
"
F+-%F, .
"
%C , .
"
$3I
68
3 #
( (
7
4
$3I +
F+-%F, .
(, L .
%
$3I
F
F !
R /0>01
#
,=.
F
* &"
)
, . ? >
.
5 , . $
, . @? , .
, .
$
&
#
,=<.
&
+,G 5. @? L 0 , . 6 , . L
5 , . , . L 5 , . , .
+
)
, 5 .
+
, .
, .
, .
? <
L , . 5 , . 5 , .
,90.
E < <
'
"
!
*+,G 5.
, . ? > (
,90. #
,90.
@
, . ? < L , . L , . L , . L 5 , . L
5 , . ,9;.
5
#
, . L , . L , .
, . , . ,97.
E
"
#
A
#
!
'
F
, . ? , .
(
,97. ,97.
, . ? , . L ,. , .
,96.
& (
6:
#
,=. $
I
00
,. &
I
00
#
G
,96. +
#
#
/ :91 !
P
/9<1
+
/ 0>7 <91
#
,. 4
/=; :<1 $3I
#
0 (
%
$3I
#
,
.
3 <
%
!
"
#
/<:1
'
5
;8;
+
'
@
(
'
"
,6:.
#
Æ
"
!
#
"
@
* " ?
$
$
(
+
, . !
I
0;
#
, . %
#
/ 0>7 <91
$3I
Æ
,. I
0;
@
#
,G .
#
? , ,. ,..
,.
!
#
@
,G ,.. @? , . , . ,9:.
, . , .
! '
,
I
0.
,.
#
,G ,..
"
!
-
$3I
2
, .
,G .
#
+
4 /<:1
%
& 08
F+-%F, .
$
%C , .
,G .
%
$
F+-%F, .# %C , . I
0;
@
F+-%F, .
,G .
#
!
&-
7
% ) 4
#
69
@? >8 >8
2
2
@? >89 9 >89 9 1 3 1 3
-
&#
0:,. 9 /> >81
, .
F+-%F, .
9 /> >81
&
9 ? >6
, .
,.
Æ
"G
>78
/> > >1
/0 0 01
>>8
A
9 ? >0 !
,. +
,
.
, . , .@
,
>9.
, .
,
>;. !
( %C , .
"
5 D ;6
5
<
4
,G .
#
I
0;
, . ?
, . , . ?
, .(/ , . , .1 A
#
,96.
"
4
?
@
, . L , .
, . ? ,, . . ? 0
!
F+-%F, .# %C , .
#
#
,G .
F+-%F, .
$
@ ,.
0 ( ,.
#
6<
F+-%F, .
"
$3I
"
$3I
R /0>;1
!
$
'
%
'
!
'
%
- .+
% ? , .
"
G
? 0
!
,
. !
#
/991
: !
@ ,. :
!
: # !G ,. !
: ! # :G ,.
: !
4
G
,
.
,
.
8 *
%
,.
4
G
E
& 0=
12
1245 2356
1 2 25
14 23 123 234
45 5 56
4 3 23
58
34 4578 5689
,. ,. ,.
9
6
+
+
+
%
G ,.
I ,.
" '
"
5
,.
8>
!
@
,,!. @? : : # ! -,!. @? : : / ! ,9=.
&
! ? ,0;68. & 0=,. -,!. ? *
,,!. ? ,;8. ,68. ,8. 4
,,!. -,!.
,,!.
! -,!.
!
'
!
#
% !
0#
,
.
"
%
& 09,.
'
,0;68. ,68=9. ,;78:.
,;8. ,68. !
"
#
#
45 45 5 45 5 56
58
4578 4578 4578
,. ,. ,.
@
@
+
#
&
"
@
,. ? >(,(. ,9<.
(
D
,9<.
#
'
,<.
! ,. &
,
.
&
> , . ? , .
, . > , . ? , .(/ , . , .1
,9<.
"
,6:.
,. A
,0;68. ,;78:. ,68=9. ,;8. ,68. ,8:. ,89. ,8.
& 09,. +
> ? ++
+
?
> > 4
> ? (
I
? ?
%
"
,9<.
* &> ) )
6 , . <,!.0 , .
6 , . @? , .
> , . , .
, .
( (
( ( ( (
( (
$
! $
<,;. @? 0(,(. @?
=,; 4.
( ( (
#
(
-
,.
(
,( . ,<0.
( ¼
A
,=8.
C&
+
! $
C
%
G 2
/ 0>; =; :<1
&
45 5 56 45 56
58 58
4578 5689 4578 5689
,. ,.
0
3
¼
4
4
7 ¼
-
+
8
'(
¼ *
+
?
+
F
<,.
I
07
S
@
<,;. @?
=,; 4. ,<;.
-
!
!
<,;. ? 0 ,<7.
-
(
¼
9 &#
) !
7 7
& 0<,. !
! ,.
R /0>;1
,
,8.
. ! ,.
"
G
,8.
F
,.
,8. 3
,
"
.
86
<,!. ? =,! !. ? 0 % ;
"
@
,!.
-,!.
4
@
0
&#
) .
! $
( <,4. ? 0$
<,!. ? 0 ( <,4.#
&
<,!. @
2#
-
CE5
-,:.
=, 4.
4
: 4 (
=,! ;. ?
=,4 . ?
Æ, :. ?" 0
( ' - ( ' ' '
E ,.
S
,
% -. Æ, :.
,.
!
, % .
+
G
R /0>;1
&
G
: # !
@
(
"
"
@
F+-%F, . ? " > , . ? 0 !
( ( ( , . ?
( ' ,' . : # ! ,<8.
¼
¼ ¼
A
"
#
,60.
,<8.
,60.
%
F+-%F .
$
'
I
0
F+-%F, .
Æ
$
I
07
#
@
0,. ?
<,:.0' ,' . ,<:.
'
#
@
)
L 0,.
,<=.
!
#
"
$
,=<.
: &6' ) !
,<=.
#
& 0<,. !
#
0
<,. $
<,!. ? 0
, ! ? ,0;68.. 5
;#
F 0
<,:. ? 0
% F 0
<,8. ? 0
+
,0;68.
45 5 56 45 5 56
58 58
4578 5689 4578 5689
,. ,.
7
((
(
?
(
(
<)'F 6 <)'F
E
)4F2 'FDH F2 FH (
'F *
'F 7
<)'F 'FDH )F F2 FH
!
,<<.
,0;. ,06. ,;8.
,68.
4
,,0;68.
,,0;68.
& ;>,.
G
,,0;68.
' I,:.#, ,!.
,;78:. ,68=9.
I
00
,<<.
,
. F
F
#
,<=. "
#
"
F+-%F, . &
F#
&
R /0>;1 5
I
0;@
#
#
"
/<:1
$
#
%
$
%
Æ
!
!
@
,7;.
@
,.
%C , .
,.
&
!
;
,
.
!
"
,. ,.
$#
5
: =
/
%
5
:
#
5
? %C , .
'
4
,. ? 0,,G ,..
#
? , ,..
,.
, . ,. !
,.
89
#
@
,G ,.. ? , . , .
, ., .
$
,,..
,G ,..
4
#
,,..
#
,. !
"
#
C
#
@
0
&4
) %C , .
.
$
? , ,.. $
,. ,,.. #
!
%C , .
/ >
"
# F ,.
#
¼
"
#
,.
!
;,. ,.
,. ,.
5
F ;
@
?
?,. ?,. ? 0 ?,. " >
,0>>.
? @? I , .
, .
$
? ? 4
#
, . F /4 1
8<
4
!
@
, . ? 4 ? ? /4 1
,0>0.
%
&
7
7
) & ;0,.
,.
,.
5
#
f f f f
b b b b
e e e e
,. ,. ,. ,.
7
3
/
< 4
!
1 < ) 4
"
I 5
1 < 5 #
"
1 ) 4 1 < 4 "
?, . ? 0(7
+ ? 0 ; 7 -
;G
!
?- ? 0(7 5 4
? ? ;(7 &
:>
" >
?
+
@,7. 7 % ,0>;.
? &
? 0 ,0>;.
,0>;.
!
#
? " >
?
!
? 0
,0>;.
7 ? 4
,7. ? ; <,7. ? 0
@,7. ? 0
,0>;.
? 0 D
,0>;.
F ? ?,.
@
? ? ?,./4 1 ?
?,. /4 1 ? 0
Ì Ì
! ,.
0
/4 1 ? 0
/ ' &
&
4
A
$
* &47
7) .
, . $
0 , . ? 6 , .
+
, . )
L
,0>7.
.
, . $
$
? />
4
#
+
' +
,0>7.
$
,=<.
? ? 0
, .
A
? ? 0
#
,5
,0>;.
.
I
06
$
?
#
2+ $
, .
? ?,.
//4 1 ? ?
4 $ F ;
,. ,,.. !
,. / ,,..1 ?
0 , . 6 , . ,0>6.
:0
,6=. 4
5
0
+
, .
? ? ,/4 1. +
@
,.
0 , .
? 6 , .
!
Æ
/ ,,..1
F+-%F, . ? / >
4 !
,,..
,
#"
.
,.
&
5 T ,5. ? 5 "
,. T ,5. ? >
4
5 ,,..5 ? T ,5. ,,..T ,5. " >
T ,5. ? > A
? / >
4
5 ? >
T ,5.
"
!
5 ? >
5 / ,,..1 5 " T ,5. ,, ..T 5 / >
¼ ¼
$
,=<.
I
06 $
#
#
G
, . #
,
. 4
,0>7.
F
#
, .
!
"
@
# $
, .
.
, . ?
0
? , . L , .
,0>:.
, . .
¼
A
,0>:.
#
?
,0>:. 2
:;
&
, . ? 5
, .
0(? !
#
* ! &4; ) , . ? />
.
$
,0>:.
(
,0>7.
, .
+
, . ? , .
.
,0>=.
, .
, .
> , G .
, .
. .
, . ?
,0>=.
, .
. .
$
,
-
,0>:.
$
#
2# %
I
00
#
, , . " >
, . " >.
"
,
, . ? 0.
F
%
F 5 , .
"
, . @? , .
, . 5 , .
F@
+,G 5. @? L 0 , .
? 6 , .L
&
5 " F
#
"
/,5. @? )
+,G 5. $ I
06
F+-%F, .
/91 !
"
/
&
5
I
00
,.
,5. ? )
+,G 5.
/,5.
,0>=.
,0>=. 5
/
2
./
.5 ,5. ?
, . , . ) * ,0><.
,5.
"
,0>=. 5
"
,0>:.
#
,
.
/ 5 /
'
/91
/
*
+
#
$
5
=
I
06 08
:7
1 3 1 3 1 3
45 5 56 45 5 56 45 5 56
58 58 58
4578 5689 4578 4578 5689
7 9 7 9 7 9
&
!
Æ
&
" CE5
!
F+-%F, .
,<8.
4
#
I
06
@
, . )
/ ,,..1
,000.
,
,<.
. !
& ;;,.
,0;68.G
!
"
>(
#
,
"
. &
,006.
/=; :<1
#
) *
5
!
#
!
/ ; 861
!
/ 89 8< :> =61
4
4
#
%
"
4
#
4
#
5
= &
#
#
/<=1
, .
5 ? / 1
#
::
(
@
5 5 5
5 5 5
/51 ? / 1 ?
,00:.
5 5 5
%
#
E
? ;,.
;
#
,00:.
&
? /0 1
,00:.
#
5
!
,00:.
@
0
&"
) "
/51 #
2# 4
$ /51$ " >
$ 5
%
$ /51$ ? /$ $1 ? /$ 1 1
#
+
' F 6
5
#
5
Æ
#
/ 86 8>1 !
Æ
I
:
0 # ,
&
&
+
4
: "
C
D 7
Æ
! "
#
@? " @? ," " " .
#
" 4
"
+
#"
,G . ?
@
, .
,00=.
4
"
,00=.
? > 0 0
/
, ,. ? > ,009.
:=
%
#
/
0
+ "
#
!
#
"
@? "
0 E #
I
"G
"
#"
4
A
"
@? U " / >
#"
4
#
? " "
;
% ? 0 #
? " "
%
!
#
, 0.
!
#
, "
0.
?
, 0. ?
: . % 7
&
# " ? / 1
&
% ? 0
2
"
%
&
2
2
$ 2
4
%C ,2
. @? "
,00<.
"
%
%C , .
4
&
%C ,2
. $ %C ,2 .
/1
? "
,00:. D
/1 #
" 9
," 9.
/1 ? ,0;>.
5 "
0
"
" L 9
;
&
) 4
/1
> 0
"
? ,0 . +
@
0
/1 @?
,0;0.
:9
%
/1
?
> 0
,
0 ,0 0. .
/ 1 ? / 1 ?
,009.
4
/1
@
5HD& @? "
/1 0 > ,0;;.
!
/1 F 6
@
0
! &. 7) %C ,2 .
)
!
F 8 ,
.
D 06@
2
%C ,2 .@
? >8
? 0 ; 7 ? ? >6 ? >0
,A
D 06
. 4
@
0 >8 >8 >8
/1 ? >8 >8
>8 >6 >0
>6 >8 >6
>8 >0 >6 >8
%
( 5HD& % F 8
( %C ,2 .
: *5
F 8
5HD&
%C ,2 .
#
!
, .
@
T,+(!,2 .. ? "
, . ? B
B +(!,2.
,0;7.
4
F 8
T ,5HD& .
%C , .
4
@
5HD& ,
'
.
K !
D ;7
/1
,0;>.
; ;
0
!
,0 . " >
/> 01 A
%C ,2
.
:<
#
"
!
T0 ,5HD& . ? %C ,2
. !
, / ;.
!
&
5HD&
%C ,2
. $ %C ,2 .
! &#
7
) F
5HD& 1 %C ,2 .
, .
, ? ;.
%C ,2 .
5
"
5HD&
? !
F
@
0
/1 ? 0 > ,0;6.
5
J
?
I
,0 0.
,0 / >. ,0 ;.
/> 01 %
#
,; . L ,; . > 3 FE5
/ 1 / ,; 0.1 &
/> 01
" ,; 0. ,0;8.
! & ;7
%C ,2 .
,0 0.
?
? ;
,> >. , >.
%C , .
%
? ;
/1
# "
!
/1
@
/1 ? ,0;:.
4 ? 0
/1
,0;>. A
/1
" L 9
;
3 &> %
) !
#
, .
"
? 0
/1
= =
@
0
/1 ?
,0;=.
/1
? > 0
&
,8 =.
/, . , .1 ? / 1 ?
%
F 8
? 0 /1
%C ,2
.
/1
@
5HD& @? "
/1 0 > ,0;9.
A
? 0
,0;9.
5HD& #
,0;;.
F 8
@
< (
% '>?
@.
6
=0
0
3 .
? 0 $
5HD&
%C ,2
. # $
2
$
T,5HD&.
%C , . #
%
-
@
* 3 &4 %
) .
> 0 0 $
5HD&
%C , .
(
#
2# &
C ? ,, , .
1 ,. @? , . &
,.
@
,. ? % % ,.
?
/
,07>.
,%
&
"
1 ,. @? , . -
C
1 ,.
"
!
, .
1 ,. C
"
-
1 ,. C
¼
1
C
1 1 ,. ? 0
@
1 ,.1 ,.
C ? C
) ? ,1 .
) 0 >
Æ
1 C
'
/1 ? )
) 0 >
/1 0 >
+
' !
#
/1
Æ
"
&
/1
? ?
!
I
0:
8 -
%C ,2 .
, . > 0
/1
6 6
0
=;
0
/1 ?
,070.
I
,6 6.
" > I
,7 6.
" >
" > $
,; 6.
" > &
/1
/1 ? 0 L
,07;.
!
/1 " >
'
0 L " > ,
0 L
/1
/1
#
. !
D =
D ;8
7 7
/1
,0 ; 7.
/1
"
%C ,2 .
0 1 2
&
&
C
5
607
%C , .
,. 4
+
'
I
0
" !
"
: <
%
$
0
,0.
#
&
0
2
,2.
#
"
; F # /1
,0.
,0.
"
,0.
A
# /1 "
/1 $ 0 /1 2
"
(
5HD&# @? T# # /1 0 >
,077.
+
5HD&#
"
5HD&
,0;9.G
5HD&
5HD&0 &
0
'
5HD&# , . @? T 5HD&#
,076.
5HD&#
"
, . 4 0 ?
5HD&, .
#
5HD&
=7
: #%
?
4
"
@
* 8 &> ) .
$
" 9
,!. &
( @? !
0 /1 $ 5
J
/8>1
/1
$ I
0:
0 /1
"
,!.
! ! '
"
I
0
+
' ,. !
'
Æ
I
0
I
0:
,
" . !
I
0=
"
" L 0
/1
" (, .
,. %
? 0
I
0=
5HD&,.
(,.
"
%C ,.
: 2
5
5
=
F+-%F, .
,<8.
5HD&, .
&
F+-%F, . ? %C , . ?" 5HD&, . ,07:.
,.
I
0 ,.
I
#
0=
@
* 9 .
$
5HD&, . % F+-%F, . #
2# !
I
0=
!
% @?
,!.
/1
0 /1 !
"
,!.
!
5HD&, .
F+-%F, .
=6
+
' ,. F /:>1
5HD&,2
.
F+-%F,2
.
? 0
2
F
/891 5#%
/981
,. %
#
I
09
F+-%F, .
#
:
0'
:
0'
:
: !
5HD&,0' .
0 /1 % ?
,:. 4
@
F+-%F, . ?
% 5HD&,0 . ' ,07=.
'
:
,
Æ
:
.
9 -
F+-%F, .
$
4
2
%C ,2 . +
? , .
? >
, . (
: -
7
'
/
;61
2
!,.
!,. ;0
,. L ;
,;
4. ,07<.
=8
,.
!
,07<.
,
2
. !
> 0
+
0 +1 0 +1
,. ,.
7
3
> <
$ & ;
@
,
@? L
2 / 1 !
& ;6 %
2
0
8
)
!,. ? 0,.
!& 0
#
1
#
2#
F ,.
,
F
. D,. #
,
> 0 . F#
, @? ,. / >
2
!,. ? ,.
,. $
,
'
#
,.
> 0 (
!
!,. ?
,.
,. ?
D,.
D,.
,.
,.
,.
D,.
D,.
: 0
D
#
C
5HD&
,0;;.
4 > 0
& +(!,2 .
$
%C ,2.
5HD& $ 5HD& ,2 . , .
# )
+
, . L 0
,. L 0 /> 6 1 L
,;
4. ,06>.
0 ; 0; ;
=:
/> 6 1 , L 0. , L 0.
#
+
' !
+(!,2 . % 5HD& ,2 .
,. ,
,.L /> 6 1.
"
#
,06>.
"
Æ
/ <;1
2 )
5+
!
,;:.
!
;,.
$ !
8
%C ,2.
"
,G ,... F
> 0
,G ,.. -
#
? L &
F = 0,. ? !,.G
,07<.
0
,. ? 0,.
,. L
,;
4.
; ; ,060.
(
,. ?
,.L
,. ?
,.L 6
,. ? 6
H
/0(; 0(;1
-
,060.
0
0
,.
,. L 6 L
,;
4.
;
0
0;
0
;
? ;
/1 L 0; ,> 6 . L ;
,;
4. ,06;.
5
/8>1 &
,06;.
,;:.
0 L 0;
,. L 0;0 /> 6 1 L ;
,;
4.
, .
0
0
0
L
,. L /> 6 1 L
,;
4.
; 0; ;
+(!,2 .
+
' P
$
,=<.
#
#
,06>.
#
#
,06>.
!
7
#
5
6;;
#
/<=1
+ ,
!
#
, .
? /,.1
,G .
,G .
==
3 4
&
&
!
"
,G . A
,G .
, .
,.
4
C
, . @?
,. ,. ,067.
, M. A
9 / > &
9 M
9 / > 5
,.
(
9 L
,. !
# )"
# *
!
,9 .
,. 5 ,9 .
9 L
,9 .(9
@
4 .
M $
+
,. ?
?
" ,9 . ,066.
!
9
$
$
#
2# %
,.
,G .
D ,.@ F 3
,.
+
3 ,.,. ,. ,.
,.,. ,.
,. ,068.
5
,068.
£
,. &
,.,. ,. ?
D ,.@ $ I
;
#
!
M
,9 .(9
,-
98; /=<1. E
Æ
, .
(
#
!
077
C
/=<1
2
$ !
;
?
, . ?
! &
2
!
? >
#
,.
,-
7;7; /=<1.
=9
+
' ,. !
6
#
"
, . @?
G
!
,.
I
= !
!
6
,.
5
AI#
!
"
/ < 6= =0 971 4
5
0>;;
!
6
, ,..
,G 9 .
,G .
2
.
9/> ,9. @? /,.1
$
,G 9 . ,G .
# %
M 4 , .
$
M
#
E/> 92 9 " 92
$
$
$
4 , . ?
£
@? , .
£
,9. ? >
$
£ #
2# &
9 / >
, . @? ,9 . &
!
6
M $ I
;
9 # L
2 , . ? /,.1 $
2
, .
4, .
,06=.
!
;68
C
/=<1
!
-
;
@
9 / >
Æ
"
, . ?
,.
"
, . ?
!
, .
, .
9 5
/=91
=<
3
&
+ *
!
6
!
5
6
%
= -
C
"
,
.
5
60; 6;; !
? L
, L 0. , L 0.
@
&, . @? ',> . ',, .. 0 ' ,.
,. @? ',. ,. ,069.
E ',. @? / 1
#
,. ? / 1
#
!
&, .
C
D 6 M ? , . >
I
:
"
? ,. & > &
F F @? , F . &
5
/<01 "
'
#
,
5
<.
!
;
(
I
:
#
" ,. 0 > $ 5
/8>1 F
,. ',.' ,. 0 > !
&, . ,.
,5HI.
4
M
5HI
', . ? /, .1 ', . , . ? ', .' , . ,06<.
', . $
!
"
"
A
$ ', .
!
6 & ;8
? 0
!
,06<.
# /0:1G 0 >
" >
0 > %
@? ,. ',.' ,. 0 > @? , . & >
, . ,.
, . ',.' ,. (
&, . ,. ? ;', . ',. L , . ,.
;', . ',. L , . ',.' ,. ,08>.
+
,08>.
',.
', . @? /, .1 ', . !
, .
,06<.
9>
7
B
% 1 <
1
>
1 >
9
B
#
E
#
5
0>77
=
C
5
607
"
"
!
,79.
4
%C , .
"
5 %C , .
,066.
,FI. %
!
6 & ;:
"
D
? ,.
!
"
%C , .
%C , . %
/<1
%C , .
& ;:,.
%I
£
,.
#
%I
FI
%C , .
4
%
5
607
%C , .
!
D 9
#
? , ,.. F , . , .
,
,6>..
5
90
%C , .
£
%C , .
,. ,.
9
B
+E9
!
+E9
%
,
£
,
%
(
@
, . @?
, . , . @?
, .
D 9
%C ,.
"
@
F+-%F,. @? " > , . ? 0
, . ? , .
% !
6
#
@
?
, . , . L
, . , . ,080.
F
F+-%F,.
(, L . ? (,.
I
FI
,080.
/<1
+
,080. #
5
;80 C #
)
*
, .
!
@
, . ?
, . L , .
, . ,08;.
A
,08;.
#
,96.
I
00
"
9;
!
#
!
;
,6<.
F+-%F,.
,
5
6;7. %
! ; 9 L ! 6
,.
,G.
#
9 L #
;
(
7 (
%
&
I
00
#
F
!
#
& ;= A
!
6
#
,080.
)"
# *
#
,6<.
,.
!
6 !
,G. & ;=
+
#
, . @? , .
, .
$
&
H # &
( +
H @? " >
, . ? 0 , . ? 0
,087.
/
&
+,G 5. @? L
5 , . , . L 5 , . , .
,086.
.
,08;. $
5 @?
* /,5. #
2#4
@
4
97
'
@
, . , . L
, . , . L , .
4
@
/,5. @?
) 3
, . , . L
, . , . , .
,088.
5
@
, . ? , . L 5 , .
,08:.
, . ?
, . L , . L , . L
5 , . L
5 , . ,08:.
!
H
,088.
/,5. ?
, . L
/ , . , .1
%
#
F
5 ?
4
#
,08;.
5 @?
, . ? , .
/ >
4
5
(
#
, . , .
, . , .
!
@
/ , . , .1 ?
/ , .1
, . ? , . , .
-
5 @
/,5 . ? , . L
, . , .
?
, . L
, . L , .
,08=.
,.
,08:.
A
, . @?
/ 1 , . ?
/ 1
/1
£ £ £
/ 1
?
G
£
, . , . L
, . , . L , . ? , . L
, . L , .
/,5 . !
/<1
, 5 .
#
+
' %
I
0<
96
3 5
4
!
6 !
"
"
%C , .
= +
?
,.
!
6
#
"
#
!
;,.
"
#
%I
* =
, . @? )
,.
,089.
+(!, .
%C , . $
# %
(
$
+
?
, .
, . @?
)
,08<.
2#
? G
+(!, .
? -
@
, . @?
,99 .
, . @?
,99 .
$ !
6
, . ?
5
,089.
'
#
,. @? L
( +(!, . 4
,2.
+(!, . %
'
#
G
!
077
C
/=<1
G
, . ? ) !
, .
, .
+(!, .
%C , .
+
' +
)
+(!, .
%C , . A
#
"
#
I
;>
&
$
#
I
;>
4
98
= 0
4
,FI.
%C , .
#
&
"
FI
G
/6= =01
!
Æ
FI
,79.
5
607 !
F+-%F, .
D 9
%C , . C
D 06
%C , .
#
!
F+-%F, .
@
,. ? )
,0:>.
5
F+-%F, . , %C , ..
CE5
,0:>.
-
/<1
,
.
F+-%F, .
%C , . %C , .
4
F+-%F, .
"
!
"
F+-%F, .@
* " %C , .
F+-%F, . #
%
$
$
F+-%F, .
%C , . #
2#
, ? > 0 0.
C&
F+-%F, .
? L !
F+-%F, .
" >
"
"
"
,
,60.. $ I
=
%C , .
1
C !
/1 1 , . ?
, . /1 1 , . ? , . , . !
1
F+-%F, .
9:
Æ
/<1
F+-%F, . 1
&
C , . ? >
% #, , . , . ? >
,% -. , #, . , #, . %
? , 0. L , 0. %
"
"
1
,
. E
"
? 0
, .
%
L
F+-%F, .
1
A
> 0
!
F+-%F, .
/=<1
%C , .
%C , . %
G
C
%C , . I
=
!
FI
,0:>.
F+-%F, .
@
,.
%C , .
#
,0:>.
,.
F+-%F, .
%C , .
,0:>.
& ;9
!
,.
%C , . !
F+-%F, .
F+-%F, .
: E
> 0
2 -
& ;<,.
?
2
9 >
? > 9
1 3
,. ,.
1
<2>
%
>
3
>
4 9 # >
?
9=
9 / >
?
4 9 / >
, .
,0:>.
9 # >
!
,0:>.
&
? > >
? 0 ; 7
9 ? 9 # >
, .
& ;<,. +
> 0
?
, .
!
%C , .
#"
9
, . ? # >
, .
9 # >
+
F+-%F, .
@
>
@? >8 >8
@? >8 >8
>
, .
" /> 01 D
"
" ? " ? " 0 " ? " " ? >8
!
+
' ,. ! #
,0:>.
#
F+-%F , .
5
= !
FI
> 0
/06 981
,. & /79 7=1 #
,0:>.
#
,FHI-.
!
FI#
#
, #
.
#
,.
I
00
I
0<
/<01
"
'
5
#
"
/ 6= 6: :71
/ 8< 89 =61
&
#
D 7
@
L
,0:0.
4
2
? >
, . ( $ !
6
,0:0.
%C ,2 . 5
Æ
"
#
5HD&
,0;;. H
@
0 "#$
,0:;.
CE5
,5HI. C
5
0>;0
&, .
,.
#
,5HI.
@
? 0 &, . ,. ,0:7.
"#$ ;
!
#
, .
,
'
? 0
.
4 5HI
,0:;.
%V#-(!
4
/6:1
%
5HI
, .
5
/ 881
+ #
"
'>A#2B
*
"C" . /
/
9<
-
//
11
<>
, .
* )
$
M# %
,.*
+
$ , . L
M
#
2# !
G
$
/091
F
#
&
J
/9>1
2
'
#
/9>1
!
"
M $
M @? " L ,0 .
M
/> 0. ,
! :0 /=<1. (
#
2
M
,-
0.
# 0
,
. " , . L , .
C# -#5"
, . ,
.
, . A
0 FE5
#
-
CE5
G
, . L
2
)
$
#
2# &
,;>. E
, .
M
,! 670 /6<1.
$
3
$ ,. ? 3
#
#
!
/$ ,.1 ? $ , .$ / >
$ MG
E
M
/6<1
* )
N
#
2#
?
,G . ? ,G . &
N, . ? N, .
N
#
#
-
-
0 &
, . , . / >
?
N
#
#
4 ! )
N
,
- N,M. ?
,# #$ -#
+
' !
Æ
%
#
#
5 % $
2 )
6+ 4
$
<0
-
,
. +
#
/
6< =<1@ ,.
# /, .1 ? , .G ,.
> , . 78
#"
*
* / >
N,M. % @ $
Æ
> N,M. F
M
N,
. ? > 4
#"
*
* / >
> , .
,.
&
*
M
Æ / >
,
L Æ*. M
(
2
M N,
. ? >
,
L Æ*. / ,
. L N,
. Æ* ? ,
. 5 Æ @? N,
L Æ*.
,
. / ,
L Æ*. L Æ Æ* !
'
Æ Æ * / ,
L Æ*. ,
. / >
5 Æ N,M. % *
> , .
% N,M.@ %
>
!
M
N, . ? , . ? > $
, .
!
, . ? L
L
&
#"
* E / >
04
2 @?
* ,. " E 5 >
#
Æ
E / > +
* ,. >
#
* >
, . $
> ( , . ? , .
&
M
,
L *. "
L * ,. ,. " E L
,. ,.
#
& #
'( )
, .
A
,
. /
,. / >
,04
2. / >
E
,
L *. ? L
*
D
@ &
Æ
F N N
$
'
N
$
4
N
4 3
,
- .
$ ,.
N ,.
# )
.
&
+
0,,G ,...
,. ? ,0:8.
L (
<;
.
@? # $
,. ?
/0,,G ,..1 $
#
,- %
$
+
, . ?
,. ,0::.
-
,. ( @ F
? ,. # L
2
4
.
&
I
;
#
&
!
8 ,M. ? $
-
;:60
C
/=<1
% %
5
? ?
!
2
,. ? L
(
-
,. # @ 5
'
#
!
,.
#
,. &
I
;
#
, . ?
, . ?
! %&' ,. I ,.
?
£
,;:.
4
* .
M ,7;.
$
+
,
-
#
,-
M?
&
(
#
<7
2# ,. !
$
3
$ ,. ? 3
# $
$ ? 3
#
,. $ !
6
5
0>
!
, .
!
#
F
" , .
,I
7;=G /6<1.
?
,. & > ,0:=.
2# $ 5
/8>1 ,L0. ,L0. ,.
,. ',.' ,.
$
%
-
,.
',.
,. ',.' ,.
* 8 %
$
$
?
$ 3 , ' ,0:9.
' # $
@? ,. #
2# !
,77.
,.
C
,.
!
#4
/=< 6<1
,7:. $
? ,.,.
? ,. , . / >
E
* 9 )
# $
+
,
- 1
$
,. ? N ,.#
,- #
,- .
$
5
I
;
<6
,Æ
/ 0
%
!
S
#
,
.G
5 /991
! I,: !.
(
@
I,: !. ? 0 : ! ,0:<.
>
! S
=
"
=,: :. ? 0
: + =,: ;.
;
: ; # ! @
=,: !. ?
=,: ;. ,0=>.
- '- 5(
Æ,: !.
0
9 &"A
7 (
) & W,!.
!
X
# '
+
,,!. @? : : !
!#
)
=
7
#
<8
1
$<& 6 +8
E 5
B
'2A4)FI4'4
)>>>
$)& - 7 +
B
# /
0 - J -J <K22
$4& 6 +
*
@
L
<> )A4FDI4HF <KH)
$'& 6 +
# -
+
6
/(
E7 )>>>
$F& 3 5 ;@(-
G
0
<KDH
$2& E M ;
+
/ - J <KH)
$D& * ;
< + 6
;
+ <KKF
$H& * ;
+ 6
;
+ <KKF
$K& * ;
M
+ 6
;
+
<KKD
$<>& M ; 3
!
"
#
$ 'H 4A)FKI)DK <KH2
$<<& M ; / M 9 6
;
!
"
$ FF <A)FI4D
<KK4
$<)& * ;
+ J - 7 M . *
+
!
%
"
4AKK4I<>)) )>>4
$<4& # ; +
&
<<A<I)< <KK4
$<'& 5 ; / . #
/(
B
E N
)>><
$<F& M ;
+ .
&
6
(" - J -J <KKK
$<2& 6 ; . " &
0 !
6 N
(
/ + 0+ )>>)
$<D& . ;
(
'" &
( DA<K<I)>' <K2D
$<H& . ; )
7
6
# 0+ <KH2
$<K& 0 ; ; 6 ?
%
&
* $
HA4'2I4F' <KKH
$)>& J 0 6 + O
(
* +
# # -
6
0
3 N
/ <KHH
$)<& * 0
3 N
/ 3 <KHD
$))& 6 0 3
*
"
%
HA)FI)K <KHK
$)4& # 0
0 5 .
E . E
7 / 0
+ <KK>
$)'& 0 M
M G
6 - J <KK<
$)F& 9 0 + M
( FA)FI4K <KH4
$)2& 7 0
BP
9 P
7
B
7 5 M *
"
<KH'
$)D& 7 0
BP
7(
B
(# 4 <A<'2I<FH % <KDF
$)H& 7 0
BP
+
* E
@Q
B
<D 4A<'>KI<'<4 6 <KHK
$)K& M - * * E
@ 9
B
(
,
'4A<'D>I<'H> <KD)
$4>& + / *
+
,
& )A)FI42 <KK)
$4<& M *
67+ /
<KKD
<:
$4)& + / *
- .
* ; E
5
!
"
$ 4KA<I4H <KDD
$44& *B . -
6
(" - J <KKD
$4'& E *
6 5 + ? 9
$
0
N
/ 0
<KKH
$4F& M 5
( <A<)DI<42 <KD<
$42& ; 5
2A42)I4D2 <KDH
$4D& M %
* E ? M G
.
(
7 (
&
& # &
& 3 )>>)
$4H& M %
* E ? M G
N
7
&
)>>4
$4K& M %
5
*-+ ,A
!
<DA42HI4D2 <KH<
$'>& G %
5 0 /B 3 0
.
(
!
&
. '> <A)FI'D )>>>
$'<& G %
J G
3
(
'DAD42ID'' )>><
$')& E 9 9 %/,
0
7 / 0
+ <K24
$'4& + 5 9 + % 6
6
(
!
HFA4KHI'>K <KK>
$''& 6 9
* 9
6
9
;
(
2AD)<ID'< <KH'
$'F& O 9
M %
%
)KA)'FI)D4
<KKD
$'2& R 9
* / G
7
!
& ')A<<<FI<<'F <KKF
$'D& 9S
. .P
B + 6
8 -
6
(" ;
9
<KK4
$'H& # M 6
&
-
(
<)A4FDI4D> <KH>
$'K& M #
(N 0 .
P
&
<
6
(" - J <KK4
$F>& E + # 0 E M 0
N
/ 0
<KHF
$F<& R # + + #(G # E E 0
%
(
/
# -
J )>><
$F)& 6 M
7
1
7
/ )>><
$F4& M O 9
6 M . 6 +
7 %
<>FI<2< 7 / <KKK
$F'& 6 ?
G 6
2
# / 7(
/
- J -J <K22
$FF& # ?@ #
(
(
T !
& )KA442I
4F> <KKK
$F2& + ? ; . 9 #
8 5 . . 6
/
A +
!
$ 4>F 4AF2DIFH> )>><
$FD& % ?
; % 7
&
<2 )A)<KI)4> % <KKH
$FH& M ; . +
,
>(<
! * <)ADF2ID2K )>><
$FK& M ; . 9
B
!
* << 4ADK2IH<D )>><
$2>& . +
6
(+
.P
B(6
8 .
>(<
*
"
)>>)
<=
$2<& 6 . .
B -
3 N
/ 3 <KK2
$2)& 6 . .
B * M 6
.
!
"
$
F>A<FFI))' M <KHH
$24& . .B 3 6
7()FA<IH <KDK
$2'& .
B
+ 6
* 6
7
(
'DAFHFIFKH % )>><
$2F& . G ? + G
E
(
'< <)A44DDI44K2 *
<KK4
$22& 0 *
# 6S
B ) %
(
7 /
0
+ <KKK
$2D& M +
@ . / 7 M
(,
(
B
*
6
N
0
)>>4
$2H& E 5
* ? M 0
/Q
!
& <2 )A<'>I<F) % <KKH
$2K& E M 5
J
;
7 * 9
M E
/0 7
+
)>>)
$D>& E - 9 5 #
+
8
7 %
-
7 / 0
+ <KKK
$D<& 9 . -
. + G
G
(7
-
J <KKK
$D)& / /B " +
7
B
7 &
)>>)
$D4& 9 /
1
+
(G <KHH
$D'& / /
6
*
,
"
)>>)
$DF& M / (
?
6 <KHH
$D2& M 6 / M #
9
$ <KA)<KI))D )>>4
$DD& E
E N
(
(
'DAFKKI2<H % )>><
$DH& 0 / E 9 0
&
6
6
(
" - J -J <KKK
$DK& 9 E &
/
N
/ /
<KD>
$H>& # E "
/
(# - M <KHH
$H<& . ? 6 M 5
7 ( 3
'H2I'K) 7 / <KK2
$H)& . ? 6 7 M ;B
7
(
4 7 / 0
+ <KKF
$H4& + 6
8
G
(7
6
*
(
G
-J <KHK
$H'& 9 E 6 / / 6 /
1
)A4)DI4F) <KK>
$HF& # * 6
G / +
+
B(
!
4A'<<I'4> <KK>
$H2& + 6
* # 0
,
7 (
&
& $ )DDI
)H2 )>>4
$HD& / 6 # ?
9
<' <A<4HI<F> <KH2
$HH& E / 6
< 0
N
/ 0
N?
<KKD
<9
$HK& 6
7 M .
9
7 (
'
1
<H 'K4IF>> + )>>)
$K>& +
+ 9
" +
+ ;
9
& <>A)FKI)2K )>>>
$K<& . " 6 ; 6
"
/ 4HA'KIKF <KK2
$K)& . " 6 ; 6 G *
B
,
! <KA'KKIF44 <KKH
$K4& 6 "P
# " / +
!
& * )F 'AKK>I<>>2 M <KHD
$K'& M G
6 M + 6 G
+/
A
(
7 (
& #
& & 3 )>>)
$KF& M G
6 M + 6 G
(
B
.7*6 /()FF' 7! +
M )>>)
$K2& M G
6 M + 6 G
(
B
(
'K FA<<)>I<<'2
)>>4
$KD& M G
7 M 6
N0 ; N0;U06*(4(<))2 M )>>4
$KH& G
J ;
B
A +
7
'
1
M )>><
$KK& G G
"
8
7 ' 5666 6 %
0+ )>>> ?
/
$<>>& M G
& )4AH'2IHF> 3
<KDH
$<><& M 6 J
G %
J G
9
B
7 ( 78
2HKI2KF 7 / )>><
$<>)& M 6 J
G %
J G
N
B
E E)>><())
5
E . M )>>)
$<>4& + J
000/
B ; ?
A 0
& <'A<2K<I<D)) )>>)
<<
Probabilistic Graphical Models
Michael I. Jordan
Christopher M. Bishop
Decision Graphs
In earlier chapters of this book we have developed extensive graphical machinery for solving the
inference problem, that is the task of evaluating conditional probabilities given evidence in the
form of observed values for random variables. For practical applications, however, the role of the
probabilities is an intermediate one in the task of making decisions, or equivalently taking actions.
In order to make decisions which are optimal according to some criterion, we combine the inferred
conditional probabilities with a utility (or cost) function. In the simplest case we have a single
decision to take, and the optimal decision is generally defined to be that which maximizes the
expected utility.
More complex decision scenarios involve multiple, sequential stages of decision making, in
which new information is revealed after each decision is made. In this chapter we review the
basic concepts of decision theory and then we demonstrate the role which graphical models can
play in arriving at optimal decisions for complex problems involving multiple decision stages. In
particular, we shall formulate the multi-stage decision problem in terms of directed acyclic graphs
having both decision nodes and utility nodes in addition to the usual stochastic nodes describing
random variables. We shall then develop an exact and efficient algorithm for finding the optimal
set of decisions together with the corresponding maximum utility. Our approach is based on an
extension of the elimination algorithm introduced in Chapter ??. We shall restrict attention in this
chapter to problems in which the random variables as well as the decision variables are all discrete.
1
2 CHAPTER 1. DECISION GRAPHS
(1.1)
Figure 1.1: An example of a utility matrix for the cancer treatment problem. The first and
second rows correspond to the states of the random variable ‘normal’ and ‘cancer’, while
first and second columns correspond to the states of the decision variable ‘no treatment’ and
‘treatment’.
allows us to choose the decision variable using the criterion of maximum expected utility. While
such a criterion is intuitively appealing, it can also be justified more formally from a frequentist
perspective as being the criterion which yields the greatest reward in a long run series of deci-
sions (for example in a gambling scenario). An analogous argument can be made from a Bayesian
perspective, in which case it applies also to the situation where there is only a single trial.
In order find the expected utility for the cancer example, however, we need to know the proba-
bility that the patient has cancer. For example, suppose that
‘normal’ (1.2)
‘cancer’ (1.3)
The expected utilities associated with providing treatment or not providing treatment are then
easily evaluated to give
where denotes an average over the random variable . Thus the expected utility is maximized
in this example by providing the cancer treatment, and the value of that expected utility is
.
We can represent the random, decision and utility variables in this problem as a simple directed
graph, as shown in Figure 1.2. The symantics of such graphs will be discussed more fully in Sec-
X d
U
Figure 1.2: A simple decision graph corresponding to the cancer treatment problem described in
the text. The circular node represents a random variable describing whether or not the patient
has cancer, the square node represents the decision variable describing whether or not the patient
receives treatmtent, and the diamond node represents the utility function . Arrows from the
random and decision variables into the utility node indicate that the utility function depends on
both and .
1.1. DECISION THEORY 3
tion 1.2.1. We shall subsequently use such decision graphs as the basis for a graphical solution to
the decision problem obtained by extending the elimination algorithm of Chapter ??.
Note that without loss of generality we can replace utility functions with cost (or loss) func-
tions, in which the goal is to make decisions which minimize the expected cost. A cost function
can always be defined as the negative of a utility, and so from now on we work exclusively with
utilities.
d1 X3 d2
U1 X2 X1 U2
Utility Result Symptoms Utility
of testing of test of treatment
Figure 1.3: Decision graph corresponding to the extended cancer treatment example discussed in
the text.
Without loss of generality we can exploit the ordering of the random variables to specify their
joint distribution as a factorization in terms of conditional distributions of the form
·½ ½ ½ (1.6)
where .
However, we must also impose a constraint to reflect the causality requirement that the value
of a decision variable cannot influence any of the random variables which appear earlier in the
decision sequence. This is equivalent to a set of conditional independence statements of the form
·½ ½ ½ ·½ ½ ½ (1.7)
Any consistent model of a multi-stage decision problem must respect these constraints.
Thus the probabilistic model is specified by defining the following set of conditional probabili-
ties
½ (1.8)
¾ ½ ½
(1.9)
·½ ½ ½
(1.10)
Although these represent the most general situation, in any particular model there may be variables
missing from the conditioning sets in (1.8) – (1.10) corresponding to additional conditional inde-
pendence properties. When we develop the extended elimination algorithm for decision graphs
in Section 1.2 we will seek to exploit these conditional independences in order to obtain efficient
computation.
The decision problem specification is completed by defining the utility function, which in gen-
eral can depend on all of the random and decision variables
½ ½
·½ (1.11)
1.1. DECISION THEORY 5
This utility function often takes the form of the sum of a number of terms, each of which depends
only on a subset of the variables. Again, we seek an algorithm which can exploit this additive
structure to achieve efficient computation.
½ ½ (1.12)
½ ½ ¾ ½ (1.14)
Given this set of tables any particular decision can be taken optimally since the required values of
previous decision variables and random variables will be known at the time the decision must be
taken.
The overall optimal decision is one in which the complete set of decision variables is chosen
jointly to maximize the expected utility. To do this the first decision is made optimally using (1.12),
and then this optimal decision ½ is used to find the optimal value for the second decision variable
using (1.13) and so on. Thus the overall optimum can be found from the decision mapping by
substitution to give
½ ½ (1.15)
½ ½ ¾
½ (1.17)
must be taken optimally. Thus to find we must in general consider all possible combinations
of settings of all of the decision variables, and indeed this turns out to be an NP-hard problem.
We can evaluate the decision mapping tables by a backwards induction process starting with
the final decision. Thus we can first determine as a function of ½ ½
by first
6 CHAPTER 1. DECISION GRAPHS
marginalizing the utility function over ·½ to define an expected utility function given by
½ ½ ·½ ½ ½ ½ ½
·½
·½
(1.18)
½ ½
½ ½
(1.19)
½ ½ ½ ½ ½
½ ½ (1.20)
and corresponds to a multi-stage decision problem over a reduced set of variables. Thus the above
procedure can be applied recursively until the complete set of decision mapping tables has been
evaluated.
The full solution to the decision problem to find the jointly optimal set of decisions is therefore
given by
½
(1.21)
½ ¾ ·½
·½ ½ ¾ ½ ½
½ (1.22)
Note that the quantity can only be interpreted as the conditional distribution of con-
ditioned on if the values of are set externally. If the values of are chosen to maximize expected
utility then they will depend on the values of and so in this case can no longer be viewed
as the conditional distribution.
Our goal is to evaluate the successive summations and maximizations given by (1.21). The val-
ues which are assigned to the decision variables will then yield the decision mapping correspond-
ing to (1.12) – (1.14), and the final value of the quantity in (1.21) gives the corresponding maximum
value of the expected utility. Our approach will generalize the elimination algorithm of Chapter ??
and will be based on the idea of commuting the summation and maximization operations through
the various factors in
in order to achieve efficient computations which act on small
groups of variables at a time. Before developing this framework, however, we consider a brute
force approach, called decision trees, which emphasises the nature of the computations which must
be performed.
serve a useful role in emphasising the underlying computations which must be performed.
Consider again the extended cancer treatment problem discussed in Section 1.1.2. The decision
tree is simply a graphical representation of all possible combinations of choices for the decision
variables and the random variables. In the cancer problem the first variable to be revealed is the
random variable ½ corresponding to the presence or absence of symptoms. Subsequently a deci-
sion ½ is taken as to whether a test should be conducted. As a consequence the result ¾ of the
test is revealed and then a decision ¾ can be made as to whether to provide treatment. The final
random variable ¿ describes whether the patient has cancer or not and is only known after both
decisions have been taken. The full decision tree corresponding to this problem involves many
possible paths through the states of these variables. Two such paths are shown in Figure 1.4. Note
Utilities
UA
Cancer
yes
X3
Treatment
yes no
d2 UB
Result
of test
positive no
X2 X3
Test
yes negative
d1 d2
Symptoms
yes
no
X1 X2
no
d1
Figure 1.4: Decision tree for the extended cancer treatment example discussed in Section 1.1.2. For
clarity only two of the possible paths from the root node to the utilities are shown.
that if decision ½ is set so that no test is conducted then there will be no test result to observe. This
asymmetry, which is exploited efficiently in decision trees, is discussed further in Section 1.2.5.
By following each path in the decision tree from the root to a leaf we can evaluate the corre-
sponding utility at the leaf. It is then a simple matter to find, for any given decision node in the
tree, the decision which will give rise to the maximum expected utility. Consider for instance the
decision node ¾ in Figure 1.4 corresponding to the decision on whether to administer treatment
in the case of a positive test result. The expected utility for the choice ‘yes’ for ¾ is obtained by
averaging the two utilities and , weighted by the appropriate conditional probabilities for
the states of ¿ . Thus the process of solving the decision problem using the decision tree frame-
work involves a backwards recursion starting with the final state utilities. So in Figure 1.4, we first
evaluate the marginal utilities at each of the ‘ ¿ ’ nodes by computing sums of utilities weighted
by their corresponding probabilities (conditioned on the previous variables in the path). Next, for
each decision node ‘ ¾ ’ we chose the maximum expected utility. These expected utilities are then
8 CHAPTER 1. DECISION GRAPHS
fed back to the ¾ nodes where again they are summed with weights given by the corresponding
conditional probabilities, to give an expected utility at each ¾ node. The optimal decision for the
½ node is found by choosing the maximum of the expected utilities at the ¾ nodes. Finally, the
value of the maximum expected utility is obtained by summing over the ½ variable weighted by
its marginal probability.
Note that the conditional probabilities required to evaluate this decision tree are given by
¿ ½ ¾ ½ ¾
¾ ½ ½
½
whereas we see from Figure 1.3 that the original problem is formulated most naturally using con-
ditional probabilities in the form
¿
½ ¿
¾ ¿ ½
In order to apply the decision tree technique to this problem we must therefore perform some initial
pre-processing using the sum and product rules of probability in order to recast the conditional
probabilities in the correct form. The requirement for such pre-processing is a general feature of
decision trees. As we shall see, the required manipulations are performed automatically when we
make use of the elimination algorithm, and so the need to pre-process the conditional probabilities
is avoided.
The principal disadvantage of decision trees, however, is that they exhaustively enumerate
every possible path and so fail to take advantage of conditional independences which might exist
in the model. This is resolved by moving to an elimination tree approach, discussed in Section 1.2.
performed over sets of low cardinality. The utility function itself may also comprise the sum of
several terms each of which only depends on a subset of the random and decision variables, and
again we wish to exploit such structure to achieve computational efficiency.
In the case of purely probabilistic models we were also able to choose an efficient order in which
to marginalize, and therefore eliminate, the variables. However, a key point to emphasise in the
case of decision problems is that the elimination ordering is restricted because it is not possible to
permute a summation with a maximization. To see this consider the cancer example described in
Section 1.1.1 whose utility function is shown in Figure 1.1. The value of the maximum expected
utiliy was obtained by first averaging over the two states of the random variable for each possible
value of the decision variable (in order to obtain the expected utility as a function of ), and then
subsequently choosing the value of for which the expected utility was greater. This gave an
expected utility of
. Now suppose that instead we perform the maximization first, and then
subsequently marginalize over . For ‘normal’ the maximum value of the utility
occurs for ‘no treatment’ and has the value , while for ‘cancer’ the maximum value
of the utility occurs for ‘treatment’ and has the value . Thus the expected value of this
maximum utility is given by
Thus the value of the maximum expected utility is different in the two cases, and also in the second
case the optimal decision on whether to provide treatment is dependent on the state of the variable
. Making the decision before averaging over corresponds to the case where the state of health
of the patient is known before the decision to treat is taken. This of course leads to a higher expected
utility since the decision is better informed.
Thus we see that summations and maximizations cannot be commuted. We are still free to
choose the elimination ordering of the variables within each group of random variables (and
similarly within each of the if these comprise more than one decision variable) but we cannot
interchange the groups.
dents of that decision node must appear later in the decision sequence than that decqision variable.
To see that this implies the conditions (1.7) consider the joint distribution formed by taking
the product of the individual conditionals given by (1.6). For any given decision variable we
can marginalize the joint distribution with respect to the descendents of . Now from Chapter ??
we know that if we marginalize over a random node that has no descendents, the joint distribution
of the remaining variables factorizes with resepect to the graph obtained by removing that node.
We can therefore work backwards from the leaf nodes and remove nodes one at a time from the
graph. Once we have removed all of the descendents of , the node corresponding to will
be isolated and so the joint distribution of the remaining variables will be independent of . It is
easy to see that this implies that all of the factors which comprise that joint distribution will also be
independent of , and so we have shown that the conditions (1.7) must hold.
Note that there is a closely related graphical representation of multi-stage decision problems
called influence diagrams. They differ from the directed decision graphs discussed here through the
addition of extra edges, called information arcs, leading into decision nodes which reflect the causal
consistency constraint. Specifically, for any decision node, the ancestors of the node in the direced
graph correspond to the variables (decision and random) whose values will be known at the time
the decision is taken. The causality requirement then translates into a requirement that the directed
graph be acyclic. Here we consider graphs without information arcs, but we must then maintain
separately a record of the partial ordering of the variables in the problem.
X1 X2 X3 XL
U1 U2 U3 UL
Figure 1.5: A graph having only random and utility nodes, for which the goal is to evaluate the
expected utility. This example motivates the use of potentials containing pairs of functions repre-
senting the probability and utility contributions separately.
evaluate the total expected utility. This requires that we evaluate the following expression involv-
ing summation over the random variables
½ ¾ ½
½ ½ ½ ¾ ¾ (1.24)
½ ¾
1.2. ELIMINATION ALGORITHM FOR DECISION GRAPHS 11
Clearly a very bad way to solve this problem would be to evaluate the function
for all combinations of variable values and then perform the successive summations. If each of
each term of the utility function, and then add the results together. This again is inefficient since
many of the summation operations would be common to the sub-problems and hence there
Now we exploit the additive structure of the utility function. Consider first the summation over
½ , and define the following quantities
½ ¾ ½ ¾ ½ (1.27)
½
½ ¾ ½ ¾
½ ½ ½ (1.28)
½
where the superscript denotes that a variable has been marginalized out. We shall interpret these
quantities shortly as messages passed between cliques of the elimination tree. Using (1.27) and
(1.28) we can write (1.26) in the form
¿ ¾
½ ¾ ½
¾ ¾ ¾ (1.29)
¾
We have therefore succeeded in eliminating the variable ½ . Note that this operation required the
evaluation of two different summations over ½ given by (1.27) and (1.28). These quantities must
be computed for all values of the variable ¾ . However, because of the conditional independence
property ¿
½ ¾ , corresponding to the absence of an edge from ½ to ¿ in the graph in
Figure 1.5, we do not need to involve the variable ¿ (or indeed any of the remaining variables) at
this stage, so the summations involve storage and computation only over the variables ½ ¾ .
For reasons which we shall discuss shortly, it is convenient to work with normalized expected
2 Since this problem involves only random variables and no decision variables, we can consider any elimination order
½
½ ¾
¾
½ ¾
(1.30)
This allows us to pull out a common factor of ½ ¾ from the braces in (1.29) go give
¿ ¾ ½ ¾ ½ ¾ ¾ ¾ (1.31)
¾
The key point to emphasize here is that we have to maintain separately two quantities resulting
from summations over ½ . The first of these, ½ ¾ , involves only conditional probabilities, while
the second, ½ ¾ , takes the form of an expected utility.
When we come to eliminate variable ¾ we can work in the joint space ¾ ¿ . From (1.31) we
see that this requires us to have access to the quantities ½ ¾ and ½ ¾ separately, otherwise
we would be forced to work in the larger joint space ½ ¾ ¿ .
We can continue eliminating the variables one at a time, working our way along the chain and
exploiting its conditional independence properties. The elimination of the th variable involves a
summation given by
·½ ½ ½ (1.32)
We now define the quantities
½ (1.34)
which we shall interpret in Section 1.2.4 as the two components of a clique potential, modified by
absorption of the messages
½ and ½ . The total expected utility, after summing out variables
½ to ½ , then becomes
·½
·½ ·½ (1.35)
Again there are two distinct summations over to be performed, giving rise to the two quantities
·½ ·½ (1.36)
·½
·½
·½
(1.37)
which, as we shall see in Section 1.2.5, are the messages transmitted to the next clique in the elim-
ination tree when the variable is eliminated. Thus we see that, in order to maintain computa-
tional efficiency, two types of quantities (and, as we shall see, only two) need to be maintained and
1.2. ELIMINATION ALGORITHM FOR DECISION GRAPHS 13
propagated.
Note that if there were only probability nodes and no utility (or decision) nodes, then we would
only need to propagate variables, as we have seen in previous chapters. They are updated and
marginalized using the standard elimination algorithm for probabilistic models as discussed in
Chapter ??. The computational saving arising from the use of pairs of functions in the potentials
stems from the additive nature of the utility function.
vertices.
Set the variable , and let all the vertices initially be unnumbered. While there are still
unnumbered vertices in the graph:
1. Choose an unnumbered vertex having the smallest number of unnumbered neighbours, and give it the
label .
2. Define the set
to consist of the vertex together with all of its unnumbered neighbours.
becomes fully connected.
3. Add edges to the graph as required such that the set
4. Eliminate the vertex from the graph and decrement by 1.
At each step of this algorithm the variable to be eliminated is chosen so as to give rise to the
smallest elimination set . As we shall see, the corresponding computation involves the joint space
of the variables in the clique and so it is here that the elimination approach for decision graphs
achieves its computational efficiency. This is, however, a greedy algorithm so is not guaranteed to
find the optimal triangulation (recall from Chapter ?? that triangulation is NP-hard).
Next we treat the sets
as nodes in a graph, and add edges between pairs of nodes so as to
form a tree. This is achieved using the following procedure
Algorithm 1.2 (Elimination tree construction) For
, if contains more than one vertex,
add an edge between
and , where is the largest index of the vertices in
(excluding vertex itself).
Finally, add an extra node, containing no vertices, and connect it to all nodes having a single vertex.
This procedure ensures that when we eliminate a variable, the remaining variables in that elim-
ination set are connected to a node in which those variables also appear. A consequence of this
is that the elimination tree will satisfy the running intersection property. To see this, note that for
each variable there will be a particular node at which it is eliminated, and that it will not appear in
any node which is nearer to the root than this node. If there is any other node in which the same
variable appears then, since it will not be eliminated in that node, it must appear in the next node
in the tree which is nearer the root. By induction the same variable must appear in every node
14 CHAPTER 1. DECISION GRAPHS
on the path which connects any two nodes in which the variable appears, and hence the tree has
the running intersection property. This is therefore a junction tree of sets of nodes. Note, how-
ever, although the cliques will appear as nodes of the tree, there will in general also be other nodes
comprising subsets of nodes in cliques. Thus the elimination is not a junction tree of cliques.
As an illustration of this procedure, consider the directed graph of Figure ?? in Chapter ??,
reproduced for convenience in Figure 1.6. Applying the triangulation algorithm 1.1 followed by
X4
X2
X6
X1
X3 X5
Figure 1.6: The graph introduced in Chapter ?? to illustrate the technique of elimination.
algorithm 1.2 we obtain the elimination tree shown in Figure 1.7. Note that are several possible
X4: X2 X2: X1 X1 : :
Figure 1.7: The elimination tree resulting from the application of the elimination algorithm 1.1 to
the graph in Figure 1.6 followed by the tree construction procedure given by algorithm 1.2. The
notation
associated with a node denotes that the variable is eliminated in that node,
and that the remaining variables in the node are
It is easily seen that this tree satisfies the
running intersection property.
elimination orderings which are consistent with the algorithm 1.1. We choose an ordering which is
the same as that considered in Chapter ??.
In order to extend the construction of an elimination graph to the framework of decision graphs
we need to take account of both utility nodes and decision nodes. Each utility node is associated
with an additive term in the total utility function. Recall that utility nodes have only parents and
no descendents. The moralization step will add links between all pairs of parents for each utility
node. After the moralization step the utility nodes can be deleted from the graph and each term
in the utility function can then be associated with the corresonding complete set of nodes on the
moral graph. As we saw in Section 1.1.4, the presence of decision variables places restrictions on
the order in which variables may be eliminated. This can be accommodated simply by altering the
triangulation algorithm 1.1 so as to consider only those elimination orderings which are consistent
1.2. ELIMINATION ALGORITHM FOR DECISION GRAPHS 15
with the causality constraints of the particular problem being solved. More formally we have the
following algorithm:
Algorithm 1.3 (One step look-ahead triangulation for decision graphs) Consider a moralized deci-
sion graph with the decision sequence ½ ½ ¾ ¾ ·½ having a total of
½ ¾ ½ ¾ ½ ¾ (1.38)
so that the probability parts combine multiplicatively while the utility parts are additive.
Sum-marginalization We define the sum-marginalization over a random variable as
Ë
(1.39)
where
(1.41)
will have a utility term which is equal to the total utility function. Note that there is no need to
introduce separator sets and separator potentials since, as we shall see, the solution to the decision
problem involves only one-way message passing (equivalent to a ‘CollectEvidence’ operation, with
no corresponding ‘DistributeEvidence’).
Next we define a message passing step whereby a variable is eliminated either by summation
(probability variable) or maximization (decision variable). Suppose is an elimination set of an
elimination tree, and let ½ be the neighbours of
which are further from the root than
itself. Then the action of absorption consists of the evaluation of messages for each of the sets
and the absorbtion of these messages by multiplication into the decision potential for clique
. Specifically
Å
Ë
if
if
is the elimination set of a decision variable
is the elimination set of a random variable
(1.43)
and
½
(1.44)
This absorbtion operation then forms the basic step of ‘CollectEvidence’ which is defined re-
cursively. When CollectEvidence is called at a particular node it first calls CollectEvidence in any
neighbours which are further from the root, and subsequently absorbs messages from them. The
1.2. ELIMINATION ALGORITHM FOR DECISION GRAPHS 17
solution of the decision problem then involves a call to CollectEvidence at the root node of the
elimination tree. This leads to a sequence of absorptions starting at the leaves of the tree and
working inwards towards the root. Once this has terminated the root clique potential will contain
the maximum expected utility, and the optimal values for all of the decision variables will have
been determined during the various max-marignalizations. We give a formal proof of this result in
Section 1.2.6.
One issue which is worth exploring in more detail concerns the definition of sum marginal-
ization given by (1.39) which involved normalization of the expected utilities, and hence which
requires divisions to be performed. In the general formulation of the decision problem given by
(1.21) there are no division operations, so it worth looking more closely at the motivation for this
definition. To do this we consider the elimination tree shown in Figure 1.8, in which the elimina-
tion ordering is ½
. This graph could be considered as part of a larger elimination tree.
X1:Z
X2:Z Z: :
XL:Z
Figure 1.8: An elimination tree used to motivate the use of normalized expected utilities in the
definition of message passing.
Note that in this instance there are only random variables and no decision variables. The key point
is that several different messages are to be absorbed into the same node . Associated with each
node of the tree we have initial potentials
for and
. Marginalization over the random variables to yield the total expected
utility corresponds to the following operations
½ ½
½ ½
½
(1.45)
In the approach discussed here we seek a message passing scheme such that we can absorb from the
nodes sequentially while preserving all the required information using only the two local potentials
and in each node of the elimination tree. To see how this is achieved, suppose we first eliminate
variable from (1.45). We require that the result of this be a similar elimination tree with the node
removed and with suitably modified potentials and in node so that
½ ½
½ ½ ½ ½ ½ ½
½ ½
(1.46)
18 CHAPTER 1. DECISION GRAPHS
½ ½ ¾ ¾ ·½ (1.49)
½ ½ ½ ½ (1.50)
3. Triangulate the graph using Algorithm 1.3 and then construct elimination tree using Algorithm 1.2.
4. Initialize the decision potentials on the nodes of the elimination tree as described in Section 1.2.5.
5. Call CollectEvidence recursively starting from the root node of the elimination tree.
6. Read off the decision mapping and the maximum expected utility.
We now prove that this algorithm does indeed lead to the correct solution to the decision prob-
lem. To do this we first introduce some notation. First of all we expand out the decision sequence
so that each of the probabilistic variables is represented individually
½
½ ½
½ ½ ¾½
¾
¾ ¾
·½
·½
(1.51)
such that the variables are eliminated in the reverse order to that given. Next we define
½½ ½ ½ ½ ¾½ ¾ ¾ ¾
·½
·½ (1.52)
to be the corresponding sequence of elimination sets obtained by running algorithm 1.3. We also
define the subsets
½½ ½ ½ ½ ¾½ ¾ ¾ ¾
(1.53)
¼ ½½ ½ ½ ½ ¾½ ¾ ¾ ¾
½ (1.54)
(1.56)
·½
·½
·½
so that involves only the subsequence of sum marginalizations and represents a conditional
probability, whereas involves the complete sequence of sum and max marginalizations and
represents a partially maximized expected utility. Armed with this notation, we are now ready to
proceed with the proof.
The key idea is to show that, after each elimination of a (probability or decision) variable, the
product of the decision potentials over the remaining cliques (corresponding to variables that have
yet to be eliminated) takes the form
(1.57)
Note that ½¼ since, from (1.55) this is just the sum over all probability variables of the con-
ditional distribution , and that ½¼ is the globally maximum expected utility. Thus (1.57)
implies that, once all of the variables are eliminated, the decision potential for the null clique in the
20 CHAPTER 1. DECISION GRAPHS
elimination tree (the root) will be ½¼ ½¼ ½¼ and so will contain the maximum expected utility.
Furthermore, we shall show that each of the max marginalizations is equivalent to the correspond-
ing max marginalization occuring in the general formulation (1.21) and so each of the decision
variables will be set to its optimal value.
We now prove by induction that (1.57) holds after each variable is eliminated. When the elimi-
nation tree is initialized, before any variables have been eliminated, we have
·½
·½
·½
·½
·½
·½
·½
·½ (1.58)
·½
where ·½
and ·½·½
, and so this provides a starting point for the
inductive proof.
Now suppose that (1.57) holds for some particular values of and , and that the next variable
to be eliminated is a probabilistic variable, . By definition we have
½
(1.59)
We now operate on both sides of this equation with the sum-marginalization operator for variable
. On the left hand side of (1.59) we have
Ë
(1.60)
½
½
½
(1.61)
Now consider the sum marginalization operator acting on the right hand side of (1.59). We can
temporarily de-clutter the notation by writing , ½ and . Noting that
does not depend on we then have
¨ ¨
Ë ¨
¨
¨ ¨¨ ¨
¨
¨ ¨
Ë
(1.62)
½
Ë
½
½
½
(1.63)
1.2. ELIMINATION ALGORITHM FOR DECISION GRAPHS 21
The left hand side of (1.63) represents the computation of a message in the clique and the ab-
sorption of that message in the neighbouring clique nearer the root. From (1.63) we see that the
resulting product of decision potentials over the remaining cliques indeed has the form (1.57). This
completes the inductive step over for a given value of in the case where we absorb a random
variable. For a given value of we can apply this result repeatedly until we have eliminated all of
the random variables in the group . At this stage we have the result
¼ ¼ ¼ ¼ (1.64)
¼ ½
½
½ (1.65)
We now operate on both sides of this equation with the max-marginalization operator for the de-
cision variable ½ . On the left hand side of (1.65) we can use the definition of the max marginal-
ization operator, together with (1.55) and (1.56), to obtain
where we have used the causal consistency requirement which implies that ¼ is independent of
½
½ and hence that ½
¼ . To see this note that ¼ is the product of conditional probabilities
of the form
¼ ½½ ½ ½ (1.67)
½ ½ ¾ ½ ¾ (1.68)
which, from the conditional independence properties (1.7) implied by the causal consistency re-
quirement, is independent of
½ . Again we can temporarily de-clutter the notation by writing
½
½ and ½ . We then have
¨ ¨
¨ ¨
¨ ¨ Å ½
Å ½ (1.69)
22 CHAPTER 1. DECISION GRAPHS
where
½ ½ and ½
½ . Combining (1.65), (1.66) and (1.69) we
have
½ Å ½
½
½
½ ½ ½ ½ ½ ½ (1.70)
The operation on the left hand side of (1.70) represents the evaluation of the message resulting from
the elimination of the decision variable
½ and the absorption of that message into the decision
potential of the next clique in the elimination tree. Again we see that this takes the general form
(1.57).
Thus we have reduced the index by , and so this completes the inductive step over , and
hence the result (1.57) is proved for all permissible values of and .