International Journal of Mining and Geological Engineering, 1987, 5, 121-130

SHORT COMMUNICATION

Modelling of cyclical stratigraphy using Markov chains
Summary
State-of-the-art on modelling of cyclical stratigraphy using first-order Markov chains is reviewed. Shortcomings of the presently available procedures are identified. A procedure which eliminates all the identified shortcomings is presented. Required statistical tests to perform this modelling are given in detail. An example is given to illustrate the presented procedure.

Introduction
Proper characterization of subsurface stratification is important in many disciplines, such as geotechnical engineering, petroleum engineering, mining engineering, mineral sciences, hydrology and water resources. Some stratigraphic sections show evidence of cyclical or recurrent sedimentation. Markov chains have been applied to model such stratigraphic sequences (Krumbein and Dacey, 1969; Ethier, 1975; Ali et aI., 1980). Two parameters, state and time, are needed to describe a Markov chain. In the application of Markov chains to analyse stratigraphy, the state parameter is used to identify different lithology types, and the time parameter is changed to a location parameter in space and is used to record the transitions among different lithologies in space. For a stratigraphic sequence to possess the Markov property, the lithology type observed at a location in space should depend probabilistically on the states the sequence occupied at previous locations. At this point it is important to identify the two extreme models which lie on either side of Markov models: (t) if the state of the system at any point in space can be predicted with 100% certainty, it is a deterministic model, and (2) if the state at any point in space is independent of previous states, it is known as a Poisson process. If the transition of a Markov chain depends only on the immediately preceeding state, the chain is a first order Markov chain. If the transition depends on more than one previous state, then it is a higher order model. If the transition model of a Markov chain does not vary with the spatial parameter, it is called a homogeneous or stationary chain. Homogeneous first order Markov chains have been used in modelling stratigraphy in the vertical direction. The papers by Krumbein and Dacey (1969) and Yu (1984) form the state-ofthe-art of modelling stratigraphy in one dimension. Two types of Markov chains have been employed in these studies. The first approach considers the stratification at discrete points that are spaced equally along a vertical profile. The points are numbered consecutively, and the use Keywords: Cyclical stratigraphy; Markov chains; geomathematics;computer modelling. 0269-0316/87 $03.00+.12 ©1987Chapmanand Hall Ltd.

122

Kulatilake

of the Markov chain is based on the assumption that the lithology or state at point n depends upon the lithology at the preceeding point (n - I). Because the same lithology may be observed at successive points, the transition matrix that gives the probability of going from one lithology to another generally has non-zero dements on the main diagonal. This type of Markov chain is known as a conventional or ordinary Markov chain. If stratigraphy follows a first order conventional Markov chain, then the thicknesses of lithologies should follow geometric distributions (Krumbein and Dacey, 1969). This important property can be used in testing whether a stratigraphy follows a first order conventional Markov chain. The second approach considers only the succession of lithologies, and because each transition is to a different lithology within the system, the diagonal elements are all zero. This Markov chain is known as an embedded Markov chain. In this case, the distributions for lithologic thicknesses need not follow geometric distributions. Thus, some stratigraphic sequences may be modelled by using an embedded Markov chain to describe the transitions between different lithologies, and using different probability distributions to describe thicknesses of different lithologies. Such a process is known as a semi-Markov process. Investigations performed so far on modelling stratigraphy in 1-D contain one or several of the following shortcomings: (a) have assumed homogeneity without performing suitable statistical tests to check the applicability of homogeneity, (b) have used conventional Markov chains without satisfying the requirement that the thicknesses of tithologies follow geometric distributions, (c) have not considered semi-Markov chains when they are more suitable than conventional Markov chains, or (d) have not used proper statistical tests to check the Markov property of embedded Markov chains, The stratigraphy modelling procedure suggested in this paper eliminates all these shortcomings. The sections which follow describe the modelling procedure. An example in the 'Application Section' illustrates the use of the modelling procedure.

Overview of the suggested procedure
The suggested procedure is applicable only to stratigraphy data which show cyclical sedimentation. The first step in the procedure is to check the homogeneity of the stratigraphic column. The test suggested by Anderson and Goodman (1957) for stationarity can be used in checking the homogeneity of conventional Markov chains. However, this test is not appropriate to check the homogeneity of embedded Markov chains. At present, no suitable test is available to check the stationarity of embedded Markov chains. Until such a test becomes available, the above mentioned test may be used as the test for homogeneity. If the test result indicates non-homogeneity, then further investigations should be carried out to separate the entire stratigraphy into regions where the homogeneity property is applicable. Then each homogeneous stratigraphic section should be analysed for the other properties explained below. Stratigraphic data generally fall into one of the following groups: (a) the observed data have a first order conventional Markov dependency in the succession of lithologies, and geometric distributions for all lithologic thicknesses, (b) the observed data have a first order embedded Markov dependency in the succession of lithologies, but they do not have geometric

Modelling of cyclical stratigraphy using Markov chains

123

distributions for all lithologic thicknesses, (c) the observed data have neither first order conventional Markov dependency nor embedded Markov dependency in the succession of lithologies, but they do have geometric distributions for all lithologic thicknesses, or (d) the observed data neither satisfy Markov dependency in the succession of lithologies, nor do they have geometric distributions for lithologic thicknesses. If combination (a) holds, then first order conventional Markov chains should be used for modelling. If combination (b) holds, then the appropriate model is the semi-Markov chain. For combination (c), an independent event model such as the multinomial model is suitable. Combination (d) cannot be modelled appropriately by either the conventional or the semi-Markov model. Tests given by Anderson and Goodman (1957) and Billingsley (1961) for the Markov property can be used to test the independence against dependence of states for the conventional Markov transition matrix. If the results of these two tests lead to the rejection of the null hypothesis, then it only implies the presence of some dependence between state transitions. If one wants to prove that the transition matrix follows a first order conventional Markov chain, then it is necessary to show that the distribution of thickness of each lithology in the stratigraphy sequence follows a geometric distribution. This can be checked by performing goodness-of-fit tests (Ang and Tang, 1975) on the thickness data. To check the presence of an embedded Markov chain, tests given by Yu (1984) seem the best. These tests originate from Goodman (1968) in the context of incomplete contingency tables.

Stratigraphy modelling using first order conventional Markov chains

Basic concepts
In the case of a stratigraphic column, observations of the state are usually made starting at the bottom at discrete intervals of vertical distance. Each interval represents one step in the index space of the conventional Markov chain. A first order Markov chain is one where the transition from state i to statej depends only on the previous state of the chain. If the number of observed transitions from state i to statej isf~j, then the tally matrix, F, is given by F = ~j]; i,j=l, 2,...m (1)

where m is the total number of states. The transition probability from state i to state j, p~j, is given by

Pij =fij
J

j

(2)

The transition probability matrix, P, is defined by P= [Pif]; i,j=l,2 .... m (3)

If the transition matrix does not vary with the location of the space parameter, then the Markov chain is stationary or homogeneous. The diagonal probabilities, p~, are related to the relative thicknesses of the lithologic units (Harbaugh and Bonham-Carter, 1970). The transition probability matrix is sensitive to the interval employed. If the interval is too small, the resulting

124

Kulatilake

p, tend to be one for any finite sample, with zero in the off-diagonals; if too large, some important layers may be missed. Therefore, one should carefully inspect the stratigraphic column before choosing an appropriate size for the interval. Thicknesses of lithologies can be considered as either discrete random variables (Krumbein and Dacey, 1969) or continuous random variables (Ethier, 1975). If they are treated as discrete random variables, then they should follow geometric distributions for stratigraphy which can be modelled by conventional Markov chains (Krumbein and Dacey, 1969). If they are treated as continuous random variables, then they should follow exponential distributions. So far we have considered only single-step transition probabilities. Multiple-step transition probabilities can be obtained by powering a single-step transition probability matrix. If a matrix of transition probabilities is successively powered with the result that each row is the same as every other row, the resultant matrix is termed a regular or steady-state transition matrix (Harbaugh and Bonham-Carter, 1970; Ang and Tang, 1984). The fixed probability row vector of this matrix provides the proportion of each lithology.

Testing for homogeneity
This is based on the test suggested by Anderson and Goodman (1957) for stationarity. The stratigraphic column is divided into T subintervals. Then the following statistic is computed.

S1=2L ~
,=1

i=~ j=~

ifq(t)l°g~FPiJ(t)~ k PU _]

(4)

where t refers to the tth subinterval and Pu is the transition probability for the whole sequence. If the null hypothesis of homogeneity exists, then S1 is asymptotically chi-square distributed with ( T - 1 ) m ( m - 1 ) degrees of freedom (DF). The significance level at which S1 equals the theoretical value given in the chi-square table provides the maximum significance level, (Benjamin and Cornell, 1970), at which the null hypothesis can be accepted.

Tests for Markov property of a conventional Markov chain
A test, recommended by Bitlingsley (t961), is the Pearson statistic, given by $2= ~ ~ (fij-ein)2/eij (5)

i=1 j=l

where eij is the expected number of i to j transitions under the null hypothesis that the state transitions come from independent multinomial trials. The maximum likelihood estimate of eu is given by

eij = (fiRfcj)/N
where f g = ~ fu
j=l

(6a)

for i= 1, 2. . . . m

(6b)

Modelling of cyclical stratigraphy using Markov chains
fc~ = ~ fij
i=1

125 (6c)

for j = 1, 2 . . . . m

and N is the total number of transitions. Another test, recommended by Anderson and Goodman (1957), is the likelihood-ratio statistic $3 = 2 ~ ~ f~j loge(f/j/e,j). (7)

i=1 j = l

Under the null hypothesis of independence both these statistics become asymptotically chi-square distributed with ( m - 1)z DF. If the results of these two tests leads to the rejection of the null hypothesis, then it only implies the presence of some dependence between state transitions. To show that this dependency is a conventional first order Markov dependency it is necessary to prove that the thicknesses of lithologies follow geometric distributions. Chi-square and Kolmogorov-Smirnov goodness-of-fit tests (Ang and Tang, 1975) can be performed to check this. Stratigraphy modelling using first order semi-Markov chains In structuring an embedded transition probability matrix from observational data, only the lithology transitions are tallied. Hence, the number of entries in the embedded matrix is smaller than for the equal interval matrix. As a result, the off-diagonal probabilities have different numerical values, but the relative probabilities for p~ where i ¢ j are the same in the two types of matrices. For semi-Markov chains, the thicknesses of lithologies need not follow geometric distributions. In order to use the semi-Markov model, the stratigraphy data should satisfy the Markov property for embedded Markov chains.

Tests for Markov property of an embedded Markov chain
In this case the Pearson statistic takes the following form (Yu, 1984)

s4=
i=1 j=l

(8a)
j--/:i

where

eij= aibi for i ~ j
0 for i=j

(8b)

Values of ai and bj are computed using the iterative scheme given below (Yu, 1984) Step 1: g o _ ¢ / ~ 6~j 1-JiR/
[j=l

f o r i = l , 2. . . . m

(9a)

= 0 for i = j where 6~j = 1 for i-¢j (9b)

126 Step 2n(n _ 1): > • bf"- ~=f~j
i m

Kulatilake
6~j i2"-z a f o r j = l , 2, . . m (9c)

Step 2 n + l ( n > l ) :

a2"=fis ~ 6,pf "-1
/j= 1

(9d)

The iteration can be repeated until the required accuracy is obtained. The maximum likelihood estimate in this case is given by (Yu, 1984):

$ 5 = 2 ~, ~ f01Ogef~j--2 ~ f/R1og~ai--2 ~ fcjlog~b~
f=1 j=l j=l j=l

(10)

jOi

Under the null hypothesis that the state transitions come from independent multinomial trials, both statistics approximately follow the chi-square distribution with ( m - 1 ) 2 - m degrees of freedom. Results should indicate rejection of the null hypothesis in order to satisfy the Markov property.

Application
Table 1 provides stratigraphy data from the Oficina Formation of eastern Venezuela (Scherer, 1968). These data were used to illustrate the modelling procedure given in the paper. The stratigraphy column given in Table 1 was divided into five equal subintervals to perform the test for homogeneity. The following results were obtained: Degrees of freedom =48; S1 = 55.5; ~ = 0.22. This shows that the homogeneity can be accepted at a fairly high significant level. This allows us to treat the whole stratigraphy column under one homogeneity set. Next, the tally matrices, transition probability matrices, and chi-square statistics for an ordinary Markov chain were computed for interval sizes of 60, 120 and 240 cm. Results are given in Table 2. The results clearly show the influence of interval size on the transition probability matrix and on the chi-square values. As the interval size increases, Pu decreases. In this particular example, the influence is most pronounced on lignite and siltstone. A careful inspection of Table 1 shows that both lignite and siltstone have pretty high frequencies for thicknesses less than 120 and 240 cm. For this example, interval size of 60 cm seems a pretty good choice. Values obtained for ct dearly show the strong rejection of the null hypothesis of independence. The transition probability matrix obtained for the interval size of 60 cm was used to compute the regular transition matrix. The results provided the following proportions of lithologies for the stratigraphic column: Sandstone=0.27; Shale=0.49; Siltstone=0.t2; Lignite=0.12 Frequency distributions for lithology thicknesses are given in Fig. 1. Thickness data were subjected to chi-square and Kolmogorov-Smirnov (K & S) goodness-of-fit tests for geometric

0" ¢b

~/00 ~. ~. ~.

hJ

4~

Ox

0

~"

.~

~

~

~

""

~., ~,. ~

~

~

o
N

~.~,

~

=

~
e.+

N N•

o o o o
P~

~"

~"

o

o

~

~ - ~

o

o

cg
C~ N

(3

~o
~o ~'

~P~ ~ = ~ o ~

..

~

~:~ ~-1,.,~

~0

•- ~ :

~."

o

g,~

N'

O~ O~

r~

~

,~.

p~

~.~
@

t,O --3

128

Kulatilake
Table 2. Transition probability matrices and chi-square statistics for modelling stratigraphy by conventional Markov chains. Interval size used for observation A B C D

60 cm A 104 5 5 10 A 0.84 0.05 0.16 0.20 B 3 74 6 18 B 0.03 0.73 0.19 0.35 247.6 9 < 0.005 251.0 9 < 0.005 C 8 8 13 2 C 0.06 0.08 0.42 0.04 D 9 14 7 21 D 0.07 0.14 0.23 0.41

120 cm A 84 26 9 11 B 22 171 21 29 C 14 21 13 7 D 10 25 12 15 D 0.08 0.10 0.22 0.24

240 cm A 31 21 4 11 B 18 72 10 19 C 11 7 3 2 D 7 19 6 4

Tally matrix

Transition A probability B matrix C D $2 DF $3 DF

A B C 0.64 0.17 0.11 0.11 0.70 0.09 0.16 0.38 0.24 0.18 0.47 0.11 171.4 9 < 0.005 162.3 9 < 0.005

A B C D 0.46 0.27 0.t6 0.11 0.18 0.60 0.06 0.16 0.17 0.44 0.13 0.26 0.30 0.53 0.06 0.11 33.4 9 < 0.005 32.9 9 < 0.005

~'
i=r

o.3o~
o.zs]
/

~

(a)
Sandstone (state

A)
observed

(c)
0.60

0.7.0
065-

--

Lignite (state D)

w 0.20 ~ r ~ ,,'-

.= O,Sli !1"
t=

theoretical (geometric)

055.
o,s-

Siltstone (state C)

0550.502

0.*54
b. .~ 0,35.

ell',

,i~l.l~ii',',T,,,.rn. rn _ o ,zs 25~o 37,5 5o.o ~.~ ~,.o e'k5 ~ . o ,~.5,2~,.o Thicknesscm

~" ~ o.,,oI~. 0.35-

o.so-' (b)
¢: 0.25-

.~ o.3o-

Shale (state B)

-~

n."

0.30. 0.250`20"

(~ 0 . 2 5 -

I1
_

o.o
ozoJ

o.o
o.Jo-

d
12.5 250 3"/5 50.0 Thickness-era

0.I 5"

0.I0- ,

ii

0.05-

~

I ,

0.05-

0.05- ~ i ,

0

I2.5 25.0 37,5 5 0 0 62.5 750 87.5 tO0.O 112,5 12.5,0 Thickness-era

0

0 Thicknesscm

Fig. 1. Observed relative frequencies and geometric distribution fittings on lithology thickness data from Table 1.

Modellin# of cyclical strati#raphy usin# Markov chains
Table 3. Results of goodness-of-fittests on thickness of lithologies. Lithology type Chi-square value DF K & S value DF Sandstone 3.78 7 0.80 0.04 15 > 0.95 Shale 8.05 9 0.50 0.06 18 > 0.95 Siltstone 6.36 4 0.19 0.05 7 > 0.95 Lignite 0.99 2 0.62 0.03 5 > 0.95

129

Using the tally matrix, $4 and $5 were computed according to Equations (8) and (10), respectively. The associated degrees of freedom and ~ values were also determined. Results are given below. $4= 14.3; D F = 5 ; ~=0.015 $5= 14.8; D F = 5 ; ct=0.012 The results show a rejection of the null hypothesis of independence. However this rejection is not as strong as the rejection indicated by Table 2. Therefore, it can be concluded that the conventional Markov chain is better than the embedded Markov chain to model the considered stratigraphy data.

Conclusions
The paper provides a procedure to analyse cyclical stratigraphy data. If lithology transitions satisfy the Markov property, then the stratigraphy can be modelled using either first-order conventional Markov chains or first-order embedded Markov chains. The Markov chain type which should be used depends on the structure of the stratigraphy. Statistical tests, which should be performed to choose the proper type of Markov chain, are given in detail. If lithology transitions do not show any Markov dependence, then the stratigraphy should be modelled by an independent multinomial model. Once the model is constructed, then it can be used to generate stratigraphy using a MonteCarlo simulation. Characterization of stratigraphy is an essential element in any geological or geotechnical engineering analysis or design.

Acknowledgements
USAE Waterways Experiment Station provided financial assistance for this study. This support is gratefully acknowledged. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the author and do not necessarily reflect the views of the Waterways Experiment Station. Sue Wiedenbeck, a graduate student in Systems Engineering at the University of Arizona, assisted with most of the calculations. The writer also gratefully

130

Kulatilake

acknowledges John B. Palmerton of the Waterways Experiment Station for his assistance and interest over the course of the study.

References
Ali, E.M., Wu, T.H. and Chang, N.Y. (1980) Stochastic model of flow through stratified soils, Journal of the Geotechnical Engineering Division, ASCE 106, 593-610. Anderson, T.W. and Goodman, L.A. (1957) Statistical inference about Markov chains, Ann. Math. Statist. 28, 89-110. Ang, A.H-S. and Tang, W.H. (1975) Probability Concepts in Engineering Planning and Design 1, John Wiley and Sons. Ang, A.H-S. and Tang, W.H. (1984) Probability Concepts in Engineering Planning and Design 2, John Wiley and Sons. Benjamin, J.R. and Cornell, C.A. (1970) Probability, Statistics, and Decisionfor Civil Engineers, McGrawHill. Billingsley, P. (1961) Statistical methods in Markov chains, Ann. Math. Star. 32, 12--40. Ethier, V.G. (1975) Application of Markov analysis to the Banff Formation (Mississipian), Alberta, Mathematical Geology 7, 47-61. Goodman, L.A. (1968) The analysis of cross-classified data: independence, quasi-independence, and interactions in contingency tables with and without missing entries, Jour. Amer. Statist. Assoc. 63, 1091-131. Harbaugh, J.W. and Bonham-Carter, G. (1970) Computer Simulation in Geology, John Wiley and Sons. Krumbein, W.C. and Dacey, M.F. (1969) Markov chains and embedded Markov chains in geology, Journal of Mathematical Geology 1, 79-96. Scherer, W. (1968) Application of Markov chains to cyclical sedimentation in the Oficina Formation, eastern Venezuela, unpublished MS thesis, Northwestern University, Evanston, Illinois. Yu, J. (1984) Tests for quasi-independence of embedded Markov chains, Journal of Mathematical Geology 16, 267-82.

Department of Mining and Geological Engineering, University of Arizona, Tuscon, Arizona 85721, USA
Received 30 May 1986

PINNADUWAH H.S.W. KULATILAKE