You are on page 1of 17

Sedimentology - Elsevier Publishing Company, Amsterdam - Printed in The Netherlands

STATISTICAL MODELS IN SEDIMENTOLOGYI


W. C. KRUMBEIN

Northwestern University, Evanston, Ill. ( U . S. A . )

(Received September 12,1967)

SUMMARY

Three stages of statistical development can be recognized in sedimentology. The first is descriptive statistics, in which the sample is the object of interest, and the second is analytical statistics, in which the population assumes major importance. A very large variety of statistical techniques is available for estimating mean values, degrees of variability, tests of differences among population means, linear relations (correlations) among the variables, and ways of evaluating areal variations (trends) in sedimentary phenomena. The third stage of statistical development is the application of stochastic process models to sedimentology, in which the objective is to discern the probabilistic elements in sedimentary processes, in part by simulation with the high-speed computer. Stochastic process models thus provide one way of examining sedimentary processes through time or over an area. In conjunction with deterministic models they provide a framework for exploring the underlying physical, chemical, and biological controls on sedimentary processes and deposits, with superimposed random fluctuations introduced by the built-in probabilistic mechanism.
INTRODUCTION

Applications of statistics in sedimentology can be divided into several categories, ranging from initial use of relatively simple charts, graphs, and tables for summarizing data, to advanced multivariate models and stochastic process models. I shall discuss these as three stages of evolution in statistical applications to sedimentary processes and sedimentary deposits. These stages overlap one another in time of application and from one aspect of sedimentology to another, but we may conveniently consider them in historical sequence. The first stage involves the development of methods for measuring sedimentary attributes (such as grain size and shape), and the development or adaptation of ways in which to present and interpret the resulting numerical data. In this first
Paper presented by invitation at the Seventh International Sedimentological Congress, Reading, Berks., England, August, 1967. This work was supported by the Office of Naval Research (Geography Branch) under contract Nonr-1228(36), ONR Task No.388-078. Reproduction in whole or in part is permitted for any purpose of the United States Government.
1

Sedimentology, 10 (1968) 1-23

W. C. KRUMBElN

stage the general approach may be referred to as descriptive statistics. The second stage of development uses formal statistical design and statistical inference in sedimentary analysis and interpretation. It is appropriately referred to as analytical statistics. The third stage of statistical evolution involves the use of stochastic process models. These are concerned with the patterns of behavior displayed by sedimentary processes and deposits through time or as they spread over an area. BARTLETT (1960, p.1) refers to this approach as the dynamic part of statistical theory (the statistics of change) as opposed to the static (i.e., conventional) statistical theory used in the first two evolutionary stages mentioned above. A stochastic process is defined (BARTLETT, 1960, p. 1) as some possible actual, e.g., physical process in the real world, that has some random or stochastic element involved in its structure. Stochastic process models represent an exciting development in statistical sedimentology. They provide, among other things, mechanisms for simulation of sedimentological processes and deposits through time or over areas, as recently described and computerized by HARBAUGH (1966). The main purposes of this paper are briefly to consider some aspects of descriptive and analytical statistics, and then to develop the new frameworks of thinking that have been introduced into sedimentology by stochastic process models.
DESCRIPTIVE AND ANALYTICAL STATISTICS

In concept descriptive and analytical statistics are distinctly different. In the descriptive approach one asserts that a typical sample yields measurable facts about a sediment, and that within some given range of measurement error these facts form the logical basis for inferences about the deposit. Thus, it is argued, typical specimens may be collected on the basis of substantive judgment, from such localities as may reasonably be expected to yield maximum information about the problem at hand. The analytical point of view asserts that in addition to variability introduced by measurement error, variability may also be introduced by sampling error. That is, the population from which a sediment sample is taken has some fixed value of mean grain size, but there is no objective way of judging whether a sample picked by personal judgment affords a good or poor estimate of the population mean. Without some randomization procedure in collecting the samples, there is no assurance that personal bias may not strongly influence the particular sample value that is observed. Thus, one important difference between descriptive and analytical statistics is that in the former the emphasis is on the sample, whereas in the latter the population is the target of interest. We may point up this difference by a fundamental dictum of analytical statistics that a sample is of value only for the insight it gives into thepopulation of interest. The tremendous expansion in analytical statistics from the late 1920s on, brought about by Sir Ronald Fishers introduction of variance analysis, opened entirely new vistas in virtually all sciences. It was not until the mid-l940s, however,
Sedimentology, 1 0 (1968) 7-23

STATISTICAL MODELS IN SEDIMENTOLOGY

that sedimentologists began to use these methods at all extensively. Allen, of the University of Reading, was among the first to apply analysis of variance to sedimentary problems. In North America, Griffiths and co-workers at Pennsylvania State University were leaders in developing and applying these methods. They demonstrated the advantages of using formal statistical models instead of taking the data as they came, which had characterized much of statistical endeavor in the first stage. The shift from sample to population brought out the importance of confidence intervals on means and variances, t-tests and F-tests for comparing suites of samples, and in general demonstrated the advantages of experimental design as a formal approach to sedimentary statistics. One of the main contributions of analytical statistics to sedimentology is its emphasis on the importance of the variance of the population as well as the mean value in examining sedimentological data. A fictitious example, useful for class demonstration, is shown in Fig. 1. The two upper sets of pebbles (A and B) are noticeably different in their long dimensions, and simple inspection suggests that these could easily have come from two different well-sorted gravel beds. In the lower two sets of pebbles (C and D), however, the long dimensions in each group vary sufficiently among themselves so that these sets could easily have come from a single bed of poorly-sorted gravel. We can formalize these judgments statistically by comparing the variability between the pairs to the variability within the pairs. Table I does this, and shows that for the top pair the variability (mean square) between the means is very much greater than the variability within the sets. For the lower pair the betweenvariability is very much smaller than the within-variability. It is evident from Fig.1 and Table I that the pebbles in these two examples are

Fig.1. Pebbles arranged in different ways to bring out the concept of within- and betweenvariance. In set A-B the between-variance is much larger than the within-variance; in set C-D the reverse is true (see also Table I).
Sedimentology, 10 (1968) 7-23

10
TABLE I
VARIANCE ANALYSIS OF PEBBLE DIAMETERS IN FKi. 1

W. C . KRUMBEIN

Sets A and B

__

.__~_

Source Sum of squares - ~_______


Between sets Within sets Total 696.6 282.0 978.6
-

d.J
1 18 19

Mean square
696.60 15.67

F1

44.5

Sets C and D
~

~-

__

Source
Between sets Within sets Total Critical F at 95%

Sum of squares
7.6 971 .O 978.6
= 4.41.

d.J
1 18 19

Mean square
7.60 53.94

F1
<1

the same; they have merely been mixed up to illustrate the simple principle of variation between and within groups, which lies at the heart of all conventional analysis-ofvariance models, and is the basis on which we make our intuitive judgments. An excellent student exercise is to have various mixtures prepared, checking them each time with analysis of variance (or with a t-test) until there is a perceptible difference between them. Students quickly grasp the principle, and improve their ability to judge similarities and differences by inspection even in the field. Realization that variability is equally as important as the average entered some aspects of sedimentology even in its first stage of statistical development. In sizefrequency distributions of sediments the average grain size and the degree of sorting (a measure of grain-size variability within a sample) were early recognized as having importance in sediment description and interpretation. However, the realization that the variability among sample means can itself be partitioned and tested for significance of geographic, stratigraphic, or environmental factors, had to wait until formal models of variance analysis appeared on the scene. These models require considerably more computation than was needed with individual samples, where the mean grain size and sorting (as well as skewness and kurtosis) can be estimated by simple graphic procedures. The fact that sedimentary deposits are composed of myriads of grains or crystals, each with its own size, shape, and mineral composition, early forced the attention of sedimentologists on the frequency distributions, as mentioned above. The study of frequency distributions is by definition a statistical endeavor, and it is facilitated by use of numerical data. The extension of statistical analysis from initial study of frequency distributions to consideration of sampling problems, areal variaSedimentology, 10 (1968) 7-23

STATISTICAL MODELS IN SEDIMENTOLOGY

11

tions in sedimentary properties, and to the analysis of complex aggregates of multivariate data, is a normal and expected growth.
IMPACT OF THE COMPUTER

Some sedimentologists began experimenting with the computer in 1955 and 1956, and by 1960 a number of computer programs was available for geological problems. Before 1965 the computer had become an essential instrument in sedimentological research among the statistically-minded because of its capability for handling very complex computations very quickly. The high-speed computer has unquestionably influenced the structure of sedimentological research by encouraging a multivariate approach. A typical study of a decade ago might involve three or four variables (such as grain size, sorting, grain shape, heavy mineral content and a few others), but it is not uncommon now to measure twenty or more attributes, including not only sedimentary characteristics, but observations on faunal content, trace elements, vector properties, and (for present-day environments) various process elements as well. Statistical methods in sedimentology have kept abreast fairly well with the increasing needs for advanced statistical models applicable to multivariate studies. Table I1 is an attempt to summarize the majoI statistical objectives in the left column, with a list of methods on the right. The upper part of the table covers stages 1 and 2 of statistical applications, and the bottom portion refers to stage 3, which is discussed later in the text. For stages 1 and 2 most methods are concerned with the third block of the table (i.e., associations among measurable attributes of sedimentary processes or deposits). These attest to the importance given to the search for relations among sedimentological variables. The wide choice of methods available for analysis of multivariate data raises questions regarding the optimum method to be used for any given problem. At the time of writing, for example, much discussion centers on factor analysis and the kinds of problems to which it is particularly suited. Similarly, in trend analysis the choice of the polynomial or the double Fourier series model involves decisions regarding the nonperiodic or periodic nature of the observed surfaces that are to be fitted. A striking feature of Table I1 is the pervasiveness of the general linear model in its geological and sedimentological applications. Analysis of variance, multiple correlation and regression, factor analysis, discriminant function analysis, and the polynomial and double Fourier models for map analysis all are variants of this same model. The literature on statistical applications in virtually all fields of geology is SO large that it is not feasible in this paper to include a comprehensive bibliography of applications, or of the large number of computer programs now available. It may be mentioned, however, that the State Geological Survey of Kansas (Lawrence, Kans., U. S. A.) published a Computer Contribution Series, which includes program listings for many of the methods in Table 11. Similarly, the Sedimentological Research Laboratory of the University of Reading (Reading, Berks., U. K.) has copies of basic
Sedimentology, 10 (1968) 7-23

12
TABLE I1
SUMMARY OF STATISTICAL OBJECTIVES AND METHODS IN SEDIMENTOLOGY

W. C. KRUMBEIN

Objectives Stages I and 2


~~~

Methods

Estimation of population mean and variance. General method is based on computation of moments or quartile measures. Detection of similaritiesor differences between population parameters. General method is analysis of variance.

frequency distribution analysis confidence intervals chi-square tests for goodness of fit nested analysis of variance t-tests analysis of variance single factor row-column multifactor nested models discriminant analysis simple linear correlation simple linear regression multiple and partial correlation multiple linear regression polynomial regression stepwise regression R-mode and Q-mode factor analysis cluster analysis time-series analysis autocorrelation spectral analysis discriminant analysis polynominal functions gridded map data nongridded data double Fourier series gridded map data nongridded data double power spectra canonical correlation stepwise regression
____

Estimation of associations among measurable properties of populations. General methods are correlation and regression to sort out important variables, or to develop predictor models. Classification is also an objective here.

Detection and evaluation of patterns of areal variation in measurable properties of populations. General method is trend analysis.

Description and analysis of probabilistic elements in sedimentary processes and deposits. General method uses stochastic processes.

systems analysis simulation studies use of transition probabilities and probability trees use of conditional probabilities independent-events trials

programs and maintains contact with computer-oriented sedimentologists. As for the statistical methods included in Table 11, and the modek on which they are based, these may be found in standard statistical texts and references, and in their geological context in M~LLER and KAHN(1962) and KRUMBE~N and GRAYB~LL (1965). Some
Sedimentology, 10 (1968) 7-23

STATISTICAL MODELS IN SEDIMENTOLOGY

13

aspects of geological data processing are covered in SMITH (1966); and a forthcoming text by HARBAUGH and MERRTAM (1968), covers a wide variety of computer methods for stratigraphic and structural analysis. Another, recent, book by GRIFFITHS (1967) is more specifically oriented toward methodological and statistical aspects in sedimentology. The last class of objectives in Table I1 covers stochastic processes, as applied to the study of sedimentary processes through time or as they spread over an area (as in transgression-regression cycles), to discern the patterns ot behavior of sediments successively deposited either vertically or laterally. These methods are new in sedimentology, and they are included in the table for completeness. Definitions and an example are discussed later in this report.
SEDIMENTOLOGICAL MODELS

In sedimentology one objective of analytical statistics is identification of the major variables involved in sedimentary processes. Once these are sorted out, the statistical generalizations need to be tied in with underlying physical, chemical, and biological controls that are operative during weathering, erosion, transportation, deposition, and lithification. It was recognized long ago that sets of samples collected along streams or beaches display the kinds of changes that sediments undergo during transportation. Such studies are facilitated by quantitative data, and even in the first stage of statistical study graphs of mean grain size, or of heavy mineral content of sands along lines of transport, proved to be very instructive. Such diagrams aided in development of conceptual models (mental images) of the processes taking place. As interest in sedimentary environments grew, it became apparent that environmental factors as well as the attributes of sediments had to be studied in order to understand why the deposits show the sorts of changes they display. Out of these considerations emerged the concept of a process-response (cause and effect) model, in which relations between process elements and sedimentary responses can be shown at least qualitatively. Process-response models may be implemented in various ways. The commonest approach is to isolate some specific process, such as stream transportation, and to study it in controlled flume experiments, or purely theoretically from underlying principles of physics. A field approach might be to find an ideal stream (i.e., where a sediment can be followed from a known source with minimum disturbances due to tributaries), in which to observe and measure attributes of the channel, of the flowing water, and of the sedimentary materials in transit or in stream deposits. The data can be examined in terms of physical theory, or purely statistically in terms of a predictor model (see Table 11) based on multiple regression, that relates a particular response (such as pebble roundness) to the several most important independent variables in the process. Stream transportation and deposition can also be examined in the framework of systems analysis. BEER(1959, p.7) defines a system as any cohesive collection of
Sedimentology, 10 (1968) 7-23

14

W. C . KRUMBEIN

items that are dynamically related. The items (elements of a model) may be considered as points connected by a network of relationships, which can be illustrated by drawing lines between the points. The pattern of lines may change through time as the system interacts within itself, in part through feedback mechanisms and other controls on the system. Systems analysis is essentially new in sedimentology, but AMOROCHO and HART (1964) give an excellent discussion with diagrams of this approach in hydrology, which is transferable to sedimentological stream studies. It is evident from the foregoing that a wide spectrum extends from the statistical study of a single particle-size frequency curve to the analysis of a complex multivariate data matrix. Moreover, there is a complete range from qualitative conceptual models through statistical predictor models, to increasingly analytical models that seek to explain a given process in terms of physically-meaningful functional relationships. Although many sedimentological models are expressed in qualitative form, there normally is a background of numerical data that supports the generalizations in the model. Thus, a diagrammatic model showing sediment dispersal paths by arrows of varying thickness to indicate relative contributions from several source areas is in effect a generalization that may involve detailed study of particle size distributions, cross-bedding directions, heavy mineral content of sand or lithologic types of pebbles, and so on. The point of emphasis here is that a given sedimentological model tends to integrate large amounts of information, some of which may be purely qualitative (color of beds, for example), some may be statistical (mean grain size, etc.), and some may involve application of physical principles of stream transportation. Hence, distinction between purely statistical models and other kinds of sedimentological models tends to become blurred. It would be instructive to compile a bilbiography of sedimentological models, with references cross-indexed according to the structure of the models proposed. Such an endeavor is beyond the scope of this paper, but there is one aspect of model building that does bring out an important distinction between ways of structuring sedimentological data. This brings us to the final stage of our discourse.
THE THIRD STAGE OF STATISTICAL APPLICATION: PROBABILISTIC MODELS

As multivariate statistical models come into more general use, sufficient data become available for experimentation with new kinds of models. Among these, as mentioned in the introduction, are stochastic process models. In order to indicate their relation to other kinds of mathematical models, distinction may be made between deterministic models1 and probabilistic models as devices for examining sedimentological phenomena.
I use the term deterministic model for the conventional mathematical expression of a process without stochastic components, so that, given a set of values for the independent variables, the magnitude of the dependent variable can be computed exactly. Terms that may be synonymous include functional models, mechanistic models, analytical models, actualistic models, and perhaps others.
1

Sedimentology, 10 (1968) 7-23

STATISTICAL MODELS IN SEDIMENTOLOGY

15

The goal of most scientific endeavor is to explain a phenomenon so thoroughly that every state of a system in time or space can be exactly predicted. In conventional language this means relating each effect to its cause in a continuum of space and time. The classical approach to this goal is that of mathematical physics, which starts from basic physical principles and derives a quantitative expression for the phenomenon under study. This expression commonly is a differential equation, which when integrated with suitable boundary conditions, becomes a deterministic model, whose coefficients, exponents, and other parameters are constants for the conditions stipulated in the model. By inserting selected values of the independent variables into the equation, the value of the dependent variable can be exactly predicted. Where the underlying process is thoroughly understood and well documented by observational or experimental data, it is possible to set up deterministic models for sedimentological phenomena. In a very large number of problems, however, it is not possible to evaluate all of the underlying controls on a sedimentary process. A good example is afforded by cyclical successions of sediments, in which an underlying pattern of rock succession can be discerned, but in which the actual sequence of sedimentary types and thicknesses cannot be exactly predicted. No doubt the placement and thickness of each bed was determined by one or more controls in the complex of processes that went on, but whether the control was exerted by local uplift in the source area, by lateral shifts of environment due to remote tectonic events in ocean basins, or by shifting streams that bring varying supplies and kinds of debris to the moving strand line may be indeterminate. Such controls in ancient rock successions are never directly observable, nor may there, in many cases, be sufficient evidence in the rocks to point unerringly to a specific controlling factor. Where a system is highly complex, or where the ultimate cause of each variation cannot be traced and evaluated, a probabilistic model provides a device by which the observed succession of beds can be structured in such a way as to bring

[MODELS]
I

/Ll
MEMORY

INDEPENDENT EVENTS MODELS

PATH STO CHAST I C PROCESS

T
MEMORY

MODELS

(STAT I T I C A L ) A(COMPLETELY RANDOM) (PARTIAL DEPENDENCE)

DE PEN DENT DETERM I N ISTIC MODELS

(FULL

DE P E N DE N CE )

Fig.2. Diagram of models in terms of dependence and memory.


Sedirnentology, 10 (1968) 1-23

16

W. C. KRUMBEIN

out the major patterns of behavior. Because the variations are not exactly predictable in the sense that they cannot be directly related to a deterministic mechanism, these variations may appropriately be designated as random. But randomness is not to be thought o f in this connection as completely haphazard, without rhyme or reason. A random variable is a mathematical entity that arises from probabilistic mechanisms, just as conventional nonstochastic (systematic) variables arise from deterministic mechanisms. In a completely deterministic model the outcome of an experiment is exactly predictable, as stated above. In a probabilistic model the outcome of an experiment is controlled by an underlying set of probabilities, and hence it is predictable only in terms of relative likelihoods associated with a set of possible outcomes. Models that contain both deterministic and probabilistic elements define processes that follow the systematic path of the deterministic core, with superimposed random fluctuations introduced by the built-in probabilistic mechanism. The probabilistic mechanism can be built into physical processes in various ways. At one extreme is the independent-events model, in which each experimental outcome is completely independent of preceding outcomes. This is followed by a broad spectrum of models showing partial dependence of each outcome on preceding outcomes, to the other extreme of the completely path-dependent deterministic model with no probabilistic element at all (see Fig.2). It is the broad band in the middle of this spectrum that is being examined by statistically-minded geologists in various fields. Out of a variety of stochastic process models the3rst-order Markov chain has been more widely used in geology than any other, and it is illustrated here.
MARKOV MODELS

Markov chains occupy a position of partial dependence in the spectrum from independent-events models to deterministic models. Specifically, for a first-order Markov chain, the state of the system at time tr is dependent only on the state of the system at time t,l. Hence, this model has a very short memory in contrast to the long memory of the classical deterministic model, where the state at time tr depends upon all previous states. In this framework, the independent-events model has no memory at all. The geological literature on Markov chains is expanding rapidly. They have been used in geomorphology, stratigraphy, sedimentology, paleontology, and petrology. The following list covers sedimentology and stratigraphy, the latter because most papers consider cyclical deposits. ALL~GRE (1964), SCHWARZACHER (1964), CARR et al. (1966), GRWF~THS (1966), HARBAUGH (1966), KRUMBEIN (1967), and POTTER and BLAKELY (1967b), are concerned with stratigraphic examples, and POTTER and BLAKELY (1967a) apply the model to synthesis of stream deposits. VISTELIUS (1949) was the first to use Markov models in stratigraphy, and his recent papers include VISTELIUS and FE~GELSON (1965) and VISTELIUS and FAAS (1965). References to papers in other geological fields are given in KRUMBEIN (1967). KEMENY and SNELL (1960) give an elementary introduction to the mathematics of Markov chains.
Sedirnentology, 10 (1968) 7-23

STATISTICAL MODELS IN SEDIMENTOLOGY

17

TABLE111
GENERALIZED TRANSITION PROBABILITY MATRIX, py

To j +

A Markov chain is normally structured as a transition probability matrix, which controls the transitions from a given state to itself or to another state. The elements of this matrix are the probabilities pq, which represent the probability of moving to s t a t e j at time t,, given that the process is in state i at time t,.-l. These probabilities are the driving forces on the transitions, and the model is appropriately classified as a statistical model. A three-state transition probability matrix is shown in Table 1 1 1 . The diagonals, p 1 1 , p 2 2 , and p 3 3 represent the probability of a given state being succeeded by itself, whereas the off-diagonal elements plz, p 1 3 , etc., represent the probabilities of transitions from one state to another. In sedimentology the states of a system may be defined as the kinds of sediments present in a given deposit. Thus in a stream deposit made up of massive sand, cross-bedded sand, and ripple-bedded sand, these sedimentary types may be designated as states A, B, and C, respectively. The transition matrix for such a deposit can be developed in several ways. One is to record the state of the system at short but fixed vertical intervals. If the observed sequence is AABBBABBCB.. ., then the first transition is from A to itself, followed by a transition from A to B, succeeded by several transitions from B to itself, leading to a transition to C , and so on. This method is described (1967). in KRUMBEIN A second way of structuring the transition matrix is to note only the changes from one sediment type to another, and to keep a separate record of the thicknesses of each type. Thus, the preceding sequence would be structured simply as ABABCB. ., representing transitions from massive sand (A) to cross-bedded sand (B), back to massive sand followed by cross-bedded sand, and then by ripple-bedded sand (C), and so on. In this instance the diagonal probabilities are zero. CARR et al. (1966) as well as POTTER and BLAKELY (1967a) describe this procedure, including a variant with multistory lithologies that introduce diagonal probabilities greater than zero. These different ways of setting up transition probability matrices indicate, among other things, that the Markov model is flexible, and may be structured in various ways. Thus, the chain may be second-order or higher, in which case events at times tr- 2 , tr- 3 , . . .,may be examined in addition to tr- 1 in setting up the transition matrix. ALLEGRE (1964) describes first- and second-order chains, and PATTISON (1965) used a sixth-order chain in a study of hourly rainfall rates. In addition to
Sedimentology, 10 (1968) 7-23

18

W. C. KRUMBEIN

flexibility in these respects, Markov chains may be structured as a system operating at equal or unequal intervals of time, as points along a line, or as unit areas over a surface. The states may also be defined in various discretized ways, so that even a continuous variable (particle size for instance) may be arranged as discrete diameter states.
UPPER CAMBRIAN GLAUCONITIC SANDSTONES IN A MARKOV FRAMEWORK

Transition probability matrices have interesting properties when they are examined for the patterns of sedimentary successions that they disclose. The Upper Cambrian Mazomanie Formation of Wisconsin is dominantly white- to buff-colored quartzose sandstone, with abrupt incursions of dark green glauconitic sandstone (the Reno Formation) that seem to occur sporadically through the section. Field evidence suggests a cyclical pattern that starts with clean quartzose sandstone ( A ) , followed upward by light buff dolomitic quartzose sandstone (B), and succeeded by glauconitic sandstone (C) which also is followed upward by dolomitized glauconitic sandstone (0). The seemingly sporadic positioning of the glauconite beds and the abruptness of their occurrence suggested that relationships between the white and green sandstones might be brought out by structuring the data as a first-order Markov chain (COATES, 1967). A transition matrix, based on 103 observations at 1-foot vertical intervals through a continuous outcrop, is shown in Table IV, with the four states arranged in the presumed order of cyclical succession. The matrix clearly discloses two loops involving the white and buff sandstones (states A and B), and the green sandstone (states C and 0). The following points regarding the Mazomanie are pertinent for interpreting the matrix. In the first place, the white and buff sandstone represents on the average about 69% of the section, with green glauconitic sand making up the other 31 %. The average bed thickness of the four kinds of sandstone is 1.8 ft. for clean white sandstone, 1.8 ft. for the buff dolomitic sandstone, 1.4 ft. for clean glauconitic sandstone, and 2.1 ft. for the dolomitic glauconitic sandstone. These relative thicknesses are fairly well shown by the probabilities in the diagonal

TABLE IV
TRANSITIONPROBABILITY MATRIX OF MAZOMANIE (UPPER CAMBRIAN) FORMATION, SOUTHERN WISCONSIN

(After COATES, 1967)


--~ ____________-

White quartzose sandstone Buff dolomitic sandstone Green glauconiticsandstone Green dolomitic-glauconitic sandstone

A B
C

-0.47
0.33 0.23 0.21

B 0.47 0.44 0.15 0.10

D
O.O? 0.06 0.31 0.53

0.03 0.17 0.31 0.16

STATISTICAL MODELS IN SEDIMENTOLOGY

19

of Table IV, in which state C is smallest, states A and B are intermediate, and state D is thickest. Thus the prominence of the green beds when they occur is attributable to their greater average thickness, despite their smaller proportion in the whole section. The off-diagonal elements in Table IV are instructive also. The transitions AA, AB, BA, and BB form a compact group, such that the probability of moving from state A to state C or D is very small, and the probability of moving from state B either to state C or D is not much larger. What this means is that when the system is in the white to buff sandstone loop it tends to remain there, but once it gets out (usually by transition BC in the ideal cycle) it tends to hover in states C and D, although the passage back to A or B is notably greater than the reverse. If the system moves to D, the relatively large probability (0.53) of staying there tends to produce the thick bursts of green sand in the dominantly white to buff sections. What this example demonstrates is that by structuring the data into a transition probability matrix relationships observed on the outcrop become much clearer, and the pattern of succession is more apparent. Naturally, the question arises whether the process is Markovian or not; fortunately, statistical tests are available (KRUMBEIN, 1967; POTTER and BLAKELY, 1967a) and the data in Table I11 safely satisfy the criterion. When a Markov property is present the transition matrix may be used to simulate the cycles, and the long-term equilibrium proportions of each sediment type can be computed. Fig.3 shows part of the actual section and part of a computer simulation. The agreement is evident, although such agreement does not necessarily demonstrate the adequacy of the model. Of equal importance is this question: given that the process is Markovian, what does this mean geologically, in terms of sedimentary environment, source area of the quartz and glauconite, and so on ?
OBSERVED MARCHAIN LEGEND STATE A QUARTZOSE

ss

j:==lA STATE B DOLO-QTZ

ss

STATE C GLAUC SS

m C
STATE D DOLO-GLAUC

ss
D D

Fig.3. Observed and simulated stratigraphic sections of Mazomanie (Upper Cambrian) Formation. Section on right from matrix of Table 1V with computer program Marchain (KRUMBEIN, 1967). Length of sections about 10 m.
Sedimentology, 10 (1968) 7-23

20
DETERMINISTIC AND PROBABILISTIC MODELS IN SEDIMENTOLOGY

W. C. KRUMBElN

The question just raised comes immediately to the nub of the problem as to whether probabilistic models can play a legitimate role in sedimentology as devices for explaining a process rather than merely describing it. There is some difference of opinion regarding the validity of invoking random processes, as well as of the geological meaningfulness of simulations obtained from transition probability matrices. To some it seems that reliance on random processes implies surrender of the basic methods of science; to others it seems equally apparent that for most natural processes the deterministic model is merely an oversimplified expression of processes that are basically probabilistic. COLEMANS book on mathematical sociology (1964) presents an interesting and relatively unbiased discussion of deterministic and probabilistic models in a field not unlike geology in its complexity of interlocked variables. In chapter 18 Coleman discusses tactics and strategies in the use of mathematics, which includes a section on probabilistic and deterministic mathematics. He emphasizes that in general the deterministic model is mathematically simpler than its stochastic equivalent. In some cases at least, the deterministic model contains the mean values (= expected values) of the probability distributions in the corresponding stochastic process model. Thus, where interest is mainly in mean values, the deterministic model is to be preferred. In geology, BRIGGSand POLLACK (1967) conclude that . . .when sufficient knowledge exists concerning the processes and the geology, there is no need to invoke random, probabilistic processes. Another viewpoint in the spectrum of opinion is that of HARBAUGH (1966) and HARBAUGH and WAHLSTEDT (1967), who present

RELATIONS A M O N G MODELS
CONCEPTUAL M O D E L ( M E N T A L IMAGE)

DETERM I N ISLSTATISTICAL-STOCHASTIC TIC MODEL PREDICTOR PROCESS I MODEL MODEL REAL-WORLD DATA (RE-LOOP A S NECESSARY)

CONVENTIONAL STATISTICAL

/ANAYss\

Fig.4. Generalized diagram of some relations among deterministic, statistical, and stochastic process models. Although the cross-path from deterministic to stochastic process models is by way of predictor models, there is also a direct relation between them. See text for discussion.
Sedimentology, 10 (1 968) 7-23

STATISTICAL MODELS IN SEDIMENTOLOGY

21

computer programs for simulating sedimentary processes by use of stochastic process models. The probabilities used as input can be selected on the basis of their geological reasonableness for the process under study, without necessarily using real data. By changing the input, a variety of situations can be simulated including deltaic deposits, beaches, algal reefs, and other sedimentary features that . . .develop progressively. . . with startling realism.
RELATIONS AMONG CURRENT APPROACHES

Fig.4, based on a lantern slide used in the oral presentation, is inserted here to illustrate in a broad way how current sedimentological studies are interlocked in terms of deterministic, conventional statistical, and stochastic process models. If one starts with a conceptual model, he may follow the left-hand path in the diagram as a direct approach to the development of a deterministic model. The middle path represents a conventional statistical approcach that leads normally to a statistical predictor model. The right-hand path represents one kind of probabilistic approach, leading for example to a Markov chain. In any case, the final model is ultimately tested with observational data to evaluate its adequacy in the real-life world. What the figure shows primarily is that all three paths are very closely interlocked in methodology, and there is considerable freedom of flow from one path to another in the diagram. Thus, conventional statistical analysis (some form of the general linear model, usually) may be used to sort-out the more important variables or states in a system. These particular variables may then be used in formulating deterministic or stochastic process models. The choice of a particular path through the diagram is controlled by the purposes of the study, by the kinds of data available, and by the way in which the data are structured for analysis. In particular, the deterministic and stochastic process models endeavor to make explicit the physical meaning of coefficients and exponents, whereas in the statistical predictor model this is not necessarily a primary consideration. Predictor models can be very important in the search for natural resources, for example, even though the physical meaning of each element in the model may not be fully understood. In the long run the predictor model represents an intermediate stage in theoretical analysis, and the ultimate choice for theoretical sedimentology appears to lie between deterministic and stochastic process models.
CONCLUDlNG REMARKS

The divergence of opinion regarding the relative usefulness of deterministic and probabilistic approaches suggests that the truth lies somewhere between. This may not be necessarily so, however, as the theory of more complex or more comprehensive sedimentological processes is investigated, especially with succeeding generations of computers. No doubt some present unexplainable variations, now assigned to random fluctuations, will be accounted for by more sophisticated deterSedimentology, 10 (1968) 7-23

22

W. C. KRUMBEIN

ministic models. However, with greater computer capability the mathematically more complicated stochastic models can conveniently be used. It will always be possible to reduce the complexity of a system by taking the expected values of the distributions involved (thereby obtaining the corresponding deterministic model), but this will be at the risk of ignoring important stochastic components in the system. This present paper can hardly be used to support the foregoing judgment regarding the future of theoretical sedimentology, inasmuch as my own example of a Markov chain is largely descriptive. However, when the full range of stochastic process models is examined, it is apparent that they afford a wide choice of ways for analyzing the same problems as are now treated deterministically. This was illustrated very recently by CONOVER and MATALAS (1967) with respect to suspendedsediment concentrations in streams. Such models, developed in a probabilistic framework, have the advantages that accrue from including variability as well as mean values, which was seen to be basically important even in the first two stages of statistical applications in sedimentology.
REFERENCES

ALL~GRE, C., 1964. Vers une logique mathematique des series stdimentaires. Bull. SOC.Gkol. France, 6 : 214-218. AMOROCHO, J. and HART,W. E., 1964. Critique of current methods in hydrologic systems investigation. Trans. Am. Geophys. Union,45 : 307-321. BAILY, N. T. J., 1957. The Mathematical Theory ofEpidemics. Griffin, London. M. S., 1960. An Introduction to Stochastic Processes with Special Reference to Methods and BARTLETT, Applications. Cambridge University Press, London, 3 12 pp. BEER, S . , 1959. Cybernetics and Management. Wiley, New York, N.Y., 214 pp. BRIGGS, L. I. and POLLACK, H. N., 1967. Digital model of evaporite sedimentation. Science, 155 : 453456. CARR, D. D., and others, 1966. Stratigraphic sections, bedding sequences, and random processes. Science, 154 : 1162-1164. COATES, M. S., 1967. Application o,f a Markov Chain to the Mazomanie-Reno Sandstone in SouthCentral Wisconsin. Thesis, Northwestern Univ., Evanston, Ill., 97 pp. (unpublished). COLEMAN, J. S., 1964.Introduction to Mathematical Sociology. Free Press, Glencoe, Ill., 554 pp. CONOVER, W. J. and MATALAS, N. C., 1967. A statistical model of sediment transport. U. S., Geol. Surv., Profess. Papers, 575-B : B60-B61. GRIFFITHS, J. C., 1966. Future trends in geomathematics. MineralZnd., Penna. State Univ., 35 : 1-8. GRIFFITHS, J. C., 1967. Scientz$c Methods in Analysis of Sediments. McGraw-Hill, New York, N.Y., 508 pp. J. W., 1966. Mathematical simulation of marine sedimentation with IBM 7090/7094 HARBAUGH, computers. State Geol. Surv., Kans., Computer Contr., 1 : 1-52. HARBAUGH, J. W. and MERRIAM, D. F., 1968. Computer Applications in Stratigraphy. Wiley, New York, N.Y.. (in press). HARBAUGH, J. W. and WALHSTEDT, W. J., 1967. FORTRAN IV program for mathematical simulation of marine sedimentation with IBM 7040 or 7094 computers. State Geol. Surv., Kans., Computer Contr., 9 : 1 4 0 . KEMENY, J. G. and SNELL, J. L., 1960. Finite Markov Chains. Van Nostrand, Princeton, N. J., 210 pp. KRUMBEIN, W. C., 1967. FORTRAN IV computer programs for Markov chain experiments in geology. State Geol. Surv., Kans., Computer Contr., 13 : 1-38. KRUMBEIN, W. C. and GRAYBILL, F. A., 1965. An Introduction to Statistical Models in Geology. McGraw-Hill, New York, N.Y., 475 pp. MILLER, R. L. and KAHN, J. S., 1962. Statistical Analysis in the Geological Sciences. Wiley, New York, N.Y., 483 pp.
Sedimentology, 10 (1968) 7-23

STATISTICAL MODELS IN SEDIMENTOLOGY

23

PATTISON, A., 1965. Synthesis of hourly rainfall data. Water Resources Res., 1 :489498. POTTER, P. E. and BLAKELY, R. E., 1967a. Generation of a synthetic vertical profile of a fluvial sandstone body. J . SOC. Petrol. Eng., 6 : 243-251. POTTER, P. E. and BLAKELY, R. F., 1967b. Random processes and lithologic transitions. J. GeoZ., in press. SCHWARZACHER, W., 1964. An application of statistical time-series analysis of a limestone-shale sequence. J . Geol., 72 : 195-213. SMITH, F. G., 1966. Geological Data Processing: Using FORTRAN IV. Harper and Row, New York, N.Y., 284 pp. VISTELIUS, A. B., 1949. On the question of the mechanism of the formation of strata. Dokl. Akad. Nauk S.S.S.R., 65 : 191-194. VISTELIUS, A. B. and FAAS, A. V., 1965. On the character of the alternation of strata in certain sedimentaryrock masses. Dokl. Akad. Nauk S.S.S.R., 164 : 629-632. VISTELIUS, A. B. and FEIGELSON, T. S., 1965. On the theory of bed formation. Dokl. Akad. Nauk S.S.S.R.. 164 : 158-160.

Sedimentology, 10 (1968) 7-23

You might also like