You are on page 1of 7

Statistics and Data Analysis in Geology — Chapter 4 : Clarification

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed.
by John C. Davis

Clarification of zonation procedure described on pp. 238-239
Because the notation used in this section (Eqs. 4.80 through 4.84) is inconsistent with that used elsewhere in the book, it may be confusing. This confusion can be resolved by some redefinition of the equations. Let xij represent the standardized value of log trace j measured at depth i, where there are n depths and m log traces. Each log trace is standardized by subtracting the mean of the trace from every value and dividing by the standard deviation of the trace. This transforms the log traces so that each trace has a mean of zero and variance of one. The vector xi consists of the standardized values for all log traces at depth i. For the purposes of demonstrating the clustering process, we will regard xi as a column vector and use inner product notation (x x) to represent sums of squares, even though xi is actually a row in an n × m matrix. Using this convention and keeping in mind that the log traces are standardized, the total sum of squares is given by SST =
n i

xi xi = m × (n − 1)

(4.80)

Because the log traces are standardized, the total sum of squares is simply the number of log traces times the number of degrees of freedom for the variance of each trace, which is the number of depth intervals minus one. Consequently, we do not have to calculate the sum of the cross-products to find SST . The zonation process consists of iteratively collecting adjacent depths into zones composed of intervals that have similar log characteristics. The measure of similarity (or rather, of dissimilarity) is the squared distance, E 2 . (A greater squared distance between two adjacent zones indicates a greater dissimilarity between them. If two zones are identical, the squared distance between them will be zero.) We will refer to zones by the subscript k, and the number of points within a zone as nk . Because on the first iteration each zone includes only a single depth (that is, nk = 1 for all k), the squared distance between every observation at depth i and its neighbor at depth i + 1 is given by 1 (4.81) (xi − xi+1 ) (xi − xi+1 ) 2 After the first iteration, some of the zones will include more than a single depth; these zones are henceforth treated as a single object defined by their vector mean,
2 2 Ek = Ei =

Xk =

1 nk

nk i

xi

(4.82)

again, treating these as column vectors. We also must calculate a measure of the variation within a zone in the form of a within-zone sum of squares, Wk . Wk =
nk i

xi − Xk

xi − Xk

(4.83)

Note that the summation extends over the set of nk depths included in zone k. If we sum this measure over all g zones, we will obtain the total sum of squares within zones, SSW . SSW =
g k

Wk =

g k

nk i

xi − X k

xi − Xk

(4.84)

238

Repeat until all intervals are combined into a single zone or some stopping criterion is satisfied. Operation of the zonation algorithm can be demonstrated using the hypothetical well log shown in Figure 1. As noted previously. (3) Replace the vectors of intervals in a zone with their mean vector. on the initial pass each zone 2 are consists of a single depth.Analysis of Sequences of Data — Zonation Procedures Variable 1 0 0 1 2 3 4 5 Variable 2 6 7 10 15 20 25 30 35 40 0 5 Variable 3 Variable 4 0 10 15 20 25 30 24 26 28 30 32 34 36 38 5 5 10 10 15 15 20 20 Figure 1. As iteration proceeds and depths become grouped into zones. the squared distance between zone k and zone k + 1 is 2 = Ek Xk − Xk+1 Xk − Xk+1 (1/nk + 1/nk+1 ) This is the amount by which the merger of zones k and k + 1 will increase the total sum of squares within zones. is (1) Calculate the distances between the vectors of every interval (or zone) and the underlying interval (or zone). Hypothetical well log with four traces. so nk = 1 for all k and the squared distances Ek given by Equation (4. (4) Return to (1). (2) Combine the intervals whose squared distance is smallest to form a zone. after standardizing the log traces. the data are listed in Table 1. which consists of four well log variables measured at 20 successive depths. That is.81). The means and standard 239 . the dissimilarity between adjacent zones is determined as the squared distance between the zone means weighted by the number of depths within each zone. A general summary of the steps in the zonation algorithm.

dev. var. 4 34 31 35 26 26 25 31 33 33 33 34 33 34 33 31 25 36 36 34 34 depth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 deviations of the four traces are mean std.4254 0.6068 The twenty values for the log are given below.6068   1.0338 240 .8935 5.2735 7.2336 5.85 1. For depth 1. which is also the product 4 × 19 = 76. using Equation (4.1752  0.0813 1.5346 −1.4943 2.4242 0.0335 6.Statistics and Data Analysis in Geology — Chapter 4 : Clarification Table 1. var.1 14.1752 0. the sum of these values is SST = 76.0813   = 5. 1 var.1944 0. 3 23 18 24 12 10 11 12 9 8 5 5 7 3 13 12 12 26 27 23 22 var.9471 3.5312 8.2735 x1 x1 = −1.9318 4.0. 1 1 2 1 2 2 3 3 4 5 6 6 6 4 4 4 3 2 3 3 3 var.6253 6.80) as modified above. 2 var.1981 1.7870 5.5735 3.4112 3.5346    −1.3656 4. 4 3.5131 2. The first step is to calculate the total variance in the sequence.4161 7. giving the transformed values in Table 2. 3 var. 1.6566 5. 2 12 15 12 16 18 17 22 28 30 34 35 32 35 24 23 16 12 12 14 15 var.1 31.0234 3.35 21.5347 5. Hypothetical example of a “well log” consisting of four traces (variables) measured at 20 equally spaced depths.5433 which can be used to standardize the readings listed in Table 1. the multiplication is   −1.

5150 1.0813 −0.2773 −0.5414 −0.2286 −0.2286 −0.6068 0.8436 −0.6510 −1.1712 1.9332 1.6060 −1.0813 −0.5347 −0.7248 −1.1452 −0.  2 = E1 1 2 −0.7306 1.2340 0. Hypothetical well log with variables standardized by subtracting column means and dividing by column standard deviations.2773 −0.8054 −1.6531 −0.6531    −0.8467  −0.7062 2 = 0. the distance between every log measurement and the immediately underlying measurement is calculated.0575 1.6516 1.7306 0.2399 0.3246 0. as indicated in modified Equation (4.3246 0.8816 −0.8889 −1.2016 −0.3565 The distance between the log readings at these two depths is found by postmultiplying the vector of differences by its transpose.0813 −0.2286 −0.1712 0. 2 −1.8816 −0. That is.1752 0. 4 0.2016 −1.6068 − 0.2258 −0.0813 −1.4245 1.2286 0.0813 1.9375 −1.3246 0.2286 −0.2773 −0.2286 var.6510 −1.4093 −0.6068 0.2773 1.3246 −0.8467 −0. 241 .6602  0.1752 0.6602 0. 1 −1.8816 −0. 3 1. var.5328 1.5145 −1.9332 −0.81).6068 −0.0431 var.5347 −0.3565 0.8199 1.4245 −0.2951 1.3246 0.4245 0. we first find the vector difference between the measurements at depth 1 and depth 2: x1 − x2 = −1.3446 0.5713 1.6068 0.3565     0.5347 −0.Analysis of Sequences of Data — Zonation Procedures Table 2.7033 1.4656 −0. For example.6734 −0.4872 0.7248 var.6068 depth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 During this first iteration.0775 1.6060 −0.7248 0.1752 1.3072 −0.8816 −0.6602 0.8467 = 1.8816 −1.2399 −1.7306 1.1069 0.6531 −0.2399 = −0.6516 0.4245 0.8531 After repeating this calculation for every depth interval in the log we obtain the 19 distances listed below.3683 −0.

The new.0431 0.0958 1.1752 1.6068 0.7842 0.7248 1. postmultiplying each difference vector by its transpose.2286 1.6068 0.6068 + −0.2286 −0.0594 1.4047 0.1752 0.6068 −0.0660 242 .6068 0.6068 1.2286 − −0.2502 0.9928 6. the last two values in the sequence.2278 4.8436 = −0.2286 1.Statistics and Data Analysis in Geology — Chapter 4 : Clarification depth interval 1 and 2 2 and 3 3 and 4 4 and 5 5 and 6 6 and 7 7 and 8 8 and 9 9 and 10 10 and 11 11 and 12 12 and 13 13 and 14 14 and 15 15 and 16 16 and 17 17 and 18 18 and 19 19 and 20 distance 0.1382 1. subtracting the zone mean.1092 0.0000 x20 − Xk = −0. we will combine them to form a new zone defined by the vector mean of the two original intervals.0594 −0.7248 −0.2689 1.0000 −0.2286 2 −0. from x19 and from x20 . we calculate a measure of variation within the new zone using modified Equation (4.0431 0.2286 − −0.0469 0. Since these two are most similar.8436 −0.7657 0.0631 0.0000 = 0.7842 1.6068 Finally.2220 0. Xk .7842 −0.8072 0.2286 −0. The difference vectors are: x19 − Xk = −0. and summing the products. combined zone is x19 + x20 = 2 −0.3270 0.0000 = 0.0660 0.83).1092 0.7051 0.6189 0.8535 0.0158 The smallest squared distance (greatest similarity) is between depths 19 and 20.1751 1.1092 0.8531 1.

0000    −0.0594 0. Order of combination of depth intervals into zones in successive iterations of the zonation algorithm.0660 + 0.0000 0.0660  0.0594     0.0000  0.0158. Iteration 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 19 2 1 2 3 4 5 6 7 8 9 10 10 11 12 13 14 15 16 17 18 18 3 1 2 3 4 4 5 6 7 8 9 9 10 11 12 13 14 15 16 17 17 4 1 2 3 4 4 5 6 7 8 9 9 9 10 11 12 13 14 15 16 16 5 1 2 3 4 4 5 6 7 8 9 9 9 10 11 11 12 13 14 15 15 6 1 2 3 4 4 5 6 7 8 9 9 9 10 11 11 12 13 13 14 14 7 1 2 3 4 4 5 6 7 7 8 8 8 9 10 10 11 12 12 13 13 8 1 2 3 4 4 4 5 6 6 7 7 7 8 9 9 10 11 11 12 12 9 10 11 12 13 14 15 16 17 18 1 2 3 4 4 4 5 6 6 7 7 7 8 9 9 10 11 11 11 11 1 1 2 3 3 3 4 5 5 6 6 6 7 8 8 9 10 10 10 10 1 1 1 2 2 2 3 4 4 5 5 5 6 7 7 8 9 9 9 9 1 1 1 2 2 2 3 4 4 5 5 5 5 6 6 7 8 8 8 8 1 1 1 2 2 2 3 4 4 4 4 4 4 5 5 6 7 7 7 7 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 5 6 6 6 6 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 5 5 5 5 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 243 .0594   =  0. During the initial iteration. one depth interval or zone is combined with an adjacent depth interval or zone.0000 −0. In each successive iteration.0000   0.0660  0.0079 = 0. there is only one such zone. so the number of zones decreases by one with each iteration.0000    −0. Wk is  0.Analysis of Sequences of Data — Zonation Procedures (Note that the deviations from the zone mean are symmetrical only when a zone consists of just two intervals.0000 −0.0594 0. Table 3 shows the successsive combination of zones during the 18 iterations needed to combine all of the depth intervals into two zones. so SSW = Wk = 0.) For combined depths 19 and 20.0660 0.0158 SSW is found simply by summing all the Wk for all zones that contain more than one original depth.0079 + 0.0000 Wk = 0. As an Table 3.

where individual values within zones have been replaced by the mean values of the zones. 244 . Variable 1 0 0 1 2 3 4 5 6 7 10 Variable 2 15 20 25 30 35 40 0 5 Variable 3 10 Variable 4 0 15 20 25 30 24 26 28 30 32 34 36 38 5 5 10 10 15 15 20 20 Figure 2.Statistics and Data Analysis in Geology — Chapter 4 : Clarification example. Figure 2 shows the log traces after 15 iterations. Log is shown after 15 iterations when log has been grouped into five zones. Well log from Figure 1 after replacement of values within each zone with the mean values for the zone.