.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

A). I two sets are both open under the reconstruction structuring element K . that is. ( F n S ) @ K ]< r(K). A ) 5 p [ ( e K ) @ K . Since 0 E K . ( n S ) O K ] 5 r(K). which differ only by a dilation by K . ps ] Similarly. Bs~ ( A n S . F When the image F is open under K . Proof: . pM [(F n S ) l K .Sincethedistance between the minimal and maximal reconstructions is no greater than r ( K ) .B n S ) . A ) 5 p(B nS . = ~ A e K c A $ K . p(A . Since A = A O K .then pM(A.C) PM(A. B S) + r(K). A $ QK) K ) 5 r(K). A ) = p(A e K . Hencep(A. sinceAeK > A . A5 ygllkll andygllkll = r(K).wehave P M ( A . A ] = P ( A @ K . ) and (2) PM(B.F n S ) < pM[F n S . It immediately follows that the distance between the minimal and maximal reconstructions. ) 5 pM(A. ( F n S ) e K G F E(FnS)@K. B n S ) 5 ~ [ ( A ~ I s ) $ K .Proof: r ( K ) =min x€K ml . p(B. Hence pM(A0 K . then f the distance between the sets must be no greater than the distance between their samplings plus the size of K .A n S ) + r(K). since B = B o K .B) < p ( A . A ) 5 r(K).it is unsurprising that the distance between F and either of the reconstructions is no greater than r(K). the distance between F and its sampling F n S can be no greater than r(K). pw [ F .~ l l = r ( K ) XEK Y E UK S ~ ~ ~ P M ( A . We can reason the following way: B C It is readily verified that if A G B G C . 5 Now it immediately follows that if F = F OK = F e K . Also. is no greater than the sue of the reconstruction structuring element. ~ M ( A * K .A G ( A n S )QK . then ( 1) pM(A.x + xl +yll~ forx E K 5 f{maxIl~-YJI +maxII* -Yll} forx E K Y €K Y EK 5maxIIx -yll forx E K Y EK 1 5 min max Ilx .B ) 5 p(A .Hence pM(F.B)5 pM(An S .C). n Consider p(A . B n S ) + r ( K ) . A ) p(AeK.~ l l 5 YEK x l -yll =yEyllyll and ZI I IYI = 1 ~ l . ( F n S ) $ K ] 5 r(K).B). WhenF = F o K = F e K .13: If A = A O K and B = B O K . Proposition 5.( F n ~ K ] < @) r ( K ) and ~ M [ F . Why? It is certainly the case that F n S F C ( F n S ) @ K .

.

.

.

.

.

.

.

.

.

.

column) position is (r. could be five pixels (Fig. and grouping. or computation of minima or maxima. N(r. or table-look-up operations. . depending on the neighborhood operator. We begin with its definition. it could include the nearest four neighbors. or it may be large enough to contain all the pixels in some symmetric N x N neighborhood centered around the given position. It can perform the jobs of conditioning. Neighborhood operators can be classified according to type of domain. In this chapter we describe how and why the neighborhood operator works. for example.c) could include only one neighbor. Neighborhood operators whose output depends in part on previously generated output values are called recursive neighborhood opemtors.c) designate the set of the neighboring pixel positions around position (r . such as AND. OR. such as a pixel's north or east neighbor. N . The two types of domain consist of numeric or symbolic data. and whether or not they are recursive. subtraction. We begin our discussion with nonrecursive neighborhood operators. we let N(r. For a pixel whose (row. or NOT. Operators that have a numeric domain are usually defined in terms of arithmetic operations.c). and possibly of some values of previously generated output pixels. of the values of the input pixels in some neighborhood around the given input position. type of neighborhood. labeling. Neighborhood operators whose output is only a function of an input image neighborhood related to the output pixel position are called nonrecursive neighborhood opemtors. such as addition. Operators that have a symbolic domain are defined in terms of Boolean operations. A neighborhood might be so small and asymmetric as to contain only one nearest neighbor. The output of a neighborhood operator at a given pixel position is a function of the position. 6.1). it might consist of an M x M square of neighbors .CHAPTER NEIGHBORHOOD OPERATORS The workhorse of low-level vision is the neighborhood opemtor.c) For example. of the input pixel value at the given position.

c l . c ) implies ( r l . The most common nonrecursive neighborhood operator is the kind that is employed uniformly on the image. and so on. c ) meaning a list of the values f ( r ' .v ) E N ( r . c .u .c). . This kind of neighborhood operator is called shift-invariant or position invariant.c') E ~ ( c)] r . c ) in a preagreed-upon order that the notation itself suppresses.c')E N ( r .the distance po depending on ( r .cl) E N ( r . Letting f designate the input image and g the output image. where the neighborhood N itself satisfies a shift invariance: (rf. c ) is less than or equal to po(r.264 Neighborhood Operators Figure 6. The action for a shift-invariant nonrecursive neighborhood operator can be written as g(r. One common nonrecursive neighborhood operator is the Zinmr operiztor. the set of all neighboring pixel positions whose Euclidean distance to pixel ( r . 7'he output value at pixel position ( r .cl) : (rf. centered around ( r . c ) . Its action on identical neighborhood spatial configurations is the same regardless of where on the image the neighborhood is located. v ) . c f )for each of the (r'. we can write the action of a general nonrecursive neighborhood operator 4 as the notation f (r'.c') : (r'.c ) = 4 [ f ( r ' .v ) forall ( u .c ) is less than or equal to po and whose row index is less than or equal to r .u . c ) of a linear operator is given as a possibly position-dependent linear combination of all the input pixel values at positions in the neighborhood of ( r. or it might consist of the set of all neighboring pixel positions whose Euclidean distance to pixel ( r .c')E N ( r .1 A small and a large square mighborhood that might be used in a neighborhood operator.c ). c ) .

Then the translation of the given image f (r .c)] : since the neighborhood N is shift-invariant.c.c1.co) and then applying 9. is shift-invariant. c0) produces an h shifted by (ro.ro. suppose . c)] and h(r. Make a change of dummy variables.cl)] 1 Shifting f by (ro.r0.6. Now +[f (u .c -co) =+[f(u -ro.co). c. c') E NlV. c . and the translation of the result g (r .co). Then g(r -ro. The neighborhood itself is shift invariant.ro.co) produces a g shifted by (ro.co).c .) = 4 [f (r" . The output value at pixel position (r.c') : (r'.v -c0): (U -r0. An important kind of shift-invariant neighborhood operator is the linear shiftinvariant operator.. cl) : (r'.) : (r'l. and v = c' + co.r. Let u = r' + r.O).). and the weights of the linear combination are the same from neighborhood to neighborhood. Their equality states that the result of a shift-invariant operator on a translated image is the same as translating the result of the shift-invariant operator on the given image.1 Introduction 265 It should be obvious that shift-invariant operators commute with translation operators.ro.cl') E Nl(rl. [f(rl.c0) : (r'.c) is g(r . v = 4 [ f(u g(r. suppose the translation is by (r. To see this.c) is f (r .c .co) = & [g(rl . c -co) is the translation of the neighborhood operator acting on the given image.c)] . v) E N(r.c)] is the result of the neighborhood operator acting on the translated input image. Compositions of shift-invariant operators are also shift-invariant. then the action of the linear shift-invariant operator on an image f having domain F can be written .C' . C) = & [g(rl. first applying'+l and $en &.c .c.r.co) : (u.. followed by & results in an h that has been shifted by (rO.v -co) EN(^ -ro.rO. where Consider the translation of the result: g(r . c) of a linear shift-invariant operator is given as a fixed linear combination of all input pixel values at positions in the neighborhood of ( r .ro. To see this. note that h(r and g(rl .. So shifting f by (ro. C) = 4. and the g shifted by (rO. c') E N2(r.c)] Then the composition &+. c).c -co)] . If we designate by W the neighborhood of all pixel positions around (O.cl'.. and g(r -ro. c') E N2(r.) .CO) (u.v) E N(r. the result of a shift-invariant operator on a translated image is the same as translating the result of the shift-invariant operator on the given image.co).u . That is. More precisely.ro.c. c0).

2 and 6.u ) ! u ! : ) ) (. The mask in (a) comes from a repeated application of a 3 x 3 box filter.5 shows the application of a mask with weights to an image. The weight function w is called the kernel or the mask of weights.(r2 + c2). where r and c range between -2 and u! +2 and ( designates the binomial coefficient (: . W is the domain of w . and the application of the mask to the image. the indexing of the image.t2) - . We write Figures 6.and the operation itself is called the crosscorrelation off with w .3 illustrate common linear shift-invariant operators for noise cleaning. in (c) from (c:2). Figure 6.3 Common 5 x 5 masks used for noise cleaning.( U . Figure 6. Figure 6. The mask in (a) is o called the 3 x 3 b x filter.2 Common 3 x 3 masks used for noise cleaning.266 ~eighborhoodOperators (el $ (0 $ Figure 6. in (b) from the hnction 8 .4 shows the indexing of the mask.

. l Figun 6.2) of the image. and the computation of the crosscornlation for pixel (2. mask input image with extended domain FigW 6.Assume w(r'.c') = 1 for al (r'.2) of an image for the croskcomlation operation.5 Application of a mask of weights to pixel (2.c').4 Indexing of the mask (left). and of the image (right).

(r. c')] m. c ' . c')] and e(r. Another example of a shift-invariant operator includes the gray scale morphological operators of dilation and erosion.268 Neighborhood Operators (Assume ail weights are 1) Figure 6. c) = ( r m.6 illustrates the indexing of a nonsymmetric mask and an image and the operation of the convolution operator.w(rl. The convolution off with w is denoted by f * w and is defined by Cf * w)(r. ~)EW + c') . convolution and correlation are the same. a linear shift-invariant operation called convolution. c) = (r1.c')€W ( r -rl.c . and Wilson and Ritter (1987). Finally.4. for our last example we take the multiplicative maximum and minimum shift-invariant operator V ( r . c + c1)w(r'. If the mask is symmetric. Figure 6. The effect is like applying a crosscornlation operator with the L-shaped mask flipped. In this section we discuss a variety of symbolic and symbolic-related neighborhood operators in a way that emphasizes their common form (Haralick.r'. c .c')] (rl.in V ( r + r'. Ritter and Wilson ( 1987). A more detailed discussion of linear shift-invariant operators is given in Section 6. and Wilson ( 1987).c1)w(r'. Shrader-Frechette.c') (r1. Gray .cl)EW defined by Ritter. but it indexes differently over the pixels of the image.c) = min V ( r + r'. c) = rnax V(r . c')] respectively. c) = (r .C )EW m2(r.6 A nonsymmetric L-shaped mask and the application of the convolution operator to an image using this mask. c .r'.c')EW 2 + w(rl. whose action can be written as d(r. The cross-correlation has a close relative.cl)w(r'. 1981). otherwise the convolution operation flips the mask.rl.c') The convolution has the same form as the crosscorrelation.c-cl)€F f (r .

. However. most of which we do not address here. = h(a. then the output of the function is its second argument. = h(a. The 8-connected neighborhood around a pixel consists of all the pixels in a 3 x 3 window whose center is the given pixel.. . the 4-connected neighborhood around a pixel consists of the pixel and its north. Define a. n = 1. It changes all pixels whose label is the background label to the non-background label of neighboring pixels. For the operator in the 8-connected mode.1 Region-Growing Operator he region-growing operator is nonrecursive and often has a symbolic data domain. Recall that recursive neighborhood operators are those for which a previously generated output may be one of the inputs to the neighborhood.7 Indexing of the pixels in (a) 4-connected and (b) ltonnected neighborhoods of xo. . . = x0. south. Hence: The region-growing operator uses the primitive function h in the following way.-. If the first argument is the special symbol g for background.7. Figure 6.. let a. Nonrecursive . Then the output symbol y is defined by y = a. depending on their values.-.).x. In this manner the output memory is also part of the input.2 Symbolic Neighborhood Operators 269 (a) Figure 6. Then the output symbol y is defined by y = as. The common form of the operators suggests the possibility of a large-scale integration hardware implementation in the VLSI device technology. n = 1. and west neighbors.x. The simple operators discussed here may use two basic neighborhoods: a 4-connected neighborhood and an 8-connected neighborhood.8. Define a. let a = x0. parallel.. east. . 6. . It is based on a two-argument primitive function h. Otherwise the output is the first argument. Previous outputs cannot influence current outputs. The symbolic operators are useful for image segmentation tasks as well as for the labeling of primitives involved in structural image analysis.6. which is a projection operator whose output is either its first argument or its second argument. .4.neighborhood operators are those using independent image memory for input and output. As illustrated in Fig.). nonrecursive operators. Rosenfeld and Pfaltz (1966) call recursive operators sequential. . For the operator in the 4-connected mode.or 8connected neighborhood. instead of working on the binary images on which the binary dilation operator . i (b) ( 1971) examines symbolic-related neighborhood operators that compute local properties. The region-growing operator is related to the binary dilation morphological operator with a structuring element that is a 4.8 shows the operation of the region-growing operator in the 8-connected mode.2. . 6.

In the 8connected mode. 6.. A more sophisticated region-growing operator grows background border pixels to the region label that a majority of its neighbors have. such an operator sets a.). The region-shrinking operator defined here can change the connectivity of a region and can even entirely . In this sense it is a generalization of dilation. = h(xo. g) = a all = h(a. .3 Region-Shrinking Operator The region-shrinking operator is nonrecursive and has a symbolic data domain. the region-growing operator works on labeled regions. . It changes the label on all border pixels to the background label. .8 and defines the output symbol y by y = c. or the 4-neighborhood if the city-block distance is desired. To accomplish this labeling. .g) = a filled image (one iteration) Figun 6. where #{n 1 a. simply iteratively grow the nonbackground labels into the background labels by using the &neighborhood if the max distance is desired. we can label each background pixel with the label of its closest nonbackground neighboring pixel. n = 1. employing the Cneighborhood and &neighborhood alternately in the ratio of for Euclidean distance.8 Operation of the region growing operator in the 8connected mode.Neighborhood Operators original symbolic image = h(a.2.2 Nearest Neighbor Sets and Influence Zones Given a symbolic image with background pixels labeled g and each connected set of nonbackground pixels labeled with a unique label.2. 6.x. = c') for all c'. Serra (1982) calls these nearest neighbor sets influence zones. works. = c) > # { n ] a.

x. a. . 1968).6. . otherwise use the 4-neighborhood.4. If the arguments differ.5 x an approximation that should be which differs from good enough for most purposes. approximating as closely as possible the ratio */1 (Rosenfeld and Pfaltz.a ~ I..4 > Alternating these two sequences will give a ratio of 715.l. n = 1. It is based on a two-argument primitive function h that can recognize whether or not its arguments are identical. = x0.8. . Hence The region-shrinking operator uses the primitive function h in the following way.8. =o:{ if#{n I a . 3 : 2. .8.2 Symbolic Neighborhood Operators 271 delete a region upon repeated application..8. For the operator in the 8-connected mode. = g ) > k otherwise As mentioned in the section on nearest neighbor sets.4. = h(xo. . 3 : 2 > gives a ratio of 17/12.x. = h (a. Then the output symbol y is defined by y = a. then 8 use the 8-neighborhood for the current iteration. and a ratio of 312 can be obtained by the sequence 3 : 2 =< 4.). . let a. except that region-shrinking operates on labeled regions instead of binary-1s. h outputs the special symbol g for background. .4. .. . the 4-neighborhood and the &neighborhood must be used alternately. to obtain a regionshrinking (or region growing) that is close to a Euclidean-distance region-shrinking (or growing). h outputs the value of the argument. A more sophisticated region-shrinking operator shrinks border pixels only if they are connected to enough pixels of unlike regions. Define a. A ratio of 413 can be obtained by the sequence 4 : 3 =< 4. For the operator in the 4-connected mode. Then the output symbol y is defined by y = a. Define a. If I N4 .).9 Operation of one iteration of the region-shrinking operator using an 8-connected neighborhood. .. which is just under Using one 4:3 Altersequence followed by two 3:2 sequences gives a ratio of Ion.8.& ( ~ 8 1) I < 1 N4 1 . n = 1. . Let N4 be the number of uses of the Cneighborhood so far and N8 be the number of uses of the &neighborhood so far. If the arguments are the same. just over nating between < 4 : 3.. 3 : 2 > and < 4 : 3. by less than 2.9 illustrates this operation.8.x.4 >. The choice of Cneighborhood or 8-neighborhood for the current iteration that best approximates the Euclidean distance can be determined dynamically. a. In the 8connected mode it sets a. a + + Figure 6. Figure 6. = x0.). let a. n = 1. = h(a.8 and defines the output symbol y by .-. Region shrinking is related to binary erosion..4.

The connectivity number operator associates with each pixel a symbol called the connectivity number of the pixel. The number designates which of the six kinds of connectivity a pixel has with its like neighbors.5 Connectivity Number Operator The connectivity number operator is nonrecursive and has a symbolic data domain. Its purpose is to classify the way a pixel is connected to its like neighbors. = xo.). for interior. Then the output symbol y is defined by y =f (a4). Define a. and crossing pixels. The symbol. connected pixels. Then the output symbol y is defined by y = f (as). and Fukumura (1975).4 Mark-InteriorlBorder-Pixel Operator The mark-interiorborder-pixel operator is nonrecursive and has a symbolic data domain.-I . it outputs the special symbol i. as we define it. For identical arguments it outputs the argument. for border. n = 1.10. = h(a. is really a label. five for border pixels and one for interior pixels. there are six values of connectivity. edge pix&. In this case its 4connectivity takes the index value 5. . = h(a. n = 1. For the operator in the a-connected mode. The operator. it outputs b. 6.x. . let a. . If not. = XO. It is based on two primitive functions.272 Neighborhood Operators 6. let a. One is a two-argument primitive function h very similar to that used in the region-shrinking operator. The one-argument primitive function f can recognize whether or not its argument is the special symbol b. and all border pixels with the label b . . For 4-connectivity.. Hence c ifc=d b ifcfd f(c)={i b ifc=b ifcf b The mark-interiorborder-pixel operator uses the primitive function h in the following way. The two-argument primitive function h can recognize whether or not its arguments are identical. .2. . Otherwise the 4-connectivity of a pixel is given by the number of times a 4-connected neighbor has the same value but the corresponding three-pixel comer neighborhood does not.8.-I . Yokoi Connectivity Number The definition we give here of connectivity number is based on a slight generalization of the definitions suggested by Yokoi. 6.. It has no arithmetic number properties. For the operator in the 4comected mode. .4. If it is. Define a. As shown in Fig. branching pixels. 6. a pixel is an interior pixel if its value and that of each of its 4-connected neighbors are the same.). These corner neighbors are shown in Fig.11. though a number. For nonidentical arguments it outputs the special symbol b. It marks all interior pixels with the label i.x. Toriwaki. The other one is a one-argument primitive function f .2. The border pixels consist of isolated pixels. uses an 8-connected neighborhood and can be defined for either 4-connectivity or 8-connectivity.

w s . and (e) comer of ~ 4 . et and south neighbors of the center pixel. (a) indexing pixels in a 3 x 3 neighborhood.1 0 Connectivity number labeling of a b i i image using 8conntctivity. north. . (c) corner of x2.6.2 Symbolic Neighborhood Operators 273 Binary Image Labeling of the ' 8 ' Pixels 0 Isolated 1 Edee Key: 2 Connecting 3 Branching 4 Crossing 5 Interior Figure 6.11 Corner neighborhood corresponding to each of the east. (b) corner of X I . Figure 6. (d) corner of x3.

Figure 6. The Rutovitz connectivity number simply counts the number of transitions from symbols that are different from that of the center pixel to symbols .a4).a3. a2. the function h is slightly different. The definition we give here of the Rutovitz connectivity number. Otherwise the 8-connectivity of a pixel is given by the number of times a 4-connected neighbor has a different value and at least one pixel in the corresponding three-pixel neighborhood comer has the same value.a4).a2. the function h of four arguments is defined by The function f of four arguments is defined by if a l = a. otherwise 5 The connectivity operator using Ccomectivity is then defined in the following way by letting a1 = h ( ~ 0 . For 4-connectivity. The connectivity operator requires two primitive functions: a function h that can determine whether a three-pixel comer neighborhood is connected in a particular way and a function f that basically counts the number of arguments having a particular value.a3. the connectivity number y is defined by y = f (al. ~ 2 ) Define the connectivity number y by y = f (a I . ~ 1 . sometimes called a crossing number. For 8-connectivity. It is defined by q ifb#cand(d=bore=b) r ifb=cand(d=bande=b) s otherwise Then. a pixel is an interior pixel if its value and that of each of its S-comected neighbors are the same. ~ 6 . is based on a slight generalization of the definitions suggested by Rutovitz (1966).12 illustrates the computation of the Yokoi connectivity number.274 Neighborhood Operators For 8-connectivity. =q}. Rutovitz Connectivity Number The Yokoi connectivity number is not the only definition of connectivity number. as before.2= a 3 = a4 = r n where n = #{ak 1 a. Another definition given by Rutovitz (1966) is based on the number of transitions from one symbol to another as one travels around the 8-neighborhood of a pixel.

The Rutovitz connectivity number requires a three-argument primitive function h defined by Then set .a2.a3.a4)1 = (branching) (edge) Figure 6.12 Computationof the Yokoi connectivity number using konnectivity.as) 3 = f (a1.a2.a4) 5 = (interior) a l= r at .a2.4 a1 = 4 a4 = 4 a1 = r a2 = r a3 = r a4 = q f(al.a4) 4 = (crossing) f (alra2.a3. that are the same as that of the center pixel as one travels around the &neighborhood of a pixel.2 Symbollc Nelghborhood Operators 275 (11 =4 a.a3.a3. = r a2 = r a3 = r a4 = r a2 = 4 a3 = 4 a4 = 4 f(a1.8.

n=l 8 6. { ****** ** ** * ** * ** * ** (a) ** **** *** * * ** * * * (b) input image connected shrink output image Figure 8. 6. Basically a pixel's label is changed to g. Rosenfeld (1970).276 Neighborhood Operators The output value y is then given by Y=CO. the set of pixels that get labeled as background is a strong function of the way in which the image is scanned with the operator. for background. Toriwaki.d. The operator definition given here is based on Yokoi. the connected shrink operator labels only those border pixels that can be deleted from a connected region without disco~ecting region. the four-argument primitive function h is defined by : : h(b. the region remains connected. pixels that are interior during one position of the scan may appear as border pixels at another position of the scan and eventually may be deleted by this operator. The theoretical basis of the connected shrink operator was explored by Rosenfeld and Haltz (1966). left-right scan will delete all edge pixels that are not right-boundary edge pixels. . a topdown. After one complete scan of the image.e) = {0 otherwise . It requires two primitive functions: a function h that can determine whether the three-pixel corner of a neighborhood is ~ 0 ~ e c t and a function ed g that basically counts the number of arguments having certain values. In the 8connectivity mode. In the 4-comectivity mode.d. the four-argument primitive function h is defined by using 1 ifb=cand(d#bore#b) h(b.e) = 1 i f c # b a n d ( d = b o r e = b ) 0 otherwise.13 Connected shrink operator applied in a topdown. It is similar in certain respects to the connectivity number operator and the region-shrinking operator.2. Instead of labeling all border pixels with the background symbol g . and Stefanelli and Rosenfeld (1971). as illustrated in Fig.13. left-right scan using 4-connectivity. For example. if upon deleting it from the region it belongs to. The operator uses an 8-connected neighborhood and can be defined for deleting either 4-deletable or 8deletable pixels.6 Connected Shrink Operator The connected shrink operator is a recursive operator having a symbolic data domain. and W m u r a (1975). c.c.' Since it is apthe plied recursively.

The earliest discussion of connectivity in digital pictures can be found in Rosenfeld (1971).a3. 6.a. Toriwaki. Related algorithms and discussion of connectivity can be found in Levialdi (1972).a4.xo). This iterative algorithm does not employ the 1-deletability of the Yolcoi.a3.a4.14 Application of the connected shrink operator to an image using 4-connectivity.6. a3.x) = g {x ifexactlyoneofal. Rutovitz (1966) preceded Rosenfeld in the use of crossing numbers but did not use connectivity in his development. and Fbkumura method. .az.a2. it uses a 2 x 2 window rather than a 3 x 3 window in the shrinking process but requires the detection of an isolated element Original Image Result Figure 6.. Figure 6. we define the connected shrink operator by letting The output symbol y is defined by y = f (al.a4 1 = otherwise Using the indexing convention of Fig.14 further illustrates the connected shrink operator.2 Symbolic Neighborhood Operators 277 The five~argument primitive function f is defined by f(a1.1 1. who introduced a parallel or nonrecursive shrinking algorithm for the purpose of counting the number of components in a binary image.

a pair relationship operator marks a pixel with the specified label p if the pixel has a specified label I and neighbors enough l pixels having a specified label m. the pair relationship operator.7 Pair Relationship Operator The pair relationship operator is nomecursive and has a symbolic data domain.2. 6. Lobregt. and Groen (1980) discuss a recursive operator for three-dimensional shrinking. Al other pixels it either marks with another specified label or leaves unmodilied. An example of a pair relationship operator is one that relabels with a specified label all border pixels that are next to an interior pixel and either can relabel all other pixels with another specified label or can leave their labels alone. It works by marking all border pixels that are next to interior pixels and then deleting (or shrinking) any marked pixel that is . The pair relationship operator employs two primitive functions. the output value y is &fined by where q can be either a specified output label or the label x8. For the 8connected mode. It is a general operator that labels a pixel on the basis of whether it stands in the specified relationship with a neighborhood pixel. the output y is defined by where q can be either a specified output or the label x8. Formally. The twoargument function h can recognize whether its first argument has the value of its second argument.2.8 Thinning Operator The thinning operator discussed here is defined as a composition of three operators: the mark-interior/bor&r-pixel operator. It is &fined by 1 ifa=m h(asm)= ( 0 otherwise For the 4connected mode. Verbeck. 6. and the marked-pixel connected shrink operator. A three-dimensional extension to this nomecursive algorithm can be found in Arcelli and Levialdi (1972).278 Neighborhood Operators during the iterative process so that it may be counted before it is made to disappear by the process.

or Arcelli and Sanniti di Baja (1978). 6. which is exactly like the connected shrink operator except it shrinks only pixels that are deletable and marked. Tamura also notes that the thinning of Rosenfeld and Davis (1976) is very sensitive to contour noise when used in the 4-connected mode. For other similar operators that thin without changing geometry or topology. see Rosenfeld and Davis (1976). A brief comparison of thinning techniques can be found in Tarnura (1978). It requires a binary image whose border pixels are labeled 0 and whose interior pixels are labeled i. and Rosenfeld and Davis (1976). 6. What remains is their center lines. deletable. which produces an image whose pixels are marked if on the original image they were border pixels and were next to interior pixels. The result of successively applying the thinning operator on a symbolic image is that all regions are symmetrically shrunk down until no interior pixels are left. The distance between two pixels can be defined by the length of the shortest 4-connected path (city-block distance) or 8-connected path (max or chessboard distance) between them.15 Result of one application of the thinning operator using the 4connectivity deletabiiity condition. The marked-pixel image and the original symbolic image constitute the input to the marked-pixel connected shrink operator. the mark-interiorborder-pixel operator examines the original symbolic image to produce an interiorborder image. The purpose of the distance transformation operator is to produce a numeric image whose pixels are labeled with the distance between each of them and their closest border pixel. To implement the operator as the composition of three operators. The first discussions of thinning appeared in Hilditch (1969) and Deutsch (1969).2 Symbolic Neighborhood Operators 279 ****** ******* * **** * *** * * (a) input image ****** **** ** * * ** * ** * * (b) thinned output image Figure 6. This operator has the nice property that the center line is connected in exactly the same geometric and topologic way the original figure is connected. Rosenfeld ( 1975). Stefenelli and Rosenfeld (197I).9 Distance Transformation Operator The distance transformation operator can be implemented as either a recursive or a nonrecursive operator.15. Deutsch (1972).2. . The interiorborder image is examined by the pair relationship operator.6. who suggests that a smooth 8-connected thinning results if &deletable pixels are removed from thinning 4connected curves. Stefanelli and Rosenfeld (1971). These initial insights were later expanded by Fraser (1970). as shown in Fig.

d x . .. x In the 4connected mode. . d = {min{a. b =a. the output symbol y is defined by For the second operator.16 illustrates the operation of the recursive distance operator (Rosenfeld and Pfaltz. . .. the pair relationship operator labels with an n all pixels whose label is i and that are next to a pixel whose label is n . = i and there exists n such that a. the output symbol y is defined by y = h(xo. = &{al. a . . . . an application of the pair relationship operator changes no pixel value. . aN)+d. # i where i is the special interior pixel label. . . bottom-top scan. In the 8connected mode.aN) + d otherwise ) .x8 are pixel values coming from spatial positions defined in Fig. It is possible to compute a distance transformation that gives distances closely approximating Euclidean distance. x ~ . . ~ ~ ~ x ~ . the output symbol y of the second operator is defined by In the 4connected mode. top-bottom scan and the second operator in a right-left. = i .. + I ) h(ao. and the resulting image is the distance transformation image. In the first application the pair relationship labels with a 1 all pixels whose label is i and that are next to a pixel whose label is 0. For the first left-right. . 1968).x8). . When no pixel has the label i. topdown pass.280 Neighborhood Operators As a nonrecursive operator. where xo. In the 8connected mode.. In the 4connected mode. .1) J .. ~ ( x ~ .aN)= if a.. Another way of implementing this operator nonrecursively is by the iterative application of the primitive function defined by ifa. 6. the output symbol y is defined by where g(al.x . the output symbol y of the first operator is defined by Y = ~ ( x ~ . if a. .. Another way (Rosenfeld and Pfaltz.. n = O . x2. See Rosenfeld and Pfaltz (1966).... Both operators are based on similar primitive functions.All other pixels keep their labels. In the 8connected mode.a ~d).x3.. N i min {b 1 forsomen < N . 1966) of implementing the distance transformation involves the application of two recursive operators. . the output y is defined by y = h (x. For the first operator the primitive function h is defined by 0 if a N = O h(al. ~ I ) . .xo. . the distance transformation can be achieved by successive application of the pair relationship operator. x3. . the output y is given by Y = hn{h(xz. x .~ )o). # i . # i a. . This implementation is related to the one given by Rosenfeld and Pfaltz (1968).1. . . the primitive function is simply the minimum function. a ~ .7. In the nth application.x4). . the first operator being applied in a left-right. Figure 6. x O .

35 13. Vossepoel (1988) suggests d l = . For an integer approximation of that d l = 1 and d2 = scaled Euclidean distance.3459. 5+ .2 Symbolic Neighborhood Operators 281 Original Pass 1 0 0 000 0 0 00 I l l 10 001212 0 0 1 1 001 Pass 2 3330 23eo Figure 6.9445 and d2 = 1. using konnectivity. Barrow et al. whereas Beckers and Smeulders (1989) argue that for isotropic distance transformations d l = . or diagonal. bottom-up pass.351.16 Operation of the recursive Rosenfeld and Pfaltz distance operator. Borgefors (1984) minimizes the maximum absolute value of the error between the corrkt Euclidean distance and the computed distance and determines 1.8. horizontal.This gives the correct Euclidean distance for all shortest paths that are vertical. (1977) and Nitzian and Agin (1979) use a scaled distance and put d l = 2 and d2 = 3. the output y is given by Montanari (1968) puts dl = 1 and d2 = f i . Borgefors recommends d l = 3 and d2 = 4. Borgefors (1986) extends these ideas to using neighboring pixels whose Euclidean distance may be larger than fi from the central pixel. For the second right-left.9413 and d2 = 1.

Pixels having a nonbackground label on the original image get labeled with the distances associated with their spatial position on the distance transformation image.11 Number of Shortest Paths Rosenfeld and Pfaltz (1968) describe the following neighborhood operator.9604 and d2 = 1. Then we give each pixel in a labeled region the minimum distance a pixel in the region has received.2. There the radius of fusion for a pixel is the smallest integer n such that after region-growing n iterations and region-shrinking n 1 iterations. for a better approximation. Pixels that are not labeled retain the background label. The radius of fusion as defined here differs from the radius as defined by Rosenfeld and Pfaltz (1968). The difficulty with this definition for radius of fusion is that by using 4-neighborhoods. Verwer recommends d l = 5 and d2 = 7 or. To determine the radius of fusion for every binary-l pixel. for example. The radius of fusion is therefore not defined. Then we mask this distance transformation image with the original image. In this way every binary-1 pixel can be labeled with a unique label corresponding to the connected region to which it belongs.2. which must be applied iteratively to count for each 0-pixel the number of shortest paths to the set that is represented as the binary-1 pixel on a binary image. For an integer approximation of scaled Euclidean distance.3583.$ 282 Neighborhood Operators Verwer (1988) minimizes the maximum relative error and determines that d l = . 6. a small error occurring only in a few situations. -: L + 6. we perform a region growing to label every binary4 pixel with the label of its nearest labeled pixel. it is possible for a pair or triangle of pixels never to fuse. Danielsson (1980) propagates the pair of vertical and horizontal distances and is able thereby to compute a distance transformation that is almost exactly Euclidean. then the given connected region wilf fuse with some other connected region. The radius of fusion for any connected component of 1 is the radius p of a disk satisfying the condition that if the binary image I is morphologically closed with a disk of radius p. The binary4 labeled pixels are given the background label. we begin by computing the comected components image for the given binary image I.1 0 Radius of Fusion Let I be a binary image. Defining it by some large number in such cases is artificial. One iteration of a shrink operation on this image can label with a 0 all border pixels (pixels that have a neighbor with a different label). We use this image as the input to the distance transformation operator that labels nonzero every pixel with its distance to the nearest border. Let the given . d l = 12 and d2 = 17. Exactly as we do to determine the nearest neighbor sets. the pixel retains a nonbackground label.

where N can be the 4-neighborhood or the 8-neighborhood. If this happens after M iterations. Extremumklrted Neighborhood Openton Finding pixels or regions that are relative minima and relative maxima can play an important role in the extraction of interesting points or region primitives for texture analysis or matching purposes. 6.3 Extremum-Related Neighborhood Operators 283 P* P ' Figun 6.c) of f l wl be the number of shortest paths from that pixel to the providing pO(r.6. Define c). eventually successive applications of the operator make no more changes. then the value il in pixel (r. c) . 1-pixels of image pO. Operators for the determination of relative extrema . binary image be denoted by pO(r. Since the application of this operator leaves nonzero pixels with their values and changes some of the 0-pixels to nonzero-labeled pixels.17. as required.17 Successive iterations of the shortest-path-counting neighborhood operator. = 0 An example is given in Fig.

yet the flat of 3s it belongs to is a transition region. for the identification of relative minima and maxima.18 illustrates how a pixel can be its neighborhood maximum yet not be part of any relative maximum.3. the central 3 is a maximum. interior to a connected set of equal-valued pixels. but it is certainly a nonmaximum pixel. For the 4-connected case. and for the identification of the region of pixels reachable by ascending or descending pattern from relative extrema regions. 6.1 Non-Minima-Maxima Operator The non-minima-maxima operator is a nonrecursive operator that takes a numeric input image and produces a symbolic output image in which each pixel is labeled with an index 0. let a. or part of a transition region (a region having some neighboring pixels greater than and others less than its own value). . The non-minima-maxima operator is based on the primitive function min and max. A pixel whose value is the minimum of its neighborhood and that has one neighboring pixel with a value greater than its own may be a minimum or transition pixel. or 3 indicating whether the pixel is nonmaximum. Figure 6.284 Neighborhood Operators and their reachability regions are more complex than simple 3 x 3 operators precisely because a relative extrema can be a region of pixels larger than 3 x 3. = bo = xo and define The output index I is defined by 0 (flat) 1 (nonmaximum) 2 (nonminimurn) 3 (transition) if a4 = xo = br if a4 = xo < b4 if a4 < xo = b4 if a4 < xo < b4 Figun 8. In its &neighborhood. 1.18 Example of how a pixel can be its neighborhood maximum yet not be a part of any relative maximum. In this section we discuss operators for the identification of nonminima and nonmaxima. 2. nonminimum.

They require an input image that needs to be accessed but is not changed and an output image that is successively modified. left-right scan and then a bottom-up. right-left scan until no further changes are made. (b) and (d). . Each pixel on the output image contains the value of the highest extrema that can reach it by a monotonic path. Figure 6. let a. (a) (b) left-right top bottom scans (c) (dl right-left bottom top scans Figure 6. Initially the output image is a copy of the input image. Pixels on the output image that have the same value as those on the input image are the relative extrema pixels.3. output images.8. The relative minima operator works in an analogous fashion. = bo = xo and define The output index I is defined by 0 (flat) if a8 = xo = b8 1 ( n o d u m ) if a8 = xo < b8 2 (nonminimum) if a8 < xo = bs 3 (transition) if a8 < xo < b8 6. The operator must be successively applied in a top-down. The maxima of these gathered values are propagated to the given output pixel.19 Pixel designations for the recursive operators that require forward and reverse scans and a numeric image and that recursively produce an output image.2 Relative Extrema Operator Relative extrema operators consist of the relative maximum operator and the relative minimum operator. They are recursive operators and have a m e r i c data domain.3 ExtnmumRdatd Neighborhood Operator8 285 For the 8-connected case. input numeric images. The relative maxima operator works as follows: Values are gathered from those pixels on the output image that corresponds to pixels on the input image that neighbor the given pixel and have input values greater than or equal to the input values of the given pixel.19 illustrates the pixel designations for the normal and reverse scans. (a) and (c).

c. ~ . As shown in Table 6. the relative maxima operator lets a o = l o and a . labeled L. Such an operator is a recursive operator and is based on the fact that by appropriate label propagation. .286 Nolghborhood Operators The relative maxima operator uses two primitive functions h and max. n=land2 The output value is defined by I = a?.. h with the max function replaced by the min function and a l inequalities changed. = h ( ~ ~ . In the 8connected mode. or L.). and L. or maxima regions and that the pixels originally labeled as nonminima or nonmaxima can be relabeled as transition regions or true relative minima or maxima. In this case we propagate the nonzero label into the zero label.2.e) i f c 2 b h(b. we have three cases to examine when Lx #Ly a n d x = y : 1. The relative minima operator is defined much l the relative maxima operator. Hence max (d. the relative minima operator lets a. Either L. h is defined by In the Sconnected mode. n = 1 .l. l . = lo and The output value 1 is defined by 1 = a4.3). for the relative minima operator. Figure 6.~. Recursive propagation of the labels from one pixel to its neighbor on the output image is performed only if the two labels are not the same and the corresponding two gray levels on the original numeric image are equal. is a flat (0) and the other one is not (1.. on the output image. The four-argument function h selects the maximum of its last two arguments if its second argument is greater than or equal to its first argument.x. minima. a n d 4 The output value 1 is defined by 1 = a4. Otherwise it selects the third argument. . .=h(xo.20 illustrates the application bf the relative maxima operator.x. In the 4-connected mode.e) = {d ifc<b The primitive function max selects the maximum of its arguments. An alternative kind of relative extrema operator can be defined by using the symbolic image created by the non-minima-maxima operator in combination with the original numeric image.a. a ..=h(xo. thereby eliminating pixels marked as flats.l. In the 4-connected mode. the operator is a 0 = l o and a.1.d. l Hence.an. 3 . the operator lets a. 2 . Let any two neighboring pixels be x and y.l. respectively. all flat regions can be relabeled as transition. The initial output image is taken to be the image created by the minima-maxima operator. ) . =loand a. n=land2 The output value 1 is defined by 1 = a2.). ~ .

6. : 287 6.20 Application of the rcIativc maxima operator in 4-connected mode to an impse.3 Extmmuin-Related Neighborhood Operators 2 3 5 4 2 ' . .

20. Final Result Figure 6. . Continued.288 Neighborhood Operators Bottom-Up Pass Result of Bonom-Up 2 9 6 4 Then it iterates down and up again until there are no more changes.

a labels b labels Nonmaxima Nonminima Flat Transition Nonmaxima . Either L. This propagation rule requires one four-argument primitive function h . The table gives the propagation label C for any pair a . is a minimum (1) and L. For the operator . is a maximum (2) or vice versa. is a transition (3).6. defined by 3 (transition) if x = y and (a = 3) or (b = 3) (a = 1 and b = 0) or (a = 0 and b = 1) i f x = y and (a = 2 and b = 2 ) if x = y and (a = 1 and b = 2) or (a = 2 and b = 1) or (a = 1 and b = 1) 2 (flat) 1 (nonminimum) 0 (nonmaximum) if x = y and (a = 0 and b = 0) or (a = 0 and b = 2) or (a=2andb=O) Values of pixels in the original numeric input image are denoted by x. In this case. Either L.1 Propagation table for the recursive relative extrema operator. Values of pixels in the non-minima-maxima-labeled image are denoted by 1. 3. thereby propagating the transition label. Nonmaxima Transition Nonminima Transition Flat Transition Nonmaxima Transition Nonminima Nonminima Transition Transition Transition Nonmaxima Nonminima Flat Transition Transition Transition 2. In this case. since a region that is constant in value and has one pixel marked as transition must be a transition region.. or L. we mark both pixels as transitions (3). b of neighboring pixels.. we mark the nontransition-region pixel as a transition region.3 Extremum-Related Neighborhood Operators 289 Table 6. since a region that is constant in value cannot simultaneously be a minimum and a maximum.

for common region. bottom-top scan. b#g.l. Figure 6. the descending reachability operator employs the four-argument primitive function h. = lo and define The output I is ad.. The unique labeling of extrema can be obtained by the connected component operator operating on the relative extrema image. lk the connected component operator. and its last two arguments are pixel values from the input image. 6.11 for the operator using 4connectedness.. The reachability operator.2. = lo and define The output I is a. then the pixel is labeled c. and 4 The output I is ad. The operators are recursive and require a numeric input and a symbolic image used for both input and output.3. Employing the pixel designations in Fig. In case of conflicts the label c is propagated.~. = lo and define a. To propagate the label.).-l. must be iteratively and alternately applied ie in a topdown. top-bottom scan is alternated with the right-left. let a. let a.3 Reachability Operator Reachability operators consist of the descending reachability operator and the ascending reachability operator.290 Neighborhood Operators using 4-connectedness and the standard 3 x 3 neighborhood designations. The operator works by successively propagating labels from all neighboring pixels that can reach the given pixel by monotonic paths. n = 1. + 1 . a # b .y) = c a ifa#g.. Initially the symbolic image has all relative extrema pixels marked with unique labels (relative maxima for the descending reachability case and relative minima for the ascending reachability case). Its first two arguments are labels from the output image.3. = h(~o. andx<y ifx>y . let a.x. Propagation can also be achieved by using the forward and reverse scan technique. right-left scan until no further change is produced.. 6. The resulting output image has each pixel labeled with the unique label of the relative extrema region that can reach it by a monotonic path if it can only be reached by one extremum. It is defined by a if(b=gora=b)andx<y b ifa=gandx<y h(a.If more than one extremum can reach it by a monotonic path.a. left-right scan followed by a bottom-up. In this case the left-right. Pixels that are not relative extrema must be labeled with the background symbol g. For the operator using 8connectedness.b.19 illustrates the pixel designations for the reachability operator.

xo. =loand defines a. where f designatesthereflectionof f . Rosenfeld and Kak (1982). for example. We begin by restating the definition for convolution. distributer over sums.21 illustrates a convolution in one dimension. the primitive function h is defined by b ifa=gandx>y h(a. W We now proceed to establish each of these properties.y) = c if a # g . . ~ o .b. The 4connected mode sets a. f ( r .). = lo anddefininga. a # b. and Hall (1979). and the frequency domain. associative.-l..and4. ~ l .4 Linear Shift-lmriant Neighborhood Opentors 291 The operator uses the primitive function h in the &connected mode by letting a. ) .l. 3. except that the inequalities are changed. Cf*w)*v = f *(w * u ) .-c). 4. andx > y Linear Shiklnvariant Neighborhood Operators The linear shift-invariant neighborhood operator is the fundamental object of study in image and signal processing. c ) = f(-r. Cf+g)*w=f *w+g*w. and is homogeneous. see Pratt (1978). ~ .x. *v = u *W.8. Gonzalez and Wintz (1987). 1 n= and 2. In this section we limit our discussion to the spatial domain. Theoutputlabell isdefined by1 =a4. =h(a.x. = h ( a . Much has been written about the relationship between it.2. Hence. 2. Convolution has a number of mathematical properties. among which are: 1. Jain (1989). 5. n = 1. l n . (W*v)-=**U1.3. The output label I is defined by 1 = a2. - 6. b # g . (af) * w = aCf * w) for any constant a. We also prove that the reflection of a correlation is the convolution of the reflection. for the ascending reachability operator.4. the discrete Fourier transform. Then we prove that the convolution operator is commutative.1 Convolution and Correlation The mnvolution of an image f with kernel w is &fined by Figure 6. The ascending readability operator is defined just as the descending reachability operator is.

and the function f to be convolved is &lined as domain {4.8.7. .5.6.6. The kernel k is defined as domain {-1.5.0. Notice that as f slips through the window of length 3.9).292 Neighborhood Operators .1). Figure 6. The cmlution f *k is &fined on domain {3.7. it meets k reflected to form the sum of products.21 Example of how a onedimensional convolution is peformed.8).4.

1: Let domain ( w ) = W and domain ( v ) = V . b ) ( some ( c .a. d ) E Wand ( e . Then where W @ V = { ( a .B = c .6.4 Linear Shift-lmariant Neighborhood Operators 293 Figure 6. f )E V . Proof: Let ( r . a = c + e . c ) E W @ V. Then Make a change of variables. Proposition 6. Continued.21.b . let a = r . Then . b = for d +f ).

b)Ev $W $ V . b ) E V and ( a . Proposition 6. let a = a iandO=b+j. domain ( w ) = W . c )E F [(f * w ) * v ] ( r . Hence E the summations may be rearranged.Then 1. c -b)v(a. f . b ) E V .3: Suppose F = domain Cf) = domain ( g )= G.B .@) W @ V . [ ( f + g ) * w ] ( r .a . @ . @ ) W $ V implies that for some ( a . (f * w ) ( r . c ) .b ) E W . c ) + ( g * w > ( r . B y definition. c ) = (cr. c ) . 2 .2: Let domain (f) = F .294 Nrlghborhood Operators Proposition 6.Then + But ( a . c ) E F $ W Q V ) Proof: Let ( r .c ) = a(f * w)(r. c ) = C f * w ) ( r . c= [ f * ( w * v ) ] ( r . [ ( a ) * w]( r .( r . Then [ ( f * w ) * v ] ( r .a . And E ( a .c ) for any constant a . and domain ( v ) = V .b ) E W imply (a. ( a -a.b) Making a change of variables in the second summation.

Letting the reflection of a set A be denoted by A . we obtain A = {a 1 -a E A ) . Proposition 6.4 Linear Shift-Invariant Neighborhood Operators 295 The last property (w * v r = w *v' can be also demonstrated by direct calculation. By definition.6.4: (w * v )"= w Pmt: * 6. Then we have .

However. Consider ( J @ w ) @ u= C f @ ~ ) * f i =cf*it)*v' = f * ( a *fi) = f gI(lV*fi)" = f @ ( w*v) = f @(w@C) .c-bl)€F f ( r .b)EW (r+. It is not.bl)w(-a'.b') The cross-correlation off with w is defined by There is a direct relationship between convolution and cross-correlation: The demonstration is direct. f ( r .a'.b) i (a. we can determine whether crosscorrelation is associative..c+b)EF Making a change of variables.o'.b')3(a1. Then = lal.bl)C(a'. we let a' = -a and b' = -b.c+b)w(a.c .c -bl) €)L (-8'.al. V@w)(r.b l ) € W (r-al. c = (J * w)(r. b') = -bl)€W (r-a1. we let a' = -a and b' = -b. By definition. it has a close analog.c) Using the associative property for correlation and the relation f @w = f *w between convolution and cross-correlation. . Then (f g~w)(r. -b3 .b'~~V a ( r .296 ~elghborhood Operators Making a change of variables.c) J = (-a1.c .c)= x 4 f(r+a.

If the domain of w is a (2M+ 1) x (2N + 1) neighborhood. considerable savings can be obtained in the computation of the resultant image by performing the computation in a separable way. In this case we can structure the computation by computing first along the columns for each row of the image and then along the rows Figun 6. When f has the special form of a discrete impulse. the convolution off with w gives The result is just the kernel of the convolution. 6. When w has a product decomposition.2 Separability When the weight matrix w of a linear shift-invariant operator has a product decomposition. then a straightforward computation would require (2M + 1) x (2N + 1) multiplications and-additions for each output value generated. which suggests naming w as the impulse response function or the point spreadfunction.4. it may be written as as illustrated in Fig. Consider an image f defined on a domain F whose center is the origin (0. the comrnutivity of convolution implies that the cross-correlation of w with v is the reflection of the crosscorrelation of v with w.22.4 Linear Shift-invariant Neighborhood Operators 297 Finally. .O). 6.22 A separable mask w that can be decomposed into the two masks u and u as shown.6.

c + cl)w(rl.298 Neighborhood Operators for each column of the image resulting from the first computation: g(r. When put in vector representation.c ) can be written as a computation along the rows: M + g(r.r+c1)€F f ( r + r'. Letting we clearly see that the computation of each h(r. Then g(r. c') The bracketed term is a computation along the columns of row r + r'. weight matrices that are separable can be written as a Kronecker product of two vectors.ct)€w (r+r'. ) c Then the convolution off with k has an implementation as the sum of M separable convolutions since convolution is linear: = Cv * km)(r. C ) = (r'. More detailed discussion can be found in Chaper 7 on determining optimal weights from isotropic noise covariance matrices.c ) requires 2N 1 multiplications and additions. c ) = rl=-M h(r + r'.c) requires 2M + 1 multiplications and additions. Hence the total computation requires (2M + 1) (2N + 1 ) multiplications and additions. Next suppose that a kernel k has a representation as the sum of M separable kernels: + M k ( r .c ) m= l K - Such a separable represenbtion can be determined for any kernel defined on a rectangular neighborhood of support by the use of the singular-value decompositions . c)u(rl) The computation of each g(r.c ) = m=l km(r. one corresponding to the row function and one to the column function.

Show that 4 is a shift-imarbt ] operator. . Examining the case I < J (the case I 1 J can be examined in an exactly symmetric manner). UJ . subroutines for which are available in most libraries of linear routines. 1971). . we let the I columns of the matrix U be u.. c') E W .. The singular-value decomposition of an I x J matrix K is given by K = UAV where U is an I x I orthonormal matrix. Programming neighborhood operators to execute on images efficiently is important. . . Many neighborhood operators can be expressed in the iterative form. and we assume that they are arranged in order of decreasing value.2. we can express any kernel k defined on a I x J neighborhood as In practice. each column ui being an I x 1 vector and we let the rows of the matrix V be u . Exercises Suppose g(r. + .c') : (r'. c +c') : (rl. This provides for a relatively efficient implementation. 6.1. Show that is a shift-invariant operator. and V is a J x J orthonormal matrix.c') E W].each row u being a 1 x J vector Then we can rewrite the matrix multiplication UAV as Letting the rth row of ui be ui(r) and the cth column of v be v j(c). C)= 9lf(r . in determining a separable implementation for a given kernel k.u. The diagonal entries of A are nonnegative.c . and only the first few highest entries need be retained..Exercises 299 (Rao and Mitra. 6. c) = +lf(r +r'.. . the diagonal entries of A often decrease rapidly.r'. where 6. A is an I x J matrix all of whose nonzero entries are on the diagonal.3.. Suppose g(r. .

.

.

.

We discuss neighborhood operators for the conditioning operations of noise cleaning and sharpening and for the labeling operations of edge and line detection. This chapter is organized by application. unpattemed variations affecting all measurements. Noise Cleaning Noise cleaning is very often one of the first operators applied to an image in a computer vision task. Conditioning can also perform background normalization by suppressing uninteresting systematic or p a t t e d variations. Conditioning is based on a model that suggests the observed image is composed of an informative pattern modified by uninteresting variations that typically add to or multiply the informative pattern. Conditioning estimates the informative pattern on the basis of the observed image. each spatial event being a set of connected pixels.or example.CHAPTER C O N D I T I O N I N G AND LABELING Introduction Conditioning and labeling are among the most common uses of neighborhood operators. Noise cleaning uses neighborhood spatial coherence and neigh- . and identificationof pixels that participate in various shape primitives. if the interesting spatial events of the informative pattern are only those of high-valued and low-valued pixels. then the thresholding operation can be considered a labeling operation. comer finding. labeling is based on a model that suggests the informative pattern has structure as a spatial arrangement of events. E. Other kinds of labeling operations include edge detection. Conditioning is typically applied uniformly and is usually context independent. Thus conditioning suppresses noise. which can be thought of as random. Labeling determines in what kinds of spatial events each pixel participates.

.2 and 6. c) (2M 1)(2N 1) -a cost of one division per pixel. The partial row sums h(r-M. Mastin (1985) reviews a number of approaches. independent of neighborhood size. Typical masks for the 3 x 3 and 5 x 5 neighborhoods are shown in Figs. and one division (McDonnell. any sized neighborhood box filter can be implemented by using just a constant five operations per pixel. + + .c) = g(r..M . The box filter operator with a (2M + 1) x (2N + 1) neighborhood-smoothing image f is defined by M N The recursive calculation then proceeds in the following way. c) defined by M J4 1 ' for row r + 1 is recursively computed by Again this takes two more operations per pixel. h ( r + l + M . Then the partial row sum function h(r 1. the box filter output is calculated by 1 k(r. Then the partial column sum function g(r + 1. c ) .. Noise-cleaning techniques detect lack of coherence and either replace the incoherent pixel value by something more spatially coherent by using some or all of the pixels in a neighborhood containing the given pixel or they average or smooth the pixel value with others in an appropriate neighborhood. Suppose row r has just been processed and we are beginning to process row r + 1.. The operator that computes the equally weighted average is called the boxjlter opemtor. h ( r + l . 6. Not only is the operator separable.c). but by implementing it as a recursive operator. c ) for the previous 2M + 1 rows as well as for the current row must be kept available. 1981).304 Conditioning and Labeling borhood pixel value homogeneity as its basis. The operator is important because of the ease with which it can be made to execute quickly. defined by + is computed for each c by the recursive relation This requires two operations per pixel. The operations are two additions.3 of the preceding chapter. Finally. two subtractions. c).

In this case the pixel values in each image neighborhood can be considered to constitute a vector of N = M2 components. Obviously this idealization has stretched reality. if = w.1 A Statistical Framework for Noise Removal Suppose that we wish to apply an M x M shift-invariant linear neighborhood operator to an image in order to reduce noise. In this order this . The pixels of the neighborhood are ordered in a left-to-right. The proper determination of neighborhood size and mask values depends on the correct balance between the better job of noise removal that large neighborhood operators can provide and the loss of detail that this entails. where k= 1 (r. The application of linear noisecleaning filters has the effect of defocusing the image. That is. every pixel value in the image would be the same.(r)w. however. the pixel values in each image neighborhood would be the same constant.)*w2. top-to-bottom order.c) E W.1. 7. All other things being equal. We can get around this trivialization. Suppose we adopt the idealization that if there were no noise.2. The twodimensional filter can be written as a product of two one-dimensional Gaussians. under the ideal situation. We discuss this relationship from a statistical point of view. Narrow lines become attenuated. every pixel in each neighborhood takes the same value. Here the weight m t i w is arx given by for all (r. the greater the defocusing action. first convolving a one-dimensional Gaussian in the row direction and then convolving a one-dimensional Gaussian in the column direction. Hence the constant value associated with any neighborhood is allowed to be different from the underlying constant value in any other neighborhood.c)EW 9 C -4 ( ) The neighborhood W must be big enough to allow the pixels on the periphery of the neighborhood to be a distance of two or three a from the center. Edges become blurred. including any neighborhood with which it overlaps.(c) thenI*w =(I*w. by being inconsistent aad treating each neighborhood independently. since the neighborhoods overlap. then. because if. Let us fix our attention on one neighborhood having N pixels. which makes it possible to implement the Gaussian filters efficiently and separably. Determination of the correct balance depends on having a model that relates the observed neighborhood pixel values to the true underlying neighborhood pixel values and the effect the noise has on the latter. the larger the neighborhood of the filter.2 Noise Cleaning 305 A common linear smoother is the Gaussian filter.

1p)(x . We desire the estimator @ to be good in the sense both of being unbiased and of having minimum variance. For ji to be unbiased. Recognizing E[(x . an N-dimensional vector c such that ji = c'x is the estimated value of p computed from the observed neighborhood pixel values in the components of the vector x .1 ) Taking partial derivatives off with respect to each component of c and using the fact that C is symmetric. we obtain To find the N-dimensional vector c that minimizes V @ ] = c' C c subject to the constraint c'l = 1. We seek. we can use the Lagrange multiplier technique.lp)'] as the covariance matrix C for x . Now by the algebra of expectations E@] = E[clx]= clE[x]= pc'l Since we desire EG] = p . Let f ( c ) = c'Cc + X(ctl .306 Conditioning and Labeling value of the pixels becomes the components of an N-dimensional vector x . therefore. we find that the vector of partial derivatives o f f with respect to the components of c is given by We set this equal to 0 and solve for c in terms of the Lagrange multiplier X. if there were no noise. Next we determine an expression for the variance of @. the N-dimensional vector c must satisfy c'l = 1. The observed vector x can then be considered a random vector whose expected value is pl : E[x] = p1. Hence . The neighborhood operator we wish to employ is a linear shift-invariant operator that uses all the pixels in the central N = M x M neighborhood to estimate the value of p it will assign to the output pixel whose position is at the neighborhood's center. Suppose that p is the underlying unknown constant value that. E@] = p. would be the value each pixel would take. Let 1 designate the N-dimensional vector all of whose components are 1s.

xg) and their spatial position in a 3 x 3 neighborhood.l(a) and (b) illustrates these conventions. (c) Unnormalized minimum variance weight mark. . To account for the reality that the greater the distance between a given pixel and the center pixel of the neighborhood. so that -2 A=llC-'1 Therefore The variance of the estimator can now be easily calculated: EXAMPLE 7. . Hence c'l = (+) 1'C-'1 = 1. a < 8. the greater its deviation from the expected value. east. we must also have C-I = (Ct)-' = (C-I)'. (b) Variances associatd with each spatial position.I +q-'. (a) (b) (c) Figure 7. in a typical circumstance. we take the variance of the north.7. times the variance of the center pixel. . and west nearest neighbors of the center pixel to be a. c'l = ($1 ( z . 3 a . times the variance of the center pixel.' ~ 1~ ) = (+) ll(C-')'I Since C = C'. south. The normalization constant divides each entry by 1 + & . we may solve for A.1 (a) Correspondences between the components of the vector xt = ( X I . a 2 1. Figure 7. .1 Consider the case when the neighborhood is 3 x 3 and the random noise is uncorrelated..x2. and the variance of the diagonal neighbors to be 8.2 Noise Cleaning 307 Substituting this value of c into the constraint c'l = 1.

2 shows particular 3 x 3 linear operators and their associated values of a and 8. The diagonal covariance matrix C is then given by Since the optimal c is determined by there results . and that all pixel values have the same variance. 7 l c .2 We again consider a 3 x 3 neighborhood but assume that the noise is correlated in such a way that adjoint pixels have correlation p and with a distance n apart have correlations pn.() This is shown in Fig. (b) a = B =2. and(c) u = 2 .2 Three common 3 x 3 masks for noise reduction. . B =4. EXAMPLE 7. R r (a) a = B = 1. n > 1. Figure 7.308 Conditioning and Labeling Figure 7.

.pZ)Z -p 1+p2 -p p2 -p(l 1 f-P 0 -P -P 2 0 0 1 0 -p 1 +pi -p(l 2 0 0 0 0 + p2) P p2 -P + p2) 0 0 0 0 0 -P . 0 2 0 from which it follows that Figure 7.2 Noise Cleaning 309 In this case the covariance m t i has the following fom.7. arx (1 P p2 P P d P P p2 d P p2 P P ' p2 P p2 p P ' P ' P ' p2 P ' P ' p2 1 P p2 p 1 P ' p2 P P ' P ' P ' P p2 P ' p2 1 p p2 P p2 P ' P ' p2 P p2 P C=02 p2 P ' p2 P ' \ P 1 P p2 P p p2 P p2 P ' 1 P ' P ' p2 P 1 P P ' p2 P ' p3 P ' 1 P d 1 The inverse covariance m t i is then given by arx 1 c-'= 02(1 .3 shows this result as well as some particular 3 x 3 linear operators with their associated values of p .

7. (e) mask for p = 1. (c) mask for p = -112.m. Figure 7. the secondorder statistics are stationary when : for all a satisfying 1 5 i a I M .3 (a) General mask for an equal variance correlated case.2 Determining Optimal Weight from Isotropic Covariance Matrices Let p denote the expected value of the neighborhood mean. In this case the covariance can be written more simply as a function a(i .n). and (f) mask for p = -1.4 Naming convention for pixels in an M x N neighborhood where M andN = 4 . (b) mask for = 0. Let o ( i . n) = E[(xij .p)(xmn.4 illustrates the naming conventions for pixel values in an M x N neighborhood. 1 5 rn + a 5 M . + + + Flgun 7. That is. 1 5 n b 5 N.d l The second-order statistics of an image is said to be stationary when the covariances for a pair of pixels that stand in the same translational relationship are identical regardless of the absolute location of the pair on the image. j .3 10 Conditioning and Labeling p Figure 7.2.m . (d) mask for p = 112. j . and for all b satisfying 1 5 j b 5 N . I .

7.7.6.7. consider Fig. When the second-order statistics are stationary and isotropic. 7.m' ) + ( j .n)?] when the distance involved is the Euclidean distance. When the image second-order statistics are stationary. In this case the covariance can be written as a function u [(i . 7. Each of the 136 distinct e w e s has a two-character label. No two entries are identical.2 Noise Cleaning 311 Figure 7. Table 7.2 lists all pairs of pixel locations in the 4 x 4 neighborhood standing in the same Euclidean distance relationship.5.1 lists all pairs of pixel locations that stand in the same translational relationship. The resulting isotropic covariance matrix is shown in Fig. Let y r k d d k r y . then the covariance matrix takes the form shown in Fig. Table 7. To illustrate this. which shows the upper half of a covariance matrix associated with a 4 x 4 neighborhood. all pairs of pixels with the same distance between them have the same covariance.5 Covariance matrix associated with a 4 x 4 neighborhood for an image having nonstationary statistics.

7.7 can be written as a partitioned .3 12 Conditioning and labeling 2 For example. . all the pixel location pairs under column c are related by three rows down and one row across.) Then the isotropic covariance matrix of Fig. i ? 4 k j i b b i j k d c b a c d c b s 4 = ( ba b c d c d .

The composite matrix C can then be expressed . we can then write C as v1s v2S v3S v4S v2S v 1 s -vzS v4S v3s v2S v 1 s where u .6 Covariance m t i associated with a 4 x 4 neighborhood for an image arx whose statistics satisfy the stationary assumption. matrix: When each submatrix S l .S2.vjr and v 4 are scalars. where Empirical experiments with image data indicate that actual covariance matrices do not often deviate too far from this form. Hence .v 2 . as a matrix Kronecker product.7.S3.2 Noise Cleaning 313 Figun 7.and S4 is some multiple of the same submatrix S . The inverse of a matrix that can be represented as a Kronecker product is the Kronecker product of the inverses.

. all the pixel location pairs under column c are related by distance 0.2 Pairs of pixel locations situated in the same distance relationship. For example.314 Conditioning and Labeling Table 7.

and u is an N x 1 vector. if A is an I x J matrix.7 Isotropic covariance form.. u is a J x 1 vector. The Kronecker product of a J x 1 vector u with an N x 1 vector v is a JN x 1 vector.7.2 Noise Cleaning 315 Flgun 7. More precisely. then ( A x B)(u x v ) = ( A u ) x (Bu) This makes the matrix C-' easy to calculate: Hence the desired weight vector . The multiplication of a Kronecker product matrix with a Kronecker product vector is also a Kronecker product. UJV . B is an M x N matrix. If then the Kronecker product u x u is the JN x 1 vector defined by uxu= () U'V u.

Let x.'* 3 16 Conditioning and Labeling has a Kroneker product form. That is. If y is reasonably close to fi. when external noise affects the analog-to-digitalconversion process. then we can infer that y is an outlier value.5) Using ly . ' 9 5 Zoutlier rrmovai = { ji y ifp-jil<e otherwise (7. Andrews and Kane (1970). Andrews and Caspari (1971).2. the value of a pixel is simply replaced by a random noise value typically having little to do with the underlying pixel's value. If y is statistically significantly different from ji. of the neighborhood operator is defined by The output value zmd. when there is a bad spot on a vidicon or CCD array. . The center pixel value is an outlier if this difference is too large. The size of the neighborhood must be larger than the size of the noise structure and smaller than the size of the details to be preserved.. which means that it can be written as a product of a row function times a column function.XN. Griswold.jiI as the test statistic poses some problems. .fi to the centerdeleted neighborhood variance d2. This can happen when there is a transmission error. Such a neighborhood is called the centerdeleted neighborhuud. We take fi to be the value that minimizes the sum of the squared differences between ji and x1. . Let ji be the estimate of the representative value of the centerdeleted neighborhood.XN denote all the pixel values in the neighborhood with the exception of the center pixel value.fil2.fil and rejecting the hypothesis of significant difference when t is small enough. or when something small and irrelevant occludes the object of interest. small being smaller than threshold 8. ji minimizes E = .ji)2 This suggests using the test statistic t = IY. Statistically significant implies a comparison of y . 7.. and Kattiyakulwanich(1975) give more details about the algebra associated with the separable implementation. Afrait (1954). If 9 is too small. .The minimizin ji is quickly found to be the mean B x. of the centerdeleted neighborhood: ji = En=. we can infer that y is not an outlier value. thereby permitting fast implementation as a separable computation. If 8 is too large. the edges will be blurred. A neighborhood operator can be employed to detect and reduce such noise. . ( x n... which is defined by d2 = & E:=l(~n .. the noise cleaning will not be good. and Haralick.3 Outlier or Peak Noise In outlier or peak noise. The output value with a contrast-dependent-threshold . Let y denote the center pixel value. The basis of the operator is to compare an estimate of the representative neighborhood value with the observed value of the neighborhood's center pixel. .

Vagnucci. w(r'. That is. the neighboring pixel values should contribute in accordance with how close they are to the observed value of the given pixel. and Li's idea is to use all neighboring pixel values.c) = mu{~. The kclosest pixel values are determined.c e d n o u l l i ranovll= c no td p n c {y ji ifl*I<0 otherwise This operator replaces a pixel value with the neighborhood mean if the pixel's value is sufficiently far from the neighborhood mean. and Li (1981) suggest a gradient inverse weighted noise-cleaning technique that they empirically show reduces the sum-of-squares error within homogeneous regions while keeping the sums of squares between homogeneous regions relatively constant. This makes the operator a shift-invariant operator. For 3 x 3 neighborhoods Davis and Rosenfeld found k = 6 to perform better than other values of k. Wan.7. Instead of making the replacement occur completely or not at all. 7.2 Noise Cleaning 317 neighborhood operator can be determined by z o r u. but not a linear shift-invariant operator. we can define a smooth replacement. The idea behind the operator is simple and is not unrelated to the k-nearest neighbor technique. At each pixel position the weights are inversely proportional to the absolute value of the difference between the given pixel's value and the neighboring pixel's value.5 Gradient Inverse Weighted Wang.2.r. Al the pixel values in the neighborhood are compared with the cenl tral pixel value. weighted according to how close they are.c1)-f(r.4 k-Nearest Neighbor Davis and Rosenfeld (1978) suggest a k-nearest neighbor averaging technique for noise cleaning. In the k-nearest neighbor technique. Vagnucci.cl. In estimating the underlying value for a pixel.2.c) otherwise . The output value z is a convex combination of the input pixel value y and the neighborhood mean.c)l) (r'cl) = (r. The neighborhood operator is a linear operator where the weights are a function of the spatial configuration of the pixel values. those k-neighboring pixels having values that are closest to the observed given pixel value are used in an equally weighted average. They also found that iterating the operator for three or four iterations produced better results than just one application.V(r1. where the weights of the convex combination depend on 1 1 : 9 7. The k-nearest neighbor average is then the equally weighted average of the k-nearest neighbors.

6 Order Statistic Neighborhood Operators The order statistic approach (Bovik. (7. and Munson.318 where Condltloning and Labeling and the noisecleaned output g is given by a convex combination of the pixel neighborhood values The gradient inverse weighted noise-cleaning operator may be iterated to provide improved performance. 7.) = ax(.2.1 1) This operator was introduced by ' h k y (1971). For K x K neighborhoods where K is odd. . We denote the sorted neighborhood values by x ( I ) .. Huang. if y. = ax. . To see this.10) Order statistic filtering commutes with any linear monotonically increasing transformation.XN. .) b. . then y(. .Conceptually+ can visualize the operator as sorting the pixel values from smallest to we largest and taking a linear combination of these sorted values. an output value z for the position of the center of the neighborhood that is defined by using 2 = c N n-1 WX(.. x ( ~Hence the order statistic operator produces . N = K x K will also be odd. . and the median is given by Z m ~ d =X '~ N 1 (3-1 (7.. Hence 1 + + ! Median Operator The most common order statistic operator used to do noise cleaning is the median. note that for any positive constant a and any constant b. . 1983) takes linear combinations of the sorted values of all the neighborhood pixel values X I . b.

Also it sometimes produces regions of constant or nearly constant values that are perceived as patches. streaks. the image is first filtered with an N x 1 median filter and then a 1 x N median filter.2 Noise Cleaning 319 Gallagher and Wise (1981) showed that repeatedly applying a median filtering operation to a onedimensional signal dust produce a result that is unchanged by further median filtering operations. -N 5 j 5 N) for an odd-sized 2N + 1 x 2 N + 1 neighborhood. of values. Coyle. Brownrigg (1984) defines a weighted-median filter. For each (i. The weighted median is then given by the median of the values in the list: (7. Such artifacts may suggest boundaries that really do not exist. j)th pixel in a neighborhood. they are distorted or lost. Rectangular blotches of constant value should be preserved. The running median does a good job of estimating the true pixel values in situations where the underlying neighborhood trend is flat or monotonic and the noise distribution has fat tails. should be removed. and Gallagher (1985) suggest that the neighborhood size and shape of a median filter be chosen so that the underlying noise-free image will be a median root for this filtering operation with the given neighborhood size and shape. Fitch.12) Zwcigw m x t i i = Y ( w) Brownrigg analyzes the 3 x 3 neighborhood case to determine a weight matrix w that satisfies the following two requirements: 1. 2.. The weights are given by a weight matrix w defined on domain W = {(i. . Thus it does well in situations where the neighborhood is homogeneous or when the central feature in the neighborhood is a step edge. Nodes and Gallagher (1983) show that it is almost always the case that repeated applications ofthe separable two-dimensional median fiiter results i a twon dimensional median root. Bovik (1987) gives an analysis of this effect. j)l . j ) E W. The weighted median first forms a list y . Narendra demonstrates that although the output of the separable median filter is not identical to the full two-dimensional median filter. the performance is sufficiently close. (i. such as double-sided exponential or Laplacian noise. straight or bent. One-pixel-wide streaks. . Median roots consist only of constant-valued neighborhoods and sloped edges. j) is put into the list w(i. x(i.N 5 i 5 N. of Narendra (1981) discusses an impleme&a~on a separable median filter. when the neighborhood contains fine detail such as thin lines. The sum L of the weights is required to be odd: L= C C w(i. Far an N x N . The fixed-point result of a median filter is called the median mot. . Corners can be clipped.. This implementation reduces the computational complexity of the median filter.separable median filter. j).7. or amorphous blotches. The running median is effective for removing impulsive noise. j ) times. This can give the image a mottled appearance.y. . However.j) i=-N j=-N N N Let x(i. j ) E W denote the value of the (i.

Heinonen. 7.13) Arce and McLaughlin (1987). and Nieminen.8. and Neuvo (1987) discuss other variations of the running-median operator.8 How the weighted-median operator works. high-valued streak is to be eliminated. 1984). Define the interquartile distance Q by where truncation is used for those cases that are not evenly divisible. Weidner. (b) and (c) two configurations for which the one-pixel-wide. Using the weight m t i (a). The running-median operator can be put into a contextdependent threshold form (Scollar. (d) configuration for which the block of 2s is to be preserved. . Trimmed-Mean Operator Another common order statistic operator that works well when the noise distribution has a fat tail is the trimmed-mean operator (Bedner and Watt. (a) Weight matrix. and Huang. (c). N-k Figure 7. and an equal weighted average of the central N . and (d) produce the same list of arx sorted values (e) whose median value is 2.320 Conditioning and Labeling '" He finds that the weights with the smallest sum that satisfy the requirements is the weight matrix whose center weight has value 3 and whose other weights have value 1. Nieminen and Neuvo (1988). The operator of the weighted median is illustrated in Fig. -: .2k order statistics is used. 1984). configuration (b). When a = kN. Here the first k and last k order statistics are not used. The output value z of the median with contrast-dependent threshold is given by zmedjan otherwise (7.the result is called the a-trimmed mean.

and Kassam (1988) show that the trimmed-mean filter can perform better than the linear box lilter as well as the median filter in white noise suppression. Let ji be the estimator for p.1g + ~ ~ .[X(II eX(WI (7.7 A Decision Theoretic Approach to Estimating Mean Let x l . . Hence minimizing the expected loss is equivalent to minimizing As is often assumed. we assume the observations are independent so that Typically the density P(xn I p) has a functional form that depends only on (p -xn)'. Midrange When the noise distribution has tails that are light and smooth. The decision theoretic framework suggests defining ji to be the estimator that minimizes the expected loss By definition of conditional probability.2 321 Lee and Kassam (1985) indicate that when the noise has a double-exponential distribution. .15) 7. 1988). . while at the same time having midrange edge-preserving characteristics comparable to those of the median. 1957. Arce and Fontana. Peterson.R Noise Cleaning Tr B g 7.2.XN be observed noisy values from a neighborhood of pixels whose mean p needs to be estimated by the neighborhood operator. p) be the loss incurred if ji is the estimate of the mean when the true value for the mean is p. In this case we seek the estimator ji that minimizes . It is defined by . the midrange estimator can be more efficient than the mean (Rider. Lee. a value for a around .4 gives the minimum variance among all a-trimmed mean estimators. Let L(ji..

322 Conditioning and Labeling If we take the prior P(P) to be uniform and the loss fyction to be the win-andlose-all loss function-infinite gain when 6 = p (a negative infinite loss) and zero loss when ji # p-then the estimator ji must maximize In this case ji is the standard maximum-likelihood estimator. In the case of a mixture of a Gaussian with a uniform U [-f . Hence P1(y)/P(y) = -& .)~. Values far from the estimate are weighted less and less.xnI2l n=l Differentiating with respect to ji gives the necessary condition which can be rewritten as N where In the Gaussian case P(Y) = &a e ~ ~ ~ ~ 2 ) where y = (p . The ji that maximizes is the same as that which maximizes N logp [ ( i . and becomes the equally weighted neighborhood mean. A two-term Taylor series expansion of . so that is proportional to For small y this is just smaller than 1.x. and for large y it approaches 0.!I.

(+)'I2 [I for ($1' <1 (0 otherwise 'Ibkcy usually takes S to be the sample median absolute deviation and the constant c to be 6 or 9. Another common one is the squared loss function L(fi. then becomes related to Tukey's fraction c to be biweight.. Assuming again a uniform prior. . Thus xi that are more than four standard deviations away from the estimate ji will get weight zero (c = 9).X(').p) = (ji .x ~ )d~ I n ~ n-1 N (7. A numerical approximation. however.p)2.X ~ ) ~ I ~ P J no1 P [(P .where x ( ~5 xo 5 . We use the neighborhood order statistics X(~). we seek to find a 6 that minimizes This corresponds to a minimum variance estimate. 3 [ .5 xtm. and assume that the integral is .2 Noim Cleaning 323 '2 around zero gives an approximation proportional to where If we take the range of the contaminating uniform to be 6a and the contaminating just short of .XN)..25. Approximate the integral and the integral .x(N). which is used in robust estimation A. ) zero outside the interval (xI.7. Carrying out the minimization yields fi = J P f i p [ ( ~. is easily obtained. There are other kinds of loss function than the win-and-lose-all kind..16) The product form makes this integration difficult to do analytically.

gets farther away from the center of the distribution.2. If .9 illustrates these definitions. UP and DOWN. It is most directly described in terms of a state machine. wn will become smaller. We then have where wl = W ( N + ~= 0. Figure 7. and the state remains the same so long as the state is UP and f (n + 1) > f ( n ) or the state is DOWN and f (n + 1) 5 f (n). 7. At position n + 1.324 Conditioning and Labeling Figure 7. we see that as x.9 Definition for height of a relative maximum on a descending slope and height of a relative minimum on an ascending slope.8 Hysteresis Smoothing Hysteresis smoothing can remove minor fluctuations while preserving the structure of all major transients. Ehrich (1978) introduced a symmetric hysteresis noisecleaning technique. ) Reflecting on the meaning of this equation. The machine has two states. the output value g(n + 1) is equal to the input value f (n + I).. A minor fluctuation is defined as a segment either having a relative maximum on a monotonically decreasing slope where the relative height of the maximum is smaller than h or having a relative minimum on a monotonically increasing slope where the relative depth of the minimum is smaller than h. The hysteresis algorithm is applied in a row-by-row fashion and then in a column-by-column fashion.

g(n 1) = f (n + I). Experiments done by Lee (1983b) indicate that the performance of the sigma filter is better than that of the gradient inverse weighted filter. 1981) . . Tables 7.f (n ko) 2 h. and the state remains DOWN. . This makes n ko the location of the next relative minimum. Ehrich suggests that the hysteresis smoothing be applied twice to the input image data: once in a left-to-right or forward order. Finally.4 summarize the next stateloutput table for the state machine.2 Noise Cleaning 325 the state is w and f (n 1) I f (n). However.1). 7. and the state changes to VP.1). For a 7 x 7 neighborhood. 5 y + 2 0 ) When the number of neighboring pixels in the two-sigma ranges is too small. and the state changes to DOWN. Newman and Dirilten. and the selected-neighborhood average filter.7. g(n + 1) = f (n + I). 1977. Tomita and Tsuji. Nagao and Matsuyama. + + + + + + + + + + + + + + + 7. indicating that the relative maximum is a minor fluctuation.f (n) < h.2. Lee suggests looking to all the values in the neighborhood of the given pixel value y and averaging y only with those values that are within a two-sigma interval of y. If XI. indicating that the relative minimum is significant. then the output value follows the input. let ko be the smallest positive integer for which f (n + ko) < f (n ko 1) and f (n ko) < f (n ko . Lee suggests that too small is less than four. indicating that the relative maximum is sigmlicant. if the state is DOTHNand if f (n + 1) > f (n) and f (n + jo) . and the state remains w. iff (n + 1) > f (n) and f (n jo) -f (n) 2 h.9 Sigma Filter The sigma filter (Lee. Haralick and Watson.XNare the neighboring values. This makes n jo the location of the next relative maximum. If the state is DOWN and f (n 1) > f (n)..3 and 7. Nowifthestateiswandif f(n+l) 5 f(n)and f(n)-f(n+ko) < h. let jo be the smallest positive integer for which f (n + jo) > f (n + jo 1) and f (n + jo) > f (n + jo. 1983b) is another w y in which a pixel can be averaged with a its neighbors that are close in value to it. then the output value follows the input. 1973. then the output value remains the same: g(n 1) = g (n).1 0 Selected-Neighborhood Averaging selected-neighborhood averaging (Graham. The output of the symmetric hysteresis smoother is the average of the two results. and then in a right-to-left or reversed order. But. the median. 1962. iff (n 1) 5 f (n) and f (n) .. 1979.2. then the estimate j j is determined by . Lee recommends taking j j as the average of all its eight immediate neighborhoods. whereM={nly-2u I x .indicating that the relative minimum is a minor fluctuation. then the output value remains the same: g(n 1) = g(n).

are not required to be square.326 Conditioning and Labeling Table 7. and some diagonal. Graham uses a three-pixel vertical or horizontal neighborhood. For example. some rectangular. Newrnan and Dirilten suggest a five-pixel strip neighborhood oriented orthogonally to the gradient direction.4 Specification of the next stateloutput for the state machine. One of the neighborhoods that contains the given pixel will have the lowest variance. Each neighborhood that contains the given pixel has a mean and variance. The noise-filtered value is computed as the mean value from the lowest variance neighborhood containing the given pixel. Input Condition takes the point of view that the pixels that are averaged should be ones that form a tight spatial configuration as well as ones that are homogeneous and related to the given pixel. there are nine 3 x 3 neighborhoods that contain a given pixel.3 Input to the state machine as a function of the input f and the previously computed output g . Some may be square. however. Neighborhoods in the collection. Current State a b c d i UP DOWN UP f(n + 1) UP f(n + 1) UP g(n) DOWN f ( n + 1) UP f(n + 1) DOWN s(n) DOWN DOWN f ( n 1) f ( n 1 ) + + . Nagao and Matsuyama use pentagonal and hexagonal neighborhoods. 4 Table 7. we must make a change in point of view from the central neighborhood around a given pixel to a collection of all neighborhoods that contain a given pixel. To understand selected-neighborhood averaging.

2 Noise Cleaning 327 The model under which selected-neighborhood averaging appears to be a good noise-smoothing technique is one that assumes each pixel is a part of a homogeneous region. and solve the resulting system of equations. for such a neighborhood would undoubtedly have a high variance. we take partial derivatives of c2 with respect to a and p. the neighborhood found to be the lowest-variance neighborhood is likely to be the neighborhood that is entirely contained in the region. Here the true image is regarded as constituting a set of random variables-one at each pixel. Its mean is assumed to be 0. Let I: denote the noise random variable at the given pixel position. Letting Z denote the observed pixel value. Let To determine the minimizing a and (3. set them to zero. The observed image is modeled as the true image perturbed by noise. we have Having observed Z .7. and in this region the pixel can be covered by one neighborhood that is entirely contained within the region. We consider first the case of additive noise. and its variance is a:. Constraining this estimate to be. The mean of the random variable Y is denoted by py. In this way the selected-neighborhood averaging operator will never average pixels across an edge boundary. and its variance u2 is assumed known. It is simply regarded as the way in which an image pixel naturally varies from its mean. Certainly it will not be a neighborhood that contains a step edge. . If this is the case.2.11 Minimum Mean Square Noise Smoothing Lee (1980) introduced a minimum mean square noise-smoothing technique suitable for additive or multiplicative noise. In our description we fix on a particular pixel and let Y denote its value. of the form we can use a least-square criterion to determine a and P. Lee says that the problem is to determine an estimate Y of Y on the basis of Z. Both py and u: are considered known and to vary over the image. 7. The noise random variable E and the image random variable Y are assumed uncorrelated. The random value that Y takes as it differs from the mean is not regarded as noise.

. . Let . Not all noise is additive. the noise has more of a Poisson character. To use this estimate. then and are the estimated neighborhood mean and variance. py and u. It is reasonable to take it to be known. a. We seek a least mean square estimate Y of Y from the observed Z. which is better modeled by multiplicative noise. p y . Under low light conditions. However. . Here the observed Z can be represented as where the expected value of [ is 1. We take these to be the true mean and variance: p '=fiz and a t = 6. If x . They may be estimated as the mean and variance of the central neighborhood around ~ the given pixel..328 Conditioning and Labeilng This immediately results in a=0 : 4 + u2 making the estimate Y a convex combination of the observed pixel value Z and the mean p Y . . x denote the values in the neighborhood. . and a2 all need to be known. and it is not reasonable to regard them as known. the noise variance u2 can be regarded as constant over the whole image. change over the image. In the case of stationary non-signaldependent noise.

Kuan et al. and shows its application to radar imagery (Lce. VSIVN).2. Lee (1980) makes a linearizing assumption. gets a result close to the one given here.= 0 = 2(a! .2 Noise Cleaning 329 Taking the partial derivative of c2 with respect to a! and 6 and setting them to zero results in ae2 . to which we added the following types of noise: Uniform Gaussian Salt and pepper Varying noise (The noise energy varies across the image). Assuming that u is a uniform random variable in the interval [O. The S/N ratio w s set at -1 d l . The signal-to-noise ratio is defined as S = 10 * log. Figure 7. the S/N ratio in each portion of the image was made a the same. The bottom left portion is corrupted with salt and pepper noise while the bottom right portion is corrupted with varying noise. where VS is the image gray level variance and VN is the ( noise variance. In order to study effectively the performance of each noise-removal technique relative to the noise type. 1981). the required values for py and a: are obtained as the neighborhood mean and variance. we find that the . (1985) give the formulation expressed here. The top left portion is corrupted with additive uniform noise. Figure 7. For our experiments we used an image of blocks.1 2 NoissRemoval Techniques.+ 0:) + 2@pr Solving this system of equations results in The estimated Y is then given by As in the additive noise case. In our implementation the salt and pepper noise generation was done i) for by specifying the minimum gray value (i-) and the maximum gray value (.10 gives the image with no noise.I)(& aa! + a:) + 2a!a2(~. 11.11 shows the image generated by appending the images obtained after addition of each of the noise types listed above.Experiments In this section we describe the performance of the noise-removal techniques outlined previously.7. noise pixels and the fraction @) of the image that is to be corrupted with noise. The top right portion is corrupted with additive Gaussian noise. 7.

.

.

.

.

. and fi = x . Schrciber (1970) suggests that reasonable values f o r k are between and the larger values providing the greater amount of sharpening.v 114 Condltlonlng and Labellng figure 7.2 slgma parameter w~th a 9 x 9 neighborhood . Sharpening The simplest neighborhood operator method for sharpening or crispening an image IS to subtract from each pixcl some fraction of the neighborhood mean and then scale the result. is the neighborhood mean. . .19 Result obta~nedh> uslng a slgma hlrcr and a 0. output value is given by ZrhaTcncd = SP. Roqenfeld ( 1969) calls this technique unsharp masking.kfil where s I S a scaling constant.xr are all the values in the neighborhood.1 8 Result wh~alnfdbh u. If x . the c:=. . t. including the value y of the center pixcl. .lne a i x ? neighborhood sunning-mean lilter. Figure 7.

.

replace it with the neighborhood median.3. 6 with the observed interquartile range Q.336 Conditioning and Labeling is unreliable.4.Zmin < Z. c + cl)l(r'. They also indicate that after one or two applications of the extremum sharpening operator. Thus we have a neighborhood operator defined by Wallis (1976) suggested a form of unsharp masking that adjusts the neighborhood brightness and contrast toward given desired values. minimum or maximum to the value of the pixel at the neighborhood's center. More formally. C) otherwise Kramer and Bruckner used the extremum-sharpening operator in a character recognition application.. let W be the neighborhood. Reasonable values for A range from 3 to 25 and for a from 0 to 0. c') E W ) : zmin if f (r. For this operator to be effective.f (r. then to bring out or sharpen the spatial detail of which it is part.. and a is a constant governing the degree to which the neighborhood mean is adjusted to the desired mean. 1975) is one that may be iteratively applied and is most appropriate for images that have large . The Wallis neighborhood operator is defined by where pd is the desired neighborhood mean.1 Extremum Sharpening The extremum-sharpening neighborhood operator (Kramer and Bruckner. The output value i. Brenner. Zmin 4 rV t $ = min{f ( r + r'. with the desired . if the value of the center pixel is close to the neighborhood median. On the other hand. and Selles (1980) report good results with the extremum-sharpening operator in biomedical image analysis applications. Define zmin z by and . 7. the neighborhood length N must be around $ of the image side length. c +cl)J(r'. = . Lester. replace its value with the median sharpened value..c') E W )and and zmu = rnax{f (r Then Zcxtrcmum sharpened + r'. interquartile range Qd . Z 1 5 . a: is the desired neighborhood variance.C) . A is a gain or contrast expansion constant governing the degree to which the neighborhood variance is adjusted toward the desired variance. and a. the application of the median neighborhood operator proved to be of considerable benefit prior to image segmentation. The Wallis operator can be put in the median mode as well by replacing ji with zdun. d is the closer of the neighborhood contrast regions.

Jumps in brightness value are associated with bright values of first derivative and are the kinds of edges that Roberts ( 1965) originally detected. Haralick and Watson (198l). Zucker and Hurnme1 (1979). which measure a quantity related to edge contrast or gradient as well as edge or gradient direction. Letting r . Morgenthaler and Rosenfeld ( 198I). the jumps in value really must refer to points of high first derivative o f f . Letting s. In this section we review other approaches to edge detection.7. Hence we use the word edge to refer to a place in the image where brightness value appears to jump. Prewitt (1970) used two 3 x 3 masks oriented in the row and column direction (Fig. he obtained that the gradient magnitude is d m . Hueckei ( 1973). One clear way to interpret jumps in value when refemng to a discrete array of values is to assume that the array comes about by sampling a real-valued function f defined on the domain of the image that is a bounded and comected subset of the real plane R2. He used two 2 x 2 masks to calculate the gradient across the edge in two diagonal directions (Fig. Haralick (1980).22).4 Edge Detection 337 What is an edge in a digital image? The first intuitive notion is that a digital edge between two pixels that appears when their brightness values are is the bounda~~ sigdicantly different.23). she obtained that the gradient magnitude g is Jp:+pf and the gradient direction 9 taken in a clockwise angle with respect to the column axis is arctan@ /p2). and Morgenthaler (1981b) all use the surface fit concept in determining edges. be the value calculated from the first mask and r2 the value calculated from the second mask. be the value calculated from the first mask and s2the value calculated from the second mask. 7.4. 7. On an image we often point to a region and say it is brighter than its surrounding area. The finite difference typically used in the numerical approxirnation of first-order derivatives is usually based on the assumption that the function f is linear. 7. 7. . and some line detectors. Here "significantly different" may depend on the distribution of brightness values around each pixel. Prewitt (1970).21). he obtained that the gradient magnitude g is . From this point of view. Such edges are referred to as step edges. Brooks (1978). the performance characterization of edge operators. We also discuss zerocrossing edge operators. We might then say that an edge exists between each pair of neighboring pixels where one pixel is inside the brighter region and the other is outside. Letting p be the value calculated from the first mask and p2the value calculated from the second mask.1 Gradient Edge Detectors One of the early edge operators was employed by Roberts (1965). Edge detection must then involve fitting a function to the sample values. Sobel (1970) used two 3 x 3 masks oriented in the row and column direction (Fig. We begin with some of the basic gradient edge operators. Roberts (1965) and Sobel (1970) explain edge detectors in terms of fitting as well.

So first we explore four important properties that an edge operator might have: (1) its accuracy in estimating gradient magnitude. The edge contour direction is 90" more than the gradient direction.25). In actual practice. each such operator has properties that differ from one another. Robinson (1977) used a compass template mask set having values of only 0. that the gray level values in a 3 x 3 neighborhood can be described by Figure 7. However. f1.24). Nevatia and Babu (1980) use a set of six 5 x 5 compass template masks (Fig. Two of the nine are appropriate for edge detection (Fig. (3) its accuracy in estimating step edge contrast. and the gradient direction 0 taken in a counter-clockwise angle with respect to the column axis is arctan(s. since the negation of each mask is also a mask. To simplify computation. they obtained that the gradient magnitude g is m direction 8 taken in a counterclockwise angle with respect to the column axis is arctandfl I 2) f K r c (1971) describes a set of eight compass template edge masks (Fig. The edge contour direction is defined as the direction along the edge whose right side has higher gray level values and whose left side has lower gray level values. 7. The Robinson and Kirsch compass operators detect lineal edges (Fig./m and the gradient direction B is 45" argmax k. ish The gradient magnitude g is . and the gradient mask. For gradient direction and magnitude we suppose. (2) its accuracy in estimating gradient direction.28). Gradient magnitude and direction are computed as for the Kirsch operator. From this discussion it might appear that one can simply design any kind of vertical and horizontal difference pattern and make an edge operator. and (4) its accuracy in estimating step edge direction. 7. Frei and Chen (1977) used a complete set of nine orthogonal masks to detect edges and lines as well as to detect nonedgelike and nonlinelike neighborhoods.22 Prewitt edge detector masks..be the value calculated from the first mask and f2the value calculated from the second f / .21 Masks used for the Roberts operators. 7.26). As before. computation need only be done by using four masks. letting f.338 Conditioning and Labeling Figure 7./s2). 7. f2 (Fig. 7.27). . without loss of generality.

the gradient magnitude is dm. a simple linear relationship I(r. and the gradient direction 8 satisfies tan 8 = a/B.29a) and that the masks used for the vertical and horizontal differences are those shown in Fig. For such a linear gray level intensity surface. . Figure 7.4 Edge Detection 339 Figure 7.29(b).25 Kirsch compass masks.c) = arr Bc y (Fig. 7.7. The row and column differences computed by this edge operator are + + Figure 7.24 Frei and Chen gradient masks. 7.23 Sobel edge detector masks.26 Robinson compass masks. Figure 7.

For example.l) = a2(1.27 Nevatia-Babu 5 x 5 compass template masks. = a. The gradient magnitude computed by the operator is then g =d m = 2 ( 2 a + b ) d m Hence for this operator to compute gradient magnitude correctly.1) = a:. and will have variance If a2(-1..340 Conditioning and Labeling Figure 7.-1) = u2(-1. since 2(2a + b) = 1 . if then where S(r. then the variance V[gr] = 4a2a3 + 2b2a. a and b must satisfy 2(2a + b ) = 1. The choice of the values for a and b under the constraint 2(2a + b ) = 1 has consequences relative to the effect the noise has on the value produced by the edge operator. The gradient direction 8 computed by the operator satisfies which is precisely the correct value.c). and a2(-1.O) = a2(1.. Since 2(2a + b ) = 1 .-1) = a2(1.c) is independent noise having mean 0 and variance a2(r. will have expected value a2(2a + b ) = a . 0) .

Gradient direction Gradient direction direction 240" I 1 Gradient direction Edge contour direction 60" I Edge contour direction 90° Gradient direction 330" Edge contour direction 120' Gradient direction 0" Edge contour direction 150" 1 I Gradient direction 30" Gradient direction 60" (a) Figure 7.28 Lineal gray level spatial patterns detected by the compass edge op . T e white boxes show pixels with low values. erator. black boxes indicate pixels with law values. White boxes indicate pixels with high values. and the gray boxes show h pixels with high values. 341 . Both edge contour and gradient directions are indicated. Compass edge operator with maximal response for edges in the indicated direction.

(b) 3 x 3 masks to compute differences in the row and column directions. selection of a particular operator in effect commits us to a particular noise model. Continual.8a + 16a2)a.~resultisa~ multiple of the Prewitt operator.and since 2(2q b) = 1.342 Conditioning and Labeling Contour direction 180" Gradient direction 90" Contour direction 225" Gradient direction 135" Contour direction 270" Gradient direction 180" Contour direction 3 15" Gradient direction 225" Contour direction 0" Gradient direction 270" Contour direction 45" Gradient direction 3 15" Contour direction 90" Gradient direction 0 O Contour direction 135" Gradient direction 45" (b) Figure 7. j a.2 = 2 4 . a$ b = The result is a multiple of the Sobel operator. a = i. 2af +4 4 + i Whenuf=~~. The value of a that minimizes the variance = Vk.29 (a) Gray level pattern for a linear gray level intensity surface. To determine the properties of an edge operator for edge direction and contrast. 4 1 gr 8r (a) (b) Fiqun 7.andsince'2(2a+b)=1. Vk. Hence we see that the choice of the Prewitt..] 4a2uf + f (1 .a=~. Sobel. . When u. or Frei and Cbrn masks for edge detection should not be by a flip of the coin but should be based on a noise model.28. In other words.] then is a=0 .thcnb=~.

.7. the analysis is a little simpler and we have In either case the gradient magnitude g = J g m now clearly depends on the gradient direction.30 that when edge direction 8 satisfies 5 tan 9 5 1. Using this edge model. We assume that a straight step edge passes directly through the center point of the l center pixel of the 3 x 3 neighborhood. Hence for pixels on or near the edge boundary. From simple trigonometry the areas V and W are given by When 0 5 tan0 5 i.30 Edge boundary passing though the center of the 3 x 3 neighborhood and the areas of each pixel on each side of the edge boundary. we find from Fig. W assume a model in e which each pixel value is the convex combination of the high and low values. 7. Figure 7. and all points on the dark side of the edge boundary have the same value L. Al points on the bright side of the edge have the same value H.4 Edge Detection 343 we must assume an appropriate edge model. where the coefficients of the convex combination are just the areas of high and low values and where the areas of pixels are unit squares. some of the pixel's area will have high values and some low values. The model we choose is the step edge.

7. So the slight dependency on edge direction causes the constant to be off by no more than 4.734". To detect an edge with a gradient edge operator. Hancock and Kittler (1990) use a dictionary-based relaxation technique. The technique can be iterated by using the output edge-labeled image as the input edge-labeled image for the next iteration.).958.31 illustrates a simple polyhedral object. Illingworth. that the edge contrast is less than 1.L) and 1. and linking. Bowker (1974) describes an early use of edge orientation information. Each edge-labeled pixel in the resulting edge-labeled image is guaranteed to have some neighboring edge-labeled pixel whose directional orientation is consistent with its own.85 l o . then the corresponding pixel on the output image is labeled an edge. 4 { 4 2 ' . If only edge pixels with orientation directions of between 0" and 22" are selected. The iterations can continue until there are no further changes. It is always consistent for an edge-labeled pixel to be adjacent to a non-edge-labeled pixel. If the gradient magnitude is smaller than a threshold. its gradient magnitude image. the direction can be used for edge organization./g. For the case of the compass edge operators. For the Prewitt operator the maximum difference between the computed edge direction and the true edge direction is 7. Davies (1986). The basic idea of the edge detection technique with enforced orientation consistency is to examine every pixel labeled an edge pixel on the input image.6% different from what it would be for the Robinson compass operator. and Paler (1983) show. the maximum difference between the computed edge direction and the true edge direction for the Sobel operator is 1. the maximum difference between the computed edge direction and the true edge direction is less than 0. and this occurs at 8 = 20. Kittler.321". The true edge contrast is H -L.6 lists an example set of consistent orientations for pairs of 8-neighboring pixels. The major problem with gradient edge operators is that they generally produce + : . using an analysis similar to ours. Table 7. If so. the pixel is labeled as having no edge.L). and this occurs at 34". one must examine the gradient magnitude output of the operator at each pixel.5025. and the binary edge image obtained by labeling a pixel an edge if the gradient magnitude is greater than 12. once detection has taken place.2%. and from computer simulations we find that when a = 0.344 i Conditioning and Labeling A straightforward computer calculation shows that when 2a b = .423". Although the magnitude is used for detection purposes. Check each one of its eight neighbors to see if it has at least one edge-labeled neighbor whose direction orientation is consistent with its own orientation. If it is high enough.42g0. the computed edge contrast will be between 0.2278 and b = 0.042(H . then the corresponding pixel on the output image is labeled a nonedge. Robinson (1977) &scribes a constraint technique to eliminate detected edge pixels having directions that are not consistent with their neighboring edge pixel directions. Figure 7. selection. In contrast.958(H . then an edge is detected passing through the pixel. and the pixel will be labeled an edge pixel. Ikonomopoulos (1982) develops a local operator procedure that also uses orientation consistency to eliminate false edges from among the detected edges. and this occurs at 8 = 20. If not. The computed edge direction 8 satisfies tan8 = (g.31(d) results. the reduced edge image of Fig. and Kitchen and Malin (1989). Related analyses can be found in Deutsch and Fram (1978). Both a gradient magnitude and a gradient direction can be associated with each edge pixel.

.

This implies that 2a +b = 1. 7.33 produces the correct value of the Laplacian of I.32. The Laplacian of a function I ( r . If we multiply the weights of Fig. + . 7. and that each of the masks of Fig. The place where the first derivative of the step is maximum is exactly the place where the second derivative of the step has a zero crossing. Such operators are called zero-crossing edge operators. which is 2k4 4. 7. 7.35 with the values of Fig.34 and then add.35. term that e = -(4a + 4b). c ) = k .6 Legal consistent orientation pairs for edge pixels detected by a compass edge operator. 7. c ) is defined by Two of the common 3 x 3 masks employed to calculate the digital Laplacian are shown in Fig. The isotropic generalization of the second derivative to two dimensions is the Laplacian. + k2r + k3c + k4r2+ kSrc+ k6c2. It is easy to verify that if I ( r . The way they work can be easily illustrated by the onedimensional step edge example shown in Fig. 7. The general pattern for the computation of an isotropic digital Laplacian is shown in Fig.2ka.4.33.2 Zero-Crossing Edge Detectors The nonmaxirna suppression can be incorporated into and made an integral part of the edge operator.then the 3 x 3 values of I are as given in Fig. It is easy to see from the equation relating the k . 7. 7.346 Conditioning and Labeling Table 7.34. the result must be 2k4 2k6.

Figure 7. The constraints are that e = 440 + 46) and that 2a b = 1.4 Edge Detection 347 (c) Figure 7.34 The 3 x 3 neighborhood values of an image function Z(r. (c) Its second derivative.7.32 (a) One-dimensional step edge. (b) Its first derivative. + . The various 3 x 3 masks that correctly compute the digital Laplacian have different performance characteristics under noise. Suppose that the values in a local 3 x 3 neighborhood can be modeled by Figure 7.c) = k l + k2r + k3c + k4r2 + kjrc + k6c2. n Figure 7.33 Two common 3 x 3 masks employed to calculate the Laplacian.35 General pattern for a 3 x 3 mask computing a digital Laplacian.

Using the Lagrangian multiplier solution technique. Marr and Hildreth (1980) suggest using a Gaussian smoother. Since convolution is an associative and commutative operation. e = The resulting mask is shown in Fig. . Minimize u2[4a2+4b2 +(4a +4b)2] subject t o 2 a + b = 1. Of course different noise models will necessitate different weights for the digital Laplacian mask.1). Then + + + + Setting each of the partial derivatives to zero and solving for a and b yields a = $ and b = Since e = -(4a 4b).c). ~ . The central negative area of the kernel is a disk of radius f i u . If the noise variance is constant. The values of a and b that minimize the variance of the Laplacian can then be determined easily. 7.348 Conditioning and Labeling where E(r. The Laplacian of the Gaussian kernel is given by +.c) is independent noise having mean 0 and variance a2(r. the differencing entailed by taking first or second derivatives needs to be stabilized by some kind of smoothing or averaging. As we have seen with the previous edge operators. let e2 = u2[4a2+ 4b2 + (4a + 4b)2] h(2a b . d o m a i n of the Laplacian of the Gaussian kernel must be at least as large as a disk of Figure 7. The resulting operator is called the Laplacian of Gaussian zero-crossing edge detector. then the variance V of the digital Laplacian will be V = a2[4a2+ 4b2 (4a 4b)'I. + 9.36 The 3 x3 mask for computing a minimum-variance digital Laplacian when the noise is independent and has the same variance for every pixel position.36. smoothing an image by convolving it with a Gaussian kernel and then taking its Laplacian by convolving the result with a Laplacian kernel is exactly the same as taking the Laplacian of the Gaussian kernel (LOG) and convolving the image with it.

3 1processed with a Laplacian of Gaussian model a = 1.4 Edge Detection 349 radius 3 4 0 . 7. or if it is greater than t and one of its eight neighbors is less than -t.37 An 11 x 11 Laplaciaa of the Gaussian h l for u = 1. Once the image is convolved with the Laplacian of the Gaussian kernel. One way of accomplishing this is to define where k is defined to be the value that makes N N 0= r=-N c=-N LOG(r. identically distributed noise. It appears that this kind of minimum-variance optimization for smoothed images has not been utilized. To determine the optimal mask values. In actual practice. with some care being taken to do the quantization so that the sum of the positive entries equals the absolute values of the sum of the negative entries. c) where N = 13 h a ] and A is chosen to be just less than the largest value of A that would make LOG(N. . the zero crossings can be detected in the following way: A pixel is declared to have a zero crossing if it is less than -t and one of its eight neighbors is greater than t. a noise model will have to be assumed.7. 7. If this noise model is independent.4.4 and then with a zerocrossing detection having a threshold t = 1.37 shows a Laplacian of the Gaussian kernel suitable for an 11 x 11 mask ( a = 1. which will then have to be appropriately taken into account. it is clear that the masks of Fig. then the Gaussian smoothing will introduce spatial correlation. the Laplacian of the Gaussain kernel is multiplied by some constant.Figure 7. Figure 7. From our discussion of variance of the 3 x 3 digital Laplacian.N) = 0. although it is not difficult to do. since only a zero crossing is being looked for.38 shows the image of Fig. for some fixed threshold t .4). and the resulting values are quantized to integers. Figure 7.33 for computing the digital Laplacian are not necessarily optimal.

.

4. Delp and Chu (1985). and a noise variance a2. the detection probability will be to a first approximation a function of C/a and only is to a slight degree a function of 8. edge direction is 8. For an edge contrast of C in a direction 8 on an image where the noise has standard deviation a . which is a function of 8. To determine the misdetection rate. and Haralick and Lee (1990). the suppression of nonedgelike neighborhoods. We denote this probability by Prob(edge is detected in direction 8. Hence we see that PM PM(C/a. an edge orientation 8. This figure divided by the total number of edge pixels that would ideally be detected is the misdetection rate. = The false-alarm rate.7. and a. . The misdetection rate PM 1 minus the detection rate.3 Edge Operator Performance Associated with any edge detector is its performance characteristics. Peli and Malah (1982). Several comparisons have been made between edge detectors and evaluations of edge detector performance. Now add a noise image having variance a2. Haralick (1984). Generate an ideal image with a long step edge of contrast C and orientation 8. which is the most likely next step after edge detection.C. (6-81 < 61. Its various implementations differ in the details of the establishment of the gradient direction. which are defined by its misdetection rate and false-alarm rate. AU the evaluation metrics.4 ~ d g e Detection 351 even when the value of the directional derivative is high. Kitchen and Rosenfeld (1981a and b). where 6 is some fixed small-angle interval. smooth the ideal image with a 2 x 2 box filter. noise variance is a2). This edge operator has come to be known as the Canny operator. It will be a function of noise variance a2. leave something to be desired. The performance characteristics of an edge operator can easily be determined empirically by the following kind of experiment. generate images in which each pixel has a value from a pseudorandom variable having a normal distribution with fixed mean p and variance a2. For colored noise this random image can be smoothed with a small-sized box or Gaussian smoothing filter. The edge detector can be run on these images. For an edge detector that is properly designed. Bryant and Bouldin (1979). Then each edge detector will be associated with this detection probability. there is a probability that an edge detector will in fact detect the edge. Then to generate a single pixel-wide edge.8). and for each a the false-alarm rate can be estimated as the number of pixels declared to be an edge divided by the number of pixels processed. fix an edge contrast C. for they are mainly not appropriate for describing edge random perturbations for an edge-grouping processing step. edge contrast is C. To determine the false-alarm rate. Run the edge detector on the noisy edge image and count the number of edge pixels not detected. the edge detection should be suppressed. is the probability that the detector declares an edge pixel given that there is no edge. Pratt and Abdou (1979). 7. and the directional differencing. in general. however. P p . These include Deutsch and F h (1978).

A general line detector should be able to detect lines in a range of widths. Vanderburg suggests a semilinear line detector created by a step edge on 135" Flgun 7. For a dark line the different gray levels of the side regions have higher values than the center elongated region containing the dark line.39. 7. The width along the same line segment may also vary.39 Template masks for a compass line detector having four orientation directions.352 Conditioning and Labeling 0" 45" 90" 135" Figure 7. Different line segments may differ in width. For a bright line the different gray levels of the side level have lower values than the center elongated region containing the bright line. One-pixel-wide lines can be detected by compass line detectors. Line Detection A line segment on an image can be characterized as an elongated rectangular region having a homogeneous gray level bounded on both its longer sides by homogeneous regions of a different gray level.40 Template masks for a semilinear compass line detector having four orientations. . as shown in Fig.

5 Line Detection 353 Figun 7.41 Template m s for a line detector capabk of detecting lines of one ah to three pixels in width.7. .

For lines that have a width greater than one pixel. One possible way of handling a variety of line widths is to condition the image with a Gaussian smoothing. (Adding replacement noise means choosing a fraction p of pixels of the image at random and replacing their values with random values within the range of the image values. the line detector of Fig. which makes it possible for the simple line detectors of Figs. 7.12]. The template masks compare gray level values from the center of the line to values at a distance of two to three pixels away from the center line. using a standard deviation equal to the largest-width line that the detector is required to detect. Another way to handle wider lines is to use a greater sampling interval.40.39 and 7. 7.05. = WN+I-. 7.0. 7.39 and 7. The smoothing has the effect of changing the gray level intensity profile across a wide line from constant to convex downward.01. As long as the regions at the sides of the line are larger than two pixels in width. .02. 7. b.002. Generate and examine the appearance of the following noisy images obtained by independently distorting each pixel of an image of a real scene by the following methods: a.39 produces half the response of what it would for one-pixelwide lines. Template masks for eight directions are shown in Fig. the semilinear detector of Fig. Larger-width lines can be accommodated by even longer interval spacings.40 are not satisfactory. Show that if w.40 fails when t > 0. For lines three pixeIs or more in width.0. the technique will work for lines of one to three pixels in width.0. 7. then order-statistic filtering commutes with any linear transformation.39.354 Conditioning and Labeling either side of the line. Adding replacement noise with replacement fractions p = 0. His qualitative experiments indicated that the semilinear line detector performs slightly better than the linear line detector of Fig.8.40 to work. if Y n = axn b.05. Using the template masks shown in Fig.2. for any constants a and b.) c. Adding Gaussian noise with standard deviation from 1 to 21 by 4. Distorting the image with multiplicative noise by multiplying each pixel value with a uniform random variable in the range [0.. 0.1. For lines two pixels in width. then + 7. as suggested by Paton (1979). the template masks of Figs. 7.41. That is. 7. then it will produce many false detections. For lines two pixels in width. he calculated the line strength s at a pixel position as s = max{ai in a direction 8 defined by + bi lai > t and bi > t ) where ai + bi = s 8 = 45"i. And if t = 0.01.0. it produces no response.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Given a noisy. In the sloped model. Graham (1962) and Prewitt (1970) were the first to adopt this point of view. does to the assumed form. We can then use these estimates in a variety of ways. processing proceeds by implicitly or explicitly estimating the free parameters of the general form for each neighborhood and then calculating the appropriate conditioning or labeling values on the basis of the definitions relative to the underlying gray level intensity surface. piecewise linear (sloped facet model). In the flat model. line detection. defocused image and assuming one of these models. including edge detection. respectively. The commonly used general forms for the facet model include piecewise constant (flat facet model). to accomplish labeling and conditioning. each ideal region in the image is constant in gray level. Processing of the digital image for conditioning or labeling must first be defined in terms of what the conditioning or labeling means with respect to the underlying gray level intensity surface. The observed digital image is a noisy. In Section 8. and piecewise cubic. Similarly. regions have gray level surfaces that are bivariate quadratic and cubic surfaces. in the quadratic and cubic models.CHAPTER THE FACET MODEL 81 Introduction The facet model principle states that the image can be thought of as an underlying continuum or piecewise continuous gray level intensity surface. On the basis of the general form. discretized sampling of a distorted version of this surface. such as &focusing or monotonic gray level transformation. we must first estimate both the parameters of the underlying surface for a given neighborhood and the variance of the noise. To actually carry out the processing with the observed digital image requires both a model that describes what the general f r of the surom face would be in the neighborhood of any pixel if there were no noise and a model of what any noise and distortion. each ideal region has a gray level surface that is a sloped plane.2 we illustrate the use . and noise filtering. piecewise quadratic. comer detection.

Section 8. Section 8. To find the relative maxima.1 1 discusses using the facet approach to compute higher order isotropic derivative magnitudes.3 we review the parameter estimation problem for the sloped facet model and in Section 8. which is to detect and locate all relative maxima. for the nth group of 2k + 1 can be expressed by Taking partial derivatives of e?. setting these partial derivatives to zero. which topographically are ridges and ravines. taking the origin of each fit to be the position of the middle observation in each group of 2k +1. The squared fitting error e?.. Section 8. and f results in the following: Assuming k = 1. to subpixel accuracy. .9 discusses the integrated directional derivative gradient operator.4 we use the sloped facet model for peak noise removal. f spaced points a unit distance apart. we can least-squares fit a quadratic function Emz+ bm + B. Section 8. and simplifying yields .10 discusses the facet approach to comer detection. we illustrate how a facet model can be used to partition an image into regions each of whose gray level intensity surface is planar. In Section 8. and Section 8. Section 8.6 we illustrate its use in the classic gradient edge detector application.7 discusses the Bayesian approach to deciding whether or not an observed gradient magnitude is statistically significantly different from zero.8 discusses the zero crossing of second derivational derivative edge detectors. Then we analytically determirie if the fitted quadratic has a relative maximum close enough to the origin.13 concludes with the labeling of every pixel into one of a variety of topographic categories. In Section 8. -k 5 m 5 k. 6. some of which are invariant under monotonic gray scale transformation. Relative maxima are defined to occur at points for which the first derivative is zero and the second derivative is negative.5.372 Tho h a Modal of the facet model principle in an application of determining relative maxima in a one-dimensional sense. . to each group of 2k + 1 successive observations. from a one-dimensional observation sequence f. In Section 8. . we consider a simple labeling application.12 discusses the determination of lines. with respect to the free parameters d . f 2. Section 8. Relative Maxima To illustrate the facet model principle. fakeri on successive equally .

. We take [.. and f . < To see how well this algorithm will perform under conditions of additive independent Gaussian noise. extremum is a relative maximum when 2 < 0 The algorithm then amounts to the following: 2. In this casc the computed variates d an9 P are normally distributed. e < 0. [. then there is no chance of maxima.. compute xo = -6/22. can be modeled by where g. [.. Test whether C < 0 If not. is the noisy observed value. is the noise. We can compute the expected value and variance for b and e. ..8.= c d bm + a for -k 5 m 5 k. to be a normally distributed random variable with mean 0 and variance u2.2 Relath Maxima 373 the matrix equation from which The quadratic y = Px2+ bx d has relative extrema at xo = -b/2C.. u2). suppose that the observed f . If + 1. then mark the point n +xo as a relative maximum. has N(0. kol i. is the true underlying value of the nth point and satisfies gn+. + . and the .. If 3.

by examining the covariance we can determine the statistical dependence or independence of b and i.) 89 Also. given the value of c? Prob ( E > O(c) = 1 1 ---&gi*dC . Hence 6 and t are uncorrelated. One such question has the form: What is the probability that the estimated t is greater than zero. Having determined that we can ask questions relating to the probability of missing a peak. and since each is a normally distributed variate.. they are statistically independent.374 The Rat Modd 1 3 = -(a2 +4u2 + a2)= -a2 4 2 (.

Sloped Facet Parameter and Error ~stimation In this discussion we employ a least-squares procedure both to estimate the parameters of the sloped facet model for a given two-dimensional rectangular neighborhood whose row index set is R and whose column index set is C and to estimate the noise variance. we can ask questions relating to the probability of declaring a false peak." the knowledge that has can be used to determine or bound the misdetection rate and the falsedetection rate performance characteristics.. One such question has the form: What is the probability that a e estimated t is less than zero.. $u < c < 0 ) .. for a given negative c.6 1 2 ~< f . the image function g is modeled by (8. The facet parameter estimates are obtained independently for the central neighborhood of each pixel on the image.c ) r R x C. To limit the probability of false maxima. for the relative maximum labeling procedure "label the a pixel as having a relative maximum if 2 < c .8. We also assume that for each (r.g. the chance that the estimated 2 will be greater than zero is significant.15) g(r. we can require that t < c. we can limit the probability that a false maximum can occur. this answer suggests that instead of only requiring 2 < 0 . and .3 Sloped Facet Parameter and Error Estimation 375 Our answer indicates that when c is negative but close to zero e. and therefore a maximum could be missed. We assume that the coordinates of the given pixel are (0. By doing so.c) = crr Bc y q(r. In a similar manner. given the value c? ( Prob ( t < Ole) = 4 ($1 Our answer indicates that when c is positive but close to zero the chance that the estimated t will be less than zero is significant.O) in its central neighborhood. + + + . We will assume that q is noise having mean 0 and variance u2 and that the noise for any two pixels is independent. and therefore a false maximum could occur. In general. which represents noise.c) where 7 is a random variable indexed on R x C .

there is no pixel in the center.376 The Facet Model The least-squares procedure determines an &. we obtain cu = - 6 = = Replacing g(r. C) r C C r2 r c C Cc'g(r.c! and simplifying the equations will allow us to see explicitly the dependence of &. the center pixel of the neighborhood has coordinates (0. C) r C C c2 r c Cr Ccg(r. We obtain + + C Ccrg(r. 8 . but the point where the comers of the four central pixels meet has coordinates (0. we choose our coordinate system R x C so that the center of the neighborhood R x C has coordinates (0. and 4 on the noise. When the number of rows and columns is odd. c) C C 1 r c (8. When the number of rows and columns is even.O). C) C C r2 r c d = ~ Cr Ccc ~ ( rc) + . 8. c) by cur +PC y q(r. C C c2 r c r .19) Ccrq(r.O). and +.20) . In this case pixel centers will have coordinates of an integer plus a half. 8. 4 = y + C Cc~ ( rc) C C 1 r c & = a +C r (8.O). and that minimize the sum of the squared differences between the fitted surface and the observed one: Taking the partial derivatives of c2 and setting them to zero results in Without loss of generality. The symmetry in the chosen coordinate system leads to c rtR r = 0 and c ctC c = 0 Hence Solving for &.

and ij are unbiased estimators for a . c ) .B)'c2 + (9 . = C C[(& .8. Examining the squared error residual c2. 6 .~ ) 2 2x r c r c x r C~-(~-~)~CX 1 e r c Now notice that r c is the sum of the squares of r c independently distributed normal random variables with mean 0 and variance o'.yl2 + q2(r.a ) 2r x .3 Sloped Facet Parameter and Enor Estimation 377 From this it is apparent that &. 8 . and y. we find that c2 = C C {(hr + bc + 4) . and have variances Normally distributed noise implies that &. r C)]}~ C) c r e Using the fact that we may substitute into the last three terms for c2 and obtain after simplification c2 = ~ ~ q 2 ( r .[ar + BC + y + ( ) ( I . @.( h .( B . 6 . are normally distributed.a)'r2 + (6 . are independent because they are normal and that as a straightforward calculation shows. respectively. Hence . and j. The independence of the noise implies that &. and j.

is distributed as a chi-squared variate with degrees of Freedom. the peakedness of gray level intensity is different between them. We should note here that peak noise is judged not from the univariate rnargmal distribution of the gray level intensities in the neighborhood but from their spatial distribution. Let n be the number of pixels in the neighborhood N. This indicates that the gray level spatial statistics are important. Facet-Based Peak Noise Removal A peak noise pixel is defined as a pixel whose gray level intensity significantly differs from those of the neighborhood pixels.3. 1983). Note that the use of the deleted neighborhood makes this facet approach different from that used earlier. However.241 u2 is distributed as a chi-squared variate with three degrees of freedom. c.1 illustrates an example of the spatial disttibution dependency of peak noise. In order to measure the difference between a pixel and its neighbors.l(a) and (b) the central pixel has the same gray level "5. 8.. Let N be a set of neighborhood pixels that does not contain the center pixel. The next section discusses the use of the estimated facet parameters for peak noise removal.3 r c degrees of freedom. and? are independent normals.7)' CrC 1 (8. b." and the neighborhood is composed of four values {1.8)' Cr xc 1 C 1 . It is difficult to judge that the center pixel in part (b) is peak noise. This means that can be used as an unbiased estimator for u2.5 discusses the use of the estimated facet parameters in the labeling operation called gradient edge detection. Figure 8. In Fig. we need to estimate local gray level statistics in the neighborhood in tenns of the sloped facet model and afterward compare those statistics with the current pixel gray level.2. Because &. whereas it is easier in part (a). By choosing this neighborhood.) gray level intensity and the value estimated from the neighboring pixels according to the sloped facet model (Yasuka and Haralick. we may estimate the difference between the observed value of the center pixel's (r. Section 8. ' . c2 + (4 . Therefore e2/u2is distributed as a chi-squared variate with (4 - x r C r2 + tb .4).

O)) is given by g(r.c) = ar +Bc +r +v(r. The minimizing h. and which minimize the sum of the squared differences between the fitted surface and the observed one: +. and + are given by .c) E N where q(r.c) for (r.b.c) is assumed to be independent additive Gaussian noise having mean 0 and variance 02. The least-squares procedure determints h .((0.-3 -2 -1 0 1 2 3 (b) Figure 8. we determine that the doped facet model with the centerdeleted neighborhood N = R x C . 8.1 Spatial distribution dependency of peak noise. Proceeding as before.

O) is not peak noise.0l.O).p. Hence has a t-distribution with #N . Under the hypothesis that g(0. and the output value for the center pixel is given by 9. Thus the iterated sloped facet model would be an appropriate description of this specialized facet model.O).05 to .the hypothesis of the equality of g(0. the fitted value is 9 and the observed value is g(0. each of which satisfies certain gray level and shape constraints. The value of K associated with an image means that the narrowest part of each of its facets is at least as large as a K x K block of pixels. We have already obtained that e2/a2has a chi-squared distribution with #N .Hence has mean 0 and variance 1. We assume that each region in the image can be exactly represented as the union of K x K blocks of pixels.380 The Facet Model At the center pixel of the neighborhood. A reasonable value for p is .the hypothesis of the equality of g(0. g(0. then for the ideal image having a degree-one polynomial function. Let TN-3.p. The shape constraint is also simple: Each facet must be sufficiently smooth in shape. The center pixel is judged to be a peak noise pixel if a test of the hypothesis g(0.o) and 9 is rejected.O) = 9 rejects the hypothesis.pbe a number satisfying Hence if t > TnN-3.9 has a Gaussian distribution with mean 0 and variance aZ(l+ &). and the output value for the center pixel is given by g(0.O) . Iterated Facet Model The iterated model for ideal image data assumes that the spatial domain of the image can be partitioned into connected regions calledfacets. Hence images that can have large values of K have very smoothly shaped .is not rejected. the surface is a sloped plane.3 degrees of freedom. If t 5 T#N-3. The gray levels in each facet must be a polynomial function of the rowcolumn coordinates of the pixels in the facet.O) and 7 .3 degrees of freedom. Hence if we consider the gray levels as composing a surface above the resolution cells of the facet.

the procedure amounts to fitting a sloped plane to each of the blocks in which a given resolution cell participates and outputting the fitted gray value of the given resolution cell from the block having the lowest fitting error. we would say that the facet domains are open under a x K structuring element. 1981) and has the following important properties: In a coordinated and parallel manner. An observed image J differs from its corresponding ideal image I by the addition of random stationary noise having zero mean and covariance matrix proportional to a specified one.r + p. The gray leveI distribution in each of these blocks can be fit by a polynomial model. for every resolution cell (r. and /.c + 7.r ~ be a partition of the spatial domain of R x C into its facets. The iterations generate images satisfying the facet form. j ) E R x C such that 1.c')] = ko(r . The sloped facet model relaxation procedure examines each of the K2. For the sloped facet model. The facet model suggests the following simple nonlinear iterative procedure to operate on the image until the image of ideal form is produced. Morphologically. For any (r. there exists a resolution cell (i. The output gray value is then the mean value of the block having the smallest variance (Tomita and Tsuji. thereby causing the weak to become consistent with the strong. 3 are assumed to be zero and Nagao and Matsuyama use a more generalized shape constraint. j ) E a. the strong influence the weak in their neighborhoods.c) and let B(r.. .c) E T. let I ( r . For each block. To make these ideas precise. For the flat facet model.c) = a. where E[v(r. 1977) and Nagao and Matsuyama ( 1979). c) be the gray value of resolution cell (r. c)l = 0 E[v(r. a block error can be computed . which is also suitable here.c') The flat facet model of Tomita and Tsuji (1977) and Nagao and Matsuyama (1979) differs from the sloped facet model only in that the coefficients a. c . ) centered around resolution cell (r.r'. Let r = {a. c).K x K blocks to which a pixel (r.8.. . 2. let R and C be the row and column index set for the spatial domain of an image. Each resolution cell is contained in K 2different K x K blocks.. In the iterated sloped facet model.5 Iterated Facet Model 381 K regions. this amounts to computing the variance for each K x K block in which a pixel participates. The procedure has been proved to converge (Haralick and Watson.. Set the output gray value to be that gray value fitted by the block having the smallest error of fit.c) E R x C.. c)v(rl. Region gray level constraint: I ( r . One of the K 2blocks has the smallest error of fit. c) belongs. Shape region constraint: (r. . c) E B(i.c) be the K x K block of resolution cells .

c *) be the coordinates of the given pixel (r. Let (r * .c*). use the sampled brightness values of the digital picture function to estimate the parameters. by using where j ( r . where some kind of random noise has been added to the true function values. c ) in terms of the coordinate system of the block havicg the smallest residual error. To do this. One of the K x K blocks will have the lowest error. + + Gradient-Based Facet Edge Detection The facet edge finder regards the digital picture function as a sampling of the underlying function f . Figure 8.Each mask must be normalized by dividing by 18.j ) in the neighborhood. c ) participates have fitting function J. j). and finally make decisions regarding the . The output gray value at pixel (r. c) is then given by J(r*.2 The 3 x 3 linear estimators of a pixel's gray level for the nine different 3 x 3 neighborhoods in which the pixel participates.382 The Facet Model Figure 8. the estimate is j ( i . If the pixel's position is ( i . c ) = &r 6 c 9 is the least-squares fit to the block. the edge finder must assume some parametric form for the underlying function f .2 illustrates how replacing the pixel's value with the fitted value where the fitted value depends on the block having the smallest residual error is equivalent to replacing the pixel's value with a value that is a linear combination of pixel values coming from the best-fitting block. Let the fit of the block having the smal!est residual error in which the given pixel I(r.

The resulting values for & and are the same for the quadratic or linear fit. There are other kinds of edge detectors. it is impossible to determine the true locations of discontinuities in value directly from a sampling of the function values. The idea of fitting linear surfaces for edge detection is not new. O'Gorrnan and Clowes (1976) discussed the general fitting idea. A small neighborhood on the image that can be divided into two parts by a line passing through the middle of the neighborhood and in which all the pixels on one side of the line have one gray level is a neighborhood in which the dividing line is indeed an edge line. In this case the boundary between object parts will manifest itself as jumps in gray level between successive pixels on the image. Of course. Prewitt (1970) used a quadratic fitting surface to estimate the parameters a and fl in a 3 x 3 window for automatic leukocyte cell scan analysis. The fact that the Roberts gradient arises from a linear fit over the 2 x 2 neighborhood is easy to see.8. Sharp discontinuities can reveal themselves in high values for estimates of first partial derivatives. Hueckel(1973) used the fitting idea with low-frequency polar-form Fourier basis functions on a circular disk in order to detect step edges. Let a b c d . Such edge detectors are called gradient-based edge detectors. This is the basis for gradient-based facet edge detection. & r b c q. The gradient magnitude will be proportional to the gray level jump. When such a neighborhood is fitted with the sloped facet model. The locations are estimated analytically after doing function approximation. then the true surface a r /3 c 7 will have a = /3 = 0. Suppose that our model of the ideal image is one in which each object part is imaged as a region that is homogeneous in gray level.6 Gradlent-Based Facet Edge Detection 383 locations of discontinuitiesand the locations of relative extrema of partial derivatives based on the estimated values of the parameters. Merb and Vamos (1976) used the fitting idea to find lines on a binary image. a gradient magnitude of + + will result. Hence it is reasonable for edge detectors to use the estimated gradient magnitude as the basis for edge detection.8. A discussion of how the facet model can be used to determine zero crossings of second directional derivatives as edges can be found in Section 8. On the other hand. if the region is entirely contained within a homogeneous area. such as zero-crossing ones. and the fitted sloped facet model & r 6 c 9 will produce a value of + + + + which is near zero. Roberts (1965) employed an operator commonly called the Roberts gradient to determine edge strength in a 2 x 2 window in a blocks-world scene analysis problem.

181) =1 . p -dl + lb . How large must the gradient be in order to be considered .c ( ) = max (126 I.27) There results ={lei.d l + ( b -cl =max {la .Ix -ulh + Ib . (b .b + c l . Finally.25) 2 2 The gradient. The quick Roberts gradient value is given by la . which is the slope in the steepest direction.28) Hence the quick Roberts gradient is related to the parameters of the fitted sloped surface. Th most interesting question in the use of the estimated gradient 462 + B2 as an edge detector is.C I I (8.a + d + b .ci). has magnitude a2 JG. Since max{lul.IvI) = +I?(. The max Roberts gadent value is defined by rnax{la .a + d . The Roberts gradient is defined by d ( a cu and fl is 1 6 = . the max Roberts gradient is also related to the parameters of the f t e itd sloped surface. 1 8 1) 2 (8.d 1 I x l + lul =={k +ul.d + b .c ) ~The least-squares fit for .dl.( a + b ) ] and b = .c l .384 The Facet Madel be the four gray level levels in the 2 x 2 window whose local spatial coordinates for pixel centers are + ( b .c 1.[ ( b + d ) .d .( a + c ) l [ (8. Now since (a . I . which is exactly 1 1 4 times the Roberts gradient. l a .1( c + 6 ) .b + c ( ) =max { I .

then so that the test statistic is a multiple of the estimated squared gradient magnitude B2. Hence is distributed as a chi-square variate with two degrees of freedom.21. Hence small gradient magnitudes estimated in small neighborhoods may not be statistically significant. other things being equal.01. However. the greater the gradient magnitude must be in order to be statistically significant. The false-darm rate is the conditional probability that the edge detector classifies a pixel as an edge given that the pixel is not an edge. other things being equal. Notice that. To use this technique.8. Suppose the falsealarm rate is to be held to 1%. but small gradient magnitudes may be statistically significant in large neighborhoods. Fortunately we can obtain a good estimate of a*. as the neighborhood size gets bigger. the threshold we must use must be at least 9. by knowing the conditional distribution given no edge. such an edge operator is a generalization of the Prewitt gradient edge operator.21) = . we begin by noting that 6 is a normally distributed variate with mean a and variance u2 I C 1r 2 r c that 6 is a normally distributed variate with mean 6 and variance and that 6 and 6 are independent. suppose we want the edge detector' to work with a controlled falsedark rate. From this it follows that to test the hypothesis of no edge under the assumption that a = 6 = 0. a fixed value of squared gradient becomes more statistically significant. Also notice that. If the statistic G has a high enough value.3) . For neighborhood sizes greater than 3 x 3. we must know the noise variance u 2 .6 Gradient-Based Facet Edge Detection 385 significantly different from zero? To answer this question. For example. it becomes easier to choose a threshold. then we reject the hypothesis that there is no edge. the greater the noise variance. we use the statistic G: which is distributed as a chi-squared variate with two degrees of freedom. Each neighborhood's normalized squared residual error G2 + e2 (Cr C c 1 . If the neighborhood used to estimate the facet is square. Then since P(d > 9.

3) taken over all the neighborhoods of the image is a very good and stable estimator of o Z if it can be assumed that the noise variance is the same in each neighborhood. This estimator is available for each neighborhood ' of the image. Thus if wc wanted to detect edges and he assured that the false-alarm rate (the conditional probability of assipning a pixel as an edge given that it is not an edge) is less than p. But because the effective number of degrees of freedom of G2 is so high. and f 2 are computed. . 8. then we may use EL. A sloped facet model is fitted to cach 5 x 5 neighborhood of each image and its r i .) = p. in place of a?. 8. G has essentially a chi-squared distributiun with two degrees of frecdom. figure 8. 50. the average tics.noise standard dcv~ationof 40: (lower left).3 (upper icfi).Hence our test statistic G becomes Under the hypothesis of no edgc. represents the squared residual fitting error from the nth neighborhood and the image has N such neighborhoods. and 75 is added tcl the controFled image.(upper r~pht).d ~ s k I havIng valuc ZW)and nolsy d ~ c k s .. Figure 8.3. Because there are usually thousands of pixels in the Image.. For the idcal image of Fig. G. noise standard deviatron 06 75 . If EI. being the ratio of two chi-squared statiswould have an F distribution. where P(xf 1 8. C(>ntmlledd ~ c h background hovrnp value 0. The noisy images are shown in the other parts of Fig. Independent Gaussian noise having mean 0 and standard hah deviation 40. we would use a threshold of 8. despite the dependencies among the E .. .3 (upper left) shows a controlled 100 x 100 image having a disk of diameter 63. The interior of this disk has a gray Ievel of 200.3 (Upper I d t ) . The background of the d ~ s k a gray level of 0.386 The Facet Model can constitute an estimator for 0. nolfe standard dcvialion of 50: (lower rlght). the average of 6' :( C r 1 .

6 Gradient-Based Facet Edge Detection 387 (c) Figure 8. The other parts of Fig.400.33.8. Since b2 = 302.7% of the dynamic range of the image. The sloped fit there has an average squared residual error of 3746.3 = 22. Since b2 = 5975. The standard deviation of fitting error is then 61. the standard deviation of the fitting error is a = 77. yields an average squared error of 3746.4 (a) Neighborhood for which slope fit is a relatively bad approx= 16. 8. + squared residual fitting error e2 .3 is thresholded at the value 120. assuming that the imperfectedness of the model and the noise are independent. The square root of 3746 is about 61. which is close to the 77. 8.4.2. This divided by the degrees of freedom.33 and for a 5 x 5 neighborhood C C r2 = C C c2 = 50. One neighborhood having the worst fit is shown in Fig. (b) Slope-fitted imation. which represents the standard deviation of the residual fitting errors. 8.3 (upper right). The total squared error from (c) is 82.2. this corresponds to + selecting all neighborhoods having slopes greater than 26. (c) Residual fitting errors. Obviously in the noiseless image the fit will be perfect for all 5 x 5 neighborhoods that are entirely contained in the background or in the disk.4~ 752 = 77.3 (lower right) is thresholded at 4. which is 8. In fact.3 for the r c r c .6% of the dynamic range. The fit produces an h = 36. 8. 8.5 show the edges obtained when a 5 x 5 sloped facet model is employed and when the statistic G computed from each neighborhood of the noisy image of Fig. and 11. 8. 25 . The error must come from neighborhoods that contain some pixels from the background and some from the disk. neighborhood.94.5 (upper left) shows edges obtained when the statistic G computed on the ideal image of Fig. and = 40. Figure 8. This is just a little higher than the standard deviation of the noise because it takes into account the extent to which the data do not fit the model. For the noisy image of Fig. the average being taken over all neighborhoods. which represents 30.4.3.3 measured. we would expect to find a standard deviation of d17. is 302. In these neighborhoods the sloped fit is only an approximation. This corresponds to a standard deviation of about 17.

(upper nght) edges ohtalned u h e n the statistic G computed using 5 x 5 nelghhurhoods on the nois! rmagc o f F1p R. If we cannot assume that the noise variance is the samc in cach neighborhood.5 [Upper left). 8 . Such a plot is called an operating curve. Hrjwever. then the estimator using thc average of the normalized squared residual errors for u-' i s nut proper. Notlce rhat as rhc false-alarm rate decreases.1598. The higher one corresponds to a noisy disk with noise standard deviation 7 5 . respectively. a threshold of 8 corresponds to selecting all neighborhoods having slopes grcater than 30. The lower one corresponds to a noisy disk with noise standard deviation 50. noisy image.92. . as the false-alarm rate increases. corresponding tu each possible threshold is a false-alarm rate and a misdetection rate. the misdetcction rate decreases. As just mentioned. and 1 1 guaranree (under the conditions of independent G a u w a n noise) that the false-alarm rates must be less than -1353. Edge\ obtained when the statist~cG co~nputed using by 5 -.004 . and .0I832. .O82O. "'($-$ > 8 implies Thew thresholds of 4. Here. It does have a higher variance than the estimate based on the average of the local variances.0205. The observed false-alarm rates for these th terholds are .1231. to test the hypothesis (xr . the misdetection rate increases.3 ) as an estimate of the variance in each neighborhood. and . and . and ir has a much tower number o f degrees of freedom.r 388 The Facet Model Flgura 8.3 (lower nghtl i\ ~hresholded the at value 3: Ilnwer left) and flower r~pht)thresholds or 8 and 1 1.6 shows two operating curves for the sloped facet edge detector. this estimate is not as stable as the average squared residual error. That is.3a 1s thresholded at the value 120. Figure 8. In a noisy world the false-alarm and misdetection rates cannot both be made arbitrarily small. 5 neighborhoods on the ldeal Image of Fig. In this case we can use the local neighborhood residual squared error c' / Cc 1 . The misdetection rate is the conditional probability that a pixeI ih assigned "no edge" given rhat it is actually an "edge'' pixel. 1 Q l h4. One way to characterize the performance of an edge detector is to plot its false-alarm rate as a Function of the rnisdetection rate in a controlled experiment. respectively. The corresponding rnisdetection rates are .0042. 8.

~cni~aEly proportional ta the squared gradient of the region normalized by ) r r h ~ t . and reject the hypothesis for large values of F .6 Two operating curves for the 5 x 5 sloped facet gradient edge detector The higher one corresponds 10a noisq drsk w t h noise standard deviar~onof 35 and the upper one correspond% a nols!.1 Consider the folZowing 3 x 3 region: ..~.8. 101 nrl edge for the flat-world assumption.hir a random variable whose expected value is a*. I EXAMPLE 0.vrccs of freedom. -?~ain notice that F may be regarded as a significance or reliability measure ~ + n c ~ a t ewith the existence of a nonzero-sloped region in the domain R x C.It is d . disk with noise standard dcviatlan to of 50. the variance of the noise. ry = 0 = 0.6 Gradient-Based Facet Edge Oetection 389 Flgure 8.we use the ratio u!l!ih has an F distribution with i!.

67.6 + (2.67 is greater than 10. since the probability of a region with true zero slope giving an Fzb statistic of value less than 10.17.19.6. we would probably call it a nonzerosloped region. Because 13.we are assured that the probability of calling the region nonzero sloped when it is in fact .Sampling it at the center of each pixel produces The difference between the estimated and the observed surfaces is the error.00. The F statistic is then [(-I . and 9 = 5.19/6 If we w r compelled to make a hard decision about the significance of the ee observed slope in the given 3 x 3 region. The estimated gray level surface is given by 6r + br + i .b = 2.67 11.99.17)2.6 is 0.390 The hca Model 7 Then cii = -1.67)2 6]/2 = 13. and it is From this we can compute the squared error e2 = 11.

19811a) to the decision of whether or . (upper right) T.3 (lower right) is thresholded at 2. Ail this is to bc expcctcd bccause the noise meets 15c . 8. 8. should guarantee (under conditions of lndependenz Gaussian noise) that the l. 5.06.3224.5. Since these observed false-alarm ratcs are almost identical to the observed t. 5 5 <loped tacet model using the F statistic. rcr? slnped 11 f. and .~csurnptionof the chi-square test. M and 7. These threshI !? produce observed false alarm ratcs of .3 (lower right) thresholded at 2. and 7. . It is obvious a comparison of these images that the edge noise is worse in the F-tests camr.0792. F i y r e 8.32. the better the results c~ughtto bc whcn thc appropriate wictical test is used.1236.*i>e-alarmsates are less than .32: [lower Teft) and (lower right) Ihresholds nf 5 .:l\e-alarmrates from the chi-square tests of Fig. These threshc l l ~ f . .0042.11m Bayesian Approach to Gradient Edge Detection TLlr Bayesian nlbr approach (Zuniga and Haralick.0 158.7 shows the edges obtaincd when a 5 x 5 slaped facet model is c.!. 8.7 Edgcs obtained under .surnptinns about reality.7 Bayesian Approach to Gradlenl Edge Detection 391 Flgure 8.!rcd with the chi-square tests. all of which are considerably higher than the ob-:rved misidentification rates of the corresponding chi-square tests.8. - is much less than 3 % . respectively. indicating t i i t i t . statistic image rlf the nrliky disk of Fip. were small but negligible departures from the independent Gaussian assump*Jlln\. wc may ccrrmpare the corre5rt):ding misidentification rates.0042. .04.in observed gradient magnitude G is statistically significant and therefore par' ~ i ~ a t in ssome edge is to dccide there is an edge Istatrst~callysignificant gradient) e tz hcn. and . PIedge IG) > Plnanedge IGj . and .0 165.5137. l !L:pper left) Thresholded F statistic from rhe noiseless disk. and the more one is able to make correct . The statistically oriented reader will recognize ~ ' I Ctest as a I X significance level test.M. I 2 18.lrpln!led and when the F statistic con~putcdfrom each neighborhood of the noisy II7i:l:e of Fig. The observed m~sidcntificationrates for the F-test l<lm~rc .

P ( G ) . the density function of the histogram of observed gradient magnitude. together with the relation for P(GI edge ).P(G I nonedge )P( nonedge ) P( edge ) .37) 1 .392 The Facet Model By definition of conditional probability. implies that the threshold t must satisfy P ( t ) = 2P(t 1 nonedge )P( nonedge) (8. the appropriate threshold t for G is determined as that value t for which P ( G )edge ) = (8. Once P(G I edge ) is known. P(G I nonedge) is known to be the density function of a & variate. For many images values of .95 are reasonable. The zerocrossing edge detector looks for relative maxima in the value of the first derivative taken across a possible edge.9 to . P(G I nonedge )P( nonedge ) P(G) Hence a decision for edge is made whenever P( nonedge IG) = P(G ( edge )P( edge ) > P(G I nonedge )P( nonedge) (8. Now (8. *i 2 ZereCmssing Edge Detector The gradient edge detector looks for high values of estimated first derivatives. In what follows we assume that in each neighborhood of the image. But P(G 1 edge ) is not known. the underlying gray level intensity function f takes the form .35) From the previous section.P(G I nonedge )P( nonedge ) (8. It is possible to infer P(G I edge ) from the observed image &ta since P ( G ) .38) This relation.P( nonedge ) This means that once the prior probability for nonedge is specified.39) P(t I edge )P( edge ) = P(t ( nonedge )P( nonedge ) 4 Hence Here P ( t ) is the observed density for the test statistic G evaluated at t . This permits the resulting edge to be thin and even localized to subpixel accuracy. the density function for P(G 1 edge ) can be computed. and P( nonedge) is a user-specified prior probability of nonedge. P ( t 1 nonedge) is the value of the density function of a variate evaluated at t . is easily calculated.36) P(G) = P(G ( edge )P( edge ) + P(G 1 nonedge )P( nonedge ) P ( G ) .

(r) have been defined fn general.1 Discrete Orthogonal Polynomials The discrete orthogonal polynomial basis set of size N has p o l y n o h m degree ~ zero through degree N . .. the zero-crossing edge detector places edges not a Zocations of p a pixel high gradient but at locations of spatial gradient maxima.-.1 .8.-l ( r ) . Here we show h..8. r + a O ) = O .zo construct them for one or two variables.. Such a basis is the d i m orthogonal polynomials. .(r) = r"+a. A polynomial basis set that permits the independent estka&n of each coefficient would be the easiest to use. 1957) by the relation ..-. . Thus this kind of detector wl respond to weak but spatially peaked gradients.a. These unique polynomials are sometimes d & the disl crete Chebyshev polynomials (Beckmann.+air +a. . Let the discrete integer index set R be symmetric in the sense rhat r E R implies -r E R.8 ZwCrossing Edge 393 As just mentioned.Suppose Po(r). . . + a . P. . 8.-..P.. 1973). Define Po(r) = 1. . The first five polynomial function fomlulas are where The discrete orthogonal polynomials defined on symmetric sets can be recursively generated (Forsythe. We define thr amstruction technique for discrete orthogonal polynomials iteratively. . P.rn-I +..is marked as an edge pixel if in its immediate area there is a zero m&ng of the second directional derivative taken in the direction of the gradient ( W k .. . r ER k = o . n -1 (8.41) These equations are linear equations in the unknown a o ... Pn(r)must be orthogonal to each plynomid Po(r). More . Hence we must have the n equations C Pk(r)(fn +an-. a x are easily ni solved by standard techniques. Let Pn(r)be the nth order polynomial. 1982) and if the slope of the zero crossing is negative. il The underlying functions from which the directional derivatives zr= computed are easy to represent as linear combinations of the polynomials in a q plynomial basis set. .rn-' +.

.. . Using the construction technique just described. ( r ) )of discrete orthogonal polynomials . .PN(r)QM(c)) a set of discrete polynomials . ..Q M ( c )be a set of discrete polynomials on C. Let {Po(r). The proof of this fact is easy.(c).(c) when n # i or m # j. Consider whether Pi(r)Qj(c) orthogonal to is P. ( r ) = r 8. over R. . . the first or second sum must be zero.8.394 where The Facet Model P o ( r ) = l and P . Let {Qo(c). . Let the number of elements in R be N.. . . we can construct discrete orthogonal polynomials over a two-dimensional neighborhood. Some one.and two-dimensional discrete orthogonal polynomials are as follows: Index Set Discrete Orthogonal Polvnomial Set 8. . Then the set . . .(r)Q.2 Two-Dimensional Discrete Orthogonal Polynomials Two-dimensional discrete orthogonal polynomials can be created from two sets of one-dimensional discrete orthogonal polynomials by taking tensor products. . we may construct the set {Po(r). is onR xC.. Let R and C be index sets satisfying the symmetry condition r E R implies -r E R and c E C implies -c E C. Then Since by assumption n # i or m # j. thereby proving the orthogonality. P N ( r ) ) a set of discrete polynomials .8. . . ) {Po(r)Qo(c).P N .(r)Q. .P. Using the tensor product technique also described.3 Equal-Weighted Least-Squares Fitting Problem Let an index set R with the symmetry property r E R implies -r E R be given. be on R.

a ~ . The exact fitting problem is to determine coefficients ao. . . let a data value d(r) be observed. Readers who would like more technical details on the relationships between discrete least-square function fitting and orthogonal projection may consult Appendix B. . k = 0. . Similar equations hold for the two-dimensional case. . The only difference is that the norm associated with the weighted least squares is a positive definite diagonal matrix instead of an identity matrix.a. Figure 8. Similarly an estimate for any definite integral can be obtained. . the data value d(r) is multiplied by the weight which is just an appropriate normalization of an evaluation of the polynomial P.8 Ze~CrostingEdge Detector 395 For each r E R. . For each index r in the index set.45) The approximate fitting problem is to determine coefficients ao. For example. have been computed. . The mathematics for weighted least squares is similar to the mathematics for equal-weighted least squares. .. the estimated polynomial Q(r) is given by This equation permits us to interpret Q(r) as a well-behaved real-valued function defined on the real line.K. ..K. at the index r . . . Once the fitting coefficients a k . In either case the result is The exact fitting coefficients and the least-squares coefficients are identical for m = 0. 5 N . .8. The equation for a.-.8 shows these weights for the 5 x 5 neighborhood.1 K such that is minimized. such that d(r) = z N-1 n=o anpn(r) (8. . to determine we need only evaluate In this manner the estimate for any derivative at any point may be obtained.. means that each fitting coefficient can be computed as a linear combination of the data values.

4~). .8.4 Directional Derivative Edge Finder We define the directional derivative edge finder as the operator that places an edge in all pixels having a negatively sloped zero crossing of the second directional derivative taken in the direction of the gradient.8 Kernels for estimating the cmfficients k l .4r) +kg(r2 2)c ksr(c2 .3. . . We denote the directional derivative off at the .2 ) +k7(r3 .3. Here we discuss the relationship between the directional derivatives and the coefficients from the polynomial fit.Figure 8. + + 8. .klo of the bivariate cubic kt +kzr +k3c +k4(r2 -2) +ksrc +k6(c2 .2) klo(c3 .

53) Talcing f to be a cubic polynomial in r and' c that can be estimated by the discrete orthogonal polynomial fitting procedure.c) = lim h -0 The simplest way to think about directional derivatives is to cut the surface f (r. For example. for the monomial r.c) (8. C) = k. To cut the surface f (r. .c +hcosa) .ocos3a)p3 g The first directional derivative off in the direction a can be visualized as the first derivative of f. It quickly follows by substituting f: for f that Similarly. In order for our notation to be invariant to the different discrete orthogonal polynomials that result from different neighborhood sizes.c) with a plane that is oriented in the desired direction and is orthogonal to the rowcolumn plane.8 by taking on appropriate linear combinations. .c).c). The intersection results in a curve.9 can be determined from the kernels associated with the orthogonal basis of Fig. 8.(p) taken with respect to p. c) in the direction a by fz(r. c) with a plane in direction a . where p is the independent variable.klo in the expression given above for a 5 x 5 window are shown in Fig.8 ZerctCrossing Edge Detector 397 point (r. k2r k3c k4? k5rc k6c2 (8. we simply require that r = p sin a and c = p cos a . f a(p) = kl (kzsin a k3 cos a)p + (k4sin2a k5sin a cos a kscos2a)$+ (k. 8. . 8. fr = -sin3a + dr2dc sin2a cos a + drac2.sin a cos2a + d 3 f cos3a 3alf dr3 dc3 .8. we rewrite this cubic in canonical form as f (r. . The derivative of the curve is the directional derivative. + + + + We denote the second directional derivative off at the point (r.(r. 383f (8. sin3a + k8sin2a cos a + k sin a cos2a + k. the kernel associated with any monomial of Fig.c) in the direction a by f .49) h The direction angle a is the clockwise angle from the column axis. It follows directly from this definition that fA(r. (p). we can compute the gradient off and the gradient direction angle a at the center of the neighborhood used to estimate f . It is defined as f ( r +hsina. This produces the curve f.f(r.9. Notice that because linear combinations of linear combinations are linear combinations.54) k7r3+ kgr2c k9rc2 k10c3 + + + + + + + + The kernels for directly estimating the coefficients k t . the kernel for k2 .

c). sin3a + k8sin2a cos a + k gsin a cos2a + k l ocos3a] (8. c ) = (6k7sin2a + 4k8sin a cos a 2k9cos2a)r (6klo a + 4k9 sin a cos a + 2k8sin2a)c cos2 + (2k4sin2a + 2k5sin a cos a + 2k6cosza) + + (8.56) We wish to consider only points ( r . and the kernel for k g involves r with a weight of -2. Hence r = p sina and c = pcos a. the kernel for k7 involves r with a weight of -3. Hence we can write a pictorial form for the kernel equation for r of Fig.58) . c ) on the line in direction a.9 as d =L 420 We obtain the gradient angle a by It is well defined whenever k$ + k: > 0 At any point ( r .398 he kcd Model * in Fig. 8. 8. the second directional .57) =Ap+B where A = 6 [k.4. derivative in the direction a is given by f :(r.8 involves r with a weight of 1.Then f :(p) = 6[k7sin3a + k8sin2a cos a + k gsin a cos2a + k l ocos3alp + 2 [k4sin2a + k5sin a cos a + k6cos2a ] (8.

kIo of the bivariate cubic k l +k2r+k~c+k4r2+k5rc+k.jc2+k7r3 + k ~ r ~ c + k ~ r c ~ +kloc3 for a 5 x 5 neighborhood. )pi < PO. .59) If for some p.8 Zero-Crossing Edge Detector 399 [&I k ~ o c3 Figure 8.where po is slightly smaller than the length of the side of a pixel. and fL(p) # 0.. . f:(p) = 0. . .9 Kernels for directly estimating the coefficients kl. we have discovered a negatively sloped zero crossing of the estimated second directional derivative taken in the estimated direction of the gradient. We mark the center pixel of the neighborhood as an edge pixel. fy(p) < 0. and B = 2 [k4sin2a + k5sin a cos a + kgcos2a] (8.8. and if required we make a note of the subpixel location of the zero crossing.

400 he ficet Model If our ideal edge is the step edge. c3 < 0 This makes -3clc3 +c: > 0. Hence for edge pixel candidates. Now a step edge does not change in its essence if it is translated to the left or right or if it has a constant added to its height. we must determine what it is about the cubic polynomial that is its fundamental essence after an ordinate and abscissa translation. We develop these invariants directly from the polynomial equation ga(p). They are defined by . we translate the cubic polynomial so that its inflection point is at the origin. we factor out the term (c: . The parameters are the distance between the relative extrema in the abscissa direction and those in the ordinate direction. which means that g. we have In our case since c1 = we know c l > 0. Calling the new polynomial g. Since the cubic polynomial is representing the step edge. For candidate edge pixels. To do this.3c1c3)15 3'5cf This produces d m . If a pixel is to be an edge. c3 < 0. relative extrema. 1986). the second directional derivative zero-crossing slope must be negative. This permits a rewrite to Let the contrast be C and the scale S. The parameters of the cubic that are invariant under translation relate to these relative extrema.(p) has . First. then we can refine these detection criteria by insisting that the cubic polynomial fe(p) have coefficientsthat make f e a suitable polynomial approximation of the step edge (Haralick.

8.8 Zero-Crossing Edge Detector

401

Finally, we have

g a b ) = c (SP - S3p3)

In this form it is relatively easy to determine the character of the cubic. Differentiating, we have

The locations of the relative extrema depend only on S. They are located at f1/S The height difference between relative extrema depends only on the contrast. Their . heights are f 2 ~ / 3 a Other characteristics of the cubic depend on both C and S. For example, the magnitude of the curvature at the extreme is 2 4 CS2, and the derivative at the inffection point is CS. Of interest to us is the relationship between an ideal perfect step edge and the representation it has in the least-squares approximating cubic whose essential parameters are contrast C and scale S. We take an ideal step edge centered in an odd neighborhood size N to have pixels with value -1, a center pixel with value 0, and + N pixels with value +l. Using neighborhood sizes of 5 to 23, we find the values listed in Table 8.1 for contrast C and scale S of the least-squares approximating cubic. The average contrast of the approximating cubic is 3.16257. The scale S(N) appears to be inversely related to N; S(N) = $. The value of So minimizing the relative error S(N) - %

a.

Tablo 8.1

Contrast C and scale S of the fitted cubic for an ideal step edge as a function of neighborhood size N. Contrast C Scale S

Neighborhood Size N

402

The Facet Model

These two relationships

for ideal step edges having a contrast of 2 can help provide additional criteria for edge selection. For example, the contrast across an arbitrary step edge can be estimated by 2C Edge contrast = (8.67) 3.16257 If the edge contrast is too small, then the pixel is rejected as an edge pixel. In many kinds of images, too small means smaller than 5% of the image's true dynamic range. Interestingly enough, edge contrast C depends on the three coefficients c , ,c2, and c3 of the representing cubic. Firstderivative magnitude at the origin, a value used by many edge gradient magnitude detection techniques, depends only on the coefficient c , . Firstderivative magnitude at the inflection point is precisely CS, a value that mixes together both scale and edge contrast. The scale for the edge can be defined by SON Edge scale = 1.793157 Ideal step edges, regardless of their contrast, will produce least-squares approximating cubic polynomials whose edge scale is very close to unity. Values of edge scale larger than 1 have the relative extrema of the representing cubic closer together than expected for an idea step edge. Values of edge scale smaller than 1 have the relative extrerna of the representing cubic farther away from each other than expected for an ideal step edge. Values of edge scale that are significantly different from unity may be indicative of a cubic representing a data value pattern very much different from a step edge. Candidate edge pixels with an edge scale very different from unity can be rejected as edge pixels. The determination of how far from unity is different enough requires an understanding of what sorts of nonedge situations yield cubics with a high enough contrast and with an inflection point close enough to the neighborhood center. We have found that such nonedge situations occur when a steplike jump occurs at the last point in the neighborhood. For example, suppose all the observed values are the same except the value at an endpoint. If N is the neighborhood size, then the the inflection point of the approximating cubic will occur at &%, plus sign corresponding to a different left endpoint and the minus sign corresponding to a different right endpoint. Hence for neighborhood sizes of N = 5,7,9, or 11, the inRection point occurs within a distance of 1 from the center point of the neighborhood. So providing the contrast is high enough, the situation would be classified as an edge if scale were ignored. For neighborhood sizes of N = 5,7,9,ll, and 13, however, the scale of the approximating cubic is 1.98, 1.81, 1.74, 1.71, and 1.68, respectively. This suggests that scales larger than 1 are significantly more different from unity scale than corresponding scales smaller than 1. In many images restricting edge scale to between .4 and 1.1 works well.

3

'g 1
, .

!

If one is interested in only straight-line edges, a further refinement is possible. We can insist that the curvature of the contour at the zero-crossing point be sufficiently small. Let (ro,co) = (posin 8, p cos 8) be the zero-crossing point. Let o

f = k2 + (2k4 sin 8 + k5cos 8)po + (3k7sin28 + 2k8sin 8 cos 8 + kg cos28)pi f, = k, + (k, sin 8 + 2k6cos 8)po + (ks sin28 + 2k9sin 8 cos 8 + 3klocos2 f, = 2k4 + (6k7sin 8 + 2k8cos 8)po , f , = 2k6 + (2k9sin 8 + 6klocos 8)po ,
Then the curvature K is defined by
f , = k5 (2ks sin 8 ,

+

+ 2k9cos 9)po

For straight lines, requiring K to be less than .05 is often reasonable.

Integrated Directional Derivative Gradient Operator
Accurate edge direction is important in techniques for grouping labeled pixels into arcs and segmenting arcs into line segments, as well as in Hough transformation techniques (Duda and Hart, 1972), which have been used extensively to detect lines (O'Gorman and Clowes, 1976), circles (Kimme, Ballard, and Sklansky, 1975), and arbitrary shapes (Ballard, 1981). Martelli (1972) and Ramer (1975) each use edge direction information to perform edge linking. Kitchen and Rosenfeld (1980) and Zuniga and Haralick (1983) use edge direction information in schemes to detect corners. The integrated directional derivative gradient operator (Zuniga and Haralick, 1987, 1988a), which we discuss here, permits a more accurate calculation of step edge direction than do techniques that use values of directional derivatives estimated at a point. Local edge direction is defined as the direction that is orthogonal to the estimated gradient direction and in which, if one walked along the edge, the higher-valued area would be to the right. Knowledge of the directional derivatives D, and D2 in any two orthogonal directions is sufficient to compute the directional derivative in any arbitrary direction. The gradient magnitude, which is defined as the maximum such D+ and : directional derivative, is computed as J : O its direction as tan-' D2/D,. From this perspective, estimating gradient direction requires estimates of the directional derivatives D l and D2. In previous sections we discussed how different operators have been utilized to estimate these directional derivatives at a point. Examples are the Roberts operator (Roberts, 1965), the Prewitt operator (Prewitt, 1970), the Sobel operator (Duda and Hart, 1972), and the Hueckel operator (Hueckel, 1973). These gradient operators all face a problem: Their estimate of edge direction for a step edge is inherently biased as a function of true edge direction and displacement of the true edge from the

404

The Facet Model

pixel's center. The bias occurs because the step edge does not match the polynomial model. Instead of computing directional derivatives at a point directly from the fitted surface, as in the case of the standard cubic facet gradient operator mentioned earlier, in this section we describe an operator that measures the integrated directional derivative strength as the integral of first directional derivative taken over a square area. The direction for the integrated derivative estimate that maximizes the integral defines the estimate of gradient direction. Edge direction estimate bias of the integrated directional derivative gradient operator is sharply reduced as compared with the bias of the standard cubic facet, Sobel, and Prewitt gradient operators. Noise sensitivity is comparable to that of the Sobel and Prewitt operators and is much better than that of the standard cubic facet operator. Also, unlike the standard cubic facet, Sobel, and Prewitt operators, increasing the neighborhood size decreases both estimate bias and noise sensitivity. For ramp edges the integrated operator is very nearly unbiased. The worst bias for the 7 x 7 operator is less than 0.09", for the 5 x 5 operator, less than 0.26". Section 8.9.1 describes the mathematical analysis necessary to derive the new gradient estimate. Section 8.9.2 provides a comparison of the integrated directional derivative gradient operator against the standard cubic facet gradient, Prewitt, and Sobel operators for step and ramp edges contaminated by zero-mean Gaussian noise.

8.9.1 Integrated Directional Derivative
Let F g represent the integrated first directional derivative along lines orthogonal to the direction 8 situated in a rectangle of length 2L and width 2W centered at the origin of the coordinate system and rotated by an angle of 8 in the clockwise direction. Then f ; ( ~ e. o sin e, -P sine cos 4LW for a given N x N neighborhood. The integrated gradient estimate is

F g = -L w J

,

JI:

+

+ w cos o)dpdw

(8.69)

G =FeMAX"@MAX (8.70) = where FeMAX ma- F g and U e M A x is a unit vector in the direction that maximizes Fn. Using the bivariate cubic fit, we obtain that f ,'(p cos 8 + w sin 0, -p sin 8 + w cos 8) reduces to
f;(pcosB + w sine, -psino + w cos8) = [k9sin38 kg cos3B (3k7 - 2k9)sin 8 cos28 + (3k lo - 2kg)sin28 cos 81 p2 2 [-kg sin38 (3k7 - 2k9)sin28 cos 8 (2kg - 3k10) 8 cos28 kgcos2B po sin ] + [3k7sin38 + 3klocos38 + 3k9sin 8 cos28 + 3kgsin28 cos 81w2 + [-k5 sin28 + 2(k4 - kg)sin e cos 8 + k5cos281p (8.71) +2[k4sin2B+k5sin8cosB +k6cos28]w +k2sin8 + k3cosB

+

+

+

+

+

+

8.9 Integrated Directional Derivative Gradient Operator

405

Substituting Eq. (8.71)in Eq. (8.69) results in

where A, B, C , D, E, and F are the coefficients of the quadratic relation (8.71). Evaluating this integral results in 1 1 F #= -AL' + +F 3 Finally,

+ 1 [(3k7- 2kg)L2+ 3k9 W] cos29 sin 9
1 + j [(-2k8 + 3klo)L2 3k8 w2]cos 9 sin29 + k2 sin 0 + k3cos 9 +

(8.72)

Thus Fe reduces to a trigonometric expression in sin0 and cos 0. Notice that if L = W, then Fe =
3 [(kg 3k7)sin39

+

+ (k8 + 3k10) 9 + (3k7+ kg)cos29 sin 9 cos3

Hence F g is maximized when

Then FeMax= (8.75) where Dl and D2 are the numerator and denominator of the argument of the tangent function in Eq. (8.73). In the remainder of our discussion we take the area of integration to be square: L = W.

d

~

m

.

8.9.2 Experimental Results
Experiments that illustrate how the integrated directional derivative gradient operators perform use step and ramp edges contaminated by zero-mean Gaussian noise. The step edges are generated in a rectangular grid with orientations 0 from 0° to 90" and with random displacement from the grid's center uniformly distributed within the range ( - D , D ) , with the maximum displacement D given by

406

The hcet Model

if we assume a unit distance between two 4-neighbor pixels in the grid. A step edge passing through a pixel divides it into two parts having areas Al and A2, with A, +Az = 1. If the corresponding gray level intensities on each side of the edge are I , and 12,then the pixel is assigned a gray level intensity I according to the rule

The experiments use values for I l and I2 equal to 100 and 200, respectively, which implies that the edge contrast is 100. Ramp edges are generated by defocusing step edges with a 3 x3 equally weighted averaging filter. Finally, both step and ramp edges are contaminated by adding zeromean Gaussian noise with a given standard deviation. For comparison purposes we use two performance measurements, edge direction estimate bias and edge direction estimate standard deviation. The latter measures noise sensitivity. The estimate bias is defined as the difference between the estimate mean direction and the true edge direction. Combining the previous two measurements by the root-mean-square error formula produces a single performance measurement. The experiments show that the integrated directional derivative gradient operator achieves best performance in the root-mean-square error sense when L = W = 1.8 for a 5 x 5 neighborhood size and L = W = 2.5 for a 7 x 7 neighborhood size for both step and ramp edges and for a variety of noise levels. We compare the following gradient operators: 5 x 5 extended Sobel (Iannino and Shapiro, 1979), 5 x 5 and 7 x 7 Prewitt, 5 x 5 and 7 x 7 standard cubic facet, and 5 x 5 and 7 x 7 integrated directional derivative. Figure 8.10 shows the 5 x 5 row derivative masks associated with each of the operators, and Fig. 8.11 shows the 7 x 7 row derivative mask for the integrated directional derivative gradient operator.

,+ g 7

g
!

f

Figun 8.10 Row derivative masks for gradient operators in 5 x 5 neighborhood size. (a) Integrated directional derivative; (b) standard cubic facet; (c) Rewitt; and (d) extended Sobel.

8.9 Integrated Directional Derivative Gradient Operator

407

Figure 8.11 Row derivative mask for integrated directional derivative gradient operator for 7 x 7 neighborhood size.

The column derivative masks can be obtained from the row masks by rotation of 90". For a step or ramp edge of a given orientation and noise standard deviation, each operator is applied to the grid's center 10,000 times, each time with a different noisy sample and a different edge displacement from the grid's center. Edge orientations vary from 0° to 90" and noise standard deviation from 0 to 100. Edge contrast is 100. The results can be seen in Figs. 8.12 and 8.13. Figures 8.12 and 8.13 show estimate bias against true edge direction for step and ramp edges under zero-noise conditions for the standard cubic facet gradient

0

---

-

True Angle (degrees)

90.00

5 x 5 Integrated directional derivative 5 x 5 Standard cubic facet 7 x 7 Integrated directional derivative 7 x 7 Standard cubic facet

Figure 8.12 Bias as function of true edge direction for step edges under zeronoise conditions for four different edge operators.

408

The Facet Model

0

-- - ----

True Angle (degrees)

90.00

5 x 5 Integrated directional derivative 5 x 5 Standard cubic facet 7 x 7 Integrated directional derivative 7 x 7 Standard cubic facet

Figure 8.13 Bias as function of true edge direction for ramp edges under zeronoise conditions for four different edge operators.

operator and the integrated directional derivative gradient operator. Three observations can be made: (1) The integrated operator is clearly superior to the standard cubic facet gradient operator. (2) Under zero-noise conditions the 7 x 7 integrated directional derivative gradient operator has a worst bias of less than O.WO. And (3) the 5 x 5 integrated directional derivative gradient operator has a worst bias of less than 0.26" on ramp edges. For comparison purposes, the 7 x 7 standard cubic facet gradient operator has a worst bias of about 1.2", and the 5 x 5 standard cubic facet gradient operator has a worst bias of 0.5". This improvement in worst bias remains when the edges are contaminated by additive independent zero-mean Gaussian noise. The estimate bias decreases for the integrated operator as the neighborhood size increases, whereas the opposite happens with the standard cubic facet gradient operator. Both operators perform better with ramp edges than with step edges. Figure 8.14 shows estimate standard deviation against noise standard deviation for a fixed step edge with orientation of 22.S0 and additive independent Gaussian noise. Again, the integrated operator is uniformly superior to the standard cubic facet gradient operator for both step and ramp edges. Under zero-noise conditions, Fig. 8.15 shows estimate bias of the Sobel and Prewitt operators as a function of tr@ edge direction for step edges. The 7 x 7 integrated operator has the smallest bias, followed by the 5 x 5 extended Sobel and the 5 x 5 and 7 x 7 Prewitt operators. Notice that for ramp edges the response of the integrated operator is nearly flat about zero, that is, the operator is nearly unbiased. For the 7 x 7 integrated operator, the worst bias is less than O.WO,and for the 5 x 5

'' 4
+.

8.9 integrated Directional Derivative Gradient Operator

409

0

-------

Noise standard deviation

100.00

5 x 5 Integrated directional derivative 5 x 5 Standard cubic facet 7 x 7 Integrated directional derivative 7 x 7 Standard cubic facet

Figure 8.14 Estimate standard deviation as function of noise standard deviation for step edge. Edge orientation is 22.S0. Edge contrast is 100.

I

I

I

I

I

I

,,
90.00

-7.00
0

I , ,

-------=-=-

True angle (degrees)

5 x 5 Integrated directional derivative 5 x 5 Sobel 5 x 5 Prewitt 7 x 7 Integrated directional derivative 7 x 7 Prewitt

Flgun 8.15 Estimate bias of the Sobel and Prewitt operators as function of true edge directionfor step edges under zero-noise conditions.

0

-- -----*---

True angle (degrees)

90.00

5 x 5 Integrated 'rectional derivative 5 x 5 Sobel 5 x 5 Prewitt 7 x 7 Integrated dictional derivative 7~7Prewitt

Figure 8.16 Estimate bias of the Sobel and Prewitt operators as function of
hue edge direction for step edge. Noise standard deviation is 25.

Edge contrast

is 100.

integrated operator, less than 0.26". For comparison purposes the worst bias in the 7 x 7 Prewitt operator is about 5" and in the 5 x 5 Prewitt operator about 4". Figures 8.16 and 8.17 show estimate bias as a function of true edge direction for step and ramp edges when the noise standard deviation is equal to 25. The bias for all the operators shown is nearly identical to the bias under zero-noise conditions. It can be seen from the plots of estimate standard deviation that, as expected, the 7 x 7 operators are less sensitive to noise than the 5 x 5 operators.

(eaq)

Comer Detection

The detection of comers in images is extremely useful for computer vision tasks. For example, Huertas (1981) uses comers to detect buildings in aerial images. Nagel and Enkdmann (1982) use comer points to determine displacement vectors from a pair of consecutive images taken in time sequence. Some approaches to comer detection rely on prior segmentation of the image and subsequent analysis of region boundaries. Rutkowski and Rosenfeld (1978) provide a comparison of several comer detection techniques along those lines. In this section we develop gray scale comer detectors, which detect comers by operating directly on the gray scale image. As Kitchen and Rosenfeld (1980) point out, the main advantage of such comer detectors is that their performance is not dependent on the success or failure of a prior segmentation step. Among

8.10 Corner Detectlon

41 1

0

- ------

True angle (degrees)

90.00

I . . -

5 x 5 Integrated'rectional derivative 5 x 5 Sobel 5 x 5 Rewitt

7 x 7 Integrateddirectional derivative 7~7R.ewitt

Figun 8.17 Estimate bias of the Sobel, Prewitt, and integrated directional derivative operators as function of true edge direction for ramp edge. Noise standard deviation is 25. Edge contrast is 100.

the earliest gray scale comer detectors is Beaudet's (1978) DET operator, which responds significantly near comer and saddle points. Kitchen and Rosenfeld report results using several operators that measure comemess by $theproduct of gradient magnitude and rate of change of gradient direction. Dreschler and Nagel (1981) investigate points lying between extrema of Gaussian curvature as suitable candidates for comer points. Comer detection is simply described by using the facet model. In general, what we are usually inclined to call a corner occurs where two edge boundaries meet at a certain angle or where the direction of an edge boundary is changing very rapidly. We associate comers therefore with two conditions: the occurrence of an edge and significant changes in edge direction. We discussed edge detection in Sections 8 . 6 8.9 Here we wl concentrate on change in edge direction, which is equivalent to il change in gradient direction since edge and gradient directions are orthogonal. Comers occur at edge points where a significant change in gradient direction takes place. Now this change in gradient direction should ideally be measured as an incremental change along the edge boundary. We do not desire, however, to perform boundary following, since that would require a prior segmentation step. There are several ways to handle this situation based on the realization that according to our model the direction of an edge point- that is, the tangent to the edge boundary at that point- is orthogonal to the gradient vector at that same point. The simplest approach is to compute the incremental change in gradient direction along the tangent line to the edge at the point that is a comer candidate. The second approach is to evaluate the

412

The Facet Model

incremental change along the contour line that passes through the comer candidate. Finally, we can compute the instantaneous rate of change in gradient direction in the direction of the tangent line. We will discuss each of these approaches. The properties of those neighborhood points away from the neighborhood center and possibly outside the pixel itself can be computed by two different methods: (a) using the surface fit from the central neighborhood and (b) using the surface fit from the neighborhood centered around the pixel closest to the given point. Although the first method is computationally less expensive than the second one, the latter may be more accurate.

8.1 0.1 Incremental Change along the Tangent Line
Consider a row-column coordinate system centered at the comer candidate point. Let 9(r, C) be the gradient direction at coordinate (r,c), and let = 8(0,O). Then ( ~ i n 9 ~ , c ois ~ ~ ) vector in the direction of the gradient at the origin. If the s a unit origin is an edge point, the direction of the line tangent to the edge boundary that passes through it is given by (- cos 9, sin 9), and an arbitrary point lying on that line is p( cos 8, sin 9) for some p. Consider two points PI = (rl,cl),P2 = (r2,c2)equidistant to the origin and lying on the tangent line (Fig. 8.18). PI and P2are given by -R( - cos 9, sin 9) and R(- cose, sine), respectively, where R is the distance from each point to the origin. If R is not too large, we can expect the true boundary to lie not too far away from either PI or P2.In this case a suitable test to decide whether the origin (0,O) is a comer point involves meeting the following two conditions:

-

"

-

%

4

1. (O,O),(rl,cl), and (r2,c2)are edge points. 2. For a given threshold Q, )9(rl,cl) - 9(r2,c2)1> Q.

Figure 8.18 Two points equidistant to the origin and lying on the line tangent to the edge boundary passing through it.

8.10 Corner Detection

413

Figure 8.19 Two points equidistant t the origin and lying on the contour line o
passing through it.

8.1 0.2 Incremental Change along the Contour Line
It is reasonable to assume that points on the edge boundary to each side of the comer point and close to it are k l y to have similar, but not necessarily the same, gray levels. This motivates us to approximate the edge boundary by the contour line { ( r ,C ) I f ( r ,C ) = f (0,O)) that passes through the comer candidate point at the origin of the coordinate system. We consider two points PI = ( r l ,c l )and P2 = (rz,c2)equidistant to the origin and lying on the .contour line (cf. Fig. 8.19). Let 8(r,c ) be the gradient direction at coordinates (r,c).The test to decide whether the origin (0,O)is a comer point is similar to the one used in the previous approach. That is, (0,O)is declared to be a comer point if the following two conditions are satisfied: 1. (O,O),(rl,cl), (r2,c2) edge points. and are
2. For a given threshold fl,18(rl,cl) B(r2,cz)( 0 . > This approach is computationally more expensive than the previous one because of the need to intersect the cubic curve f ( r ,c ) = f (0,O) (the contour line) with the quadratic curve r2 c2 = R2 in order to determine the points P , and Pz a distance

+

R from the origin.

8.10.3 Instantaneous Rate of Change
Let 8(r,c) be the gradient direction at coordinates (r,c), and let t9,(r,c) be the first directional derivative of 8(r,c ) in the direction a. We can compute 8i(r,c ) as follows: Let f ( r ,c ) be the surface function underlying the neighborhood of pixel values

Since (k2.(r.(p) = k3 The rate of change of gradient direction in the direction a evaluated at the origin (p = 0) is then eL(0) = k3(2k4 + ks cos a ) .79). Let f .80) in Eq.k3) is the gradient vector at the origin. and sina =- Jrn' -k3 k2 cos a = - Jm". An arbitrary point in this line is given by p(sin a . we can write this as Differentiating with respect to the parameter p results in Using the bivariate cubic polynomial approximation for f . and the gradient direction at that point is given by O. we have: f r ( ~ ) k2 = + (2k4 sin a + ks cos a)p + (3k7 sin2a +2kasin a cos a +k9cos2a)p2 + (k5sin a + 2k6cos a)p + (k8sin2a + 2k9 sin ar cos a + 3klocos2a)p2 f :(p) = (2k4 sin a + k5 cos a ) + 2(3k7sin2a + 2k8sin a cos a + k9cos2a)p f f (p) = (ks sin a + 2k6 cos a ) + 2(ks sin2a + 2k9sin a cos a + 3klocos2a)p f .k2(k5 kf + k:: a +2k6 cos a ) (8. Finally.pcosa) (p sin a . we obtain .79) We are interested in. wing Eq. (-k3. (8. cos a). c) and f .the value of 8:(0) when the direction a is orthogonal to the gradient direction at the origin (the edge direction). Consider the line passing through the origin in the direction a. p cos a ) = tan-' fr(p sina.(p sin a . (8.414 Tho hcet M O M centered at the comer candidate pixel. c) denote the row and coIurnn partial derivatives of f . p cos a ) fc Since a can be considered fixed. k2) is a vector orthogonal to it.(r.

2. one can use it for p = po. A true comer is defined as the interior pixel in the rectangular shape where two adjacent sides meet. This table shows that a very high percentage of the assigned comer points are guaranteed to lie within one pixel distance from the true comer point. 8. then this point is declared to be an edge point. If a zero-crossing edge detector is used to decide whether to label a pixel as edge. The thresholds for gradient direction change were selected to equalize as well as possible the conditional probability of assigning a comer within a given distance d from the true comer. 8.2 shows the probability of correct comer assignment for each case for distances d = 0 and d = 1. the artifically generated image has a lODB signal-to-noise ratio.8. Independent Gaussian noise with mean 0 and standard deviation 10 has been added to this image.4 Experimental Results We illustrate the performance of the various facet model-based gray level comer detectors by applying them to two digital images. where po is the zero-crossing point. and the background has gray level intensity 75. If the gradient exceeds the threshold value and a negatively sloped zero crossing of the second directional derivative occurs in a direction of f14. The perfect and noisy versions are shown in Fig. The second one is a real aerial image of an urban scene.10 Corner Detection 415 The test to decide whether the origin (0. The first one represents a set of artificially generated rectangular shapes at different orientations. The rectangles have gray level intensity 175.9" of the gradient direction within a circle of radius one pixel length centered at the point of test.O) is a comer point follows: We declare (0. The method that performs the best is the one that measures change in gradient direction as incremental change along a contour line and that computes properties of tested points away from the neighborhood center by using the surface fit from the neighborhood centered around the pixel closest to the . The first image is 90 x 90 pixels and contains rectangular shapes of 20 x 20 pixels with orientations ranging from 0" to 90" in 10" increments.O) is an edge point. Table 8. If we define the signal-to-noise ratio as 10 times the logarithm of the range of signal divided by the standard deviation of the noise. (0.1 0. For a given threshold R.20. and the conditional probability of there being a true comer within a given distance d of an assigned comer when a comer is assigned.Based Corner Detectors Each of the comer detection techniques discussed previously was applied to the artificially generated noisy image by using a neighborhood size of 7 x 7 pixels and a gradient strength threshold for edge detection equal to 20. instead of employing the test at the origin. lO:(O)l > s2. given that there is a true comer.O) to be a comer point if these two conditions are satisfied: 1. Facet Model.

2 with the performance of two other gray lcvcl corner detectors: Kitchen-Rosenfcld ( 1980) and Dreschler-Nagel ( 1982).: thc simplest one. which uses incremental change along the tangent line and ptoperticc..20 Perfect and nnihq srt~ficlollq~enerdted Inragcs and the ner~alwent. . Surprisingly. Comparison with Other Gray Tone! Corner Detectors Table 11. (c)Aerial wcnc Flgwre 8.3 compares the performance of the best facet modcl-based corner detector according to Table 8.4 16 The Facet Model (a) Origlt~~J " ( b ) N(31fy " <"* . the next best i. tcsted point. from the same corner candidate central neighborhood for all the tested points in the tangent line. - - ' % * .

361 94' O. c.25 55' 3 0.10 Corner Detection 417 Table 8. Wrameter d is the maximum distance between assigned and m e corners.2 Performance of the facet model-based comer detectors. da = 1 P(TCIAC)' Grad threshold -20 P(AC/TC)' Incremental 0.50 pixels.075 16 ' /pixel a. they computed for each pixel in the image the Gaussian curvature. Dreschler and Nagel detected comers by the following procedure. Incremental 0.97 change along tangent line.99 d=O Angle P(AC/TC) P(TC/AC) Angle Threshold Threshold 47' 0. Central neighborhood increment= 3.94 change along contour line. First.97 67' 0. Central neighborhood increment= 3.111 0.50 pixels.8.278 0. Nearest neighborhood increment= 3. Kitchen and Rosenfeld investigated several techniques for gray level comer detection. and then comers were obtained by thresholding.97 change along contour line.% 13 " /pixel 0.99 76' 0. The Gaussian curvature is the product of the main curvatures (eigenvalues of the . Their best results were achieved by measuring comemess by the product of gradient magnitude and instantaneous rate of change in gradient direction evaluated from a quadratic polynomial gray level surface fit.108 80' 0.97 change along tangent line. P(TC/AC) is the conditional probabiiity of there being a true corner when a corner is assigned. Incremental 0. b. Incremental 0. Each one computed for every pixel in the image a measure of comemess.294 63' 0.50 pixels. Instantaneous 0.278 0.94 50' 0.361 0.94 rate of change.083 0. Nearest neighborhood increment= 4 pixels. P(AC/TC) is the conditional probability of assigning a corner given that there is a corner. This entailed doing a local quadratic polynomial fit for each pixel and computing the Hessian matrix. 0.

The orientation of the main curvature that changed sign between the two extrema pointed into the direction of the associated extremum. We slightly modified the Kitchen-Rosenfeld comer detector by considering only points whose gradient exceeded a given threshold. The pixel's steepest slope w s along the line that connected the location of a maximum with the location of minimum Gaussian curvature. they found the locations of maximum and minimum Gaussian curvature. In all cases we used a cubic polynomial fitting on a 7 x 7 pixel neighborhood.35 P(ACmC) 0. d=l d=O corner detector Kitchen-Rosenfeld No gradient threshold Kitchen-Rosenfeid Gradient thnsbold=20 Dreschlcr-Nagel Gradient threshold=20 Best facet-model P(AC/TC) 0. The search for Gaussian curvature extrema was done in a 5 x 5 neighborhood. and a gradient threshold had to be used to improve its performance.21 illustrates the results of applying the facet model-based.418 The Facet Model Table 8.059 0.021 0. This resulted in substantial improvement of the original Kitchen-Rosenfeld method.36 P(TC1AC) 0. and Dreschler-Nagel gray level comer detectors to the artificially generated noisy image. 3. Since all three methods use the same cubic polynomial surface fit and the same 7 x 7 pixel neighborhood size. the same gradient threshold of 20 was used in each to minimize the effects of the noise.) 2. (This was done only for extrema lying within a given radius from the comer candidate pixel.36 0.84 0. 1. A pixel was declared to be a comer if the following conditions were satisfied.97 0.055 0. The best results according to this table are obtained by using the facet model-based comer .361 0.3 shows the probability of correct corner assignment for each case. The DreschlerNagel comer detector proved to be the most sensitive to noise.055 P(TC1AC) 0.83 0. The gray level intensity at the location of maximum Gaussian curvature w s a larger than the gray level intensity at the location of minimum Gaussian curva- ture.33 Hessian matrix). and DreschlerNagel comer detectors. Table 8.05 0.97 0. Figure 8.361 0. Kitchen-Rosenfeld.3 Performance of the best facet model-based. KitchenRosenfeld.055 0. Next.

detector. The Dreschler-Nagel corner detector performs the worst. I n all cases we used a cubic polynomial fitting on a 7 x 7 pixel neighborhood. Finally.22 illustrates the results obtained by applying each of these corner dctectots to the aerial image.1 I lsolroplc Derlvatlvr Msgnltudes 419 (a) Facet G-20 (b) Kit-Ross G-20 (c) Dre-Nnq (2-20 (d) Kit-Ross G-O Flgure 8.8. It should then come as no surprise that higher-order features can . followed by the Kitchen-Rosenfeld corner detector. the Kitchen-Rosenfeld (wilh and wlthout gradient threshold). and the DreschIerKdgel comer detectors (clockwl~efrom top left).3 for d = 1 .21 Comparison of the comer assignments for the facet model-based. Fig. a - l~olmplcDerivative Magnitudes A gradient edge can be understood as arising from a first-order isorropic derivative magnitude. 8. Gradient thresholds are equal to 16. Pdramerers are shown in Table 8.

we have in the new ( r ' . 1970. c') coordinates + + . arise from higher-order derivative magnitudes (Prewitt. Let us first consider the simple linear function f (s. c ) = k. 1978. If we rotate the coordinate system by 8 and call the resulting function g.420 The Facet Model Figurs 8. and the Dreschler-Nagel corncr dctectors (clockwise From top left). kzr k. 1981).22 Compar~sonof the comer asslpnments in the aerial sccne for the facet mcdcl-based. and Haralick.c. Beaudet. the K~tchen-Rmenfeld(wtth and without threshold). In this section we determine those linear combinations of squared partial derivatives of two-dimensional functions that are invariant under rotation of the domain of the two-dimensional function.

11 Isotropic Derivath Magnitudes 421 and g(rt.c ) so that Hence the sum of the squares of the first partials is the same constant k.d f .8.= 8 f 8f + a f ac = ar act a~ act ar af cos 0 a~ . Then we state the theorem giving the formula for the isotropic derivative magnitude of any order.-. Proceeding as we did with the first-order case. the squared gradient magnitude.=-co~e+-sine scar ar af ac + af -.c') = f ( r . upon writing the rotation equation with r' and c' as the independent variables. + ki. Now expressing in terms of we have drl=drdrl act af afar a f a c af +-. we have Let the rotated function be g.c') = f ( r . In the remainder of this section we explicitly develop the direction isotropic derivative magnitudes for the first and the second derivatives of an arbitrary function.c ) . for the original function or for the rotated function. Then g(rt.

g (r'. c) in the unrotated coordinate system.Then Thus for each point (r.~ ~ ~ O ~ i n ~ O dr2 drdc dc2 +d2f -sin20 ac2 f = -*BYcosOsinO + .~ i n 2 0 ) + . c). Thus for each point in the unrotated coordinate system. and its value is X = 2. we have a2f B'f = B'fcos2e +2--cosesin0 a r R 8r2 drdc drtdcl = f a2f -_ 8 2 f ~ ~ s O ~ i nd 2+ . Proceeding in a similar manner for the second-order partials. .cds2 ~ o 2 dca dr2 drdc dc2 Looking for some constant X that makes drdc we discover that exactly one does exist.( c o ~ 2 0 . where. produces the same value as in the rotated coordinate system. c') = f (r.

Let f : EN + E be Cm.. To see this. so that . Specializing this to two dimensions and third-order partials.11 Isotropic ~erivatlveMagnitudes 423 produces the same value as The direction isotropic second-derivative magnitude is therefore Higher-order direction isotropic derivative magnitudes can be constructed in a similar manner. $ = p. Theorem 8. and P' y = Px. regardless of which orthogonal coordinate system it is taken in.k)th entry of P is pnk. we obtain that the following is isotropic: ("f acdrdr )'+(dcardc )'+(dcacdr )'+(dcacdc )' " "I d3f It should be clear from this example that the binomial coefficients arise because of the commutivity of the partial differentialoperators. consider the following theorem. where the ( n . Let M be any positive integer. The coefficients of the squared partials continue in a binornial coefficient pattern for the two-dimensional case we have been describing. be an N x N orthonormal matrix. Then Proof: First notice that Since y = Px.1: The sum of the squares of all partial derivatives of the same order is isotropic. which states that the sum of the squares of all the partials of a given total order is equal.8.

we denote by f j l. Therefore Ridges and Ravines on Digital Images What is a ridge or a ravine in a digital image? The first intuitive notion is that a digital ridge (ravine) occurs on a digital image when there is a simply connected sequence of pixels with gray level intensity values that are significantly higher (lower) in the . Now upon recursive application of these equations.424 The Facet Model To make our notation simpler. we denote O/ayn by fn(y) and can therefore write Following the subscript notation for partial derivatives. we can Then But since p is orthonormal. (y). . write ..

+ -) + Z (dcZ-. to variations. 8. the locus of points on their surfaces having surface normals pointing in the direction of the camera generates pixels on a digital image that are ridges. The ridge peak or the ravine bottom would occur when the first directional derivative has a zero crossing.1 Directional Derivatives Recall that the first directional derivative 'of a function f in direction a at rowcolumn position (r. as well as on the length of the sequence. we need to use the neighborhood of a pixel to estimate a continuous surface whose directional derivatives we can compute analytically. we find that the second directional derivative can be expressed as a linear combination of two terms. or to a three-dimensional surface structure. Ridges and ravines may arise from dark or bright lines or from an image that might be related to reflections. The facet model can be used to help accomplish ridge and ravine identification. Here the concept of line translates in terms of directional derivatives. Therefore one important part of the computer vision algorithm lies in the detection of ridge and ravine pixels. just as we did for the facet edge operator. we would walk in the direction having the greatest magnitude of second directional derivative.84) .br2) cos 2a + drat sin 2a] 2 ar2 bc2 [ (8. we must first translate our notion of ridge and ravine to the continuous-surface perspective. Significantly higher or lower may depend on the distribution of brightness values surrounding the sequence.8. Line and curve finding plays a universal role in object analysis.c) is denoted by fL(r. the first being the Laplacian off and not depending on a and the second depending on a: 1 a2f a2f 1 a2f a2f af ' fT = . If we picture ourselves walking by the shortest distance across a ridge or ravine.12.(. To use the facet model. c) and can be expressed as From this it follows that the second directional derivative in direction a can be expressed by Rearranging the expression for f z .12 Ridges and Ravines on Digital Images 425 sequence than those neighboring the sequence. we can use a functional form consisting of a cubic polynomial in the two-variables row and column. For elongated objects that have curved surfaces with a specular reflectance function. Thus to label pixels as ridges and ravines. To do this. Linearly narrow concavities on an object surface (such as cracks) are typically in shadow and generate pixels on a digital image that are ravines.

we declare the pixel to be a ridge or a ravine.2 Ridge-Ravine Labeling To label a pixel as a ridge or a ravine. and solving for a: Therefore . =& iZ" indicating that the extremum is a relative maximum. if in one direction we find a ridge and in the other a ravine. we set up a coordinate system whose origin runs through the center of the pixel. (8. To express this procedure precisely and without reference to a particular basis set of polynomials tied to a neighborhood size.c ) = k .86).1 2.426 The Facet Model We can determine the direction a that extremizes f: by differentiating f with : respect to a. : 8. Of course. Having a direction a . and when the minus signs are taken. We select a neighborhood size to estimate the fitting coefficients of the polynomials.k J . we next need to see if by traveling along a line passing through the origin in the direction a . the first directional derivative has a zero crossing sufficiently near the center of the pixel. the direction a that makes f a maximum differs from the a that makes f t a minimum by r / 2 radians. + k2r + k3c + k4r2+ k5rc Then the two directions for a are given by a = f tan-' k5/(ka. then the pixel is a saddle point. ~ ~ ~ sin 2a = f ( . we rewrite the fitted bicubic surface in a canonical form: t t f ( r .) / D ac o s 2 a = & n d where a2f a2f It is easy to see that when the plus signs are taken. from which the two directions of the extremizing a can be computed by Eq. indicating that the extremum is a relative minimum. Also. Using the fitted polynomials.setting the derivative to zero. If so. . we can compute all second partial derivatives at the origin.

in general. The first complication is due to the fact that the fitted fo is. the largest-magnitude root pL.3AC (8. we label the pixel as a ridge or a ravine depending on the sign of the second directional derivative.90) + At those positions for p that make fL(p) = 0. We can analyze the significance in terms of the fitted cubic dynamic range. Let a represent the left endpoint of the interval. and its inflection point.8. we constrain r and c by r =psinot Therefore in the direction a we have where A = (k. There are some practical complications to this basic procedure. Pixels that are both ridges and ravines can be labeled as saddles. sin3a and c = p c o s a + k8sin2a cos a + kgcos2a sin a + k . the relative depths of its extrema. cos3a ) ~ B = (k4 a + k5sin a cos a + kgcos2a ) sin2 The first directional derivative in direction a is given by and the second directional derivative in the direction a at p away from the center of the pixel is given by f :(p) = 6Ap 2B ( 8.9 1) PL = 3A The smallest-magnitude root ps can be found by Ps =- C 3APL (8. that is.m?fr(p) The relative depth is defined as follows: Suppose the extremum is a relative minimum. It may or may not be significant. The dynamic range of the cubic segment is defined by Range = y f r ( p ) . the center point of the pixel. even though the data do not.92) If the smallest-magnitude root is sufficiently close to zero. the relative extrema of the fit may or may not correspond to extrema in the data. The cubics of interest have extrerna. the value of ft(p) is proportional to the curvature of the surface.can be found by -B .3AC > 0. This is an artifact that results from using a cubic fit.12 Rldges and Ravlnes on Dlgltal Images 427 To walk in the direction a. If B2 .sign(B)dB2 . Such a fitted cubic arising from data that are as simple as piecewise constants with one jump have relative extrema. b the right endpoint of the . a cubic.

i the locat~an thc inflection point. and u the location o f the relative minof imum.23 shows an image and its first. If the relative minimum occurs t ( ~ left of the relative maximum. then the depth is defined by Depth min{fdaI.and Id) th~rrl isotrop~cder~vative. One extremum of this ringing is closer to the origin.f d i ) ) . Notice that in each of these cases the (cl der~ratlic.fdi)) . considcr a one-dimensional data pattern that is a constant with a jump chanye at one end. the depth the of the minimum is defined by Depth = min{fP(b).f d v 1 The rclative depth of the minimum is then defined by Relative depth = depchlrange To illu\trate. second. In each of these cases the cubic fitted to the data "rings" around the large constant region where thc data is zero. The derivative estimates were obtained by an equal-weight bivariate cubic fit over a 5 x 5 neighborhood. ( c ) iecond ~+orropic . Figure 8. This extremum is an anifact.f p C ~ ) If the relative minimum occurs to the right of the inflection point. and third isotropic derivative magnitude features. (dl Figure 8.23 tal Or~glnal imagr: {h) first isotropic derlvallve.428 The facet Model interval.

8. indicating a relatively high frequency ringing.4 Relative depth data.4. Notice that in these cases the inflection points tend to be much farther away from the extremum indicating that the "ringing" behavior has a lower frequency.12 Ridges and Ravines on Digital Images 429 - Table 8. we must understand and interpret the data through the "eyes" of a cubic polynomial. Neighborhood Size Data Position of Extremum Closest to Origin Position of Inflection Point Relative Depth inflection point is not too far from the origin. which measures the significance of the ringing. What all this means is that in using fe to evaluate derivatives. I < radius threshold 1 Position of inflection from origin I > distance . A cubic extremum is significant in the sense that it reflects an extremum in the fitted data only if it passes the following tests: 1. as in the one-dimensional patterns of Table 8. Not all extrema of the cubic are significant. Also for all these cases the relative depth. Compare this situation to the case in which the data clearly have an extremum near the origin. Also the relative depth tends to be much larger in the case of a true extremum. I Position of extremum from origin 2. is close to zero.

ridge. test 3 taking into account that for true extrema the period increases with the size of the fitting interval.13.756* size of interval 4. flat. Test 5 guarantees that the curvature at the extrema is sufficiently high. or enhancing. The only diffference is that the enhanced image has more contrast. Relative depth > .variety of sensing-camera gain settings. It can be visually enhanced by an appropriate adjustment of the camera's dynamic range. Fbr example. is nicer to look at. Examples of such categories are peak. cannot ever hope to have the robustness associated with human visual perception. saddle. B I Topographic Primal Sketch [ 8. and Laffey. The last fact is important because it suggests that many of the current lowlevel computer vision techniques. gray level."@ > curvature threshold *) ( Test 1 guarantees that the extrema are close enough to the origin. pit. and can be understood more quickly by the eye.2 5. t Introduction The basis of the topographic primal sketch consists of the labeling and grouping of the underlying image-intensity surface patches according to the categories defined by monotonic. From this initial classification we can group categories to obtain a rich. edges based on zero crossings of second derivatives will change in position as the monotonic gray level transformation changes because . We call this representation the topogmphic primal sketch (Haralick. 1983). exactly the same visual interpretation apd understanding of a pictured scene occurs whether the camera's gain setting is low or high and whether the image is enhanced or unenhanced. and invariant functions of directional derivatives. and structurally complete representation of the fundamental image structure. They cannot have the robustness because they are inherently incapable of invariance under monotonic transformations. If. Test 4 guarantees that the relative extrema have a significant enough height compared with the dynamic range of the fitted cubic segment. enhancing point operators of this type include histogram normalization and equal probability quantization. 1 Distance between roots I > 1. The gain-setting. point operator changes the image by some monotonically increasing function that is not necessarily linear. ravine. and hillside. For example. which are based on edges. Invariance Requirement A digital image can be obtained with a.3. nonlinear. In visual perception. Tests 2 and 3 guarantee that the "ringing" behavior has a long enough period. hierarchical. Watson.

13. valley. saddle. as Ehrich and Rith do. however. Gray level changes are usually associated with edges. flat. Instead of concentrating on gray level changes as edges. if we could describe the shape of the gray level intensity surface for each pixel. for each area of gray level change. pit. upon computing and binding to each topographic label numerical descriptors such as gradient magnitude and direction. Background Marr (1976) argues that the first level of visual processing is the computation of a rich description of gray level changes present in an image. Knowing that a pixel's surface has the shape of a peak does not tell us precisely where in the pixel the peak occurs. we can obtain a reasonable absolute description of each surface shape. The topographic primal sketch has richness and invariance and is very much in the spirit of Marr's primal sketch and the thinking behind Ehrich's relational Erc trees ( h i h and Fbith 1978). . However. in a relative way. a description that includes type. and that all subsequent computations are done in terms of this description. convex hillside. the topographic categories of peak. Furthermore. or on one-dimensional extrema. flat. ridge. 8. and hillside do have the required invariance. as well as directions of the extrema of the second directional derivative along with their values. saddle. the entire surface of the image's gray level intensity values.2 Mathematical Classification of Topographic Structures In this section we formulate the notion of topographic structures on continuous surfaces and show their invariance under monotonically increasing gray level transformations. We will use the following notation to describe the mathematical properties of our topographic categories for continuous surfaces. which he calls the primal sketch. position. We consider each area on an image to be a spatial distribution of gray levels that constitutes a surface or facet of gray level intensities having a specific surface shape. The topographic labeling. does satisfy Marr's (1976) primal sketch requirement in that it contains a symbolic description of the gray level intensity changes. saddle hillside. The shapes that have the invariance property are peak. ridge.1 3 Topographic Primal Sketch 431 convexity of a gray level intensity surface is not preserved under such transformations. Marr (1980) illustrates that from this information it is sometimes possible to reconstruct the image to a reasonable degree. then by assembling all the shape fragments we could reconstruct. and concave hillside. It is likely that. ravine. as Marr does. with hillside having noninvariant subcategories of slope. nor does it tell us the height of the peak or the magnitude of the slope around the peak. inflection. and hillside. orientation. and fuzziness of edge.8. we concentrate on all types of two-dimensional gray level variations. pit. and Marr's primal sketch has.

In order to calculate these values. The second directional derivaf tives may be calculated by forming the Hessian. These five partials are as follows: The gradient vector is simply (a /ar.and second-order partials with respect to r and c need to be approximated.1 2 JX21.@(I) = value of the first directional derivative in the direction of a") Vf om = value of the fiisf difectiomi derivative in the direction of d2) Without loss of generality we assume J X . = (sin /3 cos / w 3 Thus (:ti) 'cs Rrthennore. We m y obtain the values of the first directional derivative by . Each type of topographic structure in the classification scheme is defined in terms of the quantities listed above. -.Vf = gradient vector of a function f 11 Vf 1 1 = gradient magni~de a ' = unit vector in the direction in which the second directional derivative () has the greatest magnitude = unit vector orthogonal to a") X I = value of the second directional derivative in the direction of o") Xz = value of the second directional derivative in the direction of vf . calculation of the eigenvalues and eigenvectors can be done efficiently and accurately by using the method of a Rutishauser (1971). Since H is a 2 x 2 symmetric matrix. and their associated eigenvectors are the directions in which the second directional derivative is extremized.since the order of differarx entiation of the cross partials may be interchanged. the first. the two directions represented by the eigenvectors are orthogonal to each other. This can easily be seen by rewriting f as the quadratic form : ( j . f. O l three pany rameters are required to determine the Hessiai m t i H. where the Hessian is a 2 x 2 matrix defined as = a2 /Br2 B2f /ar OC) f a2 lac dr a2 lac2 f f Hessian matrices are used extensively in nonlinear programming. The eigenvalues of the Hessian are the values of the extrema of the second directional derivative. af lac).

Having the gradient magnitude and direction and the eigenvalues and eigenvectors of the Hessian. no matter what direction we look in. di)= 0. i = 1 or 2. At a peak the gradient is zero. V f )]'/2 is the curvature in the direction w").24 Right circular cone. and the second directional derivative is negative in all directions. + Peak A peak (knob) occurs where there is a local maximum in all directions. then X. we can describe the topographic classification scheme. bowl) is identical to a peak except that it is a local minimum in all directions rather than a local maximum. A point is therefore classified as a peak if it satisfies the following conditions: Pit A pit (sink./[1 (Vf .8. The curvature is downward in all directions. and the Figure 8. .24). To test whether the second directional derivative is negative in all directions. we see no point that is as high as the one we are on (Fig.1 3 Topographic Primal Sketch 433 simply taking the dot product of the gradient with the appropriate eigenvector: A direct relationship exists between the eigenvalues X1 and X2 and curvature in the directions w(') and o(*) : When the first directional derivative Vf . we simply examine the value of the second directional derivative in the directions that make it smallest and largest. In other words. At a pit the gradient is zero. we are on a peak if. 8.

25). and the gradient is zero along it. The direction in which the local maximum occurs may correspond to either of the directions in which the curvature is "extrernized.434 The Facet Model second directional derivative is positive in all directions. . A point is classified as a pit if it satisfies the following conditions: Ridge A ridge occurs on a ridge line. this leads to the first two cases listed below for ridge characterization. A ridge occurs where there is a local maximum in one direction. As we walk along the ridge line. curved upward. then the ridge line is horizontal. a curve consisting of a series of ridge points. For nonflat ridges. (Fig. Furthermore. This corresponds to the third case. or curved downward. the ridge line may be flat. Therefore it must have a negative second directional derivative in the direction across the ridge and also a zero first directional derivative in the same direction. A point is therefore classified as a ridge if it satisfies any one of the following three sets of conditions: - Figure 8. sloped upward. the points to the right and left of us are lower than the ones we are on. sloped downward. The defining characteristic is that the second directional derivative in the direction of the ridge line is zero and the second directional derivative across the ridge line is negative." since the ridge itself may be curved. If the ridge is flat. 8.25 Saddle surface with ridge and ravine lines and saddle hillsides.

A geometric way of thinking about the ridge definition is to realize that the = 0 means that the gradient direction (which is defined for condition V f . At a saddle the gradient magnitude must be zero. A point is classified as a saddle if it satisfies the following conditions: Flat A flat (plain) is a simple. horizontal surface (Fig.X2 = 0 Given that these conditions are true.25). and the surface increases in this direction. At this point the maximum magnitude of the third directional derivative is nonzero. we m y further classify a flat as a foot or a a shoulder. A foot occurs at the point where the flat just begins to turn up into a hill. A point is classified as a ravine if it satisfies any one of the following three sets of conditions: A saddle occurs where there is a local maximum in one direction and a local minimum in a perpendicular direction (see Fig. nonzero gradients) is orthogonal to the direction w ( ' ) of extremized curvature. The shoulder is an analogous case and occurs where the flat is ending and turning down into a hill. It therefore must have a zero gradient and no curvature.26). 8. A saddle must therefore have a positive curvature in one direction and a negative curvature in a perpendicular direction.25). As we walk along the ravine line. 8. If the third directional derivative is zero . and the extrema of the second directional derivative must have opposite signs. A point is classified as a flat if it satisfies the following conditions: IlVf 11 = 0. 8.Xl = 0. At this point the third directional derivative in the direction toward the hill is nonzero. Ravine A ravine (valley) is identical to a ridge except that it is a local minimum (rather than maximum) in one direction. the points to the right and left of us are higher than the one we are on (see Fig. and the surface decreases in the direction toward the hill.

When the slope of the second directional derivative is negative.26 Hillside.8. Thus a flat may be further qualified as being a foot or a shoulder or not qualified at all. then the gradient must be nonzero.. in all directions. has a constant gradient). Hillside A hillside point is anythmg not covered by the previous categories. If its curvature is positive (upward). A saddle hill is illustrated in Fig. we simply take the complement of the disjunction of the conditions given for all the previous classes.436 The Facd Model Figure 8. If there is curvature. the inflection point class is the same as the step edge defined in Section 8. Therefore a point is classified as a hillside if all three sets of the following conditions are true (+ represents the operation of . A point on a hillside is an inpaction point if it has a zero crossing of the second directional derivative taken in the direction of the gradient. and the concave hill are illustrated in Fig. 8. Thus if there is no curvature. If the hill is simply a tilted flat (i. the area is called aflat.25. If its curvature is negative (downward). we call it a mncave hill. 8. we call it a slope.26. the convex hill. To determine whether a point is a hillside. It has a nonzero gradient and no strict extrema in the directions of maximum and minimum second directional derivative.e. . then we are in a flat. we call it a saddle hill. we call it a convex hill. If the curvature is 0 in all directions. then the point must not be a relative extremum. The flat. If the curvature is up in one direction and down in a perpendicular direction. not near a hill.

. or *. or saddle hill is classified as an inflection point if there is a zero crossing of the second directional derivative in the direction of the maximum first directional derivative (i. convex hill. The table exhaustively defines the topographic classes by their gradient magnitude. Each entry in the table is either 0. concave hill.5 summarizes the mathematical properties of our topographic structures on continuous surfaces.means significantly different from zero on the negative side. +. The label "Cannot occur" means that it is impossible for the gradient to be nonzero and for the first directional derivative to be zero in two orthogonal directions. -. + . The distinction can be made as follows: Slope if XI = X2 = 0 Convex if XI > X2 2 0. the gradient). and first directional derivatives taken in the directions that extremize second directional derivatives. a point is classified as a hillside if any one of the following four sets of conditions is true: We can differentiate between classes of hillsides by the values of the second directional derivative.. Summary of the Topographic Categories Table 8.13 Topographic Primal Sketch 437 logical implication): A1 = A2 = 0 -+ IlVf 11 # 0 and and XI # 0 -t V f -w'l' # 0 Rewritten as a disjunction rather than a conjunction of clauses.8. The 0 means not significantly different from zero. second directional derivative extrema values. means significantly different from zero on the positive side. XI #0 * X2 < 0 A slope.e. XI # 0 Concave if XI Saddle if XI I X2 I 0. and * means it does not matter.

ridge. and the directions of second directional derivative extrcma for peak. and hillside). and saddle arc all invariant under monotonically increasing gray level transformations. we choose w(') and w(') such that Vf . Each topographic category has a set of mathematical properties that uniquely determines it. . ravine.5 Mathematical properties of topographic structures. 0 0 0 + + + + * - o o + * 0 o 0 0 o o o 0 0 -. can which implies that a(')and dZ) be any two orthogonal directions. the gradient direction. A l posl sible combinations of first and second directional derivatives have a corresponding entry in the table.+ + * + * + * 0 * -.+ * -. unless the gradient is zero.w(') # 0 and Vf .+ + * o -.+ Peak Ridge Saddle Flat Saddle Ravine Pit Hillside (concave) Ridge Ridge Hillside (concave) Hillside (saddle) Hillside (slope) Hillside (saddle) Hillside (convex) Ravine Ravine Hillside (convex) Cannot occur From the table one can see that our classification scheme is complete. flat. ridge.w is always zero in an extreme direction.) # Invariance of the Topographic Categories In this section we show that the topographic labels (peak. pit. ravine.+ * + - o + + -. (Note: Special attention is required for the degenerate case X1 = X2 # 0. pit. To avoid spurious zero directional derivatives. We take monotoniaally inc m i n g to mean positive derivative everywhere.d2) 0.438 ¨ he hcet Model ?B Table 8. In this case there always exists an extreme direction w that is orthogonal to Vf . and thus the first directional derivative V . saddle.+ o -.

and let g(r.c) = 0 and af. Thus a point labeled hillside must be transformed to a ut hillside-labeled point. slope. Ridge and Ravine Continua Although the definitions given for ridge and ravine are intuitively pleasing. Since w is a monotonically increasing function. c) satisfies . conversely. Any pixel with a label not in the set (peak. w' is positive. c) = wlf(r. Let w be a monotonically increasing gray level transformation. the subcategories (inflection point.c) = 0 and thereby showing that at these points the directions that extremize f$ are precisely the directions that extremize g i . saddle. ravine.c). pit. pit. Hence the direction IS that maximizes gh also maximizes fi. then j3 must also extremize f$ and satisfy fk = 0. fi(r. and that g. will always have the same sign as f. The categories peak. ridge.. Then for points (r. A similar argument shows that if j3 extremizes g$ and satisfies gk = 0. let be an extremizing direction off.(r. ravine. or flat). saddle. they may lead to the unexpected consequence of being entire areas of surface classified as a l ridge or all ravine. or flat retain the same label in the transformed image and. and flat) m s have a hillside label. ridge. any points in the transformed image will have the same label in the original image. pit. C) from which we obtain that Let us fix a position (r.13 Topographic Primal Sbtch 439 Let the original underlying gray level surface be f (r. saddle. gh (r . Notice that Hence for these points. c). c>l* f i ( r . convex hill. concave hill. c)]. In particular. thereby showing that the gradient directions are the same. ridge. ravine. To see the invariance. obServe that the eigenvalue l X = X(r.pit. and flat all have in common the essential property that the first directional derivative is zero when taken in a direction that extremizes the second directional derivative. However. saddle.c) = w'lf(r. Therefore any points in the original image with the labels peak. c) having a label (peak.8. ridge. w' is not zero.c) denote the transformed image: g(r. To see how this can occur.c)/Z@ = 0.. It is directly derivable that gi(r. and saddle hill) may change under the gray level transformation.

ridge. given the topographic class label of every point in the pixel.w = 0. respectively.c). ridge. onedimensional extremum has occurred (peak. the set of all topographic classes occurring within a pixel's area without having to do the impossible brute-force computation.1 3.c) = ( K ~ r2 . because there is a problem of where in a pixel's area to apply the classification. Other. and saddle wl never occur precisely at a pixel's center. in effect. would be classified as a concave hill rather than as a peak. Unfortunately.c) and not for all points in a small neighborhood about (r. the interesting il categories of peak.r) and (r. then a ridge or ravine continuum exists by our criteria. for example. Points that are labeled ridge or ravine and have neighboring points in a direction orthogonal to the gradient that are also labeled ridge or ravine are ridge or ravine continua. ravine. except for the apex. However. The next problem we must solve is how to determine. The problem is that the topographic classification we are interested in must be a'sarnpling of the actual topographic surface classes.c). The identification of points that are really ridge or ravine continua can be made as a postprocessing step. then a pixel having a peak near one of its corners.c) = (rZ+ c2)-'12 and 0 are (-c. and if they do occur in a pixel's area. 8. c) = (r2 + c2)'j2 the hemisphere defined by f (r. pit.3 Topographic Classification Algorithm The definitions of Section 8. and the unnorrnalized eigenvectors corresponding to eigenvalues X(r .c). the gradient is proportional to (r. If the classification were only applied to the point at the center of each pixel. If this equation holds for a point (r.c) must be perpendicular to the gradient direction.c2)'P - or in fact any function of the form h ( 9 + cZ).In the case of the cone. is a ravine. Therefore Vf . To solve these problems. The entire surface of the inverted cone. Simple.c). then a ridge or a ravine exists in the commonly understood sense. c). and saddle) and (2) those that do not indicate that a .13.440 The Facd Model For a ridge or a ravine to exist at a point (r. then the pixel must carry that label rather than the class label of the pixel's center point. pit. radially symmetric examples include the inverted right circular cone defined by f (r. there are "nonpathologic" surfaces having ridge or ravine continua. local. we divide the set of topographic labels into two subsets: (1) those that indicate that a strict. the corresponding eigenvector w(r. Thus one problem we must solve is how to determine the dominant label for a pixel. nonradially symmetric examples exist as well. ravine. These continua can be reclassified as hillsides if the label of ridge or ravine does not make sense in the application. if this equation holds for all points in a neighborhood about (r.2 cannot be used directly. Most likely. The eigenvector corresponding to the nonzero eigenvalue is orthogonal to the gradient direction.

with the function value at the zero crossing. f (O. local coordinates (0. the chance of overlooking an important topographic structure is minimized. the computational cost is small. If there is a zero crossing of the second directional derivative in the direction of the gradient within the pixel's area. concave hill. one-dimensional extremum and must be assigned a label from the set (peak. Since these directions are well aligned with curvature properties. c) is greater than f (O. * This is the Newton direction. (3) two zero crossings. So that we do not search the pixel's entire area for the zero crossing. For onedimensional extrema. we have a ravine. we call it a ridge. If it is positive.O). then the choice is either ridge or ravine. the label assigned to the pixel is based on the gradient magnitude and Hessian eigenvalues calculated at the'center of the pixel. we compare the function value at the center of the pixel. we have a ridge. local. crossiy of then the pixel is a strict. (2) one zero crossing. and it points directly toward the extremum of a quadratic surface. the directions w(') and a(*) not uniquely defined. or saddle hill. there are four cases to consider: (1) no zero crossing.c). If it is nonzero. O).6. the pixel is classified as an inflection point.7 is used. otherwise we call it a ravine. we have a flat. and if the gradient magnitude at the zero crossing is zero. local. We handle this case by searching for a zero crossing in the direction given by H-' Vf . and more important. ridge. If the second directional derivative in the direction of the zero crossing is negative.8. one-dimensional extremum has occurred (flat and hillside). . then the pixel cannot be a local extremum and therefore must be assigned a label from the set (flat or hillside). If no such zero crossing occurs. the Hessian and gradient are recomputed. and (4) more than two zero crossings of the first directional derivative. we search along the gradient direction for a zero crossing of the second directional derivative. Case One: No Zero Crossing If no zero crossing is found along either of the two extreme directions within the pixel's area. If it is zero. If the gradient is zero. slope. we classify it further into infiection point. Iff (r. A strict. or saddle). local. we mean along a line (in a particular direction). Table 8. pit. are When XI = X2 # 0. as in Table 8. w(') and d2).O). The next four sections discuss these cases. Case Two: One Zero Crossing \ nd within the pixel's area. For inflection-point location (first derivative extremum). we have a hillside. ravine. If the gradient magnitude is nonzero. If the pixel is a hillside. we search only in the directions of extreme second directional derivative. At the location of the zero crossing. f (r. convex hill. By onedimensional.13 Topographic Primal Sketch 44 1 strict. onedimensional extremum can be located by finding those points within a pixel's area where a zero crossing of the first directional derivative occurs.

Using the procedure just described. we choose the category giving us the "most information.6 Pixel label calculation for case 1: no zero crossing.8. the pixel is given that label. we assign a label to each zero crossing. whereas the ridge 3 $ Table 8. If one label is peak and the other ridge. If one label is ridge and the other ravine. but experiments have shown that this case is rare.7 Pixel label calculation for case 2: one zero crossing. The peak is a local maximum in all directions. 0 - 0 - 0 + 0 + + + 0 0 + Flat Concave hill Concave hill Saddle hill Slope Saddle hill Convex hill Convex hill Case Three: Two Zero Crossings If we have two zero crossings of th one in each direction and gradlent must i recomputed at each R of extreme curvaturx then th-essian zero crossing." w i h in hc this case is peak. and thus the pixel is classified as a saddle. this indicates we are at or very close to a saddle point. An analogous argument can be made for both labels being ravine. the pixel may actually be a peak. In the case of both labels being ridge. We call these labels LABEL1 and LABEL2 The final classification given the pixel is based on these two labels and is shown in Table 8. If both labels are identical. A2 Label Peak 0 0 0 0 0 0 - 0 0 + + + - + - + Ridge Saddle Saddle Ravine Pit . IlVfll A.442 The Facet Model I Table 8.

Thw a pixel is assigned "saddle" if its zero crossings have been labeled ridge and saddle or ravine and saddle. a saddle gives us more information than a ridge or a ravine. which can be done in one pass through the image. Case More Than Two ZNO Crosshgs If more than two zero crossings occur within a pixel's area. case 3: t o zero crossings. our convention is to choose arbitrarily either LABEL1 or LABEL2 as the resulting label for the pixel. . Thus peak conveys more information about the image surface. the following four steps need to be performed.8 Final pixel classification. then in at least one of the extrema directions there are two zero crossings. At each pixel of the image.8 that not all possible label combinations are accounted for. such as peak and pit.13 Topographic Primal Sketch 443 Table 8. If such a case does occur.4 Summary of Topographic Classification Scheme The scheme is a parallel process for topographic classification of every pixel. An analogous argument can be made if the labels are pit and ravine.1 3. It is apparent from Table 8. then this case is identical to case 3. Some combinations. are omitted because of the assumption that the underlying surface is smooth and sampled frequently enough that a peak and a pit will not both occur within the same pixel's area.8. If we ignore the further zero crossings. If this happens. we choose the zero crossing closest to the pixel's center and ignore the other. w LABELl LABEL2 Resulting Label Peak Peak Peak Ridge Pit Ravine Saddle Ridge Ravine Saddle Ravine Saddle Peak Peak Pit Pit Saddle Ridge Saddle Saddle Ravine Saddle Pit Pit Saddle Ridge Ridge Ridge Ravine Ravine is a local maximum in only one direction. 8. Similarly. This situation has yet to occur in our experiments.

M Tfi# - Previous Work Detection of topographic structures in a digital image is not a new idea. 3. Peuker and Johnston (1972) characterize the surface shape by the sequence of positive and negative differences as successive surrounding points are compared with the cenbral point. knob. Peuker and Douglas (1975) describe several variations of this method for detecting one of the shapes from the set (pit. 2. ridge. the gradient magnitude. The most s i d c a n t coefficients of the secondorder polynomial yield a descriptive label chosen from the set (constant.) 4. A wide variety of techniques for detecting pits. Use the coefficients calculated in step 1 to find the gradient.13. Toriwaki and FUumura (1978) take a totally different approach from all the others. they identify points that are either east-wst or north-south elevation maxima. They start with the most frequent feature (slope) and proceed t o the less fnquent. Grender's (1976) algorithm compares the gray level elevation of a central point with surrounding elevations at a given distance around the perimeter of a circular window. Calculate the fitting coefficients. peak. Johnston and Rosenfeld (1975) attempt to find peaks by finding all points P such that no points in an n x n neighborhood surrounding P have greater elevation than P. saddle. peaks. and the eigenvalues and eigenvectors of the Hessian at the center of the pixel's neighborhood. sink. thus making it an order-dependent algorithm. slope. then search in the Newton direction. and the like have been described. This is done by using a "smoothed" array in which each point is given the highest elevation in a 2 x 2 square C O g it. (If the eigenvalues of the Hessian are equal and nonzero. Search in the direction of the eigenvectors calculated in step 2 for a zero crossing of the first directional derivative within the pixel's area. flat). ambiguous). Ravines arc found in a similar manner. valley. connectivity number and a . Paton (1975) uses a six-term quadratic expansion in Legendre polynomials fitted to a small disk around each pixel. Recompute the gradient. ridge. of a two-dimensional cubic polynomial in an n x n neighborhood around the pixel.O). He uses the continuous-fit formulation in setting up the surface fit equations as opposed to the discrete least-squares fit used in the facet model. Pits are found in an analogous manner. The continuous fit is a more expensive computation than the discrete fit and results in a step& approximation. ravine. through klo. and saddle. peak. East-west and north-south maxima are also found on this array. ravines. To find ridges. the gradient magnitude. and the values of second directional derivative extrema at each zgqcrossinn o&directional derivative3 Then apply the labeling scheme as described in Section 8. pass. The radius of the window may be increased in successive passes through the image. ridges. (0. They use two local features of gray level pictures. ridge. bowl. break.444 The h c d Model 1. k. valley.2. This topographic-labeling set consists of slope. These coefficients are easily computed by convolving the appropriate masks over the image.

are obtained by state transition rules.1. for classification of the pixel into peak.2)4 8.2 fr f c ( r i n 8 c a 0 ) f r r f rc L f C c ) (!.c). Their set of labels includes none. In this network. Using the previous result. Lines emanating from the center pixel in these directions thus provide natural boundaries of patches approximating the surface. or pass. ridge. such as ridge lines. plain. They are then able to extract a set of primitive features from the nodes of the network by mask matching. ( )d fx -fr -fr 1 m Show that the change in gradient direction taken in the direction of the gradient is given by cr? +f. and the like. and shoulder. Show that &f. In their scheme a pixel may satisfy the definition of zero. or more than one class. slope. The authors then selectively generate the principal axes from some critical points distributed over an image and interconnect them into a network to get an approximation of the image data.2)4 . Lee and Fu (1981) define a set of 3 x 3 templates that they convolve over the image to give each class except the class they call plain.3. hillside. This structural information consists of ridge lines. Hsu. ridge. They then describe how to extract structural information from the image once the labelings have been made. Thresholds are used to determine into which class the pixel will fall. valley. a figure of merit. foot. which they call the web representation. pit. Suppose f is a function of two arguments (r . the axes divide the image into regions and show important features. Global features. ravine lines. and Beaudet (1978) use a quadratic surface approximation at every point on the image surface. Exercises 8. u. a & m=. Ambiguity is resolved by choosing the class with the highest figure of merit.Exercises 445 coefficient of curvature. show that the change in a unit vector in the direction of the gradient taken in a direction of angle 8 (B being clockwise with respect to the column axis) is given by (sin8 C-8) frrfrr Lrrfcc f? +fc2 8. Mundy. Let f r denote $f and f denote Let r = psinB and c = pcos8. such as edges and peaks. one. The principal axes of the quadratic approximation are used as directions in which to segment the image. ravine.r) +f.2.

.

.

.

.

.

.

.

In this chapter we discuss some of the extraction techniques and models that have been used to measure textural properties and to compute three-dimensional shape from texture. determine a description or model for it. which the remote-sensing community analyzes. which the biomedical community analyzes. from multispectral scanner images obtained from aircraft or satellite platforms. The texture discrimination techniques are for the most part ad hoc. which the machine vision community analyzes. a formal approach or precise definition of texture does not exist. Mori. Given an image having many textured areas. Despite its importance and ubiquity in image data. Issue 2 has to do with generative models of texture. and regularity of image texture. Tamura. Issue 1 has to do with the pattern recognition task of textural feature extraction. Early work in image texture analysis sought to discover useful features that had some relationship to the fineness and coarseness. determine the boundaries between the differently textured regions. Given a textured region. to the outdoor scenes and object surfaces. Issue 3 has to do with using what we know about issues 1 and 2 to perform a texture segmentation of an image. roughness.CHAPTER TEXTURE m Introduction Texture is an important characteristic for the analysis of many types of images. contrast. to microscopic images of cell cultures or tissue samples. Given a textured region. directionality. determine to which of a finite number of classes the region belongs. In the remainder of this section we provide a brief historical elaboration of issues 1 and 2. and Yamawaki (1978) discuss the . 2. Ehrich and kith (1978) summarize the following issues in texture analysis: 1. 3.

the spectral power density function. an image known to be texturally homogeneous was analyzed. and mosaic models are examples of model-based techniques. randomness. This association provides a theoretical and visual means of understanding the texture. granulation. An image texture is described by the number and types of its primitives and their spatial organization or layout. coarseness. Typically. Autoregressive. or functional (like a linear dependence). 1973). Gray level primitives are regions with gray level properties. Each of these qualities translates into some property of the gray level primitives and the spatial interaction between them. buildings. For example. probabilistic. Later approaches to image texture analysis sought a deeper understanding of what image texture is by the use of a generative image model. irregular. ' . with microscopic imagery. and the problem was to measure textural features by which the image could be classified. may have a pairwise dependence of one primitive on a neighboring primitive. W think of this kind of texture as an organized-area phenomenon. 1972). moving-average. analysts discriminated between eosinophils and large lymphocytes by using a textural feature for cytoplasm and a shape feature for cell nucleus (Bacus and Gose. The gray level primitive includes both its gray level and gray level region properties. gray level run-length distributions. smoothness. When it is decomposable. With aerial imagery. A region is a maximally ~ 0 ~ e ~set e d a given gray level property. These statistical textural-feature approaches included use of the autoconelation function. time-series models (extended to two dimensions). lineation or as being mottled. The first dimension is concemed with the gray level primitives or local properties constituting the image texture. Image texture analysis then amounts to verification and estimation. e The image texture we consider is nonfigurative and cellular. and roads by using textural features (Haralick and Shanmugam. Unfortunately. for texture element. Then one must estimate the values of the model parameters on the basis of the observed sample.454 Texture relationship of such descriptive measures to human visual perception. few investigators have attempted experiments to map 4! -9 *- 3 . edgeness per unit area. The spatial organization may be random. and mathematical morphology. it has two basic dimensions on which it may be described. The gray level primitive can be described in terms such as the average level or the maximum and t of pixels having minimum levels of its region. First one must verify that a given image texture sample is consistent with or fits the model. or hummocky. Given a generative model and the values of its parameters. The dependence may be structural. spatial gray level co-occurrence probabilities. and the second dimension is concerned with the spatial organization of the gray level primitives. or may have a dependence of n primitives at a time. Marlaw random fields. The gray level region can be evaluated in terms of its area and shape. Image texture can be qualitatively evaluated as having one or more of the properties of fineness. they discriminated between areas having natural vegetation and trees and areas having manmade objects. relative extrema distributions. one can synthesize homogeneous image texture samples associated with the model and the given value of its parameters. The basic textural unit of some textural primitives in their defining spatial relationships is sometimes called a texel.

so that there is only one discrete feature. Parts (e) and (f) show textures in which the gray level primitives themselves have their own identifiable shape properties. and coarse may repeat. when the gray level primitives begin to have their own distinct shape and regular organization. fine.1 Introduction 455 semantic meaning into precise properties of gray level primitives and their spatial distribution. either particle or wave properties may predominate. Whatever exists has both particle and wave properties. a fine texture results. These constitute macrotextures. and for multiple-scale textural surfaces the cycle of smooth. This illustrates an important point. and we tend to speak of only gray level or only texture. The simplest example of a microtexture occurs when independent Gaussian noise is added to each pixel's value in a smooth gray level area (Fig. the relative sizes and types of gray IeveI primitives. a coarser texture results. when the small-area patch is only the size of a pixel. As the Gaussian noise becomes more correlated.9. when we explicitly define gray level and texture. the dominant property of that area is gray level. the gray level properties will predominate. Similarly. the texture becomes more of a microtexture. in the image context both gray level and texture are always there. Doing so. the surface appears as a fine texture and then a coarse one. For any textural surface there exists a scale at which. As the number of distinguishable gray level primitives decreases. When the gray level primitives are small in size and the spatial interaction between gray level primitives is constrained to be very local.1 a). Texture cannot be analyzed without a frame of reference in which a gray level primitive is stated or implied. and the number and placement or arrangement of the distinguishable primitives. we must explicitly define the concepts of gray level and texture. the dominant property is texture. . the only property present is simple gray level. when the surface is examined. and depending on the situation. the texture becomes a macrotexture. When the spatial pattern in the gray level primitives is random and the gray level variation between primitives is wide. we discover that gray level and texture are not independent concepts. The basic interrelationships in the gray level-texture concept are the following: When a small-area patch of an image has little variation of gray level primitives. Crucial in this distinction are the size of the small-area patch. Then as resolution increases. In fact.l(b) is a box-filtered defocusing of the image in Part (a). Finally. To use the gray level and textural pattern elements objectively. Figure 9. Hence. Figure 9. it appears smooth and textureless.l(a) shows a white-noise microtexture. 9. Parts (c) and (d) are examples of textures in which the gray level primitives have become regions. we are defining not two concepts but one gray level-texture concept. although at times one property can dominate the other. As the spatial pattern becomes more definite and the gray level regions involve more and more pixels. the resulting texture is a microtexture. They bear an inextricable relationship to each other very much as do a particle and a wave. the texture property will predominate. It illustrates the beginning organization of some gray level primitives larger in size than the individual pixel. When a small-area patch has wide variation of gray level primitives. As the number of distinguishable gray level primitives increases within the small-area patch.

356 Texture Figure 9. ( c ) and ( d ) Inore nlacrotcrturr\. hepn to have their r m n ~den~itiahls <h. ihl colorednf nnl.r trulule drte~rninrrlhy h r n fltcring the white-noise texture. Irl .ipe propcnic\..1 Irnur ditkrenl )i~nd< rcuturr: ( a ) U'hite-noise resture. ~ n t tI i l Irxlure\ in which ~ h r : primitive. .

(1.(1. with specifying the organization among the gray level primitives. th.2)). x Lx)(k.) x (L.((2. (3.(2.41) 9((1941t(1. one with gray level i and the other with gray level j .3).: first layer having to do with specifying the local properties that manifest themselves in gray level primitives. the second layer. 1))s ((39 21. 11. ((3941. both primitives and their spatial relationships may change as a function of position within the image. ((1. (2. (3. Such matrices of spatial gray level dependence frequencies are symmetric and a function of the angular relationship between the neighboring pixels as well as a function of the distance between them. to characterize texture we must characterize the gray level primitive properties as well as the spatial relationships between them. (3. ((3.311 } 3 9 9 9 9 Figure 8.((2.311 .319 (3. Gray Level CaOccurrence The gray level spatial dependence approach characterizes texture by the co-occurrence of its gray levels. (4.(1. (2. ((3s 11.n)) E (L. Coarse textures are those for which the distribution changes only slightly with distance.211.2 The set of all distance-1 horizontal neighboring resolution cells on a 4 x 4 image. 2 9 (3.2). (4.3)). x L.311.111 .21. For a 0" angular relationship.((2.411 .3).m = 0.(m. (4.2 illustrates the set of all horizontal neighboring pixels separated by distance 1.3). (2.9. and fine textures are those for which the distribution changes rapidly with distance. We conclude with a discussion on inferring threedimensional shape from texture on a perspective projection image of a textured planar surface.311 .(1.211 ((4. This set. they explicitly average the probability of a left-right transition of gray level i to gray level j . ((4921.1)) ((2.((2.219 (4.41. In images having texture gradients caused by nonfrontal views of a homogeneous texture. (2.11.l).((1.311 ((4931. This implies that texturelevel is really a two-layered structure. Figure 9.((1.319 (4.n ( = 1) ={((I.1)).211 .211 s 1 ((3. 11. (I . 31. with which two neighboring pixels separated by distance d occur on the image.2 Gray Level CctOccurrence 457 In summary.3)1 . 311 ((3.2)1 9((l. along with the Rn = {((k. ((4.211 . . The gray level co-occurrence can be specified in a matrix of relative frequencies Pi. From this perspective we describe a variety of other approaches. (4. 2).411 ((4941.411.((2.21s(l.31.(2.311 ((4. (3. We begin our discussion of texture with the gray level co-occurrence and the generalized co-occurrence approaches.

4). annual grasslands. Using features calculated from the co-occurrence matrix (see Fig. such as Euclidean. St. an 82%correct identification was obtained. marsh. n)] = max{lk . could be used as well. a) = P(j.l)th position of the distance1 horizontal PH matrix is the total. i. Note that these matrices are symmetric. On a set of sandstone photo-micrographs. the element in the (2. To determine this number. new residential. scrub or wooded). d. . swamp. an 89% correct identification was obtained on five sandstone classes: Dexter-L. railroad yard. and Gaskel.m 1. large irrigated fields. Upper Muddy. which represents a 4 x 4 image with four gray levels.3(c) through ( f ) we calculate all four distance-1 gray level spatial dependence matrices. 9.m = -d. In Fig.n 1). we count the number of pairs of resolution cells in RH such that the first resolution cell of the pair has gray level 2 and the second resolution cell of the pair has gray level 1. number of times two gray levels of value 2 and 1 occurred horizontally adjacent to each other. P ( i . Consider Fig. 9 = i. lake. small irrigated fields. an 84% correct identification was obtained with the use of 64 x 64 subimages and both spectral and textural features on seven terrain classes: coastal forest. 9. Formally. a). and water. 9. for angles quantized to 45" intervals. woodlands.For example. ranging from 0 to 3. On a LANDSAT Monterey Bay. Figure 9.n = -d) I(k.l). Dexter-H. On a set of aerial imagery and eight terrain classes (old residential. would be used to calculate a distance-1 horizontal spatial gray level dependence matrix. The distance metric p implicit in these equations can be explicitly defined by p[(k.3(b) shows the general form of any gray level spatial arx dependence m t i . t .3(a). image. J & Q .n) = j ) where # denotes the number of elements in the set. urban.(m. California. d . Haralick and Bosley (1973) performed a number of identification experiments. Peter. ' j z. Other distances.458 Texture image levels.j . I(m. the unnormalized frequencies are defined by ( k . urban areas.

Hence it is not likely to work well for textures composed of large-area primitives. The power of the gray level cosccumnce approach is that it characterizes the spatial intemlationships of the gray levels in a textural pattern and can do so in a way that is invariant under monotonic gray level transformations. Julesz (1961) is fhe first to use co-occurrence statistics in visual human texture discrimination experiments.3 Spatial cooccumnce calculations (Haralick. Its weakness is that it does not capture the shape aspects of the gray level primitives. 'l3ese statistics are computable from co-occurrence if one assumes that the image is generated by a . and Wied (1969) use onedimensional co-occunence statistics for the analysis of cervical cells. Darling and Joseph (1968) use statistics obtained from nearest neighbor gray level transition probability matrices to measure textures using spatial illtensity dependence in satellite images taken of clouds. it cannot capture the spatial relationships between primitives that are regions larger than a pixel. 1973). The many possible co-occurrence features times the number of distance angle relationships for which the co-occurrence matrices can be computed lead to a potentially large number of dependent features. Shanmugam. The wide class of images in which they found that spatial gray level dependence carries much of the texture information is probably indicative of the power and generality of this approach. Zucker (1980) suggests using only the distance that maximizes a chi-square statistic of P. Rosenfeld and T a (1970). Also. Bartles and Wied (1975). and Dinstein (1973) ry suggest the use of spatial co-occurrence for arbitrary distances and directions. and Haralick. Tou and Chang (1977) discuss an eigenvector-based feature extraction approach to help reduce the dimension of fature space. and Bartles. Galloway (1975) usas gray level run-length statistics to measure texture. Bahr. Shanmugam. Zobrist and Thompson (1985) use co-occurrence statistics in a Gestalt grouping experiment. and Dinstein.Gray Level Figure 9. Deutsch and Belknap (1972) use a variant of co-occurrence matrices to describe image texture. Haralick (1971).

They report better than a 95% correct identification accuracy in distinguishing . 1974).. 1974). where Pi = cJ * pi. pulmonary radiographs (Chien and Fb.Uniformity of energy (Related to the variance of {Pi. leukocyte. i c i j ( i + j . 1973). 1978) are used to analyze textures in satellite images (Haralick and Shanmugarn. Additional applications of this technique include the analysis of microscopic images (Haralick and Shanmugam. and Dinstein (1973) compute from such co-occurrence matrices of equal probability quantized images (see also Conners and Harlow... and lymph node tissue section images (Pressman.4 Eight of the common features computed from the cwccumnce prob- Markov process. Figure 8.. 1976). and cervical cell. Statistics that Haralick. . Pij ti .. Tou and Chang (1977) use statistics from the co-occurrence matrix.~PYP. An 89% classification accuracy is obtained.Pii)2(Piiy-' n for gray level i (Assuming the image is Markov) Homogeneity Cluster Tendency abilities. Chen and Pavlidis (1979) use the co-occurrence matrix in conjunction with a split-and-merge algorithm to segment an image at textural boundaries.* ~PNN}) Entropy Maximum Probability Contrast Inverse difference moment Correlation c ij pb iJ i C Pijlog Pi..Pi. Shanmugam.j lk(Pij)' where p = Probability of a run of length C iPij p : (Pi . Vickers and Modestino (1982) argue that using features of the cosccurrence matrix in a classification situation is surely suboptimal and that better results arc obtained by using the co-occurrence matrix directly in a maximum-likelihood classifier. followed by a principal-components eigenvectordimensionality-reductionscheme to reduce the dimensionality of the classification problem.

Wechsler and Kidode (1979) and Wechsler (1980) use the gray level difference probabilities to define a randomwalk model for texture. They suggest computing gray level co-occurrence involving only those pixels near edges. Sun and Wee (1983) suggest a variant of the gray level difference distribution.9. Dyer. entropy. Hong. Zucker and Kant (1981) also suggest using generalized co-occurrence statistics. plastic bubbles. Weszka. . A l these studies produce reasonable results on l different textures. such as entropy and energy. Wang. herringbone weave. Rosenfeld. beach sand. 1980) conclude that this spatial gray level dependence technique is more powerful than spatial frequency (power spectra). n) they compute a variety of features. Clearman. raffia. and Rosenfeld (1980) and Davis. and Roknfeld (1976) use contrast. and Wu (1982) also suggest using multispectral difference probabilities. Bacus and Gose use statistics of the differences between a pixel on a red image and a displaced pixel on a blue image. wool. such as edge strength maxima and edge direction relationships. Dyer. pigskin. They use the probability of a given contrast occurring in a given spatial relationship as a textural feature. and mean of P(d) as texture measures and report that they did about as well as the cooccurrence probabilities. j ) i j li-jl = d The probability of a small contrast d for a coarse texture will be much higher than for a fine texture. Haralick and Shanrnugarn (1973) use multispectral co-occurrence probabilities.2 Gray Level CeOccurrence 461 between tree bark. and Aggarwal(1981) compute co-occurrence features for local properties. and gray level run-length methods (Galloway. That is. Tenopoulos and Zucker (1982) report a 13% increase in accuracy when combining gray level co-occurrence features with edge co-occurrence features in the diagnosis of osteogenesis imperfects from images of fibroblast cultures. energy. This gray level difference probability can be defined in terms of the co-occurrence probabilities by p(d) = p(i. Haralick (1975) illustrates a way to use co-occurrence matrices to generate an image in which the value at each resolution cell is a measure of the texture in the resolution cell's neighborhood. They report an 85% classification accuracy in distinguishing between textures of three different geological terrain types on LANDSAT imagery. From P(g. See de Souza (1983) and Percus (1983) for some comments on the random-walk model. 1975) of texture quantification. They fix a distance d and a contrast c and determine the number of pixels each having gray level g and each having n neighbors that are within distance d and within contrast c. Bacus and Gose (1972) use a gray level difference variant of the co-occurrence matrix to help distinguish between eosinophils and lymphocytes. calf leather. gray level difference (gradient). Comers and Harlow (1976. and wood grain textures.

orientation. .2. . . A primitive is a connected set of pixels characterized by a list of attributes. The generalized gray level spatial dependence model for texture is based on this joint distribution. He defines the polarogram to be a statistic of these co-occurrence probabilities as a function of the angular orientation.P)' n-1 where 1 is a column vector all of whose components have the volume 1. . See also Chetverikov (1981). Here the neighborhood is the primitive. the arrangement of its gray levels is the property.and lower-order interactions but with different higher-order interactions tend to be visually similar. Strong Texture Measures and Generalized Co=Occumnce Strong texture measures take into account the co-occurrence between texture primitives.xN represent the N K-normal vectors coming from neighborcan be hoods in a subimage. and average gray level.. 9. one can parametrically estimate the joint probability distribution of the gray levels over the neighborhoods in the subiige. Textures with identical second.pxx. 1973). the joint distribution would be 25-dimensional. and the texture is characterized by the joint distribution of the gray levels in the neighborhood. we next focus on the primitive and then the spatial relationships between primitives.1 Generalized Gray Level Spatial Dependence Models for Texture A simple generalization of the primitive gray level co-occurrence approach is to consider more than two pixels at a time. To describe generalized co-occurrence. The next more complicated primitive is a connected set of pixels homogeneous in level (Tsuji and Tomita. We call this generalized cooccurrence.Davis (1981) computes co-occurrence probabilities for spatial relationships parametrized by angular orientation. In the case of a 5 x 5 neighborhood. * 1 N -- n=l . Chetverikov (1984) uses co-occurrence statistics as a function of displacement to determine textural regularity. then the mean vector C( and covariance matrix estimated by . U e u texture sfl measures include co-occurrence of primitives based on relationships of distance . The simplest primitive is the pixel with its gray level attribute. Such a primitive can be chmcterized by size. If x. On the basis of Julesz (1975) the most i m p o m interaction between texture primitives probably occurs as a two-way interaction. elongation. Given a specific kind of spatial neighborhood (such as a 3 x 2 or 5 x 5 neighborhood) and a subimage. where pL = - and y = -C ( x . The prime candidate distribution for parametric estimation is the multivariate normal. C( = pol. although this is not universally true.

Ehrich and Foith (1976). such as the same gray level or the same edge direction. we can use the angular orientation of each primitive with respect to its closest neighboring primitive as a strong measure of texture. 1978b). we have available a list of primitives. Neighborhood operators that compute these kinds of primitives can be found in various papers and will not be discussed here. Label all pixels in each maximally connected relative maximum plateau with an unique label. and (5) central axis components. In this case. Toriwaki. Then label each pixel with the label of the relative maximum that can reach it by a monotonically decreasing path.3 Strong Texture Measures and Generalized Co-Occurrence 463 or adjacency. Haralick (1978). and Ehrich and k i t h (1978a. such as which primitives are adjacent to which. Sometimes it is useful to work with primitives that are maximally connected sets of pixels having a particular property. More complex spatial relationships include closest distance or closest distance within an angular window. for common. Rosenfeld ( 1970). relative extrema primitives are likely to be very important. with the elongation of its shape. 9. (4) relative maxima or minima components. Brown. or with the variance of its local property. (3) saddle components. Myers. and Fukumura (1975). and Yokoi. Co-occurrence between two relative extrema was suggested by Davis. Johns. Mitchell. Rosenfeld and Pfaltz (1966. Rosenfeld and Thurston (1971). a connected set of pixels can be associated with its length.9. Rosenfeld and Davis (1976). for each kind of primitive situated in the . Maleson. For example. their center coordinates. then label the pixel with a special label c . Relative extrema primitives were proposed by Rosenfeld and Troy ( 1970). and count how many primitives of each kind occur in the specified spatial relationship. Included in this class of primitives are (1) connected components. Mitchell and Carlton (1977).3. see Arcelli and Sanniti di Baja (1978). and Aggarwal(1979). It is possible to segment an image on the basis of relative extrema (for example. For example. Other attributes include measures of shape and homogeneity of local property. 1968). If more than one relative maximum can reach it by a monotonically decreasing path. Because of their invariance under any monotonic gray scale transformation. From these data we can select a simple spatial relationship. We call the regions so formed the descending components of the image. Gray levels and local properties are not the only attributes that primitives may have. Many kinds of primitives can be generated or constructed from image data by one or more applications of neighborhood operators. and their attributes. We might also have available some topological information. for all primitives of elongation greater than a specified threshold. such as adjacency of primitives or nearness of primitives. and Feldman (1977) suggest using region-growing techniques and ellipsoidal approximations to define the homogeneous regions and degree of collinearity as one basis of co-occurrence. relative maxima) in the following way. and Boyne (1977).1 Spatial Relationships Once the primitives have been constructed. (2) ascending or descending components.

such as mean gray level.O) translation. Let T be the set of primitive properties and f be a function assigning to each primitive in Q a property of T. gray level primitives of smaller size are indicative of finer textures. To define the concept of generalized co-occurrence. The generalized co-occurrence matrix P is defined by P(tl. Now translate one transparency relative to the other and measure only the average light transmitted through the portion of the image where one transparency overlaps the ( x . Consider two image transparencies that are exact copies of each other. We assume that outside some bounded rectangular region 0 < u < L and 0 _< . we could lay expanding circles around it and locate the shortest distance between it and every other kind of primitive.t2) is just the relative frequency with which two primitives occur with specified spatial relationships in the image. Let Q be the set of all primitives on the image. The two-dimensional autocorrelation function of the image transparency is their average normalized with respect to the (0. Then we measure primitive properties. no investigator appears to have used it in texture discrimination experiments. The autocorrelation function is a feature that describes the size of the gray level primitives.q2) E SLf(qI)= f I and f (q2) = t2) #s P ( t l . This probability p(k) is defined by Although this distribution is simpler than co-occurrence. i" . region size and region shape. y) translated positions.f2) = #{(qI. variance of gray levels. one primitive having property t l and the other having property t2. Overlay one transparency on top of the other. Finally. it is necessary first to decompose an image into its primitives. we need to specify a spatial relation between primitives. m Autocorrelation Function and Texture From one point of view. two dimensions for primitive kind and one dimension for shortest distance. v ) . Zucker (1974) suggests that some textures may be characterized by the frequency distribution of the number of primitives related to any given primitive. Our co-occurrence frequency would be three-dimensional. and with a uniform source of light measure the average light transmitted through the double transparency. . We explore the autocorrelation function with the help of a thought experiment. Let S E Q x Q be the binary relation pairing all primitives that satisfy the spatial relation.v ) denote the transmission of an image transparency at position (u. texture relates to the spatial size of the gray level primitives on an image. Let I(u. Gray level primitives of larger size are indicative of coarser textures. This can be dimensionally reduced to two dimensions by considering only the shortest distance between each pair of like primitives. such as distance or adjacency.464 Texture texture.

the digital image is typically divided into a set of nonoverlapping small square subimages. spatial frequencies larger than 3. the autocorrelation function will drop off and rise again in a periodic manner. In the transform technique each of these vectors is reexpressed in a new coordinate system. The gray level primitive in spatial frequency (sequency) models is the gray level. and cloud shadows. The gray level primitive in the autocorrelation model is the gray level. water.5 c y c l e s h and smaller than 5. The relationship between the autocorrelation function and the power spectral density function is well known: They are Fourier transforms of each other. mountains. Suppose the size of the subimage is n x n resolution cells. He achieves an overall identification accuracy of 87%. farms. His terrain classes are clouds. respectively. the image transmission is zero. If the gray level primitives are small. desert.9.5 Digital Transform Methods and Texture 465 v 5 L. u the Hadarnard transfer uses the Walsh function s The point to the transformation is that the basis vectors of basis-e the new coordinate system have an interpretation that relates to spatial frequency or sequency. then the autocorrelation will drop off quickly with distance. . then the autocorrelation will drop off slowly with distance. and since frequency is a close relative of texture. Digital Transform Methods and Texture In the digital transform method of texture analysis. Grarnenopoulos (1973) uses a transform technique employing the sine-cosine basis vectors (and implements it with the FFT algorithm) on LANDSAT imagery. LANDSAT image. The autocorrelation function for the image transparency d is formally defined by If the gray level primitives on the image are relatively large. The spatial organization is characterized by the correlation coefficient that is a measure of the linear dependence one pixel has on another. Let (x. He uses subimages of 32 x 32 resolution cells and finds that on a Phoenix. such transformations can be useful. urban. To the extent that the gray level primitives are spatially periodic. then the n2 gray levels in the subimage can be thought of as the n2 components of an n2-dimensionalvector. The Fourier transform complex n s i. y) denote the x-translation and y-translation. The set of subimages then constitutes a set of n2-dimensional vectors. He is interested in the power of texture and spatial patterns to do terrain type recognition.9 cycles/km contain most of the information needed to discriminate between terrain types. riverbed.. The spatial organization is characterized by the kind of linear dependence that measures projection lengths. AZ.

When they add textural information.466 Texture Homing and Smith (1973) does work similar to Gramenopoulos. Other peak features include the Laplacian at the peak. He then uses a linear regression technique on the log of the power spectrum as a function of frequency to estimate the fractal dimension D. Dyer. but with aerial multispectral scanner imagery instead of LANDSAT imagery. and coarseness. and Rosenfeld. For gray level intensity surfaces of textured scenes *. In the comparative experiment reported by D'Astous and Jernigan. they increase their identification accuracy to 99%. linearity. conifers. although experimental results indicate that this effect. Bajcsy (1973a. Pentland (1984) computes the discrete Fourier transform for each block of 8 x 8 pixels of an image and determines the power spectrum. Hadamard. directionality. 1976) divide the image into square windows and use the two-dimensional power spectrum of each window. Presence of aperture effects has been hypothesized to account for part of the unfavorable performance by Fourier features compared with spatialdomain gray level statistics (Dyer and Rosenfeld. They express the power spectrum in a polar coordinate system of radius r versus angle (b. Maurer (1974a. D'Astous and Jernigan (1984) claim that the reason for the poorer performance is that earlier studies employing the Fourier transform features used summed spectral energies within band. Bloblike textures tend to have peaks in the power spectrum as a function of r . features based on Fourier power spectra have been shown to perform more poorly than features based on second-order gray level co-occurrence statistics (Haralick. 1976. 1976). The investigators show that texture gradients can be measured by locating the trends of relative maxima of r or 4 as a function of the position of the window whose power spectrum is being taken. and the polar angle of the peak. and Dinstein. the image surface becomes more finely textured. and water. Using only spectral information. open. and the co-occurrence features yielded uniformly greater interclass distances than the summed Fourier energy features. if present. Conners and Harlow. They argue that additional discriminating information can be obtained from the power spectrum in terms of characteristics such as regularity. 1973b) and Bajcsy and Lieberman (1974. the peak features yielded uniformly greater interclass differences than the co-occurrence features. In general. However. The degree of regularity can be measured by the relative strength of $e highest non-DC peak in the power spectrum. Directional textures tend to have peaks in the power spectrum as a function of 4.or wedge-shaped regions in the power spectrum. I a cf . the distance between the peak and the origin. is minimal. and Slant transforms for textural features on LANDSAT imagery over Minnesota. Shanmugam. 1973) or those based on first-order statistics of spatial gray level differences (Weszka. they obtain 74% accuracy. As the relative maxima along the radial direction tend to shift toward larger values. 1974b) obtains encouraging results by classifying crops from low-altitude color photography on the basis of a onedimensional Fourier series taken in a direction orthogonal to the rows. treating the power spectrum as two independent one-dimensional functions of r and (b. 1980). Kirvida and Johnson (1973) compare the fast Fourier. They use 8 x 8 subimages and five categories: hardwoods. city. the number of adjacent neighbors of the peak containing at least 50% of the energy in the peak.

open space. Transforms other than the Fourier can be used for texture analysis. = I * g.e. a 74% correct classification rate was obtained by using only spectral information. .0. Lowitz uses window sizes as large as 16 x 16. . and Slant Transforms for textural features on aerial images of Minnesota. But the price paid for the invariance of the quantized images under monotonic gray level transformations is the resulting loss of gray level precision in the quantized image. To compensate for this. Then each convolved image is processed with a nonlinear operator to determine the total textural energy in each pixel's 7 x 7 neighborhood. but .g. conifers.4% on a texture mosaic using fractal dimensions computed in two orthogonal directions. the images J ..N are computed. is a textural feature vector [sl(r. . Lowitz (1983. the image is first convolved with a variety of kernels. The power of the spatial frequency approach to texture is the familiarity we have with these concepts. The energy image corresponding to the nth kernel is defined by 3 3 Associated with each pixel position (r.5. Carlotto. 9. . .. . These researchers reported no significant difference in the classification accuracy as a function of which transform was employed. . probability quantizing can be employed.6 Textural Energy 467 that satisfy the fractal model (Mandelbrot. The textural energy approach is very much in the spirit of the transform approach. 1980. hardwood trees. the power spectrum satisfies Pentland reports a classification accuracy of 84. The one-dimensional f o m are illustrated in Fig. are the kernels. Textural Energy In the textural energy approach (Laws. The procedures are not invariant under even a monotonic transformation of gray level. Weszka.SN(~. Dyer.. 1984) and Carlotto (1984) suggest using the local histogram for textural feature extraction. The kernels Laws uses have supports for 3 x 5. If I is the input image and g.it uses smaller windows or neighborhood supports. however.. and water) were studied with the use of 8 x 8 subirnages. is in regard to gray level calibration of the image..C)] .c).5 x 5. Five classes (i. and Rosenfeld (1976) compare the effectiveness of some of these techniques for terrain classification. That is. . They conclude that spatial frequency approaches perform significantly poorer than the other approaches.5% when textural information was also included in the analysis. as large as 33 x 33. n = 1. Hadamard.. . 1983). This rate increased to 98. and 7 x 7 neighborhoods. The two-dimensional f o m are generated from the one-dimensional form by outer products. therefore. Kirvida (1976) compared the fast Fourier. c). The simplest orthogonal transform that can be locally applied is the identity transformation. One of the inherent problems. 1985). city. if kl and k2 are two one- .

W far wave. They take the textural energy image and iteratively smooth it with Gaussian kernels having a half-octave scale progression. R for ripple. and the northwest pixel. . they compute the mean and variance of the four 15 x 15 neighborhoods for which (r. L stands for level.5 Oncdimensional textural energy kernels.(r.. constitutes a K x K k2 kernel. They reduce feature dimensionality by simultaneously diagonalizing the scatter matrices at two successive levels of spatial resolution.c) is the southeast. E for edge. the northeast. The biggest difficulty with the textural energy approach as with the textural transfer approach is the possibility of introducing significant errors along boundaries between different textures. in which J. It is possible for the textwal energy vector for a neighborhood of such mixed textures to be close to a vector prototype for a third texture having nothing to do with the textures of the mixture. respectively. and 0 for oscillation. because it is exactly for positions by the boundary that the neighborhood support includes a mixture of textures. the textural energy approach is able to distinguish among eight textures with an identification accuracy of 94%. Unser and Eden (1989) discuss multiscale modification of the textural energy approach. the southwest. Hsaio and Sawchuk (1989) do one more level of processing on each textural energy image. then k'. Then they create a smoothed textural energy image 1. Laws shows that on a set of experiments with some sample textures. c) is given the value of the mean of the 15 x 15 neighborhood that has the smallest variance. dimensional forms. S for shape.468 Texture Flgun 9. For each pixel position ( r . whereas the spatial gray level co-occurrence has an accuracy of only 72%. c ) of textural energy image J. To handle this problem. each a row vector of K columns.

grass. The local property that Rosenfeld and Thurston suggest is the quick Roberts gradient (the sum of the absolute value of the differences between diagonally opposite neighboring pixels). large neighborhoods can be used (Fig. Textural Edgrness The autocorrelation function and digital transforms basically both reference texture to spatial frequency.7 Textural Edgeness 469 Unser (1984) notes that one could use a discrete orthogonal transform. and raffia. Rosenfeld and Troy (1970) and Rosenfeld and Thurston (1971) conceive of texture not in terms of spatial frequency but in terms of edgeness per unit area. The Roberts gradient is an estimate of the image gradient. . The difference between two average values in the left and right bordering neighborhoods to pixel x can be used to determine the vertical edge contrast.and macroedges. such as the discrete sine or discrete cosine transform. Thus a measure of texture for any subimage can be obtained by computing the Roberts gradient image for the subimage and from it determining the average value of the gradient in the subimage. (a) M i d g e . sand.6).9. Sutton and Hall (1972) extend Rosenfeld and Thurston's idea by making the gradient a function of the distance between the pixels. (b) Macroedge. 9. Ikonomopoulos and Unser (1984) suggest local directional filters. to detect mrmcroedges.8 Using different-sizedneighborhoodsto determine micro. He indicates a classification accuracy above 96% with the discrete sine transform in distinguishing between textures of paper. To detect microedges. applied locally to each pixel's neighborhood instead of using the ad hoc linear operators of Laws. Jernigan and D'Astous (1984) compute an FFT on windows and then use the entropy in different-sized regions for the normalized power spectrum for textural features. small neighborhoods can be used. An edge passing through a pixel can be detected by comparing the values for local properties obtained in pairs of nonoverlapping neighborhoods bordering the pixel. Thus for every distance d and subimage I defined over neighborhood N. they compute (a) (b) Flgun 9.

and high-frequency filtered image can be used as textural features.. then Hsu suggests are all appropriate measures for textural edgeness of a pixel. using a relative coordinate system whose origin is the center of the neighborhood. Triendl(1972) measures degree of edgeness by filtering the image with a 3 x 3 averaging filter and a 3 x 3 Laplacian filter. al Sutton and Hl apply this textural measure in a pulmonary disease identification experiment and obtain an identification accuracy in the 80 percentile range for discriminating between normal and abnormal lungs when using a 128 x 128 subimage.That is. Hsu (1977) determines textural edgeness by computing gradientlike measures for the gray levels in a neighborhood. If N denotes the set of resolution cells in a neighborhood about a pixel and g. Vector Dispersion The vector dispersion estimate for texture was first applied by Harris and Barrett (1978) in a cloud texture assessment study and is based on a theory developed by Fisher (1953). is the gray level of the center pixel. The unit normal vector to the plane fit for the ith neighborhood is given by Then the maximum likelihood estimator of the unit vector (5) around which the . The two values of average level and roughness obtained from the low. and a sloped plane fit (see Chapter 8) to the gray levels is performed for each neighborhood. I( is the mean gray level in the neighborhood. one can model the gray levels in the neighborhood with A graphical representation of these fits is shown in Fig. the image texture is divided into mutually exclusive neighborhoods.7. 9. The two resulting filtered images are then smoothed with an 11 x 11 smoothing filter. In the vector dispersion technique.The curve of g(d) is like the graph of the minus autocorrelation function translated vertically. and p is a metric.

K takes a m Relative Extrema Density Rosenfeld and Troy (1970) suggest the number of extrema per unit area for a texture measure.7 Graphical representation of the dispersion of a group of unit surface nonnal vectors for a patch of gray level intensity surface.9. unit normal vectors are distributed is given by where According to Fisher (1953). For uneven or rough textures. the distribution of errors over the unit sphere is proportional to 01 where cos Oi = Ni + mmi + nni. In any row of pixels.2) A pixel i is a relative maximum if .1) (9.1) g(i) 2 g(i .9 Relative Extrema Density 471 Figurn 9. K=- . For smooth textures K value near 0. a pixel i is a relative minimum if its gray level g(i) satisfies g(i) I g(i g(i) 2 g(i + 1) + 1) and and g(i) 5 g(i . The maximum likelihood estimate for K satisfies and has the approximate solution N-R takes the value 1. They suggest defining extrema in one dimension only along a horizontal scan in the following way.1) (9.

1978b).472 Texture Note that with this definition each pixel in the intefior of any constant gray level run of pixels is considered simultaneously a relative minimum and a relative maximum. we can mark the most centrally located pixel in the relative extrema region with the value h . Alternatively. The width of a maximum is the distance between its two adjacent minima. This sum divided by the window size is the average height of extrema in the area. (9. indicating that it is part of a relative extrema region with the value h. or mark it with the value h I N . The algorithm employed by Rosenfeld and Troy marks every pixel in each row that satisfies Eq. This set of reachable pixels is a connected region and forms a mountain. Maximally connected areas of relative extrema may be areas of single pixels or may be plateaus of many pixels. The width of a minimum is the distance between its two adjacent maxima. The height of a maximum can be defined as the difference between the value of the maximum and the highest adjacent minimum. Pixels not marked can be given the value 0. This can be achieved by locating some central pixel in the extrema plateau and marking it as the extremum associated with the plateau. The texture image created this way corresponds to a defocused marked image. In the onedimensional case. Then it centers a square window around each pixel and counts the number of marked pixels. and Boyne (1977) suggest the extrema idea of Rosenfeld and Troy except they use true extrema and operate on a smoothed image to eliminate extrema due to noise (Carlton and Mitchell. Going beyond the simple counting of relative extrema. given a relative maximum. we can determine the set of all pixels reachable only by the given relative maximum. Its border pixels may be relative minima or saddle pixels. two properties can be associated with every extremum: its height and its width. Alternatively we could set h to 1. and the sum would be the number of relative extrema per unit area to be associated with the given pixel. 1978a. The height (depth) of a minimum can be defined as the difirence between the value of the minimum and the lowest adjacent maximum.2). Another way of achieving this is to associate a value of 1/N for every extremum in an N-pixel extrema plateau. we can associate properties with each relative extremum. Mitchell. Then for any specified window centered on a given pixel. One problem with simply counting all extrema in the same extrema plateau as extrema is that extrema per unit area as a measure is not sensitive to the difference between a region having a few large plateaus of extrema and many single-pixel extrema. * v 9s ' . We can mark each pixel in a relative extrema region of size N with the value h . The solution to this problem is to count an extrema plateau only once. For example. by monotonically decreasing paths.1) or (9. not by any other. 1977. Myers. One way of finding extrerna in the full two-dimensional sense is by the iterated use of some recursive neighborhood operators propagating extrema values in an appropriate way. This is so even if the constant run is just a plateau on the way down to or on the way up from a relative extremum. we can add up the values of all the pixels in the window. indicating that it is part of a relative extremum having height h. 1976. Two-dimensional extrerna are more complicated than onedimensional extrema. Ehrich and Foith.

Mathematical morphology was discussed in Chapter 5. 1974. and symmetric axis. 1979). Its size is the number of pixels that constitute it. . such as a line. 1975). 1975). Tsuji and Tomita (1973) use size. j = c + c l ) The erosion of F by the structuring element H . We begin by examining the texture of binary images. 1974) and the Cyto computer (Sternberg. Osman and Saukar (1975) use the mean and variance of the height of mountain or the depth of valley as properties of primitives. Let H . Serra. be the structuring element. 1973. A broad spectrum of applications has been found for this quantitative analysis of microstructures method in materials science and biology. Mathematical Morphology The morphological approach to the texture analysis of binary images was proposed by Matheron (1975) and Serra and Verchery ( 1973). Frolov. This mathematical morphology approach of Serra and Matheron is the basis of the Leitz texture analyser (TAS) (Mueller and Herman. a set of pixels constituting a specific shape. C) as H(r. or a square) and the generation of binary images that result from the translation of the structuring element through the image and the erosion of the image by the structuring element. it may be important to measure the direction of the elongation or the direction of the symmetric axis. Here we review the basic definitions to make this discussion readable by itself. Its shape can be characterized by features such as elongation. is defined as The eroded image J obtained by eroding F with structuring element H is a binary image where pixels take the value 1 for all pixels in F e H . circularity. Histograms and statistics of histograms of these primitive properties are a l suitable l measures for textural primitive properties. For regions that are elongated. where H(r. i = r + r l . The symmetric axis feature can be determined by thinning the region down to its skeleton and counting the number of pixels in the skeleton.e. 1974. We define the translate of H by row-column coordinates (r.9.. Textural properties can be obtained from the erosion process by appropriately parameterizing the structuring element (H) and determining the number of elements of the erosion as a function of the parameter's value. The textural features can be obtained from the new binary images by counting the number of pixels having the value 1.cl) E H . a subset of pixels._Circularitycan be defined as the ratio of the standard deviation to the mean of the radii from the region's center to its border (Haralick. c).c) = {(i. a disk.10 Mathematical Morphology 473 The relative height of the mountain is the difference between its relative maximum and the highest of its exterior border pixels. j)l for some (rl. ) coordinates of the border pixels (Bachi. written as F 8 H . Elongation can be defined as the ratio of the larger to the smaller eigenvalue of the 2 x 2 second-moment matrix obtained from the ( . The approach requires the definition of a structuring element (i. Mueller.

The area of the eroded image as a function of the parameter. c ) (is17 eg . The normalized area of the erosion as a function of row and column distance is the autocorrelation function of the binary image. Then we can define the granularity of the image F by where #F means the number of elements in F . Sternberg (1983) has extended the morphological definition of erosion to gray level images.i . The closing of F by H isdefinedby F O H = ( F $ H ) e H . (1984) use gray level erosion and dilation to determine the fractal surface of the gray level intensity surface of a textural scene.j ) } = (I 6 X)(r. The parameter in both cases is the radius. A disk and a one-pixel wide annulus are two more examples of one-parameter structuring elements.c) J The dilation of gray level image I by gray level structuring element H produces a gray level image f defined by J(r. is defined by F @ H = {(m. Commonly used gray level structuring elements include rods. paraboloids. disks.n)l for some ( i .i a 4 The gray level opening is defined as a gray level erosion followed by a gray level dilation. The opening of F by H is defined by F o H = (F 8 H) $ H. cones. The dual operation of erosion is dilation.provides a statistical descriptor of the shape description of the image's shape distribution. s ) E H . a two-pixel structuring element can be parameterized by fixing a row distance and a column distance between two pixels. The number of binary-1 pixels of the opening as a function of the size parameter of the structuring element can determine the size distribution of the grains in an image. j ) E F and ( r .j ) } = ( I @ H ) ( r . They define the scale-k volume of the blanket around a gray level intensity surface I by . c . Peleg et al. c +j ) . The dilation of F by structuring element H .c) = b { I ( r + i . Thus it measures the proportion of pixels participating in grains of a size smaller than d. The erosion of gray level image I by gray level structuring element H produces a gray level image J defined by J(r. We simply t k Hd to be a line structuring element of length d or a disk ae structuring element of diameter d.474 Texture For example.c) =max{I(r . m = i + r and n = j + s ) Composition of erosions and dilations determinestwo other important morphological operations that are idempotent and are duals of one another: openings and closings.H ( i . G(d) measures the properties of grain pixels that cannot be contained in some translated structuring element of size d that is entirely contained in the grain and contains the given pixel. The gray level closing is defined as a gray level dilation followed by a gray level erosion. and hemispheres. written F $ H.j ) + H ( i .

The importance of this approach to texture analysis is that properties obtained by the application of operators in mathematical morphology can be related to physical three-dimensional shape properties of the materials imaged. For fine textures. Given a randomly generated noise image and any sequence of K-synthesized gray level values in a scan.A(k) = log d log k A (k) Peleg et al. and Chang (1976). and 8 means a k-fold erosionkwith the structuring element. Dcguchi and Morishita (1976). Peleg et al. The linear dependence that one pixel of an image has on another is well known and can be illustrated by the autocorrelation function. which was first used by McCormick and Jayaramamurthy (1974). The fractal surface area A at scale k is then defined by and the fractal signature S at scale k is defined by d kA '(k) S(k) = . the coefficients will vary widely.8 shows this texture synthesis model. This linear dependence is exploited by the autoregression model for texture. compare the similarity between textures by the weighted distance D between their fractal signatures Werman and Peleg (1984) give a fuzzy set generalization to the morphological operators. For coarse textures. in order to characterize texture.9. Meyer (1979) and Lipkin and Lipkin (1974) demonstrate the capability of morphological textural p&ameters in biomedical image analysis. Tou. given the gray levels in a neighborhood containing it. Kao. Serra (1978). They then use the estimated parameters and a given set of starting values to illustrate that the synthesized texture is close in appearance to the given texture. the next gray level value can be synthesized as a linear combination of the previously . defined the struckring element H over the five-pixel cross neighborhood taking the value of 1 for the center pixel and 0 elsewhere. Autoregression Models The autoregression model provides a way to use linear estimates of a pixel's gray level.1 1 Autoregression Models 475 where $ means a k-fold dilation with the structuring element. Figure 9. and Tou and Chang (1976) also use a similar technique. Theoretical properties of the erosion operator as well as those of other operators are presented by Matheron (1963). These researchers employ the Box and Jenkins (1970) time-series seasonal-analysis method to estimate the parameters of a given texture. and Lantuejoul (1978). the coefficients will all be similar.

figure 9. from a randomly generated noise image and a given starting sequence for the first-order D neighborhood in the image. . j ) consisting of pixels above or to the left of it as opposed to the simple sequence of the previous pixels a raster scan could define. 9.l) must be previous to pixel (i. . For each pixel (k. all values in a texture image can be synthesized by a two-dimensional autoregressivemodel.D I f < j ) ) D Pixels Figure 9. their performance would be poorer on diagonal wiggly streaky textures. synthesized values plus a linear combination of previous L random-noise values. from a randomly generated noise image and a given starting sequence 01. Although the one-dirhensjonal model employed by Read and Jayaramarnurthy (1972) worked reasonably well for the two vertical streaky textures on which they illustrated their method. . (k.D L f l j + D ) or ( k = i a n d j .l) in an order-D neighborhood for pixel (i. as illustrated in Fig. Here a pixel (i. j ) in a standard raster sequence.j)={(k.aK representing the initial boundary conditions. . all values in a texture image can be synthesized by a one-dimensional autoregressive model. Formally the order-D neighborhood is defined by N(i. j ) depends on a two-dimensional neighborhood N(i.D < k < i and j .l) ) (i .8 How.9 How.j).. j).9. and (k.l) dust not have any coordinates more than D units away from ( i . Better performance on general textures would be achieved by a full two-dimensional model. The coefficients of these linear combinations are the parameters of the model.

Let R x C be the spatial domain of an image.j ) has texture category k if .j ) . for texture category c .i k ( i j)l 5 8 . Let { a c ( m n). 9. The autoregressive model can be employed in texture segmentation applications as well as texture synthesis applications. j j la(i.i k ( i j)l 5 la(i.12 Discrete Markov Random Fields 477 \ -r 1 b f d Auto-mgrusive T m Moving A r m p T m Figure 9. ) 1 > 8.n)) be the coefficients .j ) and the differences between the actual values and the estimated values in the neighborhood. and for any ( r .ak(i.9. Define the estimated value of the gray level at resolution cell ( i . In this sense the autoregressive approach is sufficient to capture everything about a texture. Its weakness is that the textures it can characterize are likely to consist mostly of microtextures. c ) E R x C let N ( r .i 1 ( i . Discrete Markw Random Fields The Markov random field model for texture assumes that the texture field is stochastic and stationary and satisfies a conditional independence assumption.10 How a gray level value for pixel (i. ) is a boundary pixel. Assuming a uniform prior distribution.j ) by See Fig.j ) The power of the autoregression linear estimator approach is that it is easy use the estimator in a mode that synthesizes textures from any initially given linear estimator.&(m.10. c ) denote I . then we can decide that pixel (i. .j ) .j ) . we can decide that pixel (i. and let 8 be a threshold value. la(i.j ) can be estimated by using the gray level values in the neighborhood N ( i .j)l for every 1 and la(i.

c). ( a . and the E set {u(r. Hence each term in the summation I(r . we can proceed by least squares.O). c . The first summation is called the autoregressiveterm.j ) .O)in the raster scan order.c ) does not include ( r . Because the field is stationary. This equation has a lot of similarity to the autoregressive movingaverage time-series models of Box and Jenkins (1970).j ) contains only pixels occurring before pixel ( i . the model is called a simultaneous autoregressive model. and Kanal (1965). + + Markov mesh models were first introduced into the pattern recognition community by Chow (1962) and then by Abend. That is. Harley. That is. N ( r .j ) for ( i . c j ) E N(r + i . b ) E N ( r . The usual way of handling the problem theoretically is to assume the image is wrapped around a torus. When K(0. where the coefficients of the linear combination are given by the function h. Woods (1972) shows that when the distributions are Gaussian. Define . N ( r .j ) in the raster scan order.One important issue is how to compute the joint probability function P[I(r.c ) : ( r . Hassner and Sklansky (1980) note that this can be done by identifying the conditional probability assumption with Gibbs ensembles. and the second term is called the moving-average term.c ) E R x C]. which are studied in statistical mechanics.j ) E N(O.O).i .O)in the usual top-down raster scan order of an image.There the relationship would be expressed by where K(0. The conditional independence assumption is that the conditional probability of the pixel given all the remaining pixels in the image is equal to the conditional probability of the pixel given just the pixels in its neighborhood. c j ) for any ( i .O) contains pixels occumng before and after (0. In this case the canonical spatial neighborhood can be given as N(0. Hence the spatial neighborhood configuration is the same all over the image.O) represents a domain that contains only pixels occurring after (0.c ) .c) R x C) represents a joint set of possible correlated Gaussian random variables.For any ( r . the discrete Gauss-Markov field can be written as an equation in which each pixel's value is a linear combination of the values in its neighborhood plus a correlated noise term. There is an obvious difficulty with this condition holding at pixels near the image boundary. c ) if and only if ( a + i .O). To determine the coefficients h(i.c)J(r.c).c ) is then a translation of N(0.478 Texture the neighbors of ( r .

elK -k and j The texture parameters here are a and b(i.4) can be solved for the coefficients h(i. Chin.) E N(O.0). not necessarily Gaussian. Here the probability that a pixel takes a particular gray level has a binomial form. Tou (1980). P[I(r. and Chang (1976). j) : (i. and Mitchell (1979).c) = klI(i. Therrien (1980). Texture classification can then be done by comparing a computed set of coefficients from an observed image with the prototypical set of coefficients from the different texture classes.for (i. and Weinrnan (1984). c . Kao. (9. Chen (1980).c) E R x C) represents independent random variables. Tou and Chang (1976).n) and set these partial derivatives to zero. Faugeras (1980).n).9. and Gagalowicz (1978) and Faugeras and Pratt (1980) consider only the autoregressive term with independent noise and rewrite the autoregressive equation as Here {u(r. for every ( a . j ) E N(r. and because of an assumed stationarity. Besag (1974) and Cross and Jain (1983) discuss the autobinomial form for a discrete random Markov field.= r . De Souza (1982) develops a chi-square test to discriminate microtextures described by autoregressive models.b). The linear system of Eq.c) I(r . Tou. n) = (r. j). It is apparent that the discrete ~ a r k o v random field model is a generalization of time-series autoregressive moving-average models that were initially explored for image texture analysis by McCormick and Jayaramamurthy (1974). That is. Pratt. Faugeras.12 Discrete Markov Random Fields 479 Now take the partial derivatives of E* with respect to h(m. The left-hand side represents a convolution that decorrelates the . There results the system of equations where u (m .c)] = where (4) ek(1 . and Deguchi and Morishita (1978).c)l(r. Related papers include Delp. j) . Issues concerning the estimation of h from texture samples can be found in Kashyap and Chellappa (1981). Gith the binomial parameter depending on the values of the neighboring gray level. and Jau.c ) l( r -m. Kashyap.

The first step provides a means of tessellating a plane into cells. The lines generated tessellate a finite plane region into convex cells whose boundary consists of line segments from the lines in the random set. and Rosenfeld (1980) compare properties of synthetically generated textures with their theoretical values. Dubitzki. - Expected Value Line Model Occupancy Model Delauney Model E[Al US1 E[Nl 5 2 A I A I 2A 5 6 32 4 3 . and the second step assigns a property value ta each cell. a tessellation is produced by a Poisson process of intensity A.p) is associated with a line x cos 8 y sin 8 = p .1 Expected value for area A . Each ordered pair (8. skewness. In the Delauney model. and the number of sides to a convex cell for each of the processes.p) E [0. 1967). + - - I Table 9. a Poisson process of intensity X / r determines an ordered pair (8. oo). and the Delauney model. and Davis (1978) and Schacter and Ahuja (1979) derive the statistical properties for these random mosaic models. the perimeter length.480 Texture image. a line segment is drawn between each pair of planted points whose corresponding cells in the occupancy model share a common border segment. Fries. Other models include the Johnson-Mehl model (Gilbert.1 shows the expected value of the area. 1981). Therrien (1983) uses an autoregressive model for each cell and. 1962) and the bombing model (Switzer. Fries. perimeter length S . variance. Rosenfeld. Table 9. Ahuja. the occupancy model. which plants points in the plane. and Vickers (1980. Each point determines a cell that consists of all points in the plane closest to the given planted point. and kurtosis of the decorrelated image that is obtained by either estimating h or by using a given gradient of a Laplacianlike operator to perform the decorrelation. superimposes a Markov random field to describe transitions between cells. In the occupancy model (Miles. They give a maximum likelihood texture discriminant for this mosaic model and illustrate its use on some sample images. like Modestino. Random Mosaic Models The random mosaic models are constructed in two steps. 1969). and number of sides N for the line model. In the Poisson line model (Miles. Schacter (1980b)summarizes how texture characteristics are related to the texture's variogram and correlation function. Faugeras and Pratt characterize the texture by the mean. Schacter. 1981) compute the power spectral density function for a plane tessellated by a random line process and in which the gray levels of one cell have a Markov dependence on the gray levels of the cells around them. 1970). r] x (-00. and Vickers (1980. Modestino.

special care is given to. The descriptors for each region include an indication of whether the texture is isotropic or directional. that an image is not necessarily homogeneously textured. The choice of primitive from a set of primitives and the probability of the chosen primitive being placed at a particular location can be a strong or weak / function of location or of the primitives near the location. Other work with structural approaches to texture includes Leu and Wee (1985). then the description includes the orientation. The underlying ideal texture has a nice representation as a regular graph in which each node is connected to its neighbors in an identical fashion. Let the four 2N-' x 2N-' windows in a 2Nx 2Nwindow have CNE. If the texture is considered directional. each of which is differently textured. To describe the texture. The underlying ideal texture is transformed by distorting the primitive at each node to make a realistic texture.9. however. Zucker's model is more a competence-based model than a performance-based model. Then with only little error. For texture descriptors she uses Fourier transform features. 1970). such as that arising from a frontal view. Lu and Fu (1978) give a tree-grammar syntactic approach for texture. CSE. An important image-processing operation. The constraint is that each region has a homogeneous texture. the co-occurrence matrix C of the 2N x 2N . the size of the texture element.1 5 Texture Segmentation 48 1 Structural Approaches to Texture Models Pure structural models of texture are based on the view that textures are made up of primitives that appear in nearly regular repetitive spatial arrangements. Carlucci (1972) suggests a texture model using primitives of line segments. 1973b) is one of the first researchers to do texture segmentations for outdoor scenes.and CSW their respective co-occurrence for matrices. and closed polygons in which the placement rules are given syntactically in a graphlike language. CNW. 1976b) conceives of real texture as being a distortion of an ideal texture. open polygons. They divide a texture up into small square windows (9 x 9). Her algorithm merges small. Texture Segmentation Most work in image texture analysis has been devoted to textural feature analysis of an entire image. and the separation between texture elements. we must describe the primitives and the placement rules (Rosenfeld and Lipkin. Zucker (1976a. It is apparent. The spatial structure of the resolution cells in the window is expressed as a tree.the placement of windows with respect to one another in order to preserve the coherence among windows. Chen and Pavlidis (1979) use the split-and-mergealgorithm on the co-occurrence matrix of the regions as the basis for merging. Bajcsy (1973a. and that each pair of adjacent regions is differently textured. Finally. is the segmentation of an image into regions. The assignment of gray levels to the resolution is given by the rules of a stochastic tree grammar. Each node corresponds to a cell in a tessellation of the plane. Lu and Fu illustrate the power of their technique with both texture synthesis and texture experiments. nearly connected regions having similar local texture or color descriptors. therefore.

Comers.) + CSE(i. He uses maximum likelihood and maximum a posteriori estimation techniques to achieve a high-quality segmentation of aerial imagery. runway/taxiway.unifonnly textured if for the user-specified threshold T.CNW(i. j ) = :[CNE(i. j). The gray levels of the images are quantized to eight levels. The splitting continues until the window size is 4 x 4. and vehicle parking. multilane highway.cSW(i. and Rosenfeld (1980) indicate that the error of this computation is minimal.j ) + Csw(i.CsE(i. j).CNW(i. j)}] < T Using this criteria. E[max{cNE(i. They initially segmented the image into regions. Any 16 x 16 window not merged is split into four 8 x 8 windows.j). Themen (1983) uses an autoregressive model for each textured region and superimposes a Markov random field to describe the transitions of one region to another. j ) + CNW(i. Fries.j)] j Experiments by Hong. z. Trivedi. aircraft parking. z 6 $ 4 Synthetic Texture Image Generation A variety of approaches have been developed for the generation of synthetic texture images. and Vickrs (1981) use a Poisson line process to partition the plane and assign gray levels to each region by a Gauss-Markov model using adjacent regions. The 2N x 2N window is declared to be. water. Wu.j). The set of feature vectors generated fmm the image is clustered. Kashyap and Khotanzad (1984) use a simultaneous autoregressive and circular autoregressive model for each 3 x 3 neighborhood of an image.cSE(i. j)}I min{CNE(i. Here each neighborhood produces a feature vector associated with the model. They develop a maximumlikelihood estimator for the parameters of the process and show segmentation results on artificially generated images having three different texture types. with the correlation coefficients between vertically adjacent and horizontally adjacent pixels as the feature vectors. j).482 Texture window can be computed by 1 C(i.CSW(i. Any region whose likelihood ratio for its highestlikelihood class against any other class was too low was considered a boundary region and split. Chen and Pavlidis begin the merging process using 16 x 16 windows. and each pixel is labeled with the cluster label of the feature vector associated with its 3 x 3 neighborhood. Pixels associated with outlier feature vectors are given the cluster label of the majority of their labeled neighbors. dry land. we will provide a . commercial1 arx industrial. Rather than giving a detailed description of each. j). mobile home. and Harlow (1984) use six features from the co-occurrence m t i to segment an aerial urban scene into nine classes: residential. Their work is important because it integrates the splitting idea of Chen and Pavlidis into a classification setting. Any region whose likelihood ratio for its highest-likelihood class against each other class was high enough was considered to be uniformly textured and assigned to the highest-likelihood class. Modestino. Chen and Pavlidis (1983) use a similar split-andmerge algorithm.

Pratt. They measured the number of texture elements in a line by measuring the number of changes in brightness along the line.9. The number of changes in brightness was the number of relative extrema. Garber and Sawchuk (1981) use a best-fit model instead of the Nth-order transition probabilities to make a good simulation of texture without exceeding computer memory limits on storing Nth-order probability functions. (1984) add vector quantization to the bidimensional Markov technique of Mome. and the number of texture elements could be measured along two parallel line segments perpendicular to the view direction and two parallel line segmem parallel to the view direction. as do Tou. Witkin (1981) derived . which were then related to three-dimensional distance. variances. Faugeras. The basis of the design was an analysis that related surface slant to the texture gradient in the perspective projection image. Bajcsy and Lieberman (1976) used a Fourier transform to extract texture gradients. Shrpe from Texture Image texture gradients on oblique photography can be used to estimate surface orientation of the observed three-dimensional object. Purdy. Assumptions were that a stochastically regular surface was observed through a perspective projection. which could direct a freely moving vehicle through an undetermined environment. The techniques developed so far assume that the observed texture area has no depth changes and no texture changes within the observed area. They also do not take into account the possibility of subtextures. and Chang (1976). and Massaloux (1981) use an interlaced vertical and horizontal Markov chain method to generate a texture image. Schmitt et al. Schacter (1980a) uses a long-crested wave model. ' Yokoyama and Haralick (1979) describe a technique that uses a Markov chain method. The first work of this kind was done by Carel. Schmitt. Kao. Ma and Gagalowicz (1984) describe a technique to synthesize artificial textures in parallel from a compressed-data set and retain good visual similarity to natural textures. Yokoyama and Haralick (1978) use a structured-growth model to synthesize a more complex image texture. Schmitt. Mome. One important kind of guidance information needed by such a vehicle is the surface orientation of the surface over which the vehicle is moving.17 Shrpe from Textun 483 brief guide to some of the representative papers in the literature. Gagalowicz (1984) describes a texture synthesis technique that produces textures as they would appear on perspective projection images of three-dimensional surfaces. McCormick and Jayaramamurthy (1974) use a time-series model for texture synthesis. They produced a conceptual design of a system called VISILOG. Chellappa and Kashyap (1981) describe a technique for the generation of images having a given Gauss-Markov random field. and Lulow (1%1) and Charton and Ferris (1965). and autocorrelation functions but different higher-order moments. Gagalowicz (1981) gives a technique for generating binary texture fields with prescribed second-order statistics. and Gagalowicz (1978) develop a set of techniques for generating textures with identical means. and Massaloux (1981) to improve the appearance of the texture image.

. r ] into n equal intervals.t).lkl S T o The point ( x .sin2(s) They also gave a modified version of the two-dimensional Newton method for determining the (s. i = 1. i = 1. The edge direction is the gradient direction. assuming isotropic distribution of edge directions for frontally viewed textures.Ty)in the transformed space where All lines that go through the point P map to a circle whose center is P/2 and whose radius is JI P (1 12. assuming parallel edges in frontally viewed textures. t) as proportional to r=. sin2(V)t ) ] [I . in the The way Kender makes use of this transform for the estimation of the normal to a plane in three-dimensional space is as follows: Consider that the plane is densely populated with texels that are just edge segments (edgels). From the perspective y) geometry (see Chapter 13). it is perpendicular to the direction along the line segment. . P. given the observed k(i). . Kender gives a second transform defined by sin(s) cos"(s) Here a l l lines that go through the point P are mapped to a line that goes through the point fP/JIPJJ2 orthogonal direction to P.@. . and measured the number k(i) of tangent directions that fell in the ith interval.) has coordinates T' = (Tx. the perspective projection (u. and (a.. and Dunn (1983) indicated some mistakes in the Witkin paper and gave the joint a posteriori probability of (s. A unit length edge direction E' = (Ex.n. z) satisfies . Each has a position and orientation. .Ey) at position P' = (P. who described an aggregation Hough-related transform that groups together edge directions associated with the same vanishing point.484 Texture equations for the slant and tilt angles of a planar surface under orthographic projection by measuring the distribution of tangent directions of zero-crossing contours. . The slant angle s and the tilt angle t of the observed surface were estimated to be that pair of values maximizing the a posteriori probability of (s. z) can be thought of as the center position of the edgel on the threedimensional plane.y. It is perpendicular to the direction along an edge boundary.n. Davis.. Janos. In the case of a line segment.l)r/n. the ith interval being [(i . Witkin divided the tangent angle interval [0. Each such edgel on the plane in three-dimensional space can be represented as a set of points satisfying (i) (!) +k for some small range of k.y.ir/n]. t) achieving the maximization. . v) of any point ( x . Other work that relates to surface orientation recovery from texture includes that of Kender (1979). are its direction cosines.

regardless of their three-dimensional position. the intersection point (pO. And if the texel edgels are densely spaced on the plane in three-dimensional space. by this go) higher concentration.@. that tbese lines Li must intersect if there is more than one variety of directions to the three-dimensional edgels. The direction cosines for the edgel are proportional to The transform T = &E of any point in position P . Hence.17 Shape from Texture 485 where f is the distance the image projection plane is in front of the center of perspectivity. around the intersection point there must be a higher concentration or density of transformed points. then because the nonnal to the plane must be perpendicular to the direction cosines for al lines l .9. Pi. q ) that lies along an unknown line whose coefficientsare directly related to the direction cosines of the three-dimensional line. yi) will densely populate the line Notice now. then those which have direction cosines (a. The perspective projection of the set of points on the edgel in the plane in three-dimensional space therefore satisfies This set of perspective projection points constitutes a short-line segment or an edgel on the perspective projection image. with unit direction () : = (ay . q ) satisfies So the transform of each edgel on the three-dimensional plane will be a point @.y). can be identified. then the transform of all the pixels having the ith direction cosines (ai.px) (:. If. will densely populate the line And if the edgels on the three-dimensional plane have a variety of different directions.~-+!:z) 1 It is directly verifiable that this point @.

Let the unknown plane where the textural surface is observed satisfy Ax+By+Cz+D=O (9. f). z ) . dimensional plane.the plane contains. we give a simple geometric derivation of a procedure to recover a planar surface normal from the density of texture primitives. go. the number of texture primitives per unit area-on the planar surface is constant and does not vary with position on the surface.6). This point is the three-dimensional point whose perspective projection is (u. there results + . a complete discussion of w i h can be found in Chapter 12.y . Motivated by their idea. 25 x25. From the perspective geometry (see Chapter 13) for any three-dimensional point ( x . Let Cl be the solid angle formed hc by the neighborhood and the center of perspectivity. v ).After substituting Eq. The neighborhood size might be. we use the concept of solid angle.5) where A2 +B2 +C2 = 1. (9.-1) satisfies for all i's as can be seen from the definition of 4 . (9. Let the position of the center of the neighborhood be given by (u.5) and solving for z results in Let d be the distance between the center of perspectivity and the point determined by the intersection of the plane Ax + By Cf + D = 0 with the ray X(u. its perspective projection (u. since @o. Kanatani and Chou (1986) assume that textural density can be measured and discuss a differential geometry approach to relate the measured density of texture primitives on a perspective projection image to the surface nonnal of the planar or curved surface.that is. A relationship exists between the measured density of texture primitives in a neighborhood at a given position and the surface normal. We assume that the density of texture primitives. v ) satisfies where f is the distance from the image projection plane to the front of the center of perspectivity. for example. Then d? = x2 +yZ + z 2 . Their argument leads to a nonlinear set of equations. We select a neighborhood on the perspective projection image of the textural surface and count the number of texture primitives in the neighborhood. v). (9. -1) must be in the direction of the normal to the threego. Hence and y = -vz "=7 " f Substituting the expression of Eq. v .6) into Eq. (po. To develop this relationship.

v . Let the density of texture primitives on the plane Ax + B y Cz D = 0 be k primitives per unit area.f ) formed by the solid angle R at The area on the plane Ax By C z D = 0 that the solid a distance d is angle makes is then C@/nl[ = S2D2 /(nl[)'. + + + + + = R f d u 2 + v 2 + f2 The density y of texture primitives in this area is then We have. we can rewrite Eq.7). v ) and the surface normal n of the unknown plane. without loss of generality.8)in the form where This is the desired relationship between primitive density y at position ( u . there results where are both unit length vectors. (9. Hence.1 7 Shape from Texture 487 After substituting Eq. Now the area orthogonal to the ray X(u.9. The total number of primitives in the area S2D2/(n1[)' is then kS2D2/(n'[)'. . (9. after taking cube roots. the freedom to choose the sign of n so that nl[ is positive. we obtain that the area on the image formed by the solid angle R is Of2 -(?I[ -f/&+v2+ f2 m. Arguing in a similar manner.

that the texture primitives can be divided into types distinguishable from one another on the perspective projection image. They proceed by approximating the perspective projection with an affine transformation and developing the required relationships from the affine transformation.10) in matrix form results in where The system of Eq.Determining the n that maximizes (@n)'bis easy. (9. They do not require that the density of texture primitives be constant on the observed surface. The normal n is not known and must satisfy n'n = 1. We will find the n that maximizes the cosine of the angle between 9 n and hob subject to the constraint n'n = 1. C X . We will develop the analogous relationships in a more exact form using the reasoning we have already given with solid angles. An n maximizes the cosine of the angle between Qn and X.488 Texture Suppose that the measurement of density of texture primitives is made at N different neighborhoods. and Sakai (1981) assume that there are effective procedures for finding texture primitives. and (ui.= W b + 2qn dr . (9. Then there result the N equations [. Putting Eq.11) is overconstrained.i ) is the center position v of the ith neighborhood. and that the area of each instance of a textural primitive on the observed surface is the same.1 av Setting these partial derivatives to zero and solving yields Ohta.!n = hobi where yi is the measured density of the ith neighborhood.= n'n .b if and only if the n maximizes (@n)'b. The parameter ho is not known. Let Taking partial derivatives of r with respect to n and q results in dn . Maenobu.

From Eqs. Let + + + 1 1 Then from Eq.13) Of course the area AI depends on ( u . Consider the situation for two texture primitives having identical areas on the textured plane Ax By Cz D = 0. 9.v ) . It is a constant.v and (u2.14) Next consider the geometry illustrated in Fig. (9.11 of the distances h and h2 from the points (ul.v . Let f be the distance from ( u . be the area of each of the primitives on the textural plane surface. Let the perspective projection image be Ail and A12.17 Shape from Texture 489 Let A .v2) and extend it until it intersects the line Au +Bv +Cf = 0. Then Both R and [ are functions of the position of the primitive on the perspective projection image. Let AI be the area of the perspective projection of the primitive. Let R be the solid angle formed by this area with respect to the center of perspectivity. .9. 2 )to the line Au + Bv + Cf = 0 on the image v plane..v2) + + .) along the extended line along the to the line Au Bv Cf = 0. Then as we have already seen. respectively. AI =L?fdu2 + v 2 + f 2 (9.13) there follows Proceeding as we have done before. Hence Draw the line between ( u .12) and (9. and let f 2 he the distance from (u2. (9.v I ) and (u2.

N points on the line A u Bv Cf = 0 can be determined. (9. The only assumption is that the area of the texture primitives on the textured plane A x + B y + C z D = 0 is identical. Then f = g +f. ) and (242. Figure 9.490 Texture Cf =o.17)and (9. From similar triangles + . a fit of the N points to a line will produce the unknown coefficients n = (A. (9.. v2) with their distance to a line Ax By + + extended line to the line A u +Bv +Cf = 0. ' which is the unit normal to the textured plane.19)contains all measurable quantities. v2).C). Substituting this into Eqs.18)results in and solving for f. 8 '$ + + + . Let g be the distance between ( u . From N pairs of such texture primitives. so that the distance f can be determined.results in Notice that the right-hand side of Eq. the intersection point (us.3) is given by v ?r f- Then from the measured position and area of each pair of texture primitives on the perspective projection image.determined. Let (us. With f. be the point of intersection v3) between the extended line and the line A u +Bv C f = 0.v 1) and (u2.v .11 The similar triangle geometry generated between two perspective projection points (ul . it is possible to determine one point on the line A u + Bu + Cf = 0.B. Since f is given.

must be considered a function of (u. (9. (9.20) becomes where they call X the "textural albedo. Aloimonos and Swain make the approximation that f du2 + u 2 + f I 2 is a constant. (9. They develop a relationship between the area of a primitive on the perspective projection image and the tilt and slant of the textured planar surface. reflectance equation (see Chapter 12). . From our perspective.v). (1981) do. . In this case Eq." noticing the similarity between Eq. 1989a. Then they consider the individual unit normals as the initial approximations to the unit normal in an algorithm that constructs a set of unit surface normals satisfying a smoothness constraint. we have + Hence where X is chosen so that n'n = 1. (9. on the textured plane as where d.vi).8.21) the overconstrained equation . i = 1. then we have from Eq. however. the distance from the center of perspectivity to the texture primitive on the plane Ax + B y + Cz + D = 0.21) and the bmbertian. From the work we have already done.v) and the area A. Suppose the textural primitives are observed as all having identical area and lying on the textured plane Ax + B y + Cz D = 0.17 Shape from Rxtun 491 Aloimonos and Swain (1988) use the same affine transformation approximation to the perspective projection that Ohta et al. it is easy to derive such a relationship by using the surface normal of the textured planar surface rather than its slant and tilt angles. 1989b) also make the assumption that the area of the textural primitives is constant and do not require that the density of the texture primitives be constant. they derive the relationship between the area AI on the image of a texture primitive located at (u. In terms of the geometry we have been doing with solid angles.21)... Then by Eq. If areas of textllre primitives are measured at locations (ui. Aloimonos and Swain suggest solving for X and n for each triplet of neighboring textural primitives and determining a more accurate estimate of h by taking the average of the individual X's. Blostein and Ahuja (1987.N. which is reasonable when the depth of field of the d observed textured plane is narrow and u2+ v 2 <<f '.

2/u3]e-D'"'. it is apparent that by dividing one expression by the other.v) a = [(: 6u v2) (u2 v2)2 u3 + For an image having a disk of diameter D and contrast C centered at (0. respectively. The motivation for the brute-force search is that they accept the reality that it is difficult to determine which regions have the texture of interest and which ones do not. From these expressions. They do a brute-force search over a set of fixed slant-tilt angles to find the (slant. = { c if x2 + y 2 5 ~ ~ 1 4 0 elsewhere the response of the filters V G and & V G at the center of the disk was (&D2/2u2) x e-~'"' and (rCD2/2)[D2/4u3 . Blostein and Ahuja found the texel identification problem to be difficult and gave the detailed procedure they found most satisfactory among the ones they tried. tilt) angle that best fits the observed areas. that is. it is possible to solve for both the diameter D and the contrast C of the disk. By doing a global fit. They extracted texture disklike primitives by convolving the image with a Laplacian of Gaussian kernel and a kernel that was the partial derivative of V G with respect to G : 80 -bLG(u. they are able to identify simultaneously the true texels from among the candidate regions and the slant and tilt. . tilt) angle. ] 3 and Blostein and Ahuja do not analytically solve for the (slant.0).492 where Texture bi = [J- A i ( ~ ivi) .

3. Generate some synthetic 512 x 512 texture images by low-pass filtering a white-noise image with the Gaussian kernel - for o = 1.5.4. digital transform. If 2 a . Exercises 9. providing the union did not have concavities that were too sharp. 3a . At each local extremum. they identified texture primitives as either single disks that did not touch any other disk or the union of overlapping disks. determine the entropy. . mathematical morphology.Exercises 493 This results in What Blostein and Ahuja did was to compute the convolutions V G * I and 5 d . Then we discussed texture segmentation and synthetic texture generation. However. relative extrema density.3. and they permit the calculation of a local surface normal on the basis of the texture density or size of texture primitives at multiple locations on the perspective projection image.2. They identified the position of the local extrema in the V G *I image. 4fi.2 5 D 5 2 a u 2. this has not yet been done.4. autocorrelation. V G * I at six a values: a . and described several kinds of primitives and spatial relationships by which a texture can be characterized. These approaches assume that either the density or the size of the texture primitives is constant. and structural approaches. the techniques are generally not dependable. they used the value of & V G *Iand V * I to compute disk diameter D and constant C. a disk of diameter D was accepted to be centered at the position of the extremum. + We have discussed the conceptual basis of texture in terms of its primitives and their spatial relationships. Qualitatively. shape from texture can work.1. random mosaic models. and 6 a . We reviewed a number of different textural feature extraction techniques. 9. For each of the five images generated by Exercise 9. The degree of dependability could actually be determined by an analytic calculation of the variance of the computed quantities as a function of the Poisson density parameter for the primitive generating process.2. including co-occurrence. Finally. autoregressive Markov random fields. Finally. quantitatively. and homogeneity features of the co-occurrence matrix for distance d = 1. energy.1. 2 a .5. edgeness. 8 a.2. we reviewed a variety of approaches to determine shape from texture.

N. 9.. Repeat Exercise 9.12. 538-544. "Some Experiments with Mosaic Models for Images. 1-1 1. Write a program to identifv the position and size of the textural primitives on the perspective projection image. 9. pp. Use a method that assumes a constant size of texture primitives to estimate the surface normal (A. and A. K. T. N. Then compute the perspective projection of this textured plane. 9. 9.7 with a = 5. determine the relationship between the vector dispersion feature and a. (9.1. Kanal.7 with a = 5. For each of the five images generated in Exercise 9. 9. What is an appropriate way to estimate the variance of the estimated unit normal vector? What is this variance as a function of image size? Try images that are 64 x 64. "Classification of Binary Random Patterns." IEEE Zhnsactions on Systems." IEEE Zhnsactions on Information Theory. Devise an approach that uses both the constant density and the constant size of texture primitives to estimate the surface normal (A.7. What can you say about the relationship between the feature value as a function of a and d? For each of the five images generated in Exercise 9.256 x 256..6. Vol. B.11. determine the relationship between the textural edgeness features and a. PAMI-3." IEEE Zhnsactions on Pattern Anahsis and Machine Intelligences Vol.256 x 256. What is an appropriate way to estimate the variance of the estimated unit normal vectors? What is this variance as a function of image size? Try images that are 64 x 64.C) of the textured plane. 128 x 128.1. Man.1 and 9. Rosenfeld. Ahuja. 9. What can you infer about the relationship between a and the h(i.4. pp. C) of a textured plane f o the image produced in Exercise 9. Harley. C) of the textured plane.12 to estimate the normal (A. 128 x 128. IT-11.3. Repeat Exercise 9. and A. 128 x 128.7. 9.10.1. and 512 x 512. 1981. and 512 x 512. use the Markov random field model to estimate the coefficients of linear combinations h(i. and Cybernetics. and 512 x 512. Dubitzki.C) of the textured plane from the image produced in Exercise 9. B. 744-749. Ahuja. N. 9.13 for a number of trials with different texture images generated as in Exercises 9. T. B. Vol. B.1 and 9. 1980. 1965. What is the value for a suitable measure of the variance of the estimated unit normal vector as a function of image size? Try images that are 64 x 64. "Mosaic Models for Textures. For each of the five images generated in Exercise 9. 9. 9.13.8.14.9. j) from Eq. Rosenfeld.1 and 9. rm Repeat Exercise 9. . and L.. For each of the five images generated by Exercise 9. determine the relationship between the texture energy features of Laws and a.9 for a number of trials with different texture images generated as in Exercises 9.1. Use a method that assumes a constant density of texture primitives to estimate the surface normal (A.8 for a number of trials with different-texture images generated in Exercises 9. Write a program that uses the method of Exercise 9. 9.7.256 x 256.j) coefficients? Use the textured image of a = 5 to texture a plane Ax By Cz + D = 0 in three-dimensional space. pp.494 Texture 9.7 with a = 5. J.5. + + Bibliography Abend.4). SMC10.

.

.

.

.

.

.

CHAPTER

IMAGE SEGMENTATION

Introduction
An image segmentation is the partition of an image into a set of nonoverlapping regions whose union is the entire image. The purpose of image segmentation is to decompose the image into parts that are meaningful with respect to a particular application. For example, in two-dimensional part recognition, a segmentation might be performed to separate the two-dimensional object from the background. Figure lO.l(a) shows a gray level image of an industrial part, and Figure lO.l(b) shows its segmentation into object and background. In this figure the object is shown in white and the background in black. In simple segmentations we will use gray levels to illustrate the separate regions. In more complex segmentation examples where there are many regions, we will use white lines on a black background to show the separation of the image into its parts. It is very difficult to tell a computer program what constitutes a "meaningful" segmentation. Instead, general segmentation procedures tend to obey the following rules.
1. Regions of an image segmentation should be uniform and homogeneous with

respect to some characteristic, such as gray level or texture. 2. Region interiors should be simple and without many small holes. 3. Adjacent regions of a segmentation should have significantly different values with respect to the characteristic on which they are uniform. 4. Boundaries of each segment should be simple, not ragged, and must be spatially accurate. Achieving all these desired properties is difficult because strictly uniform and homogeneous regions are typically full of small holes and have ragged boundaries. Insisting that adjacent regions have large differences in values can cause regions to merge and boundaries to be lost.

Figure 10.1

[ a ) Gray tevcl imaRe of dn ~ndustrial pan and Ih) thc Image Into nhject Iwhlte) and back~rt)undIbIack)

cegrnentatlrln

of

Ciusferinp in pattern recognition is thc process of partitioning a set of patrern vectors into suhsets called clusters (Young and Calvert. 1974). For example, if the pattern vectors art pairs of real nurrben illustrated by the point plot of Figure 10.2. clusrering consists of finding suhwtr of points that are "close" to one another in Euclidean 2-space. As there i s no full theory of cluslering, there is no full theory of image segmentation. Image segmentation techniques are basically ad hoc and differ precisely in the way they emphasize one or more of the desired properties and in the way they balance and compromise one des~red property against another. The difference bemeen image segmentation and clustering is that in clustering, the grouping is done in measurement space: In image segmentation. the grouping is done on the spatial domain of the image, and there is an interplay in the clustering between the (possibly overlapping) groups in measurement space and the mutually exclusive groups of the image segmentation.

Flgure 10.2 Set of points in a Euclidean measurement space that can be separated rnro three clusters of points. Each cluster consists of poin~s tha~ rn some sen= are close to one another.

10.2 Measurement-SpaceGuidsd Spatial Clustering

51 1

This chapter describes the main ideas behind the major image segmentation techniques and gives example results for a number of them. Additional image segmentation surveys can be found in Zucker (1976), Riseman and Arbib (1977), Kanade (1980), and Fu and Mui (1981). Our point of view will be segmentation with respect to the gray level characteristic. Segmentation on the basis of some other characteristic, such as texture, can be achieved by first applying an operator that transforms local texture to a texture feature value. Texture segmentation can then be accomplished by applying segmentation with respect to the texture pattern value characteristic exactly as if it were a gray level characteristic.

Measurement-SpacsGuidedSpatial Clustering
The technique of measurement-space-guided spatial clustering for image segmentation uses the measurement-spaceclustering process to define a partition in measurement space. Then each pixel is assigned the label of the cell in the measurementspace partition to which it belongs. The image segments are defined as the connected components of the pixels having the same label. The segmentation process is, in general, an unsupervised clustering, since no a priori knowledge about the number and type of regions present in the image is available. The accuracy of the measurement-spaceclustering image segmentation process depends directly on how well the objects of interest on the image separate into distinct measurement-space clusters. Typically the process works well in situations where there are a few kinds of distinct objects having widely different gray level intensities (or gray level intensity vectors, for multiband images) and where these objects appear on a nearly uniform background. Clustering procedures that use the pixel as a unit and compare each pixel value with every other pixel value can require excessively long computation times because of the large number of pixels in an image.. Iterative partition rearrangement schemes have to go through the image data set many times and, if done without sampling, can also take excessive computation time. Histogram mode seeking, because it requires only one pass through the data, probably involves the least computation time of the measurement-space-clustering technique, and it is the one we discuss here. Histogram mode seeking is a measurement-space-clustering process in which it is assumed that homogeneous objects on the image manifest themselves as the clusters in measurement space. Image segmentation is accomplished by mapping the clusters back to the image domain where the maximal connected components of the mapped back clusters constitute the image segments. For images that are single-band images, calculation of this histogram in an array is direct. The measurement-space clustering can be accomplished by determining the valleys in this histogram and declaring the clusters to be the intervals of values between valleys. A pixel whose value is in the ith interval is labeled with index i , and the segment it belongs to is one of the connected components of all pixels whose label is i. The thresholding techniques discussed in Chapter 2 are examples of histogram mode seeking with bimodal histograms.

Figure 10.3 Enlnrgcd pollrl>tclmineral
o pyrltc: ~ h c f gray areah conctitutr
hnlcs.
3

err seclion. The h r ~ p h ~ arras are grains rnalrix I>C pyr(1rhotile. Ihe hlack areas are

Ftgure 10.3 illus~ratcsan oxarnplc image that is the ri_eht kind of image for the measz~rernenl-spacc-cluste% image segmentation process. It is an enlarged image of a polished mineral ore section. The width of the field is about l m m . The ore is from Ducktown. Tennessee, and shows subhedral-to-enhedtal pyrite porophyroblests (while) in a matrix of pyrorhotite (pray). The black areas are holes. Figure 10.4

**

** tt ***
tt*
**f
* & A

't ?

.* '* *t **
.A

ttt*

****
*+** I*** t*t*
ttt*

*f
tf

**** ****

.* .*

tft* I***
ttt*

** .* *t .* **
f*

If*
t**

*.**I

tt*t+

.ft

Dnra values

Figura 10.4 H~stopramo f the image In FIE. 10.3 The three nanuverlapplnp nu~drc corre\rond to Ihe hlack hole\. lhe prorhot~le,and the pyntc

-r

10.2 Measurement-SpaceGuided Spatial Clustering

5 13

Figure 10.5 Sepmentarion of the imape nT Frp. histogram o f F y . 10 4

In 3 pruduced

h) cluh~erin? ihc

shows the histogram of this image. The valleys arc no trouble ~tufind. The first

cluster is from the left end to the hrst valley. The second cluster is from the firxi valley to thc sccond valley. The third cluster 11; from the second vatley to the right end. Assignine to each pixel the cluster index of the cluster to which it belongs and thcn assignme a unique gray level tn each cluster label yields the segmentation +own in Fig. 10.5.This is a virtually perfect (meaningfull segmentation. Figure 10.6 shows an example image that is not idcal for measurement-spaceclustering imagc segmentation. F i ~ u r e10.7 shvws itc histogram, which has three modes and twn valleys, and Fig. 10.11 show the correxponding segmentation. Notice the multiple-boundary arca. It is apparent that the boundary bcrwcen rhe grain and

Figure 10.6 Image sinltlar In

some rcspectk to the Image of Fig. 10.3. Recause some d' the boundaries herwscn reprons are shadowed, bornopeneou~reglon seementatinn m y nut produce [he desired wgrnentation a

54

Image Segmentation

I *

' t i

Figure 10.7 Hictogram o l the image of Fig. I0.h

Figure 10.8 Scprlwntatinn of I hc Itnage nl Fig. 10.6 produced h cluswring the !
Iii+tt~gr;~rn F L ~ . of 111.7.

10 Figure 10.9 is a diagram of an E. Images of portions of the bulkhead.15 bulkhead. the results are not what w e desired. and there are many such border regions that show up as dark segments. will be used as examples throughout the rest of this chapter.10. this rhe background is in fact shaded dark.9 An F-15 bulkhead. Images of portions of the bulkhead are used as cnamples ~hroughout chaprer. The next example further illustrates the fallac~esof measurement-space clustering.I 5 hulhhcad .10 Image of A htctlon af the F . Figure 10.2 Measurement-SpaceGulded Spatial Quslerlng 5 15 Figure 10. which were used as test data for an experimental robot guidancelinspection system. and although the segmentation procedure did exactly as ~t should have done. Figure 10. This il1us:rates that segmentation into homogeneous regions is not necessarily a good solution to a segmentation problem. In t h ~ s case we do not desire the edge borders to be wparate regions.

If there is more than one cluster. but the detail provided is much less. Since the clustering was done in measurement space. Price. Measurement-space clustering enables the separation of one mode of the histogram set from another mode. and the resulting boundaries are very noisy and busy. It has two well-separated modes. a histogram of the masked image is computed. 10. Ohlander.12. 10.10. then each connected component of a l pixels with the same l cluster is. such as webs and ribs. If there is only one measurement-space cluster. Figure 10. They begin by defining a mask selecting all pixels on the image. It is clear that the image has distinct parts. The main mode has three valleys on its left side and two valleys on its right side. and Reddy (1978) refine the clustering idea in a recursive way. Here the boundary noise is less and the resulting regions are more satisfactory. Pixels on the image are then identified with the cluster to which they belong. there was no requirement for good spatial continuation. then the mask is terminated. Given any mask. illustrates an image of a section of the F-15 bulkhead.11 shows the histogram of this image. The problem in the segmentation is apparent.5 16 Image Segmentation 36 53 80 103 125 147 170 192 214 236 Data values Figure 10. used to generate a mask that is placed on a mask stack. 10.13. Defining the depth of a valley to be the probability difference between the valley bottom and the lowest valley side and eliminating the two shallowest valleys produces the segmentation shown in Fig.11 Histogram of the bulkhead image of Fig. Separating the main mode into its two most dominant submodes produces the segmentation of Fig. in turn. The narrow one on the right with a long left tail corresponds to specular reflection points. During successive iterations the next mask in the stack selects pixels in the histo- .

13 Sepmental~on nl I hc hub hhsacl lncluced h a mcawrement-space ! clustering Into three clu~ter< gram corl. Pixels. .14 illustrate< this procehs.12 Scgrnentat~rjn of ~ h cbulkhead ~nducedby cluttering into five cluster\ a measurement-space f Igure 10. 10.2 Measurement-SpaceGuldd Spatial Clustering 51 7 Figure 10. t h ~ n regions arc also lost. w h ~ c hwe call a recursive histogramdirected spatiai clustering.. Figure 10.10.rputatlon prnccss. It produces a result witn boundarieh being somewhat busy and with mart) smail regions in areas or specular reflectance. Figure 10.16 illustrates the rewlts r ~ fperforming a rnorphalog~calopening with a 3 r 3 hquarc structuring element on each region in the segmentation of Fig.15. The tiny regions arc removed in this manner. are then given values by a single region growing pmcess.10. which are removed hy the opening. 10.trates a rccursi\lc histogram-directed spatial-clustering technique applled to thc bulkhead image of Fig. Clurtcring i h repcared for each new inask until the stack is cmpp Figure 10.15 illu. but 5everal inlportant long.

Flgure 10.14 Recurslbe hlrtogram-dlreclcd spatlnl-clu<~erlnpxheme of Ohlander.Push O r ~ p n ama\k cavers entlre Image l ? I Figure 10. .3 5 Result$ o the h~rtogram-d~rected f fpat~alclustering urnhen applied to the bulkhead image.

and the second by the same method.f 7 . but on a set of variables closer to what the Karhunen-Loeve (principal components) transform would suggest. Figure 10. and Sakai (1980) suggest that histograms not be computed individually on the red.8 ) / 2 .8 ) / 4 . Kanade.17 illustrates a color image.10. green. ut perfr>rnm~nc n~urpholtglcal a opening Figure 10.R . Kanade. Figure 10.B ) / 3 . but using the transformed bands suggested by Ohta. 2nd (2G .16 Heculr. removed prxels by a sinclc repon prn&lnp process Figure 10. and Sakai. . and B bands.18 shows wo segmentations of thc color image: one by recursive histogrm-directed spatial clustering using the R . For ordinary color images Ohta. ( R .2 Maasurement-SpaceGulded Spatlnl Clustering 5 19 w l t l ~ ? x 3 %quare d structuring elemcnt on the <egmeuratlon oi F I ~ lfr and then filling In the 10. G . They suggest ( R + G I . and blue (RGB) color rariahles.4 crllor itnngc.

They reason that there w ~ l lhc a shoulder of the gray level lntcnsity fi~nctionat each side of the . [measurement-space clustering amounts to determining a thre\hold ~ u c h all pixelq whose values are less than or equal to the threshold are that ass~pned one cluster and the remaining pixels are assigned to the second cluster. it is not always the case that the two modes arc n~ccly separated by a valley. Thus. Unfortunately.ldr. a ne~ghrhorhoodsize of 33 x 33 or 65 x 65 can be uzcd t o compute the local histogram. computing the histoyram for each block. To obbrain thresholds for the remaining pixul\.510 Image segmentation Figure 10. 10. and (?C. and RosenfeId ( 1974) suggest determining a histogram for only ~hosrprels liav~nga hrgh Laplacian magnitude Chaptcr 6). hut using rhe tr. and dctcrrnining an apprnpriatc threshold T r each histogram.2.1 Thresholding I F thc Irnagc contains a bright object against a dark background and the measurement apauc is one-dimensional. t h q spat~iillyintcrpolated the block center-pixel thresholds to obtain a spatially adapt~\c thrcshold for each p ~ x e l .K. to In the easiest cases a prnccdurt: lo determine the threshold need only examine the histogram and place thc threshold in lthe valley between the two modes.B ) '4 wgpr\tcd liy Ol~t. .in+h)rr~lcd b.K . Chow and Kancko avoided the local histograin cornputatlcln for each pixel's neighborhood by dividing the image Into rnufuaily CXC~USILC blocks. .Nagel. and SaLi I IYROI. The lrfr reptl~c 6 .~n.R I ' 2 . Chow and Kaneko (1972) suggert using a threshold that depends on the kisto~ra111 dctcrrnined from the spatially local area around the plxcl to which the threshulcl .jnds IR G 8 ) 3. This threshold value can bc considered to o xpply IO the ccnler p~xelof each block.18 Two \cgmcntallun\ rlC the color inlare of Fig 10 17. W r s ~ k a . To handle this kind of situation.). Procedures of th~kkind were d~suusscdin Chapter 2 . t K . for example.lnrl R hands.ipplies. The right scenlcntntinn was achieved hy thc same m e ~ h o d K. o variep uf tcchniqucs can be used to combine the spatla! lnformation on the image with the gray level intcns~tyinformation to help in threshold determination.

I(k. It will also have a tendency to involve equal numbers of pixels from the object and from the background. l Kohler (1981) suggests a modification of the Watanabe idea. then it should be possible to eliminate their effect by considering only pixels having small gradients. it is even possible to determine multiple thresholds when more than one kind of object is present. Kohler defines the set E(T) of edges detected by a threshold T to be the set of all pairs of neighboring pixels one of whose gray level intensities is less than or equal to T and one of whose gray level intensities is greater than T.2 Measurement-SpaceGuided Spatial Clustering 52 1 boundary.l)} 5 T < max{I(i. . For any threshold. j ) and (k.l)}. Instead of choosing a threshold that maximizes the sum of gradient magnitudes taken over all pixels whose gray level intensity equals the threshold value. This makes the two histogram modes about the same size. A histogram of all shoulder pixels will be a histogram of all interior pixels just next to the interior border of the region. j). min {I(i.10. Milgram and Herman (1979) reason that pixels that are between regions probably have in-between gray level intensities.l) are neighbors 2. It will not involve those pixels between regions that help make the histogram valley shallow. Watanabe (1974) suggests choosing a threshold value that maximizes the sum of gradients taken over a l pixels whose gray level intensity equals the threshold value. Kohler suggests choosing the threshold that detects more highcontrast edges and fewer lowcontrast edges than any other threshold. The shoulder has high Laplacian magnitude. The total contrast C(T) of edges detected by threshold T is given by (10. E ( n = {[(iyj)y(kyl)ll 1.1) The average contrast of all edges detected by threshold T is then given by C(T)/#E(T). busyness is the percentage of pixels having a neighbor whose thresholded value is different from their own thresholded value.I(k. Thus the valley-seeking method for threshold selection has a chance of working on the new histogram. Weszka and Rosenfeld (1978) describe one method for segmenting white blobs against a dark background by a threshold selection based on busyness. They take this idea further and suggest that by examining clusters in the two-dimensional measurement space consisting of gray level intensity and gradient magnitude. pixels (i. The best threshold Tb is determined by the value that maximizes C(Tb)/#E(Tb). j). If it is these pixels that are the cause of the shallow valleys. A good threshold is the point near the histogram valley between the two peaks that minimizes the busyness.

Figure 10.21 illustrates the FLIR image thresholded at 159 and 190. If it is not an edge.19 Diagram showing how the threshold of the Panda and Rosenfeld technique depends on the gradient magnitude. then the histogram will be unimodal. 10. . consider the histogram of gray levels for all pixels that have high gradients. If a pixel has a small gradient.19. The result is that only the boundary of the appendage is picked up. If it is an edge separating a bright blob against a dark background. i p i Gradient Gray tone Figure 10.20 illustrates a FLIR image from the NATO data base. where the gradient is computed as the square root of the sum of the squares of the linear coefficients arising fiom a gray level intensity cubic fit (see Chapter 8) in a 7 x 7 window. which one might think has the right characteristics for this type of segmentation algorithm. and if the separating boundary is not sharp but somewhat diffuse. The form of the decision boundary in the two-dimensional measurement space is shown in Fig. Thus Panda and Rosenfeld suggest determining two thresholds: one for low-gradient pixels and one for high-gradient pixels. Consider the histogram of gray levels for all pixels that have small gradients. Figure 10.522 Image Segmentation Panda and Rosenfeld (1978) suggest a related approach for segmenting a white blob against a dark background. By this means they perform the clustering in the two-dimensional measurement space consisting of gray level intensity and gradient. the assumption of a homogeneous object on a dark background is not met.23 shows the horseshoe-shaped cluster in the two-dimensional gray level intensitygradient space where the gray level intensities and the gradient values have been equal-interval quantized. Figure 10. then it is either a dark background pixel or a bright blob pixel.22 shows the pixels having a large gradient magnitude. Hence the histogram of all pixels having small gradients will be bimodal. If a pixel has a high gradient. then it is not likely to be an edge. Figure 10. and for pixels with s m d gradients. Figure 10. Next. the mean being a good threshold separating the dark background pixels from the bright blob pixels. A survey of threshold techniques can be found in Weszka (1978). then it is likely to be an edge. the valley between the two modes of the histogram is an appropriate threshold point.24 illustrates the resulting segmentation. Notice that because there is a bright object with a slightly darker appendage on top.

5 and a neighhorhorrd v z e nf 15. rlct~) anc 1Wl {rlghr) 10 20 thrcsho1clt.d dl prd! level rntens~ty I5Q F ~ Q U10..2 Measurement-SpaceGulded Spatla( Clustering 523 Figure 10.20 F U R I m q c trum thc NA1'0 data hase. Figure 10.21 FLIR rrnagcl n! 1-1.10.I-IK ~B IrndTs I~axjnglarge g r ~ d ~ e rnagnitudc. ot rhr 1. To reduce nlnse. i t was lilrered w t h a Gdussian hller ( w e Chapter 61 u i t h a Gpma nf 1.22 Pixel+. nt .

Figure 10. the array Figure 10. For example.j ) in one band corresponds to pixel (i. 10. For multihand images such as LANDSAT or Thematic Mapper.2 Multidimensional Measurement-Space Clustering A LANDSAT image comes from a satellite and consists of seven separate images called bands.20.24 Segrnentat~vnof Rosenfeld ~chcrnc the Image in Fig. 10 20 u w g the Panda and .j) in each of the other bands. in a six-band image where each band has intensities between 0 and 99. The gray level intensly 1s along the y-axis and the gradlent is along the x-axis Notice the nicely bimodal gray level intens@ distribution for small gradaent magnitudes. 10. determining the histogram in a multidimensional array is not feasible.33 Scattergram of the pray Icvel intensity-gradient measurement space Tor the Image of Flg. Each band represents a particular range of wavelengths. The bands are registered so that pixel ( i .2.

000pixels per row by 10. Based on this fact. 10. can be used for this purpose.000 rows. Then they perform a measurement-space connected components on these N-tuples to collect all the N-tuples in the top of the most prominent modes.1 SinglsLinkage Region Growing Single-linkage region-growing schemes regard each pixel as a node in a graph. the counting required for the histogram is easily done by mapping the 6-tuples into array indexes.~4).x3. Neighboring pixels whose properties are similar enough are joined by an arc. 1978) threshold the multidimensional histogram to select all N-tuples situated on the most prominent modes. If the clustering done in bands 1 and 2 yields clusters cl. These measurement-space connected sets form the cluster cores.x4) is in cluster c4. which is described in most data structures texts.(~2. 1977) is to locate peaks in the multidimensional measurement space and region-grow around them. Suppose. then each possible 4-tuple from a pixel can be given a cluster label from the set {(cl. Rather than accomplish-the clustering in the full measurement space. it is possible to work in multiple lower-order projection spaces and then reflect these clusters back to the full measurement space.3.~~))A4-tuple (x1.10. since peaks fall in different places in the different histograms.xZ)is in cluster c2-and (x3.c4) if (x1. Clustering using the multidimensional histogram is more difficult than univariate histogram clustering. The clusters are defined as the set of a l l N-tuples closest to each cluster core.x4)gets the cluster label (c2. The region growing includes all successive neighboring N-tuples whose probability is no higher than the N-tuple from which it is growing.c2. for example. Adjacent mountains meet in their common valleys.~5).~4).~5). constantly descending from each peak.c3 and the clustering done in bands 3 and 4 yields clusters c4 and c5.(~3. A large image might be 10. This constitutes only lo8 pixels.(~3. a sample too small to estimate probabilities in a space of 1012values were it not for some constraints of reality: (1) there is typically a high correlation between the band-to-band pixel values. An alternative possibility (Narendra and Goldberg.3 Region Growing 525 would have to have 1006 = 1012locations. Figure 10. They .(~1. Goldberg and Shlien ( 1977. In this example two pixels are connected by an edge if their values differ by less than 5 and they are 4-neighbors.(~2.x2. that the clustering is done on a four-band image. Single-linkage image segmentation schemes are attractive for their simplicity.~4). Both these factors create a situation in which the 108 pixels can be expected to contain only between lo4 and 1V distinct 6-tuples. The image segments are maximal sets of pixels all belonging to the same connected component. The programming technique known as hashing. and (2) there is a large amount of spatial redundancy in image data. with the connected components circled.25 illustrates this idea with a simple image and the corresponding graph.

25 Simple gray level image and graph resulting from defining "similar enough" to be ditfering in gray level by less t a 5 and using the Cneighborhood hn to determine connected components (see Chapter 2). 10. neither of which is an edge. a 10. the obvious generalization is to use a vector norm of the pixel difference vector. Bryant (1979) defines "similar enough" by normalizing the difference by the quantity times the root-mean-square value of neighboring pixel differences taken over the entire image. because it takes only one arc leaking from one region to a neighboring one to cause the regions to merge. for a region cannot be declared a segment unless it is completely surrounded by edge pixels. Similarity is thus established as a function of neighboring pixel values. Simple operators.22. For pixels having vector values. As illustrated in Fig. For the image of Fig. The hybrid techniques seek to assign a property vector to each pixel where the property vector depends on the K x K neighborhood of the pixel. however. The quality of this technique is highly dependent on the edge operator used. Neighboring pixels. the normalization factor is 99. may provide too much region linkage. The initial segments are the connected components of the nonedge-labeled pixels. Pixels that are similar are so because their neighborhoods in some special sense are similar.3. do. . labeling each pixel as edge or nonedge. The random variable that is the difference of two neighboring pixels normalized by the factor 1199.2 HybribLinkege Region Growing Hybrid single-linkage techniques are more powerful than the simple single-linkage technique. A threshold can now be chosen in t e r n of the standard deviation instead of as an absolute value. and this makes the technique better behaved on noisy data.22.25. are joined by an arc. such as the Roberts and Sobcl operators. 10. Figure 10. The edge pixels can either be left assigned as edges and considered as background or they can be assigned to the spatially nearest region having a label. Here an edge operator is applied to the image. the simplest single-linkage scheme defines "similar enough" by pixel difference. Two neighboring pixels are similar enough if the absolute value of the difference between their gray level intensity values is small enough.26 shows an example of this phenomenon. One hybrid single-linkage scheme relies on an edge operator to establish whether two pixels are joined with an arc.25. have a problem with chaining.526 Image segmentation Figure 10.22 has a normal distribution with mean 0 and standard deviation 99.

30 report some succcss using this technique t \ ! \ 1. then an edge e !k I~claredin the neighborhood's center pixel. however.r~i.llick and Dinstcin (1975).~tives easily determined frnm thu polynomial. If r h zradicnt is high enough and if. ill thc c d ~ c s that can c a w e prrlhiern. i n a wrmentalron perf(>me(lhy r d k ~ n gccmncclcd component5 of nnnerlpc pixelc l i:~r. The first and secnnd partial rlr. The first partla! derivati\es are . each neighborhood I F leastnal +.10.if ~lrr center pixcl determine the gradient drrection.27 shows t h e edyes resu. Pcrkins (1980) uses a techniqi~c. Flaralick 1 1982.rllKmfi fitted with a cubic polynomial in two variables. In thir technique. the second d~rectional ~ L l c r ~ r n t i v has a negatively sloped zero crossing insidc thc pixel's area. In the gradient direction. With the direction fixed to b e the cr. They perform a dilation of the edgc pixcls In order ro close c.iINDSAT data. (This cdgc opcrator is described In rrirlre detail in Chapter 8.) F~qure10.ting from thc second directional derivative k~:jr~l:~r .3 Region Growing 527 Flgure 10.26 Edye Image n ~ r h pdp. 1984) discusseq a very ~cnsitivczero crossing of the qecond t i ~ r ~ c t ~ o derivative cdpe operator.tdicnt direction. the second partla! 5 determine the second directional derivative.~p\brfore perfnrrning he connected c3mponents operatnr.

Thc cdgcs arc well placed. After the connected components operation.i auprncntcd hl. and a zero-crossing radius o f 0. ncithcr ot' which is an edge pixel. Yakimovsky ( 1976) assumes regions arc normally distributed and uses a maximum-likelihood test to determine edges.:I\ rhtalned h r thc tdpc IIIIU~C on 0 Ir1p. and a careful examination of pixels on perceived boundaries that aru not classified as edge pixels will indicate the step edge pattern to be either nonexixtent o r weak. Obviously some regions have been merged. a 9 x 9 neighborhood. A connected components on the nonedge pixels accomplishes the ~nitialsegmentation. But this solution does not really solve the gap problem in general. Lowering the gradient threshold of the edge operator could produce an image with more edges and thereby reduce the edge gap problem. the maximum-l ikelihnnd technique computer the mean anti the scatter Figure 10. those boundaries that are present are placed correctly and they are reasonably smooth. 111 27.28 shows the boundaries from the reginn-filled image. Figure 19. For any pair of adjacent pixels with mutually exclusive neighborhoods R l and H? having 111. can lbnk togcrher. However.85. respective1y. The reculling scgmentatlon consisrs ot ~hc. 1 .528 image Segmentation zero-crossing operator using a gradient threshold of 4. are cqual has to be rejected. crmnrcted cotnponcnts nt' the nonrrlge plxc1.28 Hyhricl-linhgu rcpun-growlnp scheme in which any pair of neiphhnring pirelc. asripninp edpe plxrlc 11) ltic~r ncarcst connccted curnprlntrlk. Thii rcsult \\. Edges are declared to exist between pait\ of contigwous and exclusive neighborhoods if the hypothesis that their means arc equal and their variance!. and N 2 pixels. the edge pixels are assigned to their spatially closest cnmponcnl by a region-filling operation.

(y . To determine a roof or V-shaped edge. Call this list the similar-neighbor list.2)NIN2 (XI -X2)2 (10. where we understand neighbor to be any pixel in the K x K neighborhood. each pixel examines its K x K neighborhood and makes a list of the N pixels in the neighborhood most similar to it. As N l and N2 get large. If it can be assumed that the variances of the two regions are identical. Similarity can be established by computing S = w . To make the shared-neighbor technique work well.y)2 w2(x.8) F= N I +N2 s: s. b) denote the property vectors for two pixels if x is the gray level intensity value and a is the average gray level intensity value in the neighborhood of the first pixel. For example. For an F-value that is sufficiently large. An arc joins any pair of immediately neighboring pixels if each pixel is in the other's shared-neighbor list and if there are enough pixels common to their shared-neighbor lists. we can have ( x . that is. 210g t is asymptotically distributed as a chi-squared variate with two degrees of freedom. +N2 degrees of freedom under the hypothesis -2 that the means of the regions are equal.a) and (y. Haralick suggests fitting a plane to the neighborhoods on either side of the pixel and testing the hypothesis that the coefficients of fit. then the statistic (N1+ N2 .a12 (10. y is the gray level intensity value and b is the average gray level intensity value in the neighborhood of the second pixel. if the number of shared neighbors is high enough. Edge pixels correspond to pixels between neighborhoods in which the zero-slope hypothesis has to be rejected. has an Fdistribution with 1 and N.9) + + + . Haralick (1980) suggests fitting a plane to the neighborhood around the pixel and testing the hypothesis that the slope of the plane is zero. the hypothesis can be rejected and an edge declared to exist between the regions. referenced to a common framework.by Levine and Leemet (1976) is based on the Jarvis and Patrick (1973) shared-nearest-neighbor idea.3 Region Glowing 529 as well as the grand mean (10.5) and the grand scatter The likelihood ratio test statistic t is given by Edges are declared between any pair of adjacent pixels when the t-statistic from their neighborhoods is high enough. each pixel can be associated with a property vector consisting of its own gray level intensity and a suitable average of the gray level intensities of pixels in its K xK neighborhood.10. Another hybrid technique first used. are identical. ( x . Using any kind of reasonable notion for similarity.b12 w.

29 C)ne Itrratlon ot rtlc Po[>. For each region of the initial sesmentatlon. . and 20.and M I Xare nonnegative weights. Figure 10.10. 11984) suggest an approach to segmentation based on the fauct rnldcl of images. This gives a new segmcntatjon.31 illustrate the results of' the Pong approach on the imagc of Fig. The initial segmcntatiun~ used by Pong group together pixels that have s~milarfacet-fitting parameters ( s e c Chapter 8).) Then adjacent regions having simllar final propersp vectors are mcrged. and its elongation. (The function that worked bcst in P o n g ' ~ experirnenti replaced the property veclor of a region with the propcry vector of the best-fitt~n~ nelphborhood clf that region. . The attributes consist of such propertics o f a region as its area. Pong et a!. which is a li\t o f values of a set o f predefined attribures. 10. 10. Thus thc quantlty S takes into accoun the drferencc between the pray levels of thc two pixels in question and the difference beween thc gray level of each pixel and thc average gray level o f the neighborhood ol' thc cwher pixel. is computed.530 ~megeSegmentation whcrc w . and three iterations. u!porirhm on the b~ilkhcorllltlayc 131 Fig. a prnperty vector. Each rtgion with an associated property vector is considered a unit. F~gurcs10.and w . hut any in~tialscgmcntation can be used. respectively. w ? . two. which can then bc used as input to the algorithm. irs mcan gray level. In.31. In a series ut rterations thc property vector of a rcglon is replaced by a p r o p c q vector that is a function nf its neighboring regions.. . The weights I*>. can be learned from training data for a pan~uular l a ~ s images.29. In.30. Flgure 1'0. The pixels arc called similar enough for small enough c of r. Thu\ a sequence nf coarscr and coarser segmentations is produced. w. The pmcedusc start5 with an initial se_pmcntatlon of the image inlo m a l l regions.32 illustrates the result of removing r g i o n s c ~ fsize 25 or fewer pixels from the segmentation of Fig.due\ af S . Useful variarions are to prohibit merging across strong edge boundaries or when the variance of the cornhrned region hecomes too large. 10 for nne.

cc. . 10.'?t~l:jeti ~ t l ~~erricrl : mi? .!?r:b:m ~ t size 25 ~ ~ ~ : ~ from the segmentatlun of Fig.tln\ ~ t m~.:.32 ~ ~ ~L.31.Flgbro 70.

Pixel i belongs to region Ri . If the difference between x and andj is significantly different. = 1. the image is scanned in some predetermined manner. i afd . the diffmnce is small enough. Defbe the mean and scatter SZby and Under the assumption that all the pixels in R and the test pixel y are independent Figure 10. Let R be a segment of N pixels neighboring a pixel with gray level intensity y. then it is added to the closest region.33 ~ e g i o n . Keeping track of the means and scatters for aU regions as they are being determined does not require large amounts of memory space. Another possibility is a single-band region-growing technique using the T-test. Hence a hash table mechanism with the space of a small multiple of the number of pixels in a row can work well. There cannot be more regions active at one time than the number of pixels in a row of the image. then y is added to the closest region. If for two regions R and R j .d g eometry for one-pass scan. in contrast to single-linkage region growing. Pixel y is added to a region Rj if by a T-test the differcna betwen y and X i is small enough. regions Ri and R j are merged together. and y is added to the merged region. such as left-right. Rather. If more than one region is close enough. if the means of the two competing regions are close enough. However. left-right. top-bottom.3.2'3. topbottom g region growing.whose mean is Xi. the two regions are merged and the pixel is added to the merged region. If no neighboring region has a close-enough mean. pairs of neighboring pixels are not compared for similarity. If its value and the segment's mean value are close enough.33 illustrates the geometry of this scheme. i 4.532 image Segmentatkn 10.3 Centroidlinkage Region Growing In centroid-linlcage region growing. Figure 10. i and if the diEennce between x i and and1 is small enough. then the pixel is added to the segment and the segment's mean is updated. A pixel's value is compared with the mean of an already existing but not necessarily completed neighboring segment. then a new segment is established having the given pixel's value as its first member.

the probability that the test provides an incorrect answer is a. The fraction a represents the probability that a T-statistic with N . T i behavior tends to prevent an already large region from attracting to it many other additional pixels and tends to prevent the drift of the region mean as the region gets larger.r-)' N(X. If y is different from a l l its neighboring regions. The value of TN-. To illustrate this method. the value y is not likely to have arisen from the population of pixels in R. Pavlidis (1972) suggests a more general version of this idea. To avoid the problem of division by 0 (for SZis necessarily 0 for one-pixel regions as well as for regions having identically valued pixels). A slightly stricter linking criterion can require that not only must y be close enough to the mean of the neighboring regions. we use an a-level statistical significance test.-Xdd)? If t is too high. The significance level a is a user-provided parameter.34. the statistic has a TN-I distribution. pairs of neighboring regions can be merged if for each region the sum of the squares of the differences between the + + . but a neighboring pixel in that region must have a close-enough value to y.1 degrees of freedom will exceed the value TN_l(a). This mechanism keeps the degrees of freedom of the T-statistic high enough so that a significant difference is not the huge difference required for a T-statistic with a small number of degrees of freedom.14) S L + S L (Y . The new mean and scatter are given by .3 Region Growing 533 and have identically distributed normals. The initial scatter for a new one-pixel region is then given by NV. and (10.10.(a) is higher for small degrees of freedom and lower for larger degrees of freedom. Thus. The next section discusses a more powerful combination technique. consider the second image of the F-15 bulkhead shown in Fig.35 illustrates the resulting segmentation of this bulkhead image for a 0. If t is small enough. then we declare the difference to be sigdicant. If the ~ observed T is larger than T N -(a). the closer a pixel's hs value has to be to the region's mean in order to merge into the region. region scatters considered to be equal. Note that all regions initially begin as one pixel in size. but first we want to develop the concept of "significantly high. 10." To give a precise meaning to the notion of too high a difference. Figure 10.2% significance level test after all regions smaller than 25 pixels have been removed. Given an initial segmentation where the regions are approximated by some functional fit guaranteed to have a small enough error. If the pixel and the segment really come from the same population. y is added to region R and the mean and scatter are updated by using y. One convenient way of determining the constant is to decide on a prior variance V > 0 and an initial segment size N. and the new initial region size is given by N. the larger a region is. a small positive constant can be added to 9. then it begins its own region. This combines a centroid-linkage and a single-linkage criterion.

534 Image Segmentation Figure 70. Kettig and Landgrebe (1975) successively mergc small image blocks using a stafisticaI test.35 One-pass centroid-1rnk. is small enough. Pavlidis gets his initial segmentation by finding the besr way 20 dividc each row of the image tnto segments with a sufficiently good fif.lpe segmentat~onrri the bulkhead image of Flp. He also describes a combinatorial tree search algorithm to accomplish the merging that guarantees the best result. Gupta et al. Figure 10. ( 1973) svggest using a 7-test based on the absolute value of the difference between the pixel and the nearest region as the measure of dissimilarity.34 Second imare o f the F. 10 34 A sign~ficancelmel of 2% was used . They avoid much uf the problem of 7ero scatter by considering only cells containing a 2 x 2 block nf pixels.15 bulkhead fittcd coefficients for this region and the corresponding averaged coefficients. averaged over both regions.

right-left scan. The difference is that Levine and Shaheen attempt to keep regions more homogeneous and try to keep the region scatter from becoming too high.X I is the smallest. In the bottom-up. The Levine and Shaheen (1981) scheme is similar. the pixel is added to the region. If t > 0 for the neighboring region in which ly -XI is the smallest. then y begins a new region. If this distance is small enough. Shapiro. To be more effective. top-down manner are. a T-test can check for the significance of the difference between the region means. Usually. They do this by requiring the differences to be more significant before a merge takes place if the region scatter is high. This is easily accomplished by using the two-pass label propagation logic of the Lumia. they define a test statistic t where If t < 0 for the neighboring region R in which Jy . of course. left-right scan. . the means and scatters of each region can be recomputed and kept in a hash table.) Brice and Fennema (1970) accomplish the region growing by partitioning the image into initial segments of pixels having identical intensity. After the top-bottom.10. Nagy and Tolaba (1972) simply examine the absolute value between the pixel's value and the mean of a neighboring region formed already. each pixel has already been assigned a region label. If the means are not significant. Figure 10. differences caused by scan order are minor. then they can be merged. A slightly stricter criterion would insist not only that the region means be similar. For a user-specified value 8. If there is more than one region. They then sequentially merge all pairs of adjacent regions if a significant fraction of their common border has a small enough intensity difference across it.3 Region Growing 535 Kettig and Landgrebe (1975) discuss the multiband situation leading to the F-test and report good success with LANDSAT data. One potential problem with region-growing schemes is their inherent dependence on the order in which pixels and regions are examined. but also that the neighboring pixels from the different regions be similar enough. and Zuniga (1983) connected components algorithm. unable to make the left and right sides of a V-shaped region belong to the same segment.36 shows the resulting segmentation of the bulkhead image for a 0. Simple single-pass approaches that scan the image in a left-right. (Readers of the Levine and Shaheen paper should note that there are misprints in the formulas given there for region scatter and region scatter updating. the single pass must be followed by some kind of connected components merging algorithm in which pairs of neighboring regions having means that are close enough are combined into the same segment.2% significance level after one bottom-up. bottom-up scan or for that matter a column major scan. topdown scan does not yield the same initial regions as a right-left. A left-right. then y is added to R. right-left merging pass and after all regions smaller than 25 pixels have been removed. then the pixel is added to the region with the smallest distance. Whenever a pair of pixels from different regions neighbor each other. however.

that is. Thc rtrength uf ccntrn~dlinhgc i!. In this qectlon w t d i w ~ s sthc more powerful h hrid-linkage cornhination rechniqucl.hetwecn p~xelvalue and region mean was cons~dered small cnough to pcrmit merging. Thc cnlnbined ccntrn~d-hybrid11nbge techn~qucdoes the obviau5 thing. region growing 15 not permitted ncro\\ edge pzuctr. rhc two-pass hybrid combination ~cuhniquewould proi1uc. Centrnid l i n b p r rsdono only tor noncdgc p ~ r e l s .and hyhridI~nkupcre9ian-growing scheme usin2 a siynificance levcl test o f 0. Thc reyons arc >ornewhat simpler because clt the merging done in thc second pass . howevtr l a r ~ e .f.38 illuctrates the two-pa55 scan combined centroid. its ah~litytn place boundaries in wcak gradient arcah.36.Figure 10. A l ~ n thc dominant boundaries are nicely curved and \mouth.35 and 10. Its weakncss rs that edee gaps rnulr In excessive merging.[? uscd on h u f h p:lc\c\. cerirroid linkage w ~ l lproduce boundaries in addition to t h n w prtlduzed hy rhc cdges.xels arc askigncd to t h e ~ r cIosest lnbclcd neighbor. It can dcl this hecause ~t docs not depend on a laryc difference bctween the pixel and 1t5 ne~shbor declare a boundary It depends tc~ tyn a large ditkrcncc hetween ~ h pixel and the rncitn of the nciyhboring region to c tlcclnrc a hrlundary. and regions having fewer than 25 pixels alc cl~rr~rnalcd. ! 'l'ht. n1' Fig lo 3-1. Thc strength of thu a i n ~ l e linkage is that boundar~esarc. A \ipn~tic. htrengths.2'.> crnt roid ceglllcnrullon rhe hul khcild unagt.37 illustrates a {me-pnqs scan combined centroid. tht. Nnflce [hut the resulting hegmentation la much finer than that shown i n Fig>. Hybrid-Linkage Combinations Thc prcvlous Kectlon mcntlclned the s~nlplccomhtnation of ccntro~d-li~lliagc and srnylt-ltnkage resion growing. Thus if the pArameter5 of ccntrnid linkage were set so that any cliflircncc.56 Twn-p:l.1'/:. 10. F~yure10..~ncc Ic\t. centrnid Ilnkage and thu hyhrtd I ~ n h g cc m be combincd in a wav that take\ advnnticec nf t h e ~ rrela~ivc. Figure 10. Edge p. placed I n a hpatially J c c u r d c way.c the connected cnrnponentb of the nnnedse p i r c l ~ .and hybrid-linh~e wFmcntatlon scheme using a significance Icvcl test of 0.l o l I! 2 ' ' a.As thc dircrence criterion 1 5 rnadc more strict.

: ~ n dhybrid-linhfc the \e:l~!cnta~inn of' hulkheati i m a ~ e F i p l(1.and then deterinining 3 1 pixel lociitions having a mcasurenent (In the pcak.d ccntrr~trl-and hybrid-linhdgc .c\..2'.c~~iwntr~tirln tit thc hulkhclid image pas. Fig.38 Twr.10. o Haralick and Kclly (1969) cuygest that segmenzatlcn hc done hy first locating. In essencc. comh1rrr.37 Onc-phs\ con~brnedc-nrratd. h sl?n~tic:lncc level of I!. Irvcl cri-11.?4. Next. used cln both Spatial Clustering Tr pn\sible to determinc the i m a y segmcns by simul~dneously conlbining clustcrspace with a $patiat reg-ion growing. Figure 101. h q i n n i n g with a pixel 1 IT Inr in mcasuremcnt .. In turn. na.5 Spatial Clustering 537 figure 30.?' (IF : was used.. spatial-clustering schemes cton~binethe histogammode-seeking techn~que with a region-prowir~g r a spatial-11t1Lge technique.-pas. all the peak5 in the moasurenicnt-space h~slogrzm.34. We call \uch a technique <patid-clustering. I0. A sign~f~cancr.

Each pair of adjacent parent pixels has eight child pixels in common. Each parent's new value is the average of the children to which it is linked. and Rosenfeld (1981) describe a spatial-clustering scheme that is a spatial pyramid constrained ISODATA kind of clustering. Initial links between layers are established by linking each parent pixel to the spatially corresponding 4 x 4 block of child pixels. The bottom layer of the pyramid is the original image. The technique has the advantage over the Haralick and Kelly technique in that it does not require the difficult measurement-space exploring done in climbing down a mountain. Then each edge sends out for a limited distance a message to nearby pixels and in a direction orthogonal to the edge direction. The region growing must stop at the high-gradient pixels. Milgram (1979) defines a segment for a single-band image to be any connected component of pixels all of whose values lie in some interval I and whose border has a higher coincidence with the border created by an edge operator than for any other interval I. and so on. The gradient-guided segmentation is completed by performing a region growing of the components. and Yanamoto (1981) discuss a variation of this idea. Consider for possible inclusion into this segment a neighbor of this pixel (in general. the neighbors of the pixel we are growing from) if the neighbor's value (an N-tuple for an N band image) is close enough in measurement space to the pixel's value and if its probability is not larger than the probability of the value of the pixel we are growing from. Pixels that pick up these messages from enough different directions must be interior to a segment. Figure 10. But. If the top layer of the pyramid is . thereby assuring that no undesired boundary placements are made. Milgram does report good results from segmenting white blobs against a black background. Naka. The iterations proceed by assigning to each parent pixel the average of its child pixels. Hong. If the count is high enough. However.538 Image segmentation corresponding to the highest peak not yet processed. Minor and Sklansky mark the center pixel as belonging to an interior of a region. Initially each segment is the pixel whose value is on the current peak. They begin with an edge image in which each edge pixel contains the direction of the edge. The spoke filter of Minor and Sklansky counts the number of distinct directions appearing in each 3 x 3 neighborhood.39 illustrates the pyramid structure. Extending it to efficient computation in multiband images appears difficult. Each child pixel is linked to a 2 x 2 block of parent pixels. it does have to try many different intervals for each segment. The orientation is such that the higher-valued gray level is to the right of the edge. Burt. The iterations converge reasonably quickly for the same reason the ISODATA iterations converge. both spatial and measurementspace region growing are simultaneously performed in the following manner. Matsumoto. The message indicates the sender's edge direction. Minor and Sklansky (1981) make more active use of the gradient edge image than Milgram but restrict themselves to the more constrained situation of small convexlike segments. Each successive higher layer of the pyramid is an image having half the number of pixels per row and half the number of rows of the image below it. Milgram and Kahl (1979) discuss embedding this technique into the Ohlander (1978) recursive control structure. Then each child pixel compares its value with each of its parent's values and links itself to its closest parent. Then the connected component of all marked pixels is obtained.

.

The uniformity test requires that there be no significant difference between the mean of the region 9 . Because segments are successively divided into quarters. Sometimes adjacent quarters coming from adjacent split segments need to be joined rather than remain separate. the boundaries produced by the split technique tend to be squarish and slightly artificial. The image is represented by a segmentation tm. The two processes are mutually exclusive. A merging method starts and succ ively merges regions that are similar enough. which is a quadtree data structure (a tree whose nonleaf nodes each have four children). Efficiency of the split-and-merge method can be increased by arbitrarily partitioning the image into square regions of a user-selected size and then splitting these further if they are not homogeneous.540 image segmentation a 2 x 2 block of great-grandparents.40 illustrates the result of a Horowitz and Pavlidis type split-and-merge segmentation of the bulkhead image. They recommend the Kolmogorov-Smirnov test. if the difference between the largest and the smallest gray level in s is large. all the merging operations are followed by all the splitting operationi. Muerle and Allen (1968) suggest merging a pair of adjacent regions if a statistical test determines that their gray level intensity distributions are similar enough. and so on. Horowitz and Pavlidis (1976) suggest the split-and-merge strategy to take care of this problem. Then the method successively splits each of its current segments into quarters if the segment is not homogeneous en that is. Kettig and Landgrebe (1975) try to split all nonuniform 2 x 2 neighborhoods before beginning the region merging. A segmentation is represent& by a &set. Pietikiiinen and Rosenfeld (1981) extend this technique to segment an image using textural features. They begin with an initial segmentation achieved by splitting into rectangular blocks of a prespecified size. Splitting algori first suggested by Robertson (1973) and Klinger (1973). a minimal set of nodes separating the root from all of the leaves. then there are at most four segments that are the respective great-grandchildren of these four great-grandparents. Chen and Pavlidis (1980) suggest using statistical tests for uniformity rather than a simple examination of the difference between the largest and the smallest gray level intensities in the region under consideration for splitting. In the tree structure the merging process consists of removing four nodes from the cutset and replacing them with their parent. Split and Merge A splitting method for segmentation begins with the entire image as the initial seg- ment. The children of the root are the regions 'obtained by splitting the root into four equal pieces. Fukada (1980) suggests successively splitting a region into quarters until the sample variance is small enough. Splitting consists of removing a node from the cutset and replacing it with its four children. The splitting and merging in the tree structure is followed by a final grouping procedure that can merge adjacent unrelated blocks found in the final cutset. Figure 10. The entire image is represented by the root node.

Chen . using only a <mall amount o f main memory. Under thc assumption that the regions arc independent and identtcally distributcd normals. We give here the F-test for testing the hypothesis t h a ~ means and variances the .40 Split-ond-merge \cgnlenrdt~unof the bulkhead Itnape clt Fig. iPavlidis require that i s a given ~hrebtioldparameter. he rhc mean of thc ith quarter: and let .If .. 11 hlch IF defined by {(here f -. I ! r he quarters are identical.. Then in order for a region to be considered hnmogeneous. the optinla1 lest i s given by the sta'tlstic F. being the j t h pixel in the ith region.mtl rhe mean of each of it%quarters. The valuc of variance I \ nor assumed known. required to do a split and rnergu on images largcr than 5 12 x 512 are cxttemcly largc. 1-et each quarter have K pixels m i t h X .f i s too high. ~ n . the region is declared not uniform.6 Split and Merge 541 Flgum 10.iri~ncesare equal and known. 11 has an FTj. 10 10 . let .Y . The Chen and Pavlidis tests assume that the t. Execution of the algorithm on virtual-memory computers r e ~ u l t sin so much paging that the dominant activity may be pagln9 rather than ccynentation.10. Browning and Taniintlto ( 1982) give a description of a space-efficient version of the split-and-merge scheme that can handle large images. d~rtribution. . be the grand me:m nf all the pixeb in t'rc [our quarters. This i s thc opt \ma1 test when the randclmness can be 11: rdeled as arising k o m additive Gaussidn-distributed variates.. The data muc'ture!. .

and areas in the form of situation-action pairs. The model stored in the LTM has three levels of rules. and creating or modifying a focus-of-attention area. and an action. (3) a feature of this data entry. The knowledge in the system is not application domain specific. deleting. merging two regions. the rule fires. or extending a line. scene-independent knowledge about images and grouping criteria. The inference rules are metarules in that their actions do not modify the data in the STM. the region analyzer. When a match occurs. and Tables 10. Nazif and Levine (1984) solve this problem with a rule-based expert system for segmentation. Thus it is not easy to try different concepts without complete reprogramming. (2) a symbol denoting the data entry on which the condition is to be matched. The Nazif and Levine system contains a set of processes-the initializer.1 shows the different types of data entries allowed. The short-term memory holds the input image. lines. which are divided into two categories: focus-of-attention rules and inference rules. and the scheduler-plus two associate memories. the focus of attention. Focus-of-attention rules find the next data entry to be considered: a region. and (5) an optional DIFFERENCE qualifier that applies the operation to differences in feature values. The specific actions include splitting a region. or an entire area.2 to 10. is performed. Table 10. These rules control the focus-of-attention strategy. the area analyzer. a line. the short-term memory (STM) and the long-term memory (LTM). the segmentation data. At level 3.4 show the different kinds of features. At level 2 are the control rules. (4) an optional NOT qualifier. Instead.7 illustrates several rules from the system. are strategy rules that select the set of control rules that executes the most appropriate control strategy for a given set of data. As described in detail in Chapter 19. Thus they control which process will be activated next. the highest rule level.542 Image Segmentation Rule-Based Segmentation The rules behind each of the methods discussed so far are encoded in the procedures of the method. The Nazif and Levine approach to segmentation is useful because it is general but allows more specific strategies to be incorporated without changing the code. . the line analyzer.6 show the possible actions that can be associated with a rule. The conditions of the rules in the rule base are made up of (1) a symbolic qualifier depicting a logical operation to be performed on the data. Knowledge rules are classified by their actions. The long-term memory contains the model representing the system knowledge about low-level segmentation and control strategies. At level 1 are knowledge rules that encode information about the properties of regions. Rule-based systems including higher-level knowledge are discussed in Chapter 19. but includes general-purpose. Tables 10. usually involving data modification. a system process matches rules in the LTM against the data stored in the STM. merging two lines. The paper by McKeown discussed in Chapter 19 takes this approach for aerial images of airport scenes. Other rule-based segmentation systems tend to use high-level-knowledge models of the expected scene instead of general rules.5 and 10. they alter the matching order of different knowledge rule sets. Table 10. and the output. adding.

1 Allowable data entry types in the Nazif and Levine rule-based segmentation system.2 Numerical descriptive features that can be associated with the condition part of a rule. Numerical Descriptive h t u r e s Feature 1 Variance 1 Intensity Gradient variance Minimum X Maximum Y Ending X Ending direction Start-end distance Histogram bimodality Uniformity 1 Region contrast 1 Line contrast 1 Line connectivity Number of areas Rature 2 Variance 2 Intensity variance X-centioid Minimum Y Starting X Endiig Y Average direction Size Circularity Uniformity 2 Region contrast 2 Line contrast 2 Number of regions .7 RulaBased Segmentation 543 Table 10. Feature 3 Variance 3 Gradient Y centroid Maximum X Starting Y Starting direction Len@ Perimeter Aspect ratio Uniformity 3 Region contrast 3 Line contrast 3 Number of lines .10. Data Entry Current region Current line Current area Region ADJACENT to current region Region to the LEFT of current line Region to the RIGHT of current line Line NEAR current line Line in FRONT of current line Line BEHIND current line Line PARALLEL TO current line Line INTERSECTING current region Symbol REG LINE AREA REGA REGL REGR LINEN LINEF LINEB LINEP LINE1 Table 10.

Numerical SpatiaI Features Number of ADJACENT regions Number of INTERSECTING regions Distance to line in FRONT Distance to line BEHIND Distance to PARALLEL line Adjacency of LEFT region Number of lines in FRONT Number of PARALLEL lines Number of regions to the RIGHT Adjacency values Line content between regions Nearest point on line in FRONT Nearest point of line BEHIND Number of PARALLEL points Adjacency of RIGHT region Number of lines BEHIND Number of regions to the LEFT Table 10. Logical Features Histogram is bimodal Line is open Line is loop Line start is open Area is smooth Area is bounded One region to the LEFT Region is bisected by line Line is closed Line end is open Line is clockwise Area is textured Area is new One region to the RIGHT Same region LEFT and RIGHT of line Same region LEFT of line 1 and line 2 Same region RIGHT of line 1 and line 2 Same region LEFT of line 1 and RIGHT of line 2 Same region RIGHT of line 1 and LEFT of line 2 Two lines are touching (8-comected) Areas are absent Regions are absent Lines are absent System is starting FVocess was lines Process was regions Process was areas Process was focus Process was generate areas Process w s active a ?e~~rn\ J .4 Logical features that can be associated with the condition part of a rule.544 Image Segmentation rule.

He first computed velocity estimates for each point of the scene (see Chapter 15) and then performed the segmentation by a region-merging procedure that combined regions based on similarities in both intensity and motion. and line analyzer actions that can be associated with a rule. In this way motion was used as a cue to the segmentation process. Area Analyzer Actions Create smooth area Create texture area Create bounded area Relabel area to smooth Relabel area to bounded Add to smooth area Add to texture area Add to bounded area Relabel area to texture Delete area Region Analyzer Actions Save smooth area Save texture area Save bounded area Split a region by histogram Split region at lines Merge two regions Line Analyzer Actions Extend line forward Join lines forward Insert line forward Merge lines forward Delete line Extend line backward Join lines backward Insert line backward Merge lines backward Motion-Based Segmentation In time-varying image analysis (Chapter 15) the data are a sequence of images instead of a single image. the moving objects appear in different positions of the image &om those in the previous frame. Thus the motion of the objects creates a change in the images that can be used to help locate the moving objects. The images of the moving objects were obtained by focusing the segmentation processes on these restricted areas.5 Area. Jain. . Martin. and Aggarwal(1979) used differencing operations to identify areas containing moving objects. One paradigm under which such a sequence can arise is with a stationary camera viewing a scene containing moving objects. region.10. In each frame of the sequence after the first frame. Thompson (1980) developed a method for partitioning a scene into regions corresponding to surfaces with distinct velocities.0 MotiowBased Segmentation 545 Table 10.

dzo).d ~ )When the projection plane is at z = 1.- Initialize regions Match region rules Match focus rules Initialize lines Match line rules Start Generate areas Match area rules Stop Jain (1984) handled the more complex problem of segmenting dynamic scenes using a moving camera. and the camera at time 0 is located at (xo.Y) in the image plane. He used the known location of the focus of expansion (see Chapter 15) to transform the original frame sequence into another camera-centered sequence. the camera undergoes displacement (dxo.z). where and .546 Image Segmentation hcus-of-Attention Actions Region with highest adjacency Region with lowest adjacency Region with higher label Region to the LEFT of line Closest line in front Closest PARALLEL line Longest line that is near Weakest line that is near Next scanned line Defocus (focus on whole image) Clear region list Freeze area Next smooth area Next bounded area Largest ADJACENT region Smallest ADJACENT region Next scanned region Region to the RIGHT of line Closest line BEHIND Shortest Iine that is near Strongest line that is near Line with higher label Line INTERSECTING region hcus on areas Clear line list Next area (any) Next texture area supervisor Actions.zo).The projection A' of point A after the displacements is at (X. The ego-motion polar transform (EMP) works as follows: Suppose that A is a point in 3-space having coordinates ( x .dyo.dy . is at (dxo/dzo.. the focus of expansion .yo. dyo/dzo)... --.and the point A undergoes displacement (dx ..y.During the time interval between frames.

THEN: The REGION SIZE is NOT LOW The REGION AVERAGE GRADIENT is HIGH The REGION HISTOGRAM is BIMODAL SPLIT the REGION according to the HISTOGRAM Line-Merging R.dxo(t + d t .10. THEN: The LINE GRADIENT is HIGH The LINE LENGTH is HIGH SAME REGION LEFT and RIGHT of the LINE GET the REGION to the LEFT of the LINE The point A' is converted into its polar coordinates (r. 2. The ADJACENCY with another REGION is HIGH 3.20) dzo(x+ dx . The REGION SIZE is VERY LOW 2. Region-Merging Rule: IF: 1. The DIFFERENCE in REGION FEATURE 2 is NOT HIGH 5. The DIFFERENCE in REGION FEATURE 1 is NOT HIGH 4. 3. 2.8). * IF: 1.X O ) . 3.with the focus of expansion being the origin in the image plane. The LINE GRADIENT is NOT VERY LOW 3. The two LINES have the SAME REGION to the LEFT 5. The polar coordinates are given by + 8 = tan-' dzo(y d y . The LINE END point is OPEN 2.Y O ). The DIFFERENCE in REGION FEATURE 3 is NOT HIGH THEN: MERGE the two REGIONS Region-Splitting R u k IF: 1.7 Examples of rules from the Nazif and Levine system. The two LINES have the SAME REGION to the RIGHT JOIN the LINES by FORWARD expansion THEN: A Control Rule: IF: 1.t o ) + I . The DISTANCE to the LINE IN FRONT is NOT VERY HIGH 4.8 Motlo~BasedSegmentation I 547 Table 10.~ Y O ( Z dz .

but they have not been wellenough tested. The split-and-merge technique is not as subject to the unwanted region merge error. and they often have holes. describes a system that uses domain-specific knowledge in this manner. The hybrid and centroid region-growing schemes are better in this regard. The spatialclustering schemes may be better in this regard. However. Assume that the transformed picture is represented as a two-dimensional image having 8 along the vertical axis and r along the horizontal axis. The regions having a vertical velocity component are due to nonstationary surfaces. which is discussed under knowledge-based systems in Chapter 19. The single-linkage region-growing schemes are the simplest and most prone to the unwanted region merge errors. spatial clustering. Thus the radial motion of the stationary point A' in the image plane due to the motion of the camera is convened to horizontal motion in (r. f s i .8) space the segmentation is simplified. The use of this kind of semantic information in the image segmentation process is essential for the higher-level image understanding work. motionbased segmentation techniques can be used. The hybrid-linkage schemes appear to offer the best compromise between having smooth boundaries and few unwanted region merges. When the data form a time sequence of images instead of a single image. giving the effect of salt and pepper noise. hybrid linkage. Not discussed as part of image segmentation is the fact that it might be appropriate for some segments to remain apart or to be merged not on the basis of the gray level distributions but on the basis of the object sections they represent. The segmentation algorithm first separates the stationary and nonstationary components on the basis of their velocity components in (r. 8) space can be classified as due to stationary surfaces. + Summary We have briefly surveyed the place of segmentation in vision algorithms as well as common techniques of measurement-space clustering. If the camera has only a translational component to its motion. region growing. then all the regions that show only horizontal velocity in the (r.d ~ ~ ) ~ ] ~ 4 In ( r . The work of McKeown. The stationary components are then further segmented into distinct surfaces by using the motion to assign relative depths to the surfaces.548 and Image Segmentation r = [(X d ~ o ) (Y . then the focus of expansion remains the same. AlI the techniques can be made to be more powerful if they are based on some kind of statistical test for equality of means assuming that each region may have some small fraction of outliers and more flexible if part of a rule-based system. 8 ) space. and 8 remains constant. 8) space. If the camera continues its motion in the same direction. But the regions produced are not smoothly bounded. single linkage. it suffers from large memory usage and excessively blocky region boundaries. and split and merge used in image segmentation. The measurement-space-guided spatial clustering tends to avoid both the region merge errors and the blocky boundary problems because of its primary reliance on measurement space.

Finally. Write a program to perform a segmentation by a recursive histogram-directed spatial clustering. can be defined by 10. can be created from the contingency table T defined by The degree to which I.f . minCfr. then choosing the location of the pixels to be affected by a uniform distribution over the spatial domain of the image. with respect to the correct segmentation I. 10. A figure of merit for the segmentation of I. in which each background pixel is labeled 0 and each different disk or polygon created on the synthetic image has all of its pixels labeled with the same label.2.. Use a measure of performance from Exercise 10. can be defined by The degree to which I. a 100% correct segmentation can be defined as an image I. each having a given gray level. standard deviation of noise (after any presmoothing). Any algorithm-produced segmentation Is can be represented as an image in which each pixel is given a label designating the segment to which it belongs. area of shape. and then choosing the value of the affected pixels from a uniform distribution over the range of gray levels.5. and ) ). on the image. 10. One model for the generation of a controlled image is to establish a background graylevel value and then place nonco~ecting noninterfering shapes. Design and carry out an empirical experiment that characterizes the perfonnance of any histogram-mode-seeking segmentation procedure in terms of the control parameters of the synthetic-imagegeneration process and in terms of the parameters of the histogram-mode seeking algorithm.1. such as disks or and polygons.3. . 10. Thinlr about how a figure of merit for a segmentation process can be defined.4.or example. f . kind of shape. Control parameters include contrast between shape and background. outlier noise can be added by choosing a munber of pixels to be affected by the outlier noise from a Poisson distribution.Exercises 549 Exercises 10.1. is a coarsening of I. Design and carry out empirical experiments that chacterize the perfonnance of a recursive histogramdirected spatial clustering in terms of the control parameters of + m.6. is a refinement of I. autocorrelationfunction of noise due to presmoothing. ). What aspects of segmentation errors are not included in these figures of merit? What other definitions of figures of merit can you think of? Write a program to perform a segmentation by histogram-mode seeking. and Poisson density parameter. 10. for the image generated in Exercise 10. Next additive Gaussian noise can be included with a given standard deviation: This noise can be correlated by preaveraging it with a Gaussian filter with a given standard deviation. Possible figures of merit include 4 V.2. Write a program to generate controlled images for the purpose of segmentation.

.

.

.

.

. sets or sequences of labeled or border pixel positions can be extracted by a grouping operation. and edge contrast. The next processing step here can be one of determining all the boundary pixels participating in the same region boundary. from a segmented or labeled image. and some analytic description of each boundary piece can be determined. A labeling operation such as edge detection labels each pixel as edge or no edge. An image-segmentation operation groups together connected pixels with similar properties and labels each pixel with an index of the region of pixels to which it belongs. segmenting the boundary sequence into simple pieces. we show how to segment it into simple pieces and analytically fit a curve to the points in any piece. additional properties. Then the boundary sequence can be segmented into simple pieces. To set things up for matching. an analytic description of the curve must be determined by a suitable fitting operation. The next processing step is typically a grouping operation. may be associated with the pixel position. gradient magnitude. sequences of pixels that belong to the same curve. such as edge direction. In this chapter we discuss techniques for extracting. in which edge pixels participating in the same region boundary are grouped together into a sequence. Ledley (1964) was one of the first researchers to develop such a technique. and determining some analytic description of each boundary piece suitable for some higher-level shape-matching operation. Given any such sequence of pixels.m - CHAPTER ARC EXTRACTION AND SEGMENTATION Introduction After edge labeling or image segmentation. If it is an edge. Each set or sequence contains pixel positions that are considered to belong to the same curve.

It is assumed that there is one background label that designates those pixels in part of a possibly disconnected background region whose borders do not have to be found.2 Border-Tracking Algorithm The border-tracking algorithm examines three rows of the symbolic image at a time: the current row being processed.556 Arc Extraction and Segmentation Extracting Boundary Pixels from a Segmented Image Once a set of regions has been determined by a procedure such as segmentation or connected components. Then for each region. one on top and one . Since there may be a huge number of region labels in the symbolic image. a hash table can be used as the device to allow rapid access to the chains of a region. the row above it. and a set of future regions that have not yet been reached by the-scan. a clockwise-ordered list of the coordinates of its border pixels. The data structures contain the chains of border pixels of the current regions. it is added to the hash table. given the label of the region. For large-sized images. 11. and the row below it. The algorithm is flexible in that it can be easily modified to select the borders of specified regions. the boundary of each region may be extracted. Each chain is a linked list of pixel positions that can be grown from the beginning or the end. In this section we describe an algorithm called bonler. the border algorithm moves in a left-right. Two dummy rows of background pixels are appended to the image. it' is removed from the hash table. the simple border-tracking algorithm just outlined results in excessive 110 to the mass storage device on which the image resides. a set of part regions that have been completely scanned and their borders output. but only at most 2 * number-of_columns may be active at once. Border inputs a symbolic image and outputs. 11. begin at the first border pixel and follow the border of the connected component around in a clockwise direction until the tracking reaches the first border pixel.1 Concepts and Data Structures The input image is a symbolic image whose pixel values denote region labels. there is a set of current regions whose borders have been partially scanned but not yet output.2. top-bottom scan down the image collecting chains of border pixels that form connected sections of the borders of regions. Boundary extraction can be done simply for small-sized images: Scan through the image and make a list of the first border pixel for each connected component. which can extract the l boundaries for al regions in one left-right.2. At any given time during execution of the algorithm. When a new region is encountered during the scan. top-bottom scan through the image. which may not be able to reside in memory. When a region is completed and output. Rather than tracing all around the border of a single region and then moving on to the next region. for each region. The hash table entry for a region points to a linked list of chains that have been formed so far for that region.

The function pixeltype looks at the values of (R. it appends (R. If so. thus S(R.C . output(REG). NEWCHAIN:= false end end for if NEWCHAIN then rnakeaew-chain(CHAINSET. it is added to the set CURRENT of current region labels.C) that have the label LABEL.C) is the value (JABEL) of the current pixel being scanned. the procedure searches for a chain of the region with the label LABEL that has a neighbor of (R. if T == 'border' then for each pixel N in NEIGHB de begin CHAINSET:=chainlist(LABEL). whose first argument is a chain and whose second argument is (R.C).LABEL). for each region REG in CURRENT if complete(REG) then begin connect-chains(REG). for each chain X in CHAINSET while NEWCHAIN do if N==rear(X) then begin add(X. In this procedure. NEWCHAIN:=true.(R.11. and if it finds one. so that all rows can be treated alike.C) and its neighbors to decide whether (R. for R:= 1 to NLINES do besin for C:= 1 to NPIXELS do begin LABEL: = S(R. T: = pixeltype(R.C). whose first argument is the set of chains to which a new chain .C) is at the rear of a chain of this region.C).C . If no neighbor of (R.2 Extracting Boundary Pixels from a Segmented Image 557 on the bottom. If this is a new label. then a new chain is created containing (R.LABEL). NEIGHB: =neighbors(R. if new_region(LABEL) then add (CURRENT. NEIGHB is the list of neighbors of pixel (R.C) as its only element by the procedure makenew-chain. free(REG) end end for end end for end border. S is the name of the symbolic image.LABEL) end end for end end for.C)). The algorithm is expressed in high-level pseudocode for an NLINES by NPIXELS symbolic image S as follows: procedure border.C) is a nonbackground border pixel.NEIGHB).C) at its rear.(R.C) to the end of the chain by the procedure add.

f IgIJre 11. there was never any point at which a border could be split into two or more segments. An algorithm that tracks segments like these has to be concerned with . is being added. Region Length List (b). or junction with no intermediate junctions or comers. Because of the assumption that each border bounded a closed region. When the input is instead a symbolic edge (line) image with a value of 1 for edge (line) pixels and 0 for nonedge (nonline) pixels.3) of the image is a junction pixel. After each row R is scanned. Linking One-Pixel-Wid@Edges or Lines o i The border-tracking algorithm in the previous section required as input a symbolic image denoting a set of regions.2 illustrates such a symbolic edge (line) image. comer.558 Arc Extraction and Segmentation (a) A symbolic image with twg regions. Here it is not necessary for edge pixels to bound closed regions. where three different edge (line) segments meet. Pixel (5. Its third argument is the label LABEL to be associated with the new chain. The output of the border procedure for the symbolic image. Figure 11.1 shows a symbolic image and its output from the border procedure. and the segments consist of connected edge (line) pixels that go from endpoint. It tracked along the border of each region in parallel as it scanned the image line by line. The hash table entries and list elements associated with those regions are then freed.3) is a comer pixel and may be considered a segment endpoint as well if the application requires ending segments at corners. Figure 11. which is the second argument of the procedure. which is output. The new chain's sole element is the location (R.1 Action of the border procedure on a symbolic image. comer. the chains of those current regions whose borders are now complete are merged into a single border chain. or junction to endpoint. Pixel (3.C). the problem of tracking edge (line) segments is more complex.

3).)54(. an interior pixel of an old segment. Finding a comer As in border tracking. The data structures used are very similar to those used in the border algorithm. and future segments.3) and a potential comer at pixel (5. Segment ID Length List Figure 11.2.)53(. If the pixel is a junction or a comer point. Starting a new segment 2. Segments are lists of edge points that represent straight or curved lines on the image.2. If the pixel is an interior or endpoint of an old segment. then segement 3 would have length 5 and list [33(. current. If comer points are not used to terminate segments. the segment id of the old segment is also retumed. Finding a junction 5. Ending a segment 4.3 gives the results of its application on the symbolic image of Fig. there are past.3 Linking OnePixeGWide Edges or Lines 559 Figure 11.11. Finished segments are written out to a disk file and their space in the hash table freed. assuming the point (5. an ending point of an old segment. then a list of segment IDS of incoming segments and a list of pixels representing outgoing segments are retumed. and future regions.)551 (. Current segments are kept in internal memory and accessed by a hash table. The main difference is the detection of junction points and the segments entering them from above or the left and the segments leaving them from below or the right. Adding an interior pixel to a segment 3.3 Output of the edge-rmck procedure on the image of Fig. efficient data structure manipulation is needed to manage the information at each step of the procedure. Instead of past.2 Symbolic edge image containing a junction of three line segments at pixel (3. . Figure 11. the starting point of a new segment. or a comer. A procedure for tracking edges in a symbolic image is given below. the following tasks: 1. 11. 11. current.)43(. a junction. We will assume an extended neighborhood operator called pixeltype that determines whether a pixel is an isolated point.3) is judged to be a comer point.).

T = end point of old segment : begin add(ID. The exact details of keeping track of segment ids entering and leaving segments at a junction have been suppressed. case T = isolated point : next.NAME) end.OUTLIST).C). In this case.C. end case end end for end for end edge-track. This part of the procedure can be very simple and assume that every pixel adjacent to a junction pixel is part of a different segment.560 Arc Extraction and Segmentation procedure edge-track. end. rnakenew-segment(IDNEW. . if the segments are more than one pixel wide. output(~~jt free(1D) end.NAME). T = start point of new segment: begin IDNEW := IDNEW + 1. T := pixeltype(R.ID.NEIGHB. NEIGHB := neighbors(R. IDNEW := 0 for R := 1 to NLINES do for C := 1 to NPIXELS do besin NAME := address(R.INLIST.NAME). T = interior point of old segment : add(ID. make_newsegment(IDNEW .C). T = junction or comer point: for each ID in INLIST do besin add(ID.NAME). free(lD). for each pixel in OUTLIST do begin IDNEW := IDNEW + 1. output(1D).NAME) end. the algorithm will detect .

then the statistically closest group to which these neighbors belong is identified. provided the that t < Tcr. In this section we assume each pixel is marked to indicate whether it is an edge (line) or not. Another alternative would be to make the pixeltype operator even smarter. This can be avoided by applying the connected shrink operator discussed in Chapter 6 to the edge image.?are specified initial values. to y.N-l. and 8 -360" pixel. a number of pixels in the region initialized at No.1 degrees of freedom The given pixel is then added to that group having the smallest t-value. is defined in the following way: emin = {0 8' iflO-yJ<l8*-yJ otherwise where e* = { 0+360" 8 -360" if8-y<O otherwise If the group has N pixels and a scatter of S2. a percentage point on the cumulative t-distribution with N . then the statistical closeness of 8 to y can be measured by a t-statistic having N . and if so. Then a can be considered as the a priori variance. leftright scan. Often the application will dictate what these heuristics should be.8 +360°. Edge (line) linking is a process by which labeled pixels that have similar enough directions can form connected chains and be identified as an arc segment that will have a good fit to a simple curvelike line.?.where No and o. then it begins a new group with a group mean initialized as the direction of the edge or line. If the labeled pixel has a previously encounter& labeled neighbor. before any data have been examined. The linking proceeds by scanning the labeled edge image in a top-down. then the angular direction Bmin.11. and the scatter of the group initialized at Nou.4 Edge and Line Linking Using Directional Information 561 a large number of small segments that are really not new line segments at all. some care must be used in this determination. Edge and Line Linking Using Directional Information The procedure edge-track described in the previous section linked one-pixel-wide edges that had no directional information associated with them. : No can be considered as the weight of this a priori variance.1 . It can look at a larger neighborhood and use heuristics to decide whether this is just a thick part of the current segment or a new segment is starting. Because angle is a quantity that is modulo 360". If y is the mean angle for some group and 8 is the angular direction for the given which is the closest of 0. the angular direction of the edge (line) is associated with it. If a labled pixel is encountered that has no previously encountered labeled neighbors.

This merging test is accomplished by another t -test. = % {z if 1 . then after the pixel is linked into its closest group. in the development that follows. scatter S2. If t < Tasr-.7 7 2 1 1 otherwise 1 < 1% . a test can be done to determine whether the two groups closest to the given pixel should be merged. creating a new group having N + the pixels. most often we would expect successive row-column pairs of an extracted digital arc to be digital eneighbors or digital &neighbors.2 degrees of freedom is defined by 7. Suppose the group means are y and y2. N 1-2.and the number of pixels in each group is N 1 and N2. An extracted digital arc is a sequence of row-column pairs in which the spatial coordinates of successive row-column pairs are close together.7 1 where if y2 . The basis for the partitioning process is the identification of all locations (a) that have a . However.y 1 < 0 otherwise The t-statistic having N l + N 2. The endpoints of the subsequences are called comer points or dominant points. ~ Z two groups are merged.562 Arc Extraction and Segmentation degrees of freedom.= { 72 + 360" 7 . the group scatters are S: and q. where and I ] BI Segmentation of Arcs into Simple Segments The border-tracking and edge-linking algorithms we have discussed produce extracted digital arcs. respectively. we need only the assumption of spatial closeness and not the assumption of 4-neighboring or &neighboring. Arc segmentation is a process that partitions an extracted digital arc sequence into digital arc subsequences having the property that each digital arc subsequence is a maximal sequence that can fit a straight or curved line of a given type.360" 2 If t < T c r . the mean and scatter of the group are updated: . In fact. If there were two or more previously encountered labeled neighbors. Define y~ by 7. mean y .

ct). 11.cN) > into subsequences that are sufficiently straight.cl).5 Segmentrtbn of Arcs into Simpk Segments 563 sufficiently high curvature (high change in tangent angle to change in arc length) and (b) that are enclosed by subsequences that can fit different straight lines or curves of a given type. c..') /cur Bc + y = 0).. procedure is recursively applied to S.11. The techniques range from iterative endpoint fitting and splitting to using tangent angle deflection. the sequence is split into two subsequences SI =< (rl. The pixel having the farthest distance to the line AC is the pixel B.. . or high curvature as the basis of the segmentation. .d. Bc. and S2. . each emns of whicb bdter fit a straight-line segment. and the technique is detailed in the procedure endpointfit~~ndsplit whose arguments are S =< (rl. .1 Iterative Endpoint Fit and Split Ramer (1972) and Du& and Hart (1972) give the following iterative endpoint fitand-split ptOcedufe to segment a digital arc sequence S =< (rl.cN) and the >. the digital arc sequence. M x m l subsequences that can fit a curved-line segment are sequences for which some measure of curvature is uniformly high.) let d.4 Geometry of the iterative endpoint fit and split..) r ~ .cN). e. be the line segment defmed by the endpoints (r.cN) >. ~ n +. . . . . . the maximum + + + + Flgun 11. If dm > d*.(rm. d. .cl). Let m be any point for which dm = max. Simple here means either an arc segment that is a straight-line segment or one that is a curved-arc segment containing no straight-line segments. . which perturbs point location. be the distance between L and (r. Maximal subsequences that can fit a straight line are subsequences for aia which some measure of curvature is uniformly small. The iterative endpoint fit and split s g e t the arc sequence at pixel 8 . It requires only one distance threshold d*.).4.(rN. .Let L = {(. y 1. prominence.crn) > and S2 =< ( r m + ~ .(rN. creating two arc subsequences.(~N.c..c and (rN. = (cur. . sometimes systematically. For any point (r.. where cu2 B2 = 1. 11.5. The principal problem that must be handled by any arc segmentation technique is the determination of the appropriate region of support for any curvature calculation as well as the handling of spatial quantbtion and image noise. Next we discuss a variety of techniques for segmenting digital arcs into simple segments. The splitting is shown in Fig.

segmentlist := nil.f) := remove(open). . The procedure add adds the item specified by its second argument to the end of the list specified by its first argument. . if e > e.)/d. /3 := (rb . e = jar.E sflag := 0. . The function remove removes the first element from the list specified by its argument and returns the first element as its value. and sfig.r.e. k:=j end end end for end endpointline-fit. .open. procedure endpointAitandsplit(S . . a := (C. and a variable k that marks the index of the point having the m a h u m distance to the line constructed between the points indexed by b and f . which has a value of 0 if endpointfitandsplit does not refine the input segmentation and a value of 1 if it does. open.f)) end end endpoint-fitandsplit. while open # nil do besin (b. 7 := (ryeb.b.(b. procedure endpointlineAit(S. The procedure endpointlinefit inputs the arc sequence S.k) d := . if e > e then . +Bej +TI.(k+l .f.)/d.segrnentlist./(r.(b. an input list of the beginning and final indices for each of the segments (the endpointfitandsplit procedure will refine the position determined by open). dd(open. endpointlineAit(S.. then beein emu := e.k)). forj : = b t o f d o besrn e := 0. the list of beginning and final indices for each of the resulting segments. A).564 Arc Extraction and Segmenbtion allowable error. sflag).f.Cb)/d.b.rb)2 + (c.cb)2.rbc. a variable e in which to retun the er. segmentlist. dd(open.e. ror of fit. . the beginning and final indices b and f defining the subsequence of S being fit. begin sflag := 1.f)) end else add(segmentlist.

+k)l determined by the predeand cessor and successor k positions behind and k positions ahead of (r.11. Jang.. ..~n)l [(r.c. 11. Once the circular arc sequence is split into two noncircular arc sequences. .cn-k).).5) between two line segments meeting at a common vertex is given by the change in angular orientation from the first line segment to the second. 1978) can compute the largest distance in a smaller number of operations. The cosine of the exterior angle is given by Figure 11..+k.5 Segmentation of Arcs into Simple Segments 565 If the given digital arc sequence is obtained by tracing around a boundary of a region.. these line segments must have orientations at angles that are multiples of 45".c. Here we see that the spatial quantization effects at small distances can completely mask the correct line segment direction.). the circular arc sequence must be initially split.)] and [(r.). because if successive points are really digital 4 or 8 neighbors.. the iterative endpoint fit technique can be applied to each noncircular digital arc sequence. is not necessary. .5 Geometry of the exterior angle formed by two line segments.c... (r.~)]. . The exterior angle (see Fig. c. This makes it imperative to use line segments defined by points that may not be immediate predecessors or successors in the arc sequence.c. 11S. To measure the exterior angle at a place (r.(rN.. Good candidate points for the split might be the two points farthest apart in any direction or the two points farthest apart in a vertical or horizontal direction.). and Foster (1989) note that if the arc sequence S is composed of two line segments.. The simplest way to obtain a larger arc neighborhood is to define the exterior angle at (r.). c.cN) >. . where two line segments meet in a digital arc sequence S =< (r cl). . c. it is not reasonable to use the line segments defined by [(r.) by means of the line segments [(rn-k. d. (r. (r. A golden section search (Mangasarian. c.2 Tangential Angle Deflection Another approach to the segmentation of an arc sequence is to identify the locations where two line segments meet and form an angle. (rn. . for some k > 1.. Han. then a search among all points to determine max.c.

That is. > cos B. This is > ... This motivates Rosenfeld and Johnston's criterion of deciding that (r. There is a large extended neighborhood succeeding (r.I). c..) is a place where two line segments meet. .(k.) measures the exterior angle at (r.(k) = (cn-k ..(kn) is to have a proper geometric meaning.) that has small curvature. c. we must recall that cos 8. Davis (1977) discounts any local maximum in any extended neighborhood of size k for which there is some nearby point that has a significant exterior angle in some slightly smaller extended neighborhood.) the largest k.) is the meeting place of two line segments if and only if where t is a threshold chosen so that points with exterior angles whose cosine is larger than t are considered to be part of straight-line segments.k as twice the mean over two adjacent angular differences of line segments defined by an extended neighborhood size of k.c.) only if (r.. c.) in an arc sequence.c. 3.566 Arc Extraction and Segmentation where cr. satisfying cos 8.(kn .(m) > cos 8. Care must be used in selecting m since it must be no larger than the smallest number of points in a line segment of the arc sequence if the computed value cos8. . (r. Finally. the size of its extended neighborhood. we can use the fact that at a place where two line segments meet. To judge whether (r. They define the "curvature" b.) that has small curvature. The extended neighborhood around (r.(k.(m .Cn+k) Rosenfeld and Johnston (1973) suggest associating with (r. .c. Rosenfeld and Johnston use m = N/10.) is indeed a place where two line segments meet.(kn) will be smaller (the angle will be larger) than the corresponding cosine values of the successor and predecessor positions.cn ) rn-k rn and b.. cos8.. A point is a prominent comer point to the extent that: 1. c.) 2( cos8. where m is the largest k to be considered.(k) = " ' ( e n . There is a large extended neighborhood preceding (r. c.. 2.c..1) > .c. /2. and s(k) represents the maximum expected uncertainty in the position of an angle as a function of k.) is a point at which two line segments meet if and only if n for all i satisfying I . Freeman and Davis (1977) and Freeman (1978) measure the prominence of a comer for each point (r.) has large curvature.i 1 5 k.

c.for each point (r... . 1 < A .) and (r.) and the points (r.(rN. ( r N c N )> . Let a positive integer rn > 1 be given to specify the arc neighborhood . c. c.. Each pair of such successive subsequences has associated with it an angle change.. Let the digital arc sequence S =< ( r ..5 Segmentation of Arcs into Simple Segments 567 similar to Ledley (1964). > 6.-.c : ) .-.. This motivates the quantity as a measure of the central angle spanned by the arc sequence.. k / 2 5 v 5 t ) 1 and A = tan-'(l/(k .. . .c N )>.) of the subsequence or where 6.).c. c.1 such angle changes.11.. of the prominence of the corner at (r.. Each digital arc subsequence is then a mstximally long subsequence of S in which successive points of the subsequence are successive points of S and where 6. For each n.5..) and (r. rn + 1 5 n 5 N . There are Nlrn . < 6.) is defined as the exterior angle between the line segment defined by the points (r.) of the subsequence. the angle change 6. at (r. In a sequence S =< (rl. . N-m A sequence with N points has Nlrn mutually exclusive subsequences of length rn each.c. k / 2 II v t) tz = max{t 1 18.. . en+.This is illustrated in Fig.rn.) is marked a comer point if Shirai (1973) uses the following idea to associate an angle change at each interior point of an arc sequence.1)).the average exterior angle change is shown by .c. for each point (r.. size. A point (r.c ).c.So if then The measure K .7+1< A . .) can be defined by where t l = max{t 1 Is:-. . 11.

Otherwise it is classified as a curved line (Shirai. Let chord-to-arc distance thresholds d . Davis (1977). The next breakpoint is the point q that can be no more than m points following p for which the local tangential deflection is a relative extremum. > 6. (11.3) in Eq.for every point in the subsequence. . the following classification scheme can be used.The length (in units of pixel width) of an arc sequence S can be measured by This motivates the quantity as an estimate of the radius associated with the arc. q is mlc points following p. 1975).cl). < d*.d . or (2) d < d* and 8 < 8'. They look for the first point in the arc sequence for which a least-squares straight-line fit is sufficiently good. Pavlidis (1973). and do. . Then for any arc subsequence for which 6.2) and (11. Instead of wing endpoints to define line segments. The farthest distance d between a point on a circular arc having central angle 8 and radius R and the chord connecting the endpoints of the arc is given by Using Eqs. (11. If there is no local extremum. Anderson and Bezdek (1984) derive the following computationally efficient means to determine the cosine of twice the tangential deflection angle arising from two least-squares line fits. be given and central angle 8' be given. The local tangential deflection of a point v is measured by the angular orientation difference of the m-point least-squares fit preceding and following v . and Anderson and Bezdek (1984) use a least-squares fit to determine a line. .4) permits a chord-to-arc distance to be measured for any arc sequence. For any digital arc sequence S =< (r. This fit constitutes the baseline. Then they locate the following point p that can begin an m-pointleast-squares fit whose angular orientation is sufficiently different from the orientation of the baseline fit.. Anderson and Bezdek incorporate a least-squares fit of a fixed number m of points to segment an arc in the following way. the arc subsequence can be classified as a straight line if (1) d < d . ( r N .568 Arc Extractbn and Segmentatbn A central angle of 8 radians with an associated radius R produces a circular arc segment of length L = OR.or 6. To determine whether a digital arc subsequence is one that fits a straight or a curved line. < 6. define the normalized scatter matrix A by .cN) >. .

determine the average tangent direction. as in the approach of Ichida and Kiyono (1975) or Montanari ( 1970). Johnson and Uogt (1980). where h is the specified bound.5. the optimal algorithms have excessive computational complexity. Other approaches include those of Williams (1978). The construction takes place by picking up the next point x of the arc sequence. It may be set up as an optimal uniform bounded-error approximation problem algorithm. From the endpoints P and Q of this line segment. However. Erect a line segment X of length 2 centered through the fitted point in a direction perpendicular to the average tangent direction. Both algorithms guarantee the bounded-error approximation.11.froma line-segment fit by no more than a given amount. and Kurozumi and Davis (1982). construct two maximal-length lines. is as follows: At the beginning fitted point of any segment. computing the directions . Sklanslry and Gonzalez (1979). Tomek's technique. we describe the approximation algorithm of Tomek (1974) and the splitand-merge algorithm of Pavlidis (1977a. Here. 1977b). generalized to arcs. and let A9 be the tangential deflection angle between the angular orientations of the least-squares fit to the arc subsequences. Then Anderson and Bezdek (1984) derive that 11.3 Uniform Bounded-Error Approximation One way of viewing the arc segmentation problem is to segment the arc sequence into maximal pieces whose points deviate.5 Segmentation of Arcs into Simpb Segments 569 where Let be the normalized scatter matrices for two arc subsequences. They are not optimal in the sense that the segments they determine may not be maximal-length segments. which will eventually be close to parallel and between which the given arc is situated.

(Ao. . which.(rN. the bounding directions emanating from P and Q are still pointing toward each other. but away from each other. This is illustrated in Fig.Bo) := ( r ~ . . However. It calls a function avg-tangentdirertion. the bounding directions emanating from P and Q have become parallel and no longer point toward each other.6 Geometry of Tomek's segmentation technique. Its input arguments are S =< (r c . one or both of these boundary directions change in a way that brings the directions closer to being parallel.segmentlist). At the time point x is being processed. procedure segment-grow (S. It produces segmentlist.X. and keeping track. while f<N do begin Figure 11.570 Arc Extraction and Segmentation from P and Q to x . The final fitted point of the segment is then computed as the intersection point of two lines. the digital arc sequence. the first being the line passing through the beginning point in a direction of the average of the boundary directions and the second being the line passing through the final breakpoint of the segment in a direction perpendicular to the average of those directions that point farthest away from the directions of the line on the opposite side. . and A. We call these directions the boundary directions. which determines the average forward-looking tangent direction at a point having index b of arc sequence S. f := 2. given two points. the specified bound. The first point at which this happens becomes the final point of the segment. segmentlist := nil. As each successive point of the sequence is processed. The procedure segmentgrow details the algorithm. 11. . Eventually a point is reached where these boundary directions no longer point toward each other. The point y terminates the segment. c l ) . independently on each side. b := 1.). determines the direction cosines of the line passing through the two points. .cN) >. of those directions that point farthest away from the direction of the line on the opposite side. Initially these boundary directions actually point toward each other. . It also calls a function linedir. which is a list of beginning and final indices for the points in each segment.6. at the time point y is processed.

segmentlist.11. then keep the shifted breakpoint.(a2.(rf .f ) ).vg.f) of the indices of the beginning and fnl points of each input segment. ( ~ z . v I ) . Do the same on the final point of any even-numbered segment and the corresponding beginning point of the following odd-numbered segment. Details are given in the procedure movedreylkpint. which takes the value ia $g 0 if no breakpoints were moved and the value 1 if some breakpoints were moved.v2)) 2 2h.81) + (g1. The function get-element returns a value that is the jth element of the list specified by its first argument.:= (a0. ( ~ z . 1973). First shift the final point of any odd-numbered segment and the corresponding beginning point of the following even-numbered segment and test to see whether the maximum error of the two segments is reduced by the shift. j is its second argument.).~2)) d > then (u1. (g1. whose arguments are S.hl). and s a . push(segmentlist.82) + (u2. 5 .(b.5 Segmentrtkn of Arcs into Simple Segments 571 repeat begin f := f+l.P1). ..h.ao)u.vg + (C . if dist((a1. u I ) . ) end end segment-grow.hz)) > d &) tJml(u1.h1) := linedir((a1.v1).Bo). If it is reduced.80)v. e := (rf 113 . a list of pairs (b.(a2.cf)).hz) := linedir((a2.vl) := (g1.hl). d := dist((a1. there is an easy way to shift some of the breakpoints to produce a b e a r arc segmentation (Pavlidis.v1) := (gl. end until f=N or dist((a1. ~ 2 ) ) .+ (g2.B)+ ( ~ 2 . (g2. The technique is simple. The procedure put-element puts its . & ) + (u2.80)+ e(u.81)+ ( ~ I .the digital arc sequence.82>. if dist((a1. f (Ao. (rfrcf)).81) + (u1. 4 Breakpoint Optimization Once an initial segmentation of a digital arc sequence is obtained. The function length returns a value that is the number of items in a list.81)+ ( ~ I .

.fj+l).) then besin f.fj+l).bj+l . := f. L = length(segmentlist). .flag).ej+. := get-element(segmentlist.bj+~. flag := 1 end end e k begin d j = erromorm(S. then begin d j = errornom(S.1)..j).sflag). end end move-breakpoints. := f.segmentlist. end for sflag := or(sflag. f. ) (b f j+l) := get-element(segmentlist . bj+l := bj+1 .1. procedure movebreakpoints(S.1). dj+l) < max{e. procedure adjust(S. for j := 1 to L-1 by2 do adjust(Sj .fj).. ej+. if e j > ej+.segmentlist.segmentlist. bj+l := bj+l+l. while Bag = 1 do begin flag = 0.flag) (b.j.572 A : Extraction and Sogmentatbn n third argument in the jth position of the list specified by its first argument.b. d ~ + ~ ) < max{e.flag).flag) end for sflag := or(sflag. := errornorm(S. e. . flag = 1. .} then begin f. f . ej+l := err~rnom(S.segmentlist.1. flag=l end end. if m a ~ { d ~ .j. sflag := 0.flag).bj. .b.j +I). dj+l = erromom(S.1.+~ fj+l). dj+l = err~rnom(S.l. j is its second argument.bj. forj : = 2 t o L b y Z d o adjust(S.. f j . +I if max{dj.

segmentsplitlist.sflag . either some breakpoint has been moved. the sequence whose terms are the maximum resulting error constitute a nonincreasing sequence bounded below by 0. e l ) is reduced and the while iterates. segmentlist. The procedure for successive segment merging is detailed below after the arcsplit-andmerge procedure. or no breakpoints have been moved.4 can be used for breakpoint adjustment. . The termination.ei+. until sflagl=O and sflag2=0 and sflag34 end arcsplit-andmerge. So as the iteration proceeds.5 Sogmonllt@n of Arcs into Simple Segments 573 put-element(segmentlist.) as well as all other errors alone..5 Split and Merge The split-andmerge paradigm for curves was introduced by Pavlidis and Horowitz (1974). segmentmergelist := nil. and the movebraakpoints procedure of Section 11. move-breakpoints( S. may be at a local minimum rather than at the global minimum. repeat besin endpoins_fitandsplit(S e.ei+.c.segme~~tlist. 11.segmentlist). providing any resulting merged segment has sufficiently small error.(bj+l.segmentlist.segmentlist. . end adjust. . Therefore it must terminate.j. however. Do this repeatedly until all three steps produce no further change. in which case the while terminates. procedure arcsplitandmerge(S. Each call to adjust either moves a breakpoint that reduces the error max{ei. The idea is simple.sflag).f := remove(segmentlist). while segmentlist # nil do besin . (bl.5.N)) .1 can be used to accomh plish the splitting.5.f j)).(bi. Therefore after each iteration through the while loop.-ist.) and leaves all the other errors the same or does not move a breakpoint and therefore leaves max{ei.fj+l)). procedure endpointAtandmerge(S.~.emu . . . 1).11. sflag := 0. put-element(segmentlist. Then try to adjust the breakpoints to obtain a better segmentation. endpoint-fitandmerge(S. segmentlist := nil.segmentsplitlist.sflag2).sflag3) end. Then try to merge successive segments. First split the arc into segments for which the error in each segment is sufficiently small. The procedure adjust actually performs the trial shifting of breakpoints.( 1.. in which -case max{e. .5.j. add(segment1ist. T e endpointfit-undsplit procedure of Section 11.

) := remove(segment1ist). 0. q: 4 . E . . . PQ= nil.f 2)). The triple (a. AS before. The procedure calls the hnction indexmindist. (d-I . (a. the second argument is an initial partition.& y) designates three parallel arrays. which finds the index to that cluster whose line is closest to the given point that is its first argument. The new clusters then constitute the partition for the next iteration. then add(segmentmerge1ist.f 1) := (b19f2).5. The internal variable P designates the partition produced on the 9th iteration. the first argument to the procedure isodatalinefir is the given digital arc sequence. y)).P(k).cN) >. 4 procedure isodatalinefit(S. ~ ..b f 2 .) := (k+l.f2) end end.(aq-I. .ci).(b if k# f then sflag := 1.y(k). which is a partition of the points of S and the line-fitting parameters for each cluster in the partition. 11.79-I) := linefit(134-I).6 lsodata Segmentation A variant of the isodataclustering algorithm can be used to segment an arc sequence into subsequences that fit a line. . ). The outputs of the procedure are the final partition P. repeat w n q:=q+l. if E < E. yQ-I)).f . The line-fit parameters for the kth cluster are cr(k)..@4-'.k)) .. ..Po P . (blrf2. The basic idea of the iterative isodata line-fit clustering procedure is to determine the line-fit parameter for each cluster in a given partition. then begin add(segmentmergelist.(b. if E < E . sflag := 1 end else begin add(segmentmerge1ist .f )) end endpointfitandmerge. Bq-I. S =< ( r . Then each point is assigned to the cluster whose line fit is closest to the point..(b .574 A Extraction and Segmentation n (bz. endpointlinefit-error(S. where a(k)r +B(k)c + y(k) = 0 is the line equation. : for i=l to N do begin k: =indexrnindist((ri.(rN.k). (b1.

1954. for k=l to K do begin dist:=Ia(k)r 8(k)c y 1. + + 11. the curvature at a point on the curve is the limit. yq-I) end isodatalinefit function indexmindist((r. Places where curvature passes through zero are places where the local shape changes from convex to concave. as arc length change goes to zero.5. then it is assigned to the same subsequence. So a final pass through the ordered points in S must be made to determine the proper subsequences. it is assigned to a new subsequence. Pq-I. if dist < d then begin d: =&st.y)). indexmindist := k end end end for end indexmindist The isodata segmentation as given in the procedure isodatalinefit puts together in the same cluster points from S that may be in two collinear segments.c). c(t)].11. For this discussion we represent the curve parametrically as [r(t).c(t)] is given by . But. q p =p4-1. Curvature has been long established to play an important role in shape perception (Attneave. 1957). a < t < b. For any t.(a. Places of natural curve breaks are places of curvature maxima and minima.7 Curvature The geometric idea of curvature is simple. If a point belongs to the same cluster as the previous point.I = P . if it belongs to a different cluster. of the change in tangent angle divided by the change in arc length. (a. d: =verylargenumber. end end for end until p q .@.y) = (aq-'.5 Segmentation of Arcs into Simple Segments 575 Pi := add(Pi. For a planar curve.c(a)] to [r(t). (ri. the arc length s(t) going from [r(a).8.ci)).

. (11. Given an arc sequence S =< ( r l .r I ) . .( r N . and S.rN) where s l = 0 and >. . . Eq. . we can set up a normal.. c N>. ) ized arc-length parametiic representation by definingS. c l ).. A more stable approach is to find a way to use the largest arc length possible to estimate the tangent angle of a line fitted to the points on either side of the point having its curvature computed. The difference between the tangent angles of the line segments is the tangential deflection (as discussed in Section 11. .5).. Then a polynomial or spline least-squares fit for r as a function of s and c as a function of s can be computed. =< (sl.(sN.( s Nr N )> . ) . c .2). .576 Hence Arc Extraction and Segmentation as The unit length tangent vector T at [r(t). The observed behavior of the curvature computed this way is often unsatisfactory because the required second derivatives can have excessive noise..c(t)] measured clockwise from the column axis is given by c(t)]is given by The unit normal vector N at [r(t).The tangential deflection divided by the change in arc length across the given point is an estimate of curyamre. The curvature K is defined at a point of arc length s along the curve by where A0 is the change in tangent angle produced by a change As of arc length. =< ( s I . from which the curvature at s can be computed by .5. . c(t)]is computed by From this definition it follows that the curvature K at [r(t).

Anderson and Bezdek (1984) estimate As at (r.+~ c. = a +Rsint.+1. .c.~.The geometry of this configuration is illustrated in Fig..7. perhaps a bore accurate estimate of A s at (r. c. . c.+I = .. Figun 11.6).just as we did in Eq. c.)~.+l = a + R Cn+l = b R C O S ( ~ + ~ = b R COS .'"'(sin .+I sin 4. -c. (11. c.) and (r. = b + R c o s ( ~ .+J is given as On+.7 Geometry of the circular arc segment defined by two points (rn.c. From the last four relations it is apparent that t.. and the tangent angle at (r. where the tangent angle (measured clockwise from the column axis) at (r.sin 4.+I . = 4.90") = a .cn+l) their tangent directions 4 and 4.11.+l.) is given as 8. This produces R = . rnfl = a + R sin t.R cos r.7) Curvature at (r. -90") = a -Rcos4.cn) and (r.90') + + This results in an overconstrained linear system which can be solved in the least-squares sense for R.) can be computed by fitting a circular arc segment through (r.-90') = b +Rsin4. = cos t.. and n respectively.).rn) (COS .sin t. .) 4. = u +Rsin(4.~ cos +.+l..+. The parametric representation is then r... = .r.cn+l).cos 4.90".sin tn+l C . .) is then estimated by 1/R.+l)2 +(c..+1 2 + (cn+12.) by J(r.('"+I .5 Segmentation of Arcs into Simple Segments 577 If the curvature is computed at (r. Hence r.. c. ..+~= b +R cos t.) (11. sin = cos t. 11. =b+Rcost. cos 4.

578 Arc Extraction and Segmentation Hough Transform The Hough transform (Hough. Some schemes increment by one and some by the strength of the gradient at the pixel being processed. Fbr example. they are quantized to corresponding values M and B and the accumulator A(M. calculates the parameters of the specified curve that passes through this pixel. 11.6. The peaks indicate the parameters of the most likely lines in the image. 1972) is a method for detecting straight lines and curves on gray level images.1 Hough Transform Technique The Hough transform algorithm requires an accumulator array whose dimension corresponds to the number of unknown parameters in the equation of the family of curves being sought. From these lists the actual segments can be determined. PTLIST(M. it does not tell us where the actual segments begin and end. It determines whether there is enough evidence of an edge at that pixel and if so. Using an accumulator array A. After all pixels have been processed. The two dimensions of the accumulator array for this family would correspond to quantized values for m and for b. In the straight-line example with equation y = mx b. the Hough procedure examines each pixel and its neighborhood in the image. where d is the perpendicular distance from the line to + + . This description of the Hough method is general. To obtain this information. Here we describe the Hough transform technique and show how to apply it to finding straight-line segments and circular arcs in images. Once the parameters at a given pixel are estimated. Stockman and Agrawala (1977) were the first to realize that the Hough transform is template matching. the accumulator array is searched for peaks. it would estimate the m and the b of the line passing through the pixel being considered if the measure of edge strength (such as gradient) at that pixel were high enough. Although the accumulator array tells us the parameters of the infinite lines (or curves). + + Finding Straight-Line Segpents The equation y = mx b for straight lines does not work for vertical lines. finding line segments using the equation y = mx b requires finding two parameters for each segment: m and b.B) contains a list of all the pixel positions that contributed to the sum in the accumulator A ( M . We will emphasize the O'Gorman and Clowes (1976) formulation. it omits the details needed for an implementation. The equation d = x cos 8 y sin 8. Rosenfeld (1969) describes an implementation that is almost always more efficient than the original Hough formulation. The method is given the family of curves being sought and produces the set of curves from that family that appears on the image. We then present a Bayesian approach to the Hough transform. we can add a parallel structure called PTLIST.B). B) is incremented. Duda and Hart. 1962. We will now discuss in detail algorithms for straight-line and circle finding.

9.8 Parameters d and 0 used in the equation d = r sin 0 + c cos 0 of a straight line. 11.9 Accumulator array for finding straight-line segments in images of size 256 x 256.11. The O'Gorman and Clowes algorithm for filling the accumulator A and parallel list array PTLIST can be stated as follows: Figun 11. O'Gorman and Clowes quantized the values of d by 3s and 8 by 10" increments in their experiments on gray level images of puppet objects. Figure 11.0 Hough Ttrndorm 579 Figun 11. We will use this form of the equation but convert to row (r) and column (c) coordinates. The accumulator A has subscripts that represent quantized values of d and 8. An accumulator array quantized in this fashion is illustrated in Fig.8 illustrates the parameters of the line segment. and 8 is the angle this perpendicular makes with the c (column) axis. the origin and 8 is the angle the perpendicular makes with the x-axis. Thus our equation becomes where d is the perpendicular distance from the line to the origin of the image (assumed to be at upper left). was suggested in Duda and Hart (1972) and used by O'Gorman and Clowes (1976). .

Once the accumulator and list arrays are filled. for R := 1 to NLINES do for C := 1 to NPIXELS do begin DR := row-gradient(R. PTLIST := NIL.THETAQ)+GMAG. GMAG := gradient(DR. which would have to be converted' to degrees.DC). O'Gorman and Clowes presented an ad hoc procedure that illustrates some of the problems that come up in this phase of the line-segment extraction process. We assume here that atan2 returns a value between 0" and 359".(R. PTLIST(DQ. THETAQ := quantizeangle(THETA). Procedure accurrzulate is O'Gorman and Clowes's version of the Hough method and pretty much follows the Hough theory.10.DQ. The function atan2 is the standard scientific library function that returns the angle in the correct quadrant given the row and column components of the gradient. and the function gmdient combines the two to get the magnitude. though. DQ := quantize-distance(D). for each point (R. there is no standard method for extracting the line segments.THETAQ). V := pickgreatest-bin(A.THETAQ) := append(PTLIST(DQ. A := 0. A(DQ.THETAQ) := A(DQ.THETAQ)). the lines are two pixels wide. The algorithm is expressed in rowcolumn space to be consistent with the other algorithms in the book. DC := col-gradient(R.C) in list-of-points do .C)) end end end for end for end accumulate.THETAQ).C) . D := C*cos(THETAQ) + R*sin(THETAQ).580 Arc Extnctbn and Segmentation procedure accumulate. if GMAG > gradient-threshold then begin THETA := atan2(DR.DC). Notice that with a 3 x 3 gradient operator. Many implementations return the angle in radians. Notice also that counts appear in other accumulators than the two correct ones. The functions mwgmdient and columngmdient are neighborhood functions that estimate the row and column components of the w e n t . This procedure can be expressed as follows: procedure find-lines. 11.C) . The actions of the procedure are illustrated in Fig. while V > value-threshold do begin list-of-points := reorder(PTLIST(DQ.

4) 4 (3.2)(4.I Results of the operation of procedure acrumulate on a simple gray 0 level image.2)(4. ) 2 4 . 0 THETAQ PTLlST 4 ( 13(1 4 ( .) .1)(4.3)(4. . ) 0 (3.1)(3. ) 2 3 ( .3) Figun I 1.1 1 2 3 4 2 3 4 5 5 THETAQ 360 360 DQ: 6 3 0 0 1 20 30 40 0 THETAQ accumulator A 90 DQ: 6 3 .4) V (3.

c) lies on a circle. 11. The standard equation of a circle has three parameters. and the direction of the vector from (r. set-tozero( A. a radius d is selected.11. These can be saved as intermediate images.DQ. createsegments(finallist-of-points) .C'). to the quantized d and 8 values for that bin. The arrays D and THETA are expected to hold the quantized D and THETA values for a pixel that were computed during the accumulation. Kirnme et al.THETAQ).DPRIME. . THETAPRIME := THETA(R'. THETAPRIME)). Finding Circles The Hough transform technique can be extended to circles and other parametrized curves. The merge procedure merges the list of points from a neighbor of a pixel with the list of points for that pixel. they use a gradient technique to reduce the dimension of the parameter space to be searched. If (r.DQ. GRADPRIME := GRADIENT(R' . The set-tozero procedure zeroes out an accumulator so that it will not be reused. set-tozero(A. It creates and saves a set of line segments terminating at gaps. if GRADPRIME > gradient-threshold and abs(THETAPRIME-THETA) 5 10 then begin merge(PTLIST(DQ. Like O'Gorman and Clowes. as shown in Fig.582 Arc Extraction and Sogmentrtlon for each neighbor (R'. O'Gorman and Clowes use a least-squares procedure to fit lists of points to line segments.C'). DQ and THETAQ. c) is given.THETAPRIME) end end end for end for finallist-of-points := PTL. The function pick-gmtest-bin returns the value in the largest accumulator while setting its last two parameters.THETAQ).THETAQ). keeping the spatial ordering.THETAQ).C') of (R. then gradient (r.ETLIST(DPRIME.C'). the procedure craatesegments goes through the final ordered set of points searching for gaps longer than one pixel. V := pickgreatest-bin(A. end end while end findlines. So if a point (r.c) points to the center of that circle. Similarly the array GRADIENT is expected to contain the computed gradient magnitude.IST(DQ.C) not in list-of-points do besin DPRIME := D(R'. Finally. (1975) develop a program for finding circles in chest x-rays. The m d e r function orders the list of points in a bin by column coordinate for 8 < 45 or 8 > 135 and by row coordinate for 45 5 8 5 135.c) to the center is computed.

CO.CO.D*sin(THETA). The radius. r. O A(RO. PTLIST(RO.D). O C := C . for R := 1 to NLINES do for C := 1 to NPIXELS do for each possible value D of d do begin THETA := compute-theta(R. the row-coordinate of the center.(R.D*cos(THETA). then the coordinates of the center can be found. R := R . PTLIST := 0.11 Direction of the gradient at the boundary points of a circle.D) .11. and the column-coordinate of the center.. A := 0. d .C)) end end for end for end for end accumulate~circles..D)+l. are the three parameters used to vote for circles in the Hough algorithm. The inward pointing gradients are the ones that will accumulate evidence for the center of the circle.C.D) := append(PTLIST(RO. ' .D) := A(RO. c.CO. Circles are represented by the equations With these equations the accumulate algorithm for circles becomes procedure accumulate-circles.6 Hough Transform 583 Figure 11.CO.

It attempts to get rid of the . compute all a such that f (x. then the time complexity is O(MM-'). Each pixel is a member of two components. For each edge pixel x. for eight bins. The procedure as expressed by Ballard and Brown (1982) is: 1.a) = 0.584 Arc Extraction and Segmentation This procedure can easily be modified to take into account the gradient magnitude. etc. The components (line segments) that receive the majority support are selected. This is known as the generalized Hough transform. 68 to 112. 1981a). like the procedure for line segments. For points with high enough gradient magnitude. Find the connected components of each symbolic image and compute line length for each component. if the first quantization is 0 to 44. 1986) was developed to find straight lines in complex images of outdoor scenes. one from each symbolic image.. a) = 0 and set A(a): = A(a)+l. Variations A number of hybrid techniques exist that use some of the principles of the Hough transform. 45 to 90. Local maxima in A correspond to curves off in the image. If there are m parameters in a. Extensions The Hough transform method can be extended to any curve with analytic equation of the form f (x. The Bums line finder takes advantage of two powerful algorithms: the Hough transform and the connected components algorithm. and Riseman. The Bums method can be summarized as follows: 1. 3. etc. assign two labels representing two different quantizations of the gradient direction. each having M discrete values. 2.) The result is two symbolic images. Each component receives a count of pixels that voted for it. then the second can be -22 to 22. 2. Compute the gradient magnitude and direction at each pixel. where x denotes an image point and a is a vector of parameters. The Hough transform method has been further generalized to arbitrary shapes specified by a sequence of boundary points (Ballard. Each pixel votes for its longer component. Hanson. (For example. The Bums line finder (Bums. 91 to 134. Initialize accumulator array A(a) to zero. 23 to 67. 3.

contrast difference. and similarity of contrast.6 Hough Transform 585 quantization problems that forced O'Gorman and Clowes to search neighboring bins by the use of two separate quantizations. relative overlap. such as line strength and orientation of a line passing through (r. d). Replacement means manipulating the data structure to form new segments from several smaller ones. c) E E(8. c) : (r. + where 6 is some fixed number related to a function of the resolution cell size. They define a straight line as a sequence of line segments satisfying the following conditions: (1) Consecutive pairs satisfy the relations of collinearity. Boldt.c) For any orientation angle 8 and distance d.11. Linking is performed on a local area of the image and results in a graph structure. d. Their grouping process consists of three steps: linking. The Bayesian approach to the Hough transform computes for each quantized value of 8 and d the conditional probability P[a line segment exists with parameters(8. If the angle detection technique uses too small a neighborhood. Optimization consists of searchidg the graph to find acceptable sequences of line segments. In practice. Diagonal lines are really a sequence of horizontal and vertical steps. and replacement. c) may contain only information relative to the existence of a line or edge pixel or it may contain additional information. we define the set of pixels that can potentially participate in the line r sin 8 c cos 8 = d by + . I(r. it suffers from a problem that will affect any line finder that estimates angle based on a small neighborhood around a pixel: Diagonal digital lines are not straight. ) ) ] - . optimization. Linking means finding pairs of line segments that satisfy the relational measures of orientation difference. c) has been determined. lateral distance.c) a vector I ( r . and distance between endpoints. The algorithm has been applied to complex outdoor images and results in fewer. We assume that some local operator has been applied to the image and at each pixel position (r. We refer to such a line segment as one with parameters (8.2 A Bayesian Approach to the Hough Transform The Bayesian approach to the Hough transform described here applies to line detection using the parameterization r sin 8 c cos 8 = d. and Riseman (1989) attempt to solve this problem with a computational approach to the extraction of straight lines based on principles of perceptual organization. d I I ( r . (2) the entire sequence has the least error locally among candidate groupings. proximity. Thus in practice the Bums line finder and any other angle-based line finder can break up lines that a human would like to detect as a connected whole. and (3) the error for the best sequence is acceptably low. it will end up finding a lot of tiny horizontal and vertical segments instead of a long diagonal line.6. Weiss. more meaningful line segments. 11.

d I I(r.c) : (r. E(8.dl = Then n P(I(r.c)tE(B. P[a line segment exists with parameters (8. given that there is no line.ckE(9.586 Arc Extraction and Segmentation Now from the definition of conditional probability.c) I 894 P(B. c) r ~ ( 8 . 4 P ( o . (11.611 P ( e .d 1 I(r. the observations are independent too. Hence. Eq.&I (11. c) : (r. That is. P[I(r.d) log P[I(r. (11. That is. the unconditional distribution of the observations because the Prob(1ine) << Prob(no1ine). d) I I(r.d)) are independent.d) n P[I(~. the right bracketed term of Eq.d)l P[I(r.dlflog P(9.11) P[8.c) E E(8.. d)] by P[8.c) E E(0.c) I .ckE(B. d)].12) I Upon taking logarithms of Eq.no line] no [P[I(r.11) is approximately one. a P[I(r.d) 1 no line] Conditioned on the state of no line existing.d) n p[r(r.c) E E(8.d)l = (r.c) r E(8. We assume that the observations {I(r. c) E E(8. d) I a line segment exists with parameters (8.11) : E : r But the conditional distribution of the observations.11) simplifies to P[8.d)l We henceforth denote the conditional probability P[a line segment exists with parameters (8.c) : (r. within a fairly good approximation.d)l = [ (r. (11.d) / d)]line]1 P[I(r.d) - (r.d)(r. (11. We therefore obtain from Eq.9) (r.c)(r. c) E E(8. d) I I(r. c) : (r. c) E E(8. c) : (r.c) : (r.d)l = P[I(r.8) P[a line segment exists with parameters (8.d) I 8. d I I ( r . ~ ) P'[l(r. d)] = P[I(r. c) : (r.c) : (r. d ) 1.c) lnoline] (11. (11.c)cE(B.ckE(0. c) c) E(8. conditioned on the line parameters.d) log P[I(r.c) E E(8.c) I 8 .c) (r.c) : (r. there results log Pf8. is.c) E E(8.12). c) E E(8.c) E E(8. c) : (r.c) : (r.c) I 8.c) 1 8.13) . c) : (r.d) PII(r .d I I(r.c) I no line] (11. d)] (11.

d) (r. c).d) = log P[8. Specializing our result to this case.c)=land(r.d) 1 -q(8.r)fE(O. we obtain that if the local detector characteristics are P[I(r. those parameter values (8.c)fE(O.c) I 8. the set of all pixels on the line segment having parameters (8. c).dl = and 1 -w ifI(r. d).r)=O log [1 .c) takes the value 0 if a local line detector determines that most likely no line is passing through pixel (r.c)=l P[I(r.d) ifI(r. d) by H(8.d)l In the original Hough methodology.d) is a specified parameter function related to the edge operator employed. then the Hough transform H(8.d) C 1 -w log 1 -q(8.r)=l q(8.c) e E(8.c) is just a binary number: I(r. c) takes the value 1 if the local line detector determines that most likely some line is passing through pixel (r.d). I(r.d) are more likely to have higher counts for Ho(8. d) + #{(r.d) + C (r. This was noticed by Cohen and Toussaint (1977).d 1 I(r.c) E E(8. and I ( r .c)~E(8.d) l(r.d) I(r.r)fE(O. (11.q(O.d) if I ( r .d) = log P(8.r)fE(#. The finiteness of the image causes different numbers of pixels to be in the set E(8. d)+ log q(8.d) according to the size of E(8. c) = 1) log W This is closely related to the quantity which is what Hough described in his patent.d) than other parameter values when no line is present with parameter value (8.c)( no line] = + C (r. d)lI(r. d) c)eE(B. q(8.11.d)] =log P(8. who recommended adjusting the counts in Ho(8. c ) = O and (r. we can see an immediate problem inherent in the original Hough transform given by Eq.d).d).d) w ifI(r. ) d I(r.6 Hough Transform 587 Define the Hough transform H(8.c) : (r.r)fE(@.d) 9(8 d) log -A (r. d) takes the form H(8.d) having larger sets E(8. . Since all real images are finite in size.d) I(r.rlfE(0. Thus all other things being equal.15).d) = log P(8.r)=O log (1 -w) C (r.c) = 0 where q(8.r)=l log w - C (r.

. the terms will be negative rather than positive.d) must be in the summation. we adopt a notational change in this and the following sections.16) and the Hough transform as we have defined it. O'Gorman and Clowes use the angle T(r. Our model for (i. which is also the angle of the line boundary.c)]. a measure of edge strength of a line boundary. Instead of incrementing by one to compute N(8. O'Gorman and Clowes increment by the gradient G(r. The closer T(r. the addition of the log term of Eq. even though there are potentially more terms.d) for which T(r. c. I(r. Specifically they define the modified Hough transform H I(8. We derive expressions for the estimated parameters of the fitted line and their variances. the pixels in the summation are restricted to only those in E(8. (11.).17) in which case we see that what should be summed is log likelihoods and not gradient strength. c)lno line 1 dl C + log p(e.14) naturally handles this problem. c) = [(B(r. We denote the noisy observed values by (in.c) = 1. log P[I(r.c) = 8 and for which B(r. d) has an angle T(r . Suppose that noisy observations (Pa..) is . (11. Line Fitting Here we give a procedure for the least-squares fitting of a line to the observed noisy values. (11. I(r.16).588 Arc Extraction and Segmentation However.13) as P['(r$c)l@.. (11. d). c) in E(8.c)cE(@. G(r . c). and the unknown values en) before noise perturbation by (r.2. In the modified Hough transform of O'Gorrnan and Clowes described in the previous section. as well as a variance expression for the orientation of the line.4 H(esd) = (r.c) very much different from 8. ( 11. we rewrite Eq.c).c). If a pixel (r.17) we find that all pixels in E(8. c) can be represented by a three-dimensional vector. and the third being the angle of the edge. the second component being the gradient. d) by To understand the relationship between Eq.) that lie on a line cur. since if there is no line. The parameter d is then directly determined by d = r sin 8 + c cos 8. (11. In Eq.. the first component being a binary number indicating whether or not a line passes through the pixel at (r. + 7 = 0 are given. + Bc. In the O'Gorman and Clowes Eq. Because we need to differentiate between the observed noisy values and the d u e s before random-noise perturbing.) of points (r. then the log likelihood will be small.c) is to 8. c) as an estimate for 8. the larger the log likelihood will be.c). T(r . c.i.

By way of notation we define the following moments Pcc = g C ( c "n=I 1 which directly relate to the unknown parameters a .and that they come from a distribution that is an even function. having mean 0 and variance u2.en) we must estimate the parameters of the unknown line.. Hence 0 otherwise ' = j Dorff and Gurland (1961) use a similar model for the noise. + . but they use the representation c = mr +b instead of the more general representation ar +/3c +y = 0. It is easy to determine that Now from the noisy observations (?.. Using the Lagrange multiplier form. & and y of the line on which the points (r.11. To do this we employ the principle of minimizing the squared residuals under the constraint that hZ 6' = 1. c.) lie.7 Line Fitting 589 where we assume that the random variables tnand qn are independent and identically distributed.

590 Arc Extnctlon and Segmentation we define n=l Upon taking the partial derivative of r2 with respect to 9 and setting the partial derivative to zero.k(2&)N = 0 Letting pcc = N -l x N n=l ( e n -fic12 we obtain upon substitution . we obtain ih 62 = C 2 [&(in .I%) + &en (in fir) n=l .&)I .C f n n=l and fiC= -ten n=l I N we can obtain Hence + = -(&fir + bfic) Continuing to take partial derviatives of d wt respect to h and B. we have Letting l N f i r = .

11. With ). of the sample covariance matrix having the smallest eigenvalue. (i)i e + f = 1 -f i r 2 ( fire -f i r 1 = ( -fire i ) ficc .-11. the corresponding unit length eigenvector can be determined.25) B . But which eigenve'ct&? The one we want must minimize C [&(in - + . Any eigenvalue i must satisfy [(k !re) Pec -i - (A .)I (i) fire =0 and this means that the determinant Therefore 1 f i r f i r fieC-il=~ The smaller eigenvalue corresponds to the minus sign.1 Variance of the Fitted Parameters The randomness of the observed data points in the noisy case leads to a randomness in the estimated parameters &-and 8. The question we now address is: How can the expected values of h and B and the variances of 6 and be determined? We consider the case for 6 .l ) ( h . From Eq.ire)] Pre Pee ) ere = ( ~ .7. ( 11. determined.7 Line Fitting 591 So the sought-after N (i) fir) must be an eigenvector of the sample covariance matrix.+)(trr (i) Hence the eigenvector (i) Pre must correspond to that eigenvalue ).+(en .

There results point ( p r c . .~ r c ) ' l .592 Arc Extraction and Segmentation To find the expected value and variance of &. We expand & around the p.prc)(-B) +( 1 . noting that = s i g n ( ~ ( ~ ~ ) u which.p r c ) B + EO.p r c ] .26): E [ h l =a 1 + Prr + Pcc [ .~ ( f i r c . To complete our calculation.+ prr)a] irr To determine V [ & ] = E [(& .. is true under our model.irrr + ~ r r ) l -Some E .( b r c .i r r + ~ r r ) ' I .26) Then to determine E[&]we simply take expectations on both sides of Eq. Using the relation p. .[ ( f i r c .(~~ " = d m we obtain &=a+1 Prr Prc and /3 = -Prr + Pcc [ ( h C . 1 E [(& . we need a way by which the expected and can value and variance of irC ). of these are tedious to calculate. and .) in a first-order expansion. + .~ r c ) O .i r r +prr)a] ( 1 1. we need expressions for E[irrc .irr be related to &.a)'] = (Prr +~ c c ) ' E{ [ . so we leave the calculation for an exercise. E[X rrrl. ( 1 1... a d E [ ( i r r c . ~ [ O .CX)'] .grc)O + ( 1 . 0.brr +~ r ~ ) a ] ' } -8.

11. . 2 6 ) and ( 1 1 . 3 0 ) we can determine the covariance between & and P. ( 1 1 .f i r e + p r C ) 8 ] from which 1 ( 1 1.1 r + 4 u 2( ~Nc c-Bl2) (( ~~r+r+ p e+) u 2 ) r r Therefore From Eqs. .~ r e ) ( .a ) + O. obtain we Using the relation a symmetric calculation for b yields 8 = 8 + P r r + Pcc [ ( i r e .4 a 2 ~ 2u 2 ( P r r + Pcc + 0 ' ) N .7 Line Fitting 593 Hence Using the fact that 8' = &.29) .

o)(8 .PI] + E [ ( h i + b$)*] [ + 2p. Therefore v[+] p: V [ h ]+ p: V @ ]+ 2 p r p C ~ o v (8).32) The covariance of h and 4 is .a)'] + r:E [(6 .( & f i r +PC). v[+l = p : [~( h .18).812] + zprpCE ( h . Hence v[+] E [(j .E [(& .b ( p c + i)+ upr + PrC)'] l where $ = - c6 N n=l and t.B ) ( B ~ B i ) ] 8 + E [ ( h i + &j)z] = u2 (v[&I [ ~ I1) +V + As indicated in Exercise 11. we recall from Eq. constitute the random noise as given in Eq. ( 1 1.yl21 = = E [(-h(pr + $1. = h + 0' (v[&Iv@1+ + 1) prr+ + [ = u2 { I + ( N .594 Arc Extraction and Segmentation To detedne the variance of 9..lj(prr pcCI2 + -) +] (1 1.22) that y = -(cY~.5. (11. +PpC) and 9 = . After some simplification and rearrangement.o)(hg + B i ) ] + 2 ~ [(B . and 7.

a ) 2 ] + E [(b.C Y ) ~ ] E [(b.B)'] of Eqs. where sin 8 = B. give Using this approximation.cos B)'] +E [(sin 0 . A first-order expansion of cos 8 around 8. where cos 8 = C Y . k = 1.11.7. and The expression for E [(h .K are given and the unknown parameters . sine = 6. when u2 < prr ( N .B)'] ] E upon substituting there results Finally.sin 8 ) ~ I [(& . ..28) and ( 11.30) can be used to generate an expression for the variance of the angle 6 defined by cos 8 = &. we obtain I[(B E -e ) ~ ] Since . c). E [(cos 8 . .2 Principal-Axis Curve Fit Suppose that the curve to be fitted can be represented in the form where the functions f k ( r . ( 11. and a first-order expansion of sin 8 around 8. 11.l ) ( ~ r r pee)' + u2(prr + + g2) ~(cc + pee.7 Line Fitting 595 Similarly. .

Simple curves such as conics may be fit with the principal-axis technique by taking f (r. note that Hence a must be that eigenvector of FF' having the smallest eigenvalue. t l >f ~ ( i 2 ~ 2 2 ) . ..c) = c2..c) = r2. f~(in. The principal-axis curve fit is obviously a generalization of the line-fitting idea discussed at the beginning of Section 11. . . However.f 2(r. Bookstein (1979) gives a modification of this conic fitting that makes the fit invariant under rotation. a fit that minimizes f (r.tn) and a = f ~ ( i l . as .C) = r .t.. c. translation. c) = c. .)~ can be determined by the principal-axis technique.f5(r. and f6(r. f .)..f4(r. Instead of minimizing 6 subject to &=I a = 1.596 Arc Extraction and Segmentation al.t2) I) . c) = 1.N.ak satisfy the constraint If the observed noisy points are (i. ~ f1(?2. and change of scale. To see which one.6.tn) This means that cr must be an eigenvector of FF'. The objective function to be minimized is c:=. n = 1. Taking partial derivatives of e2 with respect to each ak results in the system where ~ I ( F I . fl(?n.(r. he minimizes e2 subject to a : : + + a: = 1.c) = rc. . .

for example. the variance of the angle of the fitted line. Another way to determine the region of support can be obtained from Eq.)* Teh and Chin (1988. 1989) give some experimental results demonstrating the efficacy of this way to determine the region of support. . they add a second. experience with the principal-axis technique has shown that the fit it provides to conics is often too eccentric.11. Techniques that use a fixed or nonadaptive region of support easily fail on curves that have segments of widely varying lengths.k in - + (C.Cn)has variance . At point (i..(Pn.C. gives instances where this happens. the angle of the fitted line using points (in_&. .) is given by where and j=-k.en. and they compare their results against some of the other techniques. Indeed.( 1 1. . If the region of support is too large. for an observed point (in. Davis (1977).8 Region-of-Support Determination 597 we shall see in the next section. Teh and Chin suggest that the appropriate region of support for a comer point is the largest symmetrically placed subsequence for which the chord joining the subsequence endpoints has maximal length.i?n-&).2..35). Region-of-Support Determination Teh and Chin (1989) argue that the determination of the region of support for the calculation of the straight-line fits that are associated with tangential angle deflection or curvature are of primary importance. many comer points or dominant points can be produced. fine features will be smoothed out. gives too. If the region of support is too small. alternative condition requiring that the ratio of the perpendicular distance between the point and the chord between the arc subsequences to the chord length be maximal. much influence to outlier points. . in effect. Formally..) of the arc sequence.. the objective function that the original technique or Bookstein's modified technique minimizes is not the most appropriate one. the region of support D.. To prevent every point of a digital circle from being assigned a comer point.. because the technique.

. . n + 1 + k ) will be decreasing as k increases.n) V(n.N . jic prr.) where P means V with the measured irr. However. bee.n) V(n.k. ) 2.tn+. . n)..598 Arc Extraction and Segmentation V(n .). En+1M) are on the same line and the observed points (in+. is the given minimum number of points for a line fit. pC.k ) .) . for each point n ..k. are noisy observations of points lying 2. instead of the true values ire. e n ) by + + + + {n-k . pee. e n . Likewise the angle of the fitted line using the noisy observed points (in+l.+I) the observed points ( i n . .n k ) . n + k . is then V ( n-k.pre. n) + V(n + 1.k . as soon as a point that is not a noisy observation of a point lying on the predecessor or successor line is included in either the predecessor sequence or the successor sequence. . and k . where for j > i . V ( n-k. . variance V(n+ 1 .) . It is possible for the right endpoint of the left line segment and the left endpoint of the right line segment to be identical.n ( P ( n + l . .n)+~(n+l. has .(in+k+l. In this case. noisy observations of points lying on the same line. and (in+.. I f the leftmost endpoint of the left segment and the rightmost endpoint of the right segment are approximately known. . This motivates defining the region of support around the right endpoint ( i n . . n) V ( n 1.n+k+l)) and around the left endpoint (in+. n 1 +k) increases.n + k ) < ~(n+k. n)+V ( n+1 . ..?. So long as 2. 2 P(n + 1 +k. the quantity to be minimized is V ( n .. where n minimizes v(n) = V(k. ir. .. V ( n .+.n) + P(n + 1.(in+l+k.:.k . where k _> k. . 1 is the index of the leftmost + + + + . The total variance around the endpoints (in.(in. n + 1 +k). I ) .k . n ) + P ( n + l . p r .. .k . by . then the comer point can be located at index n. n 1+k).

9 Robust Line Fitting 599 endpoint of the left segment. where u(n) = V(k. and N is the index of the rightmost endpoint of the right segment. + + . if the minimizing n is not within N/2 of the middle of the sequence. 11.1 Table 11.12(b) is the log of the variance u as a function of point index n. N . + Table 11.11. The bias can be removed by limiting the segment with the greater number of points to be no more than three times the number of points in the smaller segment. First we give a least-squares formulation and then modify it to make it robust.12(a). However. Let the equation of the line be ar /3c y = 0. there will be a bias to the result.n) + V(n.1 The points in an example digital arc sequence. and N = 24. Notice the sharp minimum that occurs at n = 14.k.1 lists a sequence of 24 row-column pairs of points on two noisy line segments meeting at a comer point of about 105". k. The digital curve defined by this sequence is shown in Fig. I). Shown in Fig. = 4. EXAMPLE 11. the comer point.. r n is the row number and cn is the column number Robust Line Fitting We now consider a reformulation of the line-fitting problem that provides a fit insensitive to a few outlier points. 11.

iteration.600 Arc Extraction and Segmentation ROW Log (EVfN)) COL (a) Figure 11. and N = 24. -ko + l). If we know the uncertainty associated with each point (ri. one way to formulate the problem is to find a vector p that minimizes the total weighted fitting error etPe subject to the condition that 1 lp11 = 1. where v(n) = V ( k o . An iterative robust solution can be obtained to the line-fitting problem by having the weight matrix be computed iteratively from the previous iteration to best estimate the vector p . ko = N 4.12 (a) Twenty-four point digital curve. (b) Log of the variance of the digital curve as a function of n. we can take terms as the diagonal elements of the weight matrix p. Then. the weighted least-squares sense.expressed as a variance of. where the weight matrix P is a diagonal matrix.n) +V(n. and let the W be . and let where E is a fitting-error vector.ci). Let Pk = W k k the weight matrix in the k t .

For p = v 3. >_ s2 2 s3.2) matrix that consists of the columns 3 through . we may assume that s.11.Then the total weighted fitting error becomes tlPke= plA'PAp = p'( US V1)'(US V')p = p'VS2 V'p This error has minimum value s by taking p = v 3 .9 Robust Line Fitting 601 singular-value decomposition of W k A be W k A= USV' where In the singular-value decomposition. the weighted fitting : error Wkt can be expressed as and the total error is elPkt = S: Now let U2be the N x (N . both U and V are orthonormal matrices. Without loss of generality.

c. .). c. sense. is to determine the parameter vector w and points (r. The problem we must solve. n = 1. respectively. . U2 = (u3. .).. has been determined by any of the arc extraction techniques..N are assumed to be noisy observations of points coming from some curve f (r. . . Least-Squares C u m Fitting Suppose that a set of rowcolumn positions {(i. . Define the redundancy matrix R = {cj) = U2U. . in the least-squares C. .En).N. .N that lie on the curve and are closest to (in.t.. . w) = 0 whose parameter vector. the positions (in.t. . n = 1. That is.uN). therefore. . Then N Let ei = 2 and define the weight matrix in the (k + 1)th iteration Wk+] as where otherwise and where c is a constant and Z is the median of leiI. n = 1. . w. must be estimated. ..602 Arc Extraction and Segmentation N of the matrix U.). . c. where f (r.. . . w ) = 0 In this section we treat the problem of determining the free parameters w of the curve f that make the curve f a best fit.. As in the case of fitting a line. that is. ..))f=.N. n = 1. to the given positions (in.. .

which lies on the curve y = x2 . is not the same as the problem of determining the parameters w of the curve f that minimize which can loosely be thought of as the fitting error.yo).)2 (c . c) on a fixed curve f to a given point (r. .~. . we have suppressed the dependency on the parameter vector w.c. + .11. to minimize subject to the constraint that f (rn.r. and the distance (xo . for the sake of brevity. Here (x*.c. subject to the constraint f (r..)' + (C. We take partial derivatives of e2 with respect to r. This problem is to determine a point (r.. c. . c .lo Laart-Squaras C u m Flttlng 603 Hence we seek w and (r. we define e2 = ( r r.)~. The difference between the two formulations is illustrated geometrically in Fig.c) that minimizes ( r .ye)?. and X and set these - + Figure 11. w ) = 0. betwan the point (xo.yo) and the curve y = x2 -a.yo) and the poi. Note that this problem.)'. c) to be the objective function.).N . taken over all directions.x: -a).. Here.which is the shorWt distance. c) = 0.x*12 (yo . n = 1. y*) Obviously one can be arbitrarily larger than the other.13 for the simple case of the curve y = x2 .. The figure makes clear that it is necessary to determine how to calculate the closest point ( r .yo)2 = yo -(x2-a). 11. which is to determine the parameters of the curve that minimize the sum of the squared distances between the noisy observed points and the curve.a. is the closest point on the c u m to (xo. . To solve the problem.2Xf (r..). which is the vertical distance be-n the point (xo. . n = 1.1 3 Differencebetween the distance f (xo.a.N . .c. (xo.

c) = 0 ( 1 1. (r..)] using Eq.co).37) ( 1 1. (11.41) by [f$ (I. In this case Eq.c.. solving for A.40) An approximate solution from Eqs.37) we can write the matrix equation Expanding Eq. This relation (11.604 Arc Extraction and Segmentation partial derivatives to zero. ~ = 0 = 2(c .c0)+(r-r0)-(r0. (11.and right-hand sides of Eq.).44) is exactly correct for any f that is a first-order polynomial.. This results in CI Bff bf = 2(r .39) becomes . ( 1 1.a). c.c)=f(r. ..c) and (ro. we obtain and .2 ~ ) bc dc ah = -2f(r.39) and solving for ( r . Substituting Eq.)+(c-~.) dr bf af bc (11.)-(~.c ) is close enough to (r.40) is easily obtained.38) From Eqs.c ) = $$(r co) and g ( r .36) ( 11.c ) then results in The distance d between (r. ( 1 1. co).co) yields 0=f(r. ( 1 1 .43).~.38) in a Taylor series around (ro. c ) = g ( r o . (11. ( 1 1.36) and ( 1 1.~.).ro) ....2A-(r. we obtain that %(r. c. c. Multiplying the left.39) and (11. Assuming that the unknown ( r .) so that the partial derivatives of f do not change much around (r. (11.co)can then easily be computed from Eq. ( 1 1.c) = 0 dr br d-f ( r .42) into Eq.

c. this problem then translates to finding the parameter vector w to minimize a result obtained by Sampson (1982).. Analytic solutions do not necessarily exist for the determination of w for function f other than first-order polynomials in r and c. as a small perturbation on w.) A w 'Ve2(w.). (11. has been calculated. For now. we just assume a w. This suggests that Aw should be in the negative gradient direction and should produce a change smaller than e2(w. (11. is given.. which is to determine the parameter vector w to minimize subject to the constraint that f (r. Suppose that iterate w. Its input arguments are w. that can be a constant or a function of the iteration index or a function of w +A w. we can solve the original minimization problem. . . for w is available and we perform a search.w) = 0.. Later we will discuss how reasonable initial values for w.) ~nwt negative and smaller than e2(w in magnitude.1 Gradient Descent To minimize a nonnegative function e2(w)..44). The parameter /3 is some fraction.10. The procedure curvefit details the iterative structure solution algorithm.e2..N. Therefore we treat this problem from the point of view of an iterative gradient descent solution.0 and for large t produces B = 05. we assume that some initial value w. . Taking a first-order Taylor series expansion of e2 around w. 0 5 /3 5 1. . Hence we take The a2in the denominator assures us that as 11 Ve2(w. can be determined if the fit is to a circle or to a conic. n = 1. To determine the next iterate wt+.)112 becomes smaller than a2. consider the fact that be . Now to find the right Aw.44). For example.. the initial guess. Eq.and a tentative wt+. With the use of Eq. then produces where V is the gradient operator...10 east-squares curve Fitting 605 From this approximate solution. to the determination of the distance between a point and a curve.11.5 (F) one choice that for t = 1 produces is B = 1.. we represent w. 11 Aw 112 will also become small. 11. B = 0.

It calls a function step to determine the multiplicative constant for A w procedure curvefit(wo.the final answer. e := e2(wt+ CAW). + ~ ) .Aw.e2).N.). c := 1.606 Arc Extraction and Sogmentrtion e2. form :=1 to 5 do begin c : = c *k. A smaller step size can surely produce an E ~ ( W .< ~ ) + c2(wt). step :=c. for n = 1 to N do besin 2 1 The number of iterations can be reduced some by using a more sophisticated scheme for selecting the magnitude of Aw . 8. if E ~ ( W . := e. The following function illustrates this idea. a.SO in this case +~ we can do a small search on reduced step sizes to find a sufficiently small step.)< e2(wt).).48) can be used to select a trial size for Aw. Wf Ve2(w) .e2) end end for end curvefit. a small constant.If E ~ ( W . begin e. the number of iterations to perform..333. the function to be minimized. + Aw. Now a small search of increasing step sizes may be done to determine whether there is a large step size that produces a smaller c2(wt+. A w. the initial multiplicative constant for A w. .Equation (11.that )can satisfy e2(wt+. =~ ) + Aw) can be compared with + e2(wt e2(wt). := E ~ ( W . function step (w. if e < E.e2. Wt+1 := W. e. step :=c end else break end end for end step. . w.> ~ ) + e2(wt).a . ifep<e2(wt)then k : = 3 else k := . N. On the other hand.If so. P. Then the value of E ~ ( W .a successful steepest-descent size has been determined. and wf .the trial step size is too large. := wo. Aw := -fle2(~f) )Ivez(Wf)II! + wf := wf + Aw* step(wf. we can use it.

. and when X gets close to zero. Alternatively. To find the Aw that minimizes Eq. where Vf=(:) and F = ( 0 1 (rO. ( 11. c. Taking a second-order expansion of e2(w. at each iteration. the direction that Eq. Aw exactly minimizes e2(w. The second-order method must be used with care since in situations where e2 is relatively flat.)'AWIHAW +2 (11.) A w'Ve2 i A wlHA w . When X gets large. (1 1. the direction that Eq. When H is positive definite. 11.49).2 Newton Method It is also possible to solve the minimization problem by a second-order iterative technique that typically requires fewer iterations to a solution than the steepestdescent method. a better A w can be chosen from the Aw produced by steepest ascent and the Aw produced by the Newton technique. but this involves determining the roots of a quadratic equag) by a zero-order expansion around (r. Aw could be very large. Aw) around e2(w. one can solve the system + + + + instead of Eq. (z.11.10.)..49) where H = H(w. (11.).)+ AW'VE~(W. co) % ) . Here Aw is called the Newton direction. the Hessian of e2. it would not be unreasonable to consider using h = N.50). To guard against using potentially large A w.'set the partial derivative to zero. Aw is the exact position to minimize Eq. (1 1. take partial derivatives of Eq..51) produces is the Newton direction. and although w. w.there is no prior assurance that the second-order representation is accurate for long distances away from w.).10.3 Second-Order Approximation to Curve Fitting It is possible to obtain a more exact solution to the distance between a point and a curve than Eq. This results in (5.) results in + e2(w. c o ) &r0.44). c. cO) g ( r O .51) produces is the negative gradient direction. (1 1.I 0 Least-Squares Curve Fitting 607 11.49). tion.. + AW)= e2(w. and solve for A w This produces . Instead of approximating we approximate $1 by a first-order expansion around (r. (1 1.49) with respect to Aw. (1 1. Since e2 is the sum of N terms. is the matrix of second-order partial derivatives of e2 evaluated at w.

c ~ = A2V (ro.co)= Avf(ro. Then. 11. R ) . where ( a .54) out. co)'(I . substituted back into Eq..55) where C = f (ro.40) Hence -f (r0.) ) ~f The value of A that produces the smaller value of 8 is the root chosen. cO) Writing Eq. since the difference between a point and a circle can be represented explicitly.co)'(I .4 Fitting to a Circle In this section we apply the discussion of Section 11. In the case of a circle.. (11.XF)-'V f (r.39). b . (11. we will discuss some other circle-fitting techniques that have appeared in the literature. (11. we derive a specialized fitting technique for a circle..r. E. we have a quadratic polynomial in A.3 to a circle.)' + (C . and the squared distance dZ to the curve is then determined by d2 = (r . we obtain so that From Eq. Once Eq. (11.10.c.P I . (21.co). Then the circle is represented by f(r. (11.54) (11. AX~+BA+C=O ( 11.b) is the center of the circle and R is its radius.52) into Eq.10. c.' Vf (r. and the smaller value of &' is the desired squared distance. .53).608 Arc Extraction and Segmentation Substituting Eq.w) =O.55) is solved for A each of the two possible values can be .or comparison. the parameter vector w' = (a.

@.b)q2 The simple iteration algorithm is then given as shown below. b. This makes the procedure too slow to use in practice. which are computed by the procedure.b)'. The parameters r and c are vector arrays of the row and column coordinates of the N points to be fit. Fitting 609 and Approximate initial approximations for a.b.c. .a)' + (2.Cn.a. for t := 1 to number-ofitemtom do begin for n := 1 t o N do begin d := (r(n) . Parameters a. and R are the fitting pararneters of the circle. + (c(n) . procedure circlefitl(r.a)' k := l .a.b. .R) [(in .b.R).11.a..10 Leas+Squares Cum. and R can be obtained from The gradient of e2 to be used in calculating Aw is then -I -4 5 N f (in.N. / d . The internal variable number-ofitemtiom wl have il to be around 200 or 300 for the iterative procedure to get close to the correct solution.

Aw.b.R..c2) . Ve2 := Ve2 kf (r(n)-a)(-1 +0. (8) (8) + end end for end circlefitl. R =- l N C J(in n=l . (11.a)' + (P.610 Arc Extraction and Segmentation f :=d-R2.61) If Eq. b) with radius R is ( over N points is then J d N e2 = C(d(?..25 * e2.61) is used. an exact function to be minimized can be worked out.012 . .46) to produce the next value of w =a.) and a circle centered at . (11.60) into Eq. . The squared distance between a oint (r. Of course in the case of the circle. ( 11. b) and .60) an analytically computed value for R can be determined when &! % aR =o. (11.48) and using Eq. c2 := 0. R becomes a function of the unknown (a.R)'. A faster and more tolerant iteration solution can be done by noticing that from $$ of Eq. (11.c.R ) ~ (1159) from which The iterations can then proceed by substituting Eq.a)2 + (P. ' *w * step ((i) . The error to be minimized (a.48) in Eq. (11.5(kf)2) kf (c(n).b)(-1 0.5(kf)2) -kfR c 2 : = e 2 + f *f * k + ( + end end for. n=l .b)2 (1 1.

b: =O. (11.48) and Eq. for n=l to N do beein a: =a+r(n). which. R). dr&:=o.59). each N long. b).1 with natural modification to let it also output the radius and return the values of the updated center.fi .N. drdb: =-drdb/N. It outputs the circle center (a.46) to produce the next value of w = (a. Its input is the row-column arrays r and c . (11. at: 4 . ( 11. The procedure circlefit gives the pseudocode for these iterations.62) into Eq. for i=l to N bf do besin drda: =drda+(r(i)-a)/d(i).c./(r(n) .b.61) and then the error it returns by Eq. (11.a. r=epserr(r.b)Z. (1 1. procedure circlefit(r.radius).10. .b) and radius R. which functions like the function step in Section 11.c. It also calls upon the procedure step. for n=l to N do ban f:=1-radius/d(n). given the observed row-column points and a center estimate (a. drdb: 4. It calls upon the function epserr.b. for t=l to 20 do for n=l to N do d(n): =.a.R) a: 4 .a)2 + (c(n) . b.10 Least-Squarer C u m Flttlng 61 1 where Iterations can proceed by substituting Eq. b: =b+c(n) end end for a: =am.N.a. b: =b/N.48) into Eq. determines the radius by Eq.11. bt: 4. Convergence is typically achieved after 20 iterations. drdb: =drdb+(c(i)-b)/d(i) end end for dr&: =-drdaM. (1 1.

Aa. and the estimated radius is R = 10.612 Arc Extraction and Segmentation at: =at-2(f(r(n)-a) +(d(n)-radius)drda). bt: =bt-2(f(c(n)-b)+(d(n)-radius)drdb).10. inis the row number and is the column number.2 The data given in Table 11. Ab : = B * e * g * b t . en J Table 11. Aa : = p * ~ * g * a t .2201.~.radius) end end for + end circlefit EXAMPLE 11.b. b) = (. The squared error so that the estimated d2 = d m -= . end end for g :=ll(a2 at2 + bt2).2856). from The estimated !enter for the circle is (ci.2 are noisy obse~ed~points a 90" circular arc.499.2 The points in a noisy 90' circular arc.1913. . step(a. Ab.

(11.66)iteratively.67)to convergence is generally slow.b.. In this case .b.) Iterating the system of equations (11.b. requiring about 450 iterations. (11 56) and (11. and (11.61)and Landau rearranges Eqs.11. He solves the simultaneous Eqs. Hence .65). Chernov and Ososkov (1984)claim a less computationally intensive approach that works when the relative noise perturbation is small compared with R.63)and ( 1 1. for a and b are obtained from Eqs. ( 1 1.56) and ( 1 1.10 Least-Squares C u m Fitting 613 Robinson (1961) and Landau (1987) note that the desired values of a.57). (11. (11.64) to get t where P and ? are given by Eqs. and R that make VE of Eq. and b. and R are Eq. Once a center approximation (a.57). (11. and R that minimize Eq.61).60) zero. (11.59)are values of a. the iterations proceed by b. ( 1 1. Therefore the simultaneous equations to be solved for a. is calculated at iteration t. The initial values a.

and using the fact that results in Fa' + H b l . Then Taking partial derivatives of e2 with respect to a'.yb' = Q ( 1 1.. . c:.T'. a' =a . = in.i.68).70) where . and R. and b' = b .69) ( 1 1. ( 1 1.i.614 Arc Extrrctbn and Sogmentrtlon Now f o Eq. = E. setting these partial derivatives to zero.E .. rm Define r:. b'.ya' =.P H a t + G b ' .

He rewrites the circle equation as follows: 2ra+2~b-a~-b~+~~=r~+c~ Letting q = a2 + b2 . ( 1 1.70) can be used to determine an expression for a' and b' in terms of y. (1 1. The least-squares solution for Eq.71) then becomes 2Pa' + 2Qb' + y2 = T (11. (11.b . he sets up the overconstrained linear system which can be solved in the least-squares sense for ( a . if Eq.74) minimizes the sum of the squared errors between the squared radius and the squared distance between the observed data points and the circle center: e2 = x[(in + (En N a)2 b)' . ( 1 1.73) for the desired root by the iterative Newton method. ( 1 1. Then R2 = a2+b2+q.10 Least-Squares Cum Fitting 615 and y = R2 . For example. multiplying Eq. Equation ( 1 1.74) is solved by the . (11. since it is easy for the numerical errors to influence the results unduly.73) where A=-F-G B=FG-T-H~ C = T ( F G ) .71) is zero.R 2 .(a')2. q).72) Now Eqs. and these expressions for a' and b' can be substituted back into Eq.70) by b' and adding these together shows that the bracketed term in Eq.(b')2 However.2(P2 Q2) D = T ( H ~ F G ) + 2(p2G+ Q2F). beginning with the initial estimate + + Bookstein (1979) gives the following regression procedure to fit a circle.4PQH Chernov and Ososkov solve Eq.69) by a' and Eq.( 1 1. there results the fourth-order equation in y y4 +AT3 +By2 + C y + D = O ( 1 1.11.72). (11.69) and (11. After rearranging.R212 n= l The technique should be used with care.

78). since R2 = a2 b2 q . the roundoff error in the computation of A'A can cause excessive inaccuracy. en. ( 1 1. and rearranging results in The a2 and b2 terms can be eliminated. b. a large positive number. ( 1 1.b ) of a circle of radius 10 is in the bottom right comer of an image. (11.A.77) multiplied by Cn The resulting linear system is Eq. where . a large negative number. setting them to zero. Hence the computation for R2 = a2 b2 + q will involve the subtraction of q . Therefore a singular-value decomposition technique must be used instead. and R. from a2+b2. ( 1 1. a2 b2 can easily be over 5 x 1V and q must be a large negative number just 100 larger than -5 x 105. Thomas and Chan ( 1989) also minimize + + + + Taking partial derivatives of e2 with respect to a . Multiply Eq.77) multiplied by Cn Multiply Eq. (i) =(A~A)-.76) by N and subtract it in. it is apparent that when the center ( a . (" +") i: + 22. from Eq. ( 1 1. with the inherent loss of precision in the result.75) by N and subtract from it Eq. Finally.616 Arc Extraction and Segmentation normal equations.

b.77) to obtain RZ. (11. where . distributed. are independent and identically .) is + where we assume that the random variables [ and 7. (11..5 Variance of the Fitted Parameters We suppose that the noisy observations (i.b.10 Least-Squans C u m Fitting 617 Solve Eq. b.) that lie on the circle (r.As in the case of the line fitting.R minimize Hence they must satisfy g.(ci. having mean 0 and variance u2.f.a)Z (c. our model for (i.11. R) = gz(i. f . R) = gz(ci. b .b . ..) are of points (r. Hence The least estimates i.. c.R for a .b)2 = RZ. R ) = 0. 11.I 0. .78) for a and b and then substitute into Eq.

b . ( ~ N .R ) = gz(a. b . Let Aa . ~ l. ( r l .R . and AR satisfy = g.. R ) = Then But g i [ i . .b . also satisfy g ( a .g. b . R . . k .618 Arc Extraction and Segmentation Of course the unknown true values a . ) ) that SO Solving for (Aa . ) .(a. ( ~ N . S N=]g i [ a . e . Assuming that the noise is sufficiently small so that a firstorder expansion of g l . b . we obtain .b . . we may compute the variances of the estimates 6 .A b .. c N=]0.b .A R). ( i l . .)..R) is accurate.A b . b .and g2 around ( a .R) R 0. .

b. ~ .83) In the case when gl. .84) for o f and J . J2Ji must then satisfy J2Ji = J I .11.. .. we use estimates e2and in Eq.A b . .).b. I =u2J.10 Least-Squarer c m Fitting u 619 By the system of equations (11. . ~ . .81). N . In this case the covariance matrix simplifies to n [ ( a A A!) To estimate the covariance matrix for ( A a .g2. c ~. ~ J ~ J .el). By the equal-variance and no-correlation assumption of Eq. . .(PN.l (11. R is identical to the covariance matrix of h a . This expectation is directly related to u2. . A b . ( 11.SO).tN)and the inferred values i.. . ( ~ N . J .g3 are defined by Eq. we obtain that the covariance matrix for A a . A R . respectively. A b.b)2 = R 2 for n = 1. we work out the expected value of e2 assuming the error is computed by using the true values for a .a)' + ( c .82) the covariance matrix for ti. (11. in place of the unknown but R true values [ ( r ~ . R ] . Finally. (11. J I is J I with the observed values ( P I . . b .84) .~ ( 11. Since ( r . and A R is given by E I(:!) ( A a Ab A R ) ] = u ~ J . and R . c N ) . . A R ) .

and R are estimated as those values that minimize e2.2 + 11 Assuming the variance of the noise is small compared with R2. .86) RZ Then by squaring the argument of the expectation and simplifying. d m E[e2]= R2 x n=l E [(1 + A n ).. (11.88) becomes Substituting Eq. = 2#.b) + V . we may expand 4 in a Taylor series to obtain Hence Now. b. then p4 = 3u4.a ) + + 2Vn(~n.(rn. SO that I A n( << 1.we &tennine that where p4 = E[fi] = E[T$]. (11. we must use as the estimator for u2.(11.] = 0. we obtain A . If the noise is Gaussian.87). using Eq.86) and assuming that E[&] = E[v.89) into Eq. In this case Eq. we obtain Therefore when u2 << R2. (11.Since in actual practice a . E[e2]= Nu2 which suggests d2 = r2/N as an estimator for u2.620 Arc Extraction and Segmentation where ( 1 1.

10.The analysis we have just done to determine the expect* value af e2 is also useful for establishing the expected value of the estimator R . ( 1 1. c.2 A ( r . + c . which is actually slightly biased. c . Berman (1989) observes that the least-squares solutions d .86) E[An] = $. To make our analysis easier. and Anderson (1981). (11. A . additive noise that perturbs the true (r.b ) Or Hence by Eq. Other statistical analyses of the circle-fitting model can be found in Berman and Griffiths (1985). B . C ) and f ( r .b ) + C ( c . The biases of ci and 6 tend to be negligible in most cases.45) .. b . and R to the sum of the squared error of Eq. E[R]=R+$. In this case Hence E[R] rr R 5 E [ A ~ ]But h m Eq. (11.a ) + 2 ~ ( c . a . A .) to produce the observed ( i n P.b)2. the parameter vector w' = ( a . we assume N is large enough so that the difference between d and a and b and b is negligible. 6 . b .59) are biased. C ) = A(r -a)2 +2B(r -a)(c . Then a f = .). B . where u2 is the Gaussian variance of the . w ) = f ( r . so that .1. As we have done. Berman'(1983).6 Fitting to a Conic In the case of the conic. c . he shows that the asymptotic bias of R is about a 2 / R . : 11. Berman and Culpin (1986).

(11. and C that must be imposed: B2 c AC .b)2 ) ( = dnfn -A& .b) hn(tn .a ) + B(i.a)(C. Sampson (1982) gives an iterative refinement method to find the conic panuneters to minimize Eq.=I dnfn [( -2gn 2(Pn . there is an additional constraint on the relationships between A .7. -Bgn . .10..92).a ) hn(L .b) (2. 11. (11.a ) gn(en . B.b) + )] & where g. We describe here a way of implicitly incorporating this constraint in the fitting procedure by working with a different set of parameters. is not constrained and therefore could be negative definite. .b) To determine initial values for the parameters. ) I ( : .48) is then given by l N = .7 Fitting to an Ellipse Probably the most common conic fitting done is the fit to an ellipse.Ch. . gn(?.Bh. . = A ( i . we can proceed by using the principal-axis curve-fit procedure of Section 11. .Pavlidis (1983) has a good discussion on fitting curves with conic splines. we represent (Ag : ) by a matrix product guaranteed to be positive semidefinite: A B (B C) =(: . ) - . If the conic is known to be an ellipse. Instead of representing the ellipse with a functional form.2.622 Arc Extnctkn and Sqrnontrfkn 1 The gradient of r2 to be used to detersnine Aw by Eq.

A . e . B . With this perspective we define the functions to be minimized by and proceed as before. the matrix is positive definite. Hence using the relation developed in Appendix A on ellipses. b.and second-order moments of the observed points and the parameters a .f .11. A. b . e . C .e .B. Converse1 for any positive definite matrix exlst values of d . To determine an initial estimate for the parameters a .f is given by This means that we can set up the fitting problem with the free parameters d . and f .C to d .10 Least-Squares Curve Fitting 623 It is clear from this relation that for any values d . C . e . One set of relations from A . B. we have N where . we can use the relationships between the first. and f that suet (As :).

Hence P ( y . and Riseman (1988) give a short discussion on extracting ellipses from images. Then . c..39) are derived for the distance between a point and a curve.. we need to write expressions for the probability density of observing a noisy data point (r. if there are some observed data points that are not noisy observations of points that lie on the curve of interest. c 1 not from curve) = e Finally. they satisfy the curve-fitting model. That is. Also. Hanson.8 Bayesian Fitting The equal-weight least-squares fit discussed in the previous section is suitable when all the observed data points are noisy observations of points that lie on a curve of interest. However. Hence P(r. given that it does not come from the curve being fit. we can write where (Vf )I = (g. c) comes from a uniform distribution over an area of centered at (0. but is. Indeed a data point that should not be included in a fit. To set up this framework. we need the probability density of observing a data point (r.) comes from the curve and y. (11. In this section we give a Bayesian approach for fitting under the conditions that some of the observed data points may be points having nothing to do with the curve being fit. = 1) = q and P(y. In this case we will assume that (r.c. = 1 if (r.) does not come from the curve. where y..O). c. = 0 if (r. then the simple least-squares fitting model is incorrect.) we define a random variable y. Using a model that the random perturbations are n o d and the simple expressions of Eq. and for each observation (r. w) = 0.. we need the probability q that an observed data point comes from the curve of interest.g) . c. can have an arbitrarily large effect in throwing off an estimated parameter value. 11-10.624 Arc Extraction and Segmentation Proffitt (1982) uses a discrete Fourier transform technique for fitting ellipses. Wang.c). c) that comes from a curve f (r.q. = 0) = I . for this reason a more robust approach should be considered.

> P.. Define y'. by maximizing 5. select the resulting w yielding the largest P. is defined.). . From the L iterations. cn)lwtlq > 41 .10 Least-Squrns C u m Fitting 625 Assuming the data points are independently conditioned on w and that P(ynlw) = P(Y.c .4 ) otherwise 3.I 0. Itcrate so long as P.. determine w. by 1 A={0 if PWn. .). Having hypothesized which points come from the curve and which points do not come from the curve. uniform error estimation may be appropriate.(rN. from the set of N observations and do a least-squares fit to estimate an initial w.9 Uniform Error Estimation If all the points in the observed arc are reliable (deviations from their true values are small). 6. and t <number_ofitemtions. For L iterations. We will set the problem .. iterate by selecting at each iteration K observations. w. Define P.11..+. with a reasonable spread. At iteration t. 2. Determine which points to take. . Determine probability P. 11.+. we obtain Hence a lower bound P[w l(r. by 4.cN)] is given b y This lower bound motivates the following fitting procedure: 1.

.96) imply . . . ... . .f l to minimize ~ Each of these problems can be solved as a linear programming problem. . ~ .(sN. . ). . 5 amtpm(sn) r n j e. and a i .. .. To see how to represent the problem. =< ( s . . 4 M are given basis functions. .0 .cN) > can be put into parametric form by defining S. . d l .cN)>.rN)> and S. which implies -e. = < ( s I .. to minimize and finding fl. . . .a and fl. Suppose the unknown curve can be parametrically represented as .96) The inequalities of Eq.OM are the unknown coefficients.N - ( 1 1. . r l ) .(sN. where sl = 0 and The uniform error estimation for rr . . (1 1.Pnr seeks to minimize This is equivalent to finding a l .. . .99.. .. .a. . . where s is arc length. (1 1. . .. ..626 Arc Extraction and Segmentation up by using a parametric representation of the curve to be estimated. . . . .aM.. . .. n = 1. The observed arc sequences S = < ( r l . . . . c I ) .(rN. consider Eq..

then elsc 11.' I A ~ = 1 . Prove that the following procedure dctennines a y* that is quai to y modulo 360' such that d=x-y.1. if d > 0 then else e = x -u.7. (11. then for any A. ( 1 1.The linear programming problem is to find ( e r .~ A . y*=y y* = u.l)(prr + . Exercises 11. . Showthat E[X .A trace A 11. . c r l .3.27)ofSection 11. RefertoEq.360'.2.firr + prrl2] = 402prr(prr + pcc pee) ( N . .c r ~ to minimize er ) subject to the constraint system The solution to Eq. .1.95) can be likewise obtained as the solution to a linear programming problem. ifldl<lel u =y +3m0 u = y . (I .firr + prrl =0 [(i + 0') . + X21AI Suppose x and y are angles expressed in degrees in the range O0 to 360'. Show that if A is a 2 x 2 matrix.u)-' I .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

we have By Eq. use Eq.40) to obtain we EXAMPLE A. (A.73). (A.82) the minor axis length satisfies Minor axis length = 2 2 rn m =- dm Since p. (A. (A.658 Appendix A to obtain If prr 5 pcC. > prr. .76). (A.72). (A.84) to obtain the clockwise orientation of the major axis of the ellipse with respect to the column axis.80) the major axis length satisfies Major axis length = 2 J G 7 r E m J By Eq.. and (A. we use Eq.4 W continue with an example ellipse e By Eqs.

depending on the vector space under consideration. Detailed proofs can be found in the textbooks found in the bibliography. y > such that For L = EN. y >= x'y . An inner praluct (also known as a positive defnite hermitian form) on L is a function that assigns to every pair of vectors x . For concreteness. basis. we think of a vector as being an N-tuple of real numbers. although a vector in the general context may be a function or even a matrix. An inner product < x . y >= x'Ay .APPENDIX APPENDIX B Linear Algebra Background In this appendix we review some basic facts about linear algebra. and least-squares approximations. and M a subspace of L. Shampine and Gordon. 1975.the standard inner product is < x . y E L a real number < x . We demonstrate why the solution of a least-squares problem in the discrete setting has an interpretation as the determination of an orthogonal projection in a function space. The inner product with respect to a positive definite symmetric matrix A is < x. is the most advanced. EN be N-dimensional Euclidean space (all N-tuples of real numbers). All of this is standard in numerical analysis and approximation theory. Let E denote the real numbers. We assume the concepts of linear independence. orthogonal projections. L be a finite dimensional real vector space. and spanning are already understood. y > always leads to a norm on L defined by .

. . IIx +Yll 5 llxll + llYll for all x. .660 Appendix B For L = EN where the norm is induced by the symmetric positive definite matrix A of the inner product. . The scalar coefficients of this representation are easily determined by taking inner products. . I . .u k ) is orthonormal if A vector v E L is orthogonal to the subspace M if < v . . Now since b l ... . is the set of all vectors orthogonal to M: M L = { x ELI < x . . U E M .. Note that u and v are unique and < u. .bN constitutes an orthonormal basis for M I .bm since b l . . . A set of vectors {ul. Not every norm arises from some inner product. is orthonormal to bi. To see that x can be represented in the form x = u v and that the representation x = u v is unique.x >= 0 for all x E M .v >= 0. . v E L are orthogonal if < u . we can uniquely represent any x E L by x = anbn. Finally.bN . . . notice that . we will write llxllA = Jm = 03. .bm>= xanbn. . .2) A norm has properties 1 . let b . but those that do have nicer properties.. i = 1 . .b. .Y E L . + + c:=. b I + l . be an orthonormal basis for M and let bI+l. . The orthogonal complement of M .bN extend this orthonormal basis to the entirety of L. denoted by M L . J J for 3. 1 lx 1 1 L 0 with equality if and only if x = 0. The vector u is called the orthogonal projection of x onto M. .4) It follows from these definitions that every vector x E L has a unique representation of the form X=U+V. . w > = O f o r a l l w E M ) (B . Since b I + l . . v >= 0. a E E...bN is a basis for L .bN are orthonormal. 2. J J a x =JJ J c r JJJxll all x E L . Vectors u . <x. UEM' This is sometimes expressed by saying L is the direct sum of M and M I . . denoted L = M $ M L .

+. and v = av. pv. Furthermore.are shown.. where u ~ M a n v € M L .7) Let a and 0 be scalars.. Suppose that ay+Pz=u+v Consider + Pz = a ( # . this linear operator is symmetric and idempotent. d Next we want to show that if x = u v . let y and z E L have representation + + + where u.bN is a basis for M L by taking = ~nbn and v = crnb. . the representatio* and its uniqueness of x = u + v ... a ~ y +Pu. The definition for P to be idempotent is that P(Py) = Py for all y E L.+v. Hence u = au.) =b u y + Puz)+ (au. Then . . z = u. E M and v. . E M and v.z >=< y . z E L.6. Let the name of this linear operator be P . Pz > for all y .8) 03. + v. v.) + P(u. ...) 03. This means that if x = u + v where u E M and v E Mi. Let u Cz1 c:=.1 Linear Algebra Background 661 Since b l . + where u y ~ M . u. v y ~ M L where u.. . u. .. v. . E M .9) Since u. + Pv. + v. v. EMI. v. To see that P is symmetric and idempotent. the map x --r u defines a linear operator. E M I . the map x -t u defines a linear operator. + y=u. E Mi where u E M a n d v EM' (B. E M and av. flu. + f l u z EM' But the representation of a y Pz in terms of its projections in M and Mi is unique. where u E M and v E M I . The definition for P to be symmetric is that < P y .br is a basis for M and br+.

Orthogonal projection operators are intimately related to least-squares problems. y > = (Px)'Ay = x1P'Ay = xl[B(B'AB)-'BIA]'Ay = X~AB(B'AB)-~B'AY =< x . ... projection operators have an explicit representation. u E M I .662 Appendix 8 It also follows that any linear symmetric idempotent operator P defined on L is an orthogonal projection onto the subspace M.a matrix P is an orthogonal projection operator with respect to the norm induced by the symmetric positive definite matrix A of the inner product if and only if PA = (PA)' (symmetric) and PP = P (idempotent). the approximation problem minIl~-fll. In this case the orthogonal projection operator P is simply given by P = BB'A. P is symmetric.bI of M with respect to the norm induced by the symmetric positive definite m t i A . and the distance from f to M is ( ( u(1. P y > (B. but serious roundoff error due to possible ill conditioning of B'AB sometimes makes it computationally impractical.b I ) . . Then the projection operator P onto M is given by P = B(BIAB)-'B'A. and the minimum is I P I 1 = 1 [(I . . Let b l . PP = [B(B'AB)-' B'A][B(BJAB)-'B'A] =B ( B ~ A B ) . Let f = u + v .b1 be a basis for a subspace M of L and B = ( b l . arx Px = BB'Ax . . When L = EN. Then there is a unique closest point in M to f . 14) The explicit representation is convenient for theoretical purposes. and f E L . . if the columns of B are othonormal with respect to the norm induced by positive definite symmetric matrix A . This is easily seen by direct calculation. namely. then B'AB = I is perfectly conditioned and there are no numerical difficulties. However.P )f 1 1. y >= xlAy. which is defined as the range of P. Let P be the projection operator onto the subspace M.u E M . . u = P f . For L = ENand < x . . < Px.12) has the unique solution u = P f . In other words. Y E M (B. . . . The orthogonal projection of x onto M has a particularly simple representation in terms of the orthonormal basis bl .~ B ~ A = B(BIAB)-'B'A = P Also.~ ( B ~ A B ) ( B ~ A B ) .

.

.

.

.

.

191 connected components.142 decision tree construction. 77 connected components labeling. 61. 95. 136 decision tree. 132 density of texture primitives. 52 1 centroid. 118 diagonal projections. 415-417 comer finding. 291 co-occurrence matrix. 540 decision rule construction. 1 digital transform method. 274 curvature. 4 area. 296 crossing number. 487-488 descending components. 411. 171. 111 Bayes gain. 28. 142-143 decision rules.172 covariance. finding. 571 busyness. 6 connected. 457 concave. 479 autocorrelation function. 532 chain rule. 174. 95. 194. 462 co-occurrence statistics. Bayesian fitting. 436 conditional dilation. 59 bounding second derivatives. 475 autoregressive moving average. 410. 1 13.196 connectivity number operator. 3 comer points. 60 area of the ellipse. 196 conditional independence assumption. 290 deterministic Bayes rule. 654 ascending reachability. 582 circularity measure.affine transformation. 118 deterministic maximin rule. 459 comer detection. 463 descending reachability. 218 box filter. 533 centroid-linkage region growing. 266. 60 centroid-linkage criterion. 478 average gray level. 102 decision rule error. 131. 558 boundary extraction. 175 closing characterization. 6 bands. 575-577 cutset. 464 autoregression models. 7. 29 connected components analysis. 488 antiextensive. 556 boundary sequence. 556. 458 co-occurrence of primitives. 162. 114. 272 convex. 465 . 413. 178 clustering. 290 autobinomial form. 160 circles. 13 border tracking. 555 bounding rectangle. 562 arcs. 585. 462 co-occurrence probabilities. 510. 106 deterministic decision rules. 276 connectivity. 61 closing. 189 cross-correlation. 624 binary machine vision. 106. 61 background normalization. 178 arc segmentation. 524 Bayes decision rules. 304 breakpoint optimization. 54. 436 convolution. 119 Bayesian approach to Hough transform. 478 conditioning. 511 coarse textures. 14. 80 digital image. 562 counting. 13 connected shrink.

204 gray scale opening. 462. 555 grow. 2 gray scale. 55 fill. 337-345 gradient inverse weighted. 469 edges. 110 equivalence table. 198 generation of synthetic texture. 226-230 generalized gray level spatial dependence. 396 directional derivative edge operator. 464 generalized distance transform. 474 gray level co-occurrence. 337-351. 142 error of omission. 202 gray scale erosion. 605 gradient edge detection. 23 1 expand. 210 grouping. 546 eigenvalues. 34 erosion. 339 Gauss-Markov model. 279 dominant points. 98 extensive operators. 639 endoskeleton. 164 economic consequence. 22 1-223. 49. 142 hole. 336 facet model. 466 hereditary. 454-455 gray level properties. 643 eigenvectors. 161. 466 fractal signature. 14 histogram mode seeking. 462 generalized opening. 168 hold-out method. 110. 60 gradient-based facet edge detection. 478 discrete orthogonal polynomials. 210 gray scale dilation. 435 focus-of-attention rules. 96-97. 391-392 gradient edge detectors. 648-649 extremum sharpening. 268 directional derivative edge finder. 23 1 entropy purity function. 195 homogeneous texture. 482 Gaussian filter. 179. 562 duality. 6. 97 economic gain matrix. 61 gray levels. 102 expected profit. 530 fair game assumption. 6-7 grouping operation. 159 expected gain. 268 error of commission. 471 extremal points. 126 error estimation. 542 Fourier transform. 133 distance transform. 3 ego-motion polar transform. 461 gray level primitives. 371. 118.119 generalized closing. 159 flat. 200 gray scale closing. 189 feature extraction. 95 features. 338. 436 histogram. 457 gray level difference distribution. 126. 80 . 3. 75 gray level variance. 133 equal-probability-of-ignorance. 643 ellipse. 13 feature selection. 205 -208 horizontal projection.Index 669 dilation. 527 directional derivatives. 482-483 genus. 161. 52 1 hillside. 226 distance transformation.128 fast dilation. 5 11 Hit-and-miss transform. 475 fractal surface. 305 general maximin rule. 382-389 gradient descent.167 global properties.111 edge detection. 393 discriminant function. 126 exclusive. 99 false-identificationerror. 99 falsedetection rate.159. 48 1 homomorphism. 191 exoskeleton. 475 Frei and Chen gradient masks. 167 holes. 461 gray level difference probability. 425 discrete Chebyshev polynomials. 7 edgeness per unit area. 158. 7 extrema per unit area. 383 edge linking. 191 extracting. 465 fractal dimension. 103 false-alarm rate. 62. 191 high Laplacian magnitude. 3 17 gradient magnitudes. 52 1-522 granularity. 198 generalized co-occurrence. 393 discrete Gauss-Markov field. 159 Hadamarad transform. 474 fractal surface area. 166.

436 influence zones. 455 major axis. 297 increasing operators. 473 max Roberts gradient. 602-627 least-squares fitting of a line. 1 ISODATA. 536 hybrid-linkage region growing. 327 Minkowski addition. 404 intensity image. 126. 339 Kullback information distance. 346 Laplacian of the Gaussian kernel. 32 1 minimum mean square noise smoothing. 23 labeling. 7 mathematical morphology.126 maximum likelihood decision rule. 99 misidentification error. 2. 110 maximum-likelihood test. 284 nonrecursive neighborhood operators.128 mixed second moment. 348 least-squares curve fitting. 588-597 linear shift-invariant operators. 1 image segmentation. 4 13 integrated directional derivative. 130 nearest neighbor. 338. 528 measurement. 175 order statistic approach. 645. 266 linear shift-invariant neighborhood operators. 6 labeling operation. 657. 303-304 non-minima-maxima operator. 195. 5 12 measurement-space-guided spatial clustering. 524 multidimensional measurement-space clustering. 483 Markov mesh models. 376 likelihood ratio test statistic. 652. 270 instantaneous rate of change. 65 1 . 95 measurement-space-clustering image segmentation. 237-253 morphological skeleton. 158 Minkowski subtraction. 538 isodata segmentation. 563 iterative rule. 607 noise cleaning. 166 minor axis. 3 18 orientation angle. 268 nearest neighbor rule. 658 misdetection rate. 568 long-term memory. 403-410 integrated first directional derivative. 282 opening. 233-237 medial axis transform. 657-658 mark-interior/border-pixel operator. 75 morphological pattern spectrum. 588 least-squares fitting problem. 74 major axis length. 380-382 iterative endpoint fit and split. 526 hysteresis smoothing. 215 microtexture. 181 image. 230 median operator. 2-3 matching. 509. 455 midrange estimator. 189 morphological sampling theorem. 645. 483 impulse response function. 412 inference rules. 640. 545-548 multiband images. 272 Markov chain. 477-479 match. 263 number of shortest paths. 578-588 hybrid-linkage combinations. 655 mixed spatial gray level. 555 Laplacian. 143 Nevatia-Babu 5 x 5 compass template masks. 340 Newton method. 640. 542 macrotexture. 266 Kirsch compass masks. 384 maximin decision rule. 646. 263 neural networks. 291 -299 local tangential deflection. 3 17 neighborhood operator. 3 18 median root image. 478 Markov random field.175 opening characterization. 555 image texture gradients. 542 inflection point. 352-354 line fitting. 394 least-squares procedure. 529 line detection. 161 incremental change along the contour line.Hough transform. 413 incremental change along the tangent line. 524 multiplicative maximum. 652. 74 minor axis length. 112. 160 kernel. 174. 324 idempotency . 419-424 iterated facet model. 5 11-525 medial axis. 233 motion-based segmentation. 574 isotropic derivative.

40-41 Rutovitz connectivity number. 316 peak noise removal. 230 slant transform. 48 1 splitting algorithms. 64 1 rule-based segmentation. 73 second-order row moment. 436. 595 prior probability. 482 position invariant. 607 second-order column moment. 60 shared-nearest-neighbor idea. 325 signature. 323 robust line fitting. 529 sharpening. finding. 466 Prewitt edge detector. 264 power spectrum. 129 ridge. 8 structuring element. 82. 48 projection segment. 22 1-226 recursive neighborhood operators. 1 ravine. 80. 578 structural pattern recognition. 658 pair relationship operator. 375 Sobel edge detector. 4 selected-neighborhood averaging. 48 signature analysis. 384 radius of fusion. 466 slope. 14. 83 second moments. 424-430. 542 shrinking. 274-276 saddle. 83 signature segmentation. 191 shape properties. 480 range image. 573 split-and-merge algorithm. 282 random mosaic models. 285 relative extrema primitives. 297 Poisson line model. 7 segmentation tree. 223 recursive morphology. 463 relative height. 54 projection segmentation. 525-535 region-growing operator. 654 second-order approximation. 159 . 60 pit. 439 Roberts gradient. 437 second column moment. 5 spatial relationships. 542-545 run-length encoding. 599-602 rotated ellipse. 540-541. 378-380 perimeter. 383 Robinson compass masks. 8 relative extrerna density. 73 second row moment. 537-540 spatial gray level dependence. 566 quick Roberts gradient. 2 point spread function. 162 sigma filter. 264 short-term memory. 82 -83 second mixed moment. 4 segmented. 486 spatial clustering. 473 reserved judgment. 540 splitting and merging. 642 orientation of major axis of ellipse. 334-336 shift-invariant operator. 461 spatial gray level differences. 49 prominent corner point. 435 reachability operator. 269 region properties. 525 skeleton. 597 regions. 59-79 region-shrinking operator. 278 peak. 13 relational homomorphism. 4. 73 spatial relation.Index 671 orientation of ellipse. 325-327 separability. 48-55 single-linkage criterion. 297 separated. 437 sloped facet model. 466 spatial moments. 435. 13. 480 Poisson line process. 191 separation. 433 pixel. 270 region-of-support. 424-430. 8 principal-axis curve fit. 337 primitive spatial event. 47 1 relative extrema operator. 434 ridge and ravine continua. 655 second diagonal moment. 538 statistical pattern recognition. 73 second-order mixed moment. 48-55. 109 projection. 466 peak noise. 337 solid angle. 339 robust estimation. 540 segmentations. 290 recursive morphological erosion. 433 peak features. 82 segmentation. 540 spoke filter. 436 straight-line segments. 8 step edge. 533 single-linkage region growing. 263 region growing. 457 split and merge.

205-208 uniform bounded-error approximation. 48. 569 uniform error estimation. 565 template matching. 470 vertical projection. 237 I I I I 1 ! I . 14. 191 symmetric axis. 3 19 within-group variance. 482 tangent line. 126 umbra. 392-403 zone of influence. 61 -62 texture primitives. 469-470 textural energy. 80 vertical projection. 230 synthetic texture. 272 zero-crossing edge detector. 481 -482 thickening. 8. 492 textural edgeness. 2 unsharp masking. 336 weighted-median filter. 136 thresholding. 49. 559 trimmed-mean operator. 20 1 topographic primal sketch. 172. 126 type I1 error. 578 texel identification problem.672 Index symbolic image. 323 two-dimensional extrema. 181 top surface. 467-469 textural plane. 625-627 units. 430-443 tracking. 278 threshold decision. I symmetric. 80 Wallis neighborhood operator. 334 variogram. 477. 480 vector dispersion. 170 thinning. 648 tangential angle deflection. 454. 472 type I error. 13. 486 texture analysis. 49 1 textural surface. 489 textural primitives. 5 18-523 top hat transformation. 453-454 texture features. 320 Tukey's biweight. 201 -202 umbra homomorphism theorem. 490 texture segmentation. 170 thinning operator. 20 Yokoi connectivity number.