This action might not be possible to undo. Are you sure you want to continue?

com/locate/cviu

**Automatic description of complex buildings from multiple images
**

ZuWhan Kim*, Ramakant Nevatia

Institute for Robotics and Intelligent Systems, University of Southern California, USA Received 22 April 2003; accepted 19 May 2004 Available online 10 August 2004

Abstract We present an approach for detecting and describing complex buildings with ﬂat or complex rooftops by using multiple, overlapping images of the scene. We ﬁnd 3-D rooftop boundary hypotheses from the line and junction features of the images by applying consecutive grouping procedures. First, 3-D features are generated by grouping image features over multiple images, and rooftop hypotheses are generated by neighborhood searches on those features. Probabilistic reasoning, level-of-details, and cues from image-derived unedited elevation data are used at various stages to manage the huge search space for rooftop boundary hypotheses. Three-dimensional rooftop hypotheses generated by above procedures are veriﬁed with evidence collected from the images and the elevation data. Expandable Bayesian networks are used to combine evidence from multiple images. Finally, overlap and rooftop analyses are performed to ﬁnd the ﬁnal building models. Experimental results are shown on complex buildings. Ó 2004 Elsevier Inc. All rights reserved.

Keywords: Three-dimensional object description; Building detection and description; Aerial image analysis; Multi-view; Feature grouping

*

Correspondence to: University of California, Berkeley, USA. E-mail address: zuwhan@cs.berkeley.edu (Z. Kim).

1077-3142/$ - see front matter Ó 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.cviu.2004.05.004

Z. Kim, R. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95

61

1. Introduction Three-dimensional object description is a key task of computer vision. One practical application for the 3-D object description problem is that of building detection and description from aerial images. It can greatly improve the automation of 2-D or 3-D map generation which can be used in various applications including radiowave reachability tests for wireless communications, computer graphics, virtual reality, and mission planning. The building detection and description has been an active research area [6]. Early systems used a single intensity image, which were eﬀective for simple buildings [9,11,14,15]. In general, multiple aerial images can be obtained with a small extra cost. Thus, most of the recent work in building detection has focused on the stereo or multi-view analysis [1–3,7,17,18]. There are several challenges for the building detection and description problem. Figure–ground separation: we deal with outdoor images, and it is hard to separate building boundaries from other distracting lines such as road boundaries. Moreover, lines and corners of buildings are often broken and missing due to occlusion or other accidental alignments. Representation: as in other 3-D object description problems, the model representation takes an important role in the building detection and description problem. When we use simpler representation, such as extrusions of rectangular rooftops [14,17], the description result will be more robust but the detection rate will be lower1 because of its limited representational power. On the contrary, when we use a model of extremely high representational power, such as reﬁned polygonal meshes [2,1], we can describe many more buildings but the result will be less robust and the level of geometric information we obtain will be too poor (we just get polygonal meshes) that the usability of the result will be very limited. It is very important to ﬁnd a good representation which has a high enough representational power and rich geometric information, where, at the same time, robust and computationally aﬀordable detection and description algorithm is available. Information fusion: the types of available information vary depend on the application. In most cases, stereo or multiple images are available. Range data can be generated from stereo analysis but its quality is not good enough to generate building hypotheses directly because many of the building roofs lack suﬃcient textures for stereo processing. In addition, nearby trees of similar height makes the use of such range data diﬃcult. Sometimes, accurate range data, such as LIDAR, are available (at high cost). There have been eﬀorts to maximize the use of such high-quality data [1,4] or to increase the quality of the image-based range data

In fact, the detection rates reported in such work are very high because most of the experiments were done with a limited number of building examples, most of which are rectangular.

1

62

Z. Kim, R. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95

with more than 10 images [20]. In this paper, we focus on the use of the image-derived unedited range data. In this case, how to combine information from images and the low-quality range data is very important. We present an approach for detecting and describing complex buildings by using multiple, overlapping images of the scene. We use low-quality range data, such as from fully automated stereo processing, as additional information. We apply hypothesize and verify paradigm, where lower-level features are grouped (hypothesized) into higher level ones, then ﬁltered (veriﬁed) for the purpose of minimizing the computation (otherwise, the computation will be exponential). We present a unique feature grouping approach of lines and junctions where we keep the low-level properties (and uncertainties) to the highest level. For example, a 3-D line feature is not just two end-points synthesized from 2-D line features, but also includes the actual set of 2-D line features (‘‘member’’ features), and we intensively use the properties of the member features in the higher-level grouping. To reduce the computation we apply a level-of-detail technique with probabilistic relaxation and introduce various techniques for the eﬃcient ﬁltering, such as information fusion (with range data), probabilistic height estimation, and the use of expandable Bayesian networks [12]. Our approach shows good description results on complex buildings. Our system detects and describes buildings of polygonal boundaries with complex roofs (including superstructures). Such a level of complexity is unprecedented in previous work (among those which do not use high-quality range data).

2. Representation and approach We apply a model-based approach to obtain high-level geometric information and robust detection result. Usually, in model-based approaches, when the complexity of a building model (for example, the number of rooftop corners allowed) increases the search space for the grouping increases exponentially. Hence, many of the practical building description systems have used simple models such as collections of rectangular rooftops [17] or simple blocks with gable roofs [16]. While collections of rectangular rooftops can represent many rectilinear buildings (Fig. 1A) this representation limits the detection of even simple rectilinear buildings. An example is shown in Fig. 1B. Components a and b of the building are likely to have low evidence support in the image because major parts of the rooftop boundaries do not exist as intensity boundaries in the image. Component c is unlikely to be detected because it has even lower evidence support; half of the rooftop boundary is missing. On the other hand, modeling general rectilinear rooftops imposes large computational demands and/or results in poor detection rates [3]. In [4], unrestricted polygonal rooftops are described by using high-quality range data. However, the results strongly depend on the quality of range data. Thus, obtaining high-resolution range data of good quality becomes another issue. Baillard and Zisserman [2] use six or more images to ﬁnd 3-D matched lines, and ﬁnd the orientations of half-planes

In this paper. as well as their hip relationships (shown in the ﬁgure) are stored. High-level geometric representation for a limited set of complex gables are given in [5]. Representation of a building with a complex rooftop. A building with a ﬂat polygonal rooftop is simply represented as an extrusion of the rooftop points. A list of rooftop boundary points (eaves corner points) (a b c d e f). but applying a larger tolerance will require more computation. 2). It consists of a list of outer rooftop boundary points (eaves corner points) and a list of ridge corner points as well as their hip relationships (shown in Fig. digital elevation models). Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 63 Fig. R.Z. 1. The corner angles of a building between 90° ± 30° are allowed. we introduce a system called Automatic Building Extraction and Reconstruction System (ABERS) where complex buildings are described from multiple images and range data (DEM. This angle tolerance is adjustable (see 2-D junction extraction. 2. Fig. (A) Rectilinear buildings that can be described as collections of rectangular buildings. . and a list of ridge corner points (g i j k h). However. Ridges are deﬁned (and restricted) to be interior lines parallel to outer boundary lines. Current representation and implementation does not support hip-roofs (gables).2). A building can have a ﬂat or sloping roof or a ﬂat polygonal superstructure. 2. Kim. Good results are shown on a small number of examples. (B) a building where collections of rectangles shows a limitation. Section 3. The representation of a sloping roof is illustrated in Fig. Reﬁned polygonal meshes are obtained by this method but it does not give explicit building models but just a collection of planar surfaces which are likely to consist of roofs. for the 3-D lines by using intensity matching. ABERS can detect and describe buildings of polygonal boundaries with ﬂat or complex roofs.

4B. CSG represents a 3-D object by combining primitives with boolean operators (such as union and intersection). the DEM data are generated by stereo matching of the images. Our proposed representation has a high level of geometric description. The DEM is used as auxiliary information to provide cues that help reduce the search spaces and validate feature matches. . 5 shows a ﬂow diagram of the proposed method. Kim. CSG is too unrestricted and redundant for automated modeling. building hypotheses generation is based on a single image or an image pair.64 Z.4]. Constructive Solid Geometry (CSG) representation can provide a more general description. R. the proposed method can also operate without DEM information (but it requires much more time and space and the false alarm rate may increase). into the proposed feature grouping framework. Usually. In most of the previous building description systems. 3 shows four diﬀerent views of a building. 4 shows example DEM derived from these images automatically (without any manual editing) with a commercial stereo system. such as in [16]. For example. 3. Many interactive modeling systems use CSG due to its intuitiveness. In contrast with [1. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 we can easily accommodate a gable detection and description algorithm. USC Multi-View Building Detection and Description Fig. The system uses multiple images and a rough DEM. Example aerial images of a complex building (0. Fig. Thus. and one strength of it is that the resulting description can easily be converted into other representations including CSG. Fig. Fig. To address such diﬃculty we use multiple images (more than 3) and DEM. It is diﬃcult to describe such buildings of high-level complexity and high-level geometric description from real images when signiﬁcant accidental alignments are present. However. we assume that the DEM is not accurate enough to retrieve a building model directly from it as we see in Fig.25 m/pixel resolution).

The next steps are those of rooftop hypotheses generation. In this case. the lines are grouped into junctions and parallel relationships. (A) An image-derived unedited DEM (0. The line features are ﬁltered by using rough building cues generated from the DEM. Kim. ABERS uses 3-D line and junction features for hypotheses generation to maximize the use of multiple images and to reduce the redundancy in computation. veriﬁcation. R. For ﬂat buildings. we derive 3-D features from groups of matched image features (‘‘2-D features’’) over multiple views to generate rooftop hypotheses. The ﬁrst step is to extract line features from images. and overlap analysis are . 5. therefore.Z. DEM information is used to reduce the ambiguity of the matching process. hypotheses generation. and overlap/rooftop analysis. System [17] generates building hypotheses from all the possible pairs of images. Next. All 2-D features may not be present in all views. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 65 Fig. followed by grouping of matched pairs. pair-wise feature matching across all views is performed ﬁrst. rough cues (layers) for building components generated from the DEM may be used to reduce the search space for the hypotheses generation. veriﬁcation. 4.5 m/pixel resolution). Fig. which results in redundant computation when multiple images are used. A ﬂow diagram of ABERS. to generate 3D features. (B) a reference building model superimposed on it. Then.

Expandable Bayesian networks [12] are used to combine the diverse evidence. Note that these layers are only used when we want to detect ﬂat roof buildings. layers. For each layer. To get layers. 7 is an image of an example building complex. In Section 5. Then the positive-valued regions bounded by the zero-crossings in the convolution output are extracted. Fig. Preprocessing and 2-D feature extraction 3.1.66 Z. A level-of-detail technique is applied to reduce the search space. For the buildings of ﬂat rooftops.2. evidence support for hypotheses is gathered and used to verify them. it can still give a rough idea of where the buildings are located. overlap analysis and complex rooftop analysis are performed on the veriﬁed hypotheses to give ﬁnal building descriptions. First. In the hypotheses generation step. mean and standard deviation of height. Section 3 also describes 2-D feature extraction and ﬁltering focusing on the use of DEM information. In Section 3. Hypotheses veriﬁcation using an EBN [12] is described in Section 6. coarse-level 3-D features are generated by grouping nearby 3-D features. Fig. The time complexity of ABERS is analyzed in Section 9 and the experimental results on complex buildings are shown in Section 10. area. Although DEM data do not give an explicit building model as shown in Fig. Section 7 deals with overlap analysis and Section 8 describes the generation of sloping roofs. 6 shows cues generated from the DEM image of Fig. We also describe the use of DEM information to reduce the matching ambiguity. . 4. Finally. adaptive smoothing [19] is applied to an image followed by histogram-based segmentation. [10] to generate rough cues from a DEM image for further processing. 3. 8B. We suggest a probabilistic approach to multi-view height estimation of lines. rooftop boundary hypotheses are generated by neighborhood searches on 3-D features. we describe the rooftop hypotheses generation and the application of a level-of-detail technique to it. Thus. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 repeatedly performed for each layer. The DEM image is ﬁrst convolved with a Laplacian-of-Gaussian ﬁlter to smooth the image and locate the object boundaries. R. Preprocessing We use image-derived unedited DEM (of about 1/2 of the image resolution) generated by the commercial ‘‘SocetSet’’ product from BAE Systems. The detailed usage of these cues is described in Section 3. and boundary points are calculated and used for later processing (Section 5). Once hypotheses are generated. Kim. Then. The respective DEM image is shown in Fig. deﬁned as ﬂat (parallel to the ground) planar connected surface patches. 8A and the resulting layers are shown in Fig. the processing of DEM data and other initialization procedures such as determining image combinations to use and the ground height are described. are also generated from a DEM image to reduce the search space in hypotheses generation. we follow the approach of Huertas et al. and a neighborhood search is performed on them to generate coarse-level hypotheses. Section 4 deals with the use of multiple images to generate 3-D features. the coarse hypotheses are reﬁned by a relaxation technique. 4A.

Kim. 6. Fig. (A) Correlation DEM of Fig. The angle diﬀerences between epipolar lines are obtained by sorting epipolar lines by their angles and ﬁnd- . ai. Rough building cues generated from Fig. First. An example building complex.. We choose a good combination of ﬁve or six images from up to 12 available images to reduce computation. is determined from the angle diﬀerences of the n À 1 epipolar lines. 9 shows the epipolar lines and their angle diﬀerences. Fig. R. The epipolar diversity for the ith image mi given an image combination. epipolar diversities of the images are calculated.mn} of n images. 3 (only the central parts of the images are shown). Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 67 Fig. Then. . .Z. m = {m1. Fig. 8.. the epipolar lines at the center of images are calculated for all image pairs. for all the possible image combinations. 7 and (B) layers generated by segmentation. 7. for the image combinations of Fig. 4A. ABERS automatically selects an image window combination where the epipolar lines of image pairs are less aligned to each other.

3A. the ground height is automatically set during the DEM layer generation. ABERS uses DEM cues (Fig. Two-dimensional feature extraction and ﬁltering We follow the approach of [17] for 2-D feature extraction. ‘‘Linear’’ features extracted from Fig. We see large numbers of distracting linears from trees. R. The height of the ground is needed to extrude rooftops for building extraction. Kim. Angles between epipolar lines. Note that the boundary lines Fig. ing the angle diﬀerences between the nearby ones. 10. 3. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 Fig. an image combination with the biggest average epipolar diversity is selected. Line segments are extracted from images and grouped into linear features by collapsing merging lines with small gaps. First. 3A are shown in Fig. roads. the epipolar diversity of an image is determined by the inverse of the standard deviation among the angle diﬀerences. Finally. The lowest layer with a large area is considered to be the ground and its height is set to be the ground height. When n À 1 angle diﬀerences are obtained. The linear features extracted from Fig. . In ABERS. 9. 6) to eliminate many of these distracting linears. and other structures nearby.2.68 Z. 10. the linears far from the cues are eliminated.

The linears which form a junction are called branches of that junction. A length-weighted histogram of line angles is generated from the linears of each cue and only linears near the dominant angles are used for the rooftop boundary hypotheses generation. 12 shows 85 junctions grouped from the ﬁltered linears. 11. of (polygonal) building rooftops are mostly of two or three dominant angles. A T-junction is considered to be two L-junctions for further processing. The resulting ﬁltered linears are shown in Fig. Building structures 2 This angle restriction is adjustable. Only 128 of 1171 linears remained. R. T-junctions are also extracted by grouping two linears forming a T-shape. . Fig. 11. See Section 2 for more explanation. Kim. The dominant angles are determined by choosing the two highest peaks in the histogram. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 69 Fig.Z. L-junctions are extracted by grouping nearby linears which make angles of 90° ± 30°2. 12. To prevent two nearby peaks from being chosen. ABERS extracts junction features to detect building corners. peaks near the perpendicular angles of the two highest peaks are also chosen. The building corners make a very important cue. Junctions grouped from the ﬁltered linears. Linears ﬁltered by location and angles. Fig.

4. Three-dimensional feature grouping In ABERS.70 Z.5 pixel displacement error in one view (see Appendix for estimating the displacement error). we calculate that the 2. For example. of the other line segment and pÕs epipolar line. The height of a 3-D feature is estimated from the pair-wise height estimates. ABERS has a unique grouping strategy where the properties of low-level features are utilized in the grouping procedure of several higher-level features. q. In fact. ABERS keeps the computation manageable by applying a level-of-detail technique (see Section 5). the properties of 2-D features are extensively used in the grouping of 3-D features.5 pixel error in one view can possibly cause up to 3. the error tolerance for the epipolar matching is automatically determined in the preprocessing stage with epipolar geometry. 4. and the corresponding branches must be matched. Restricting a single 3-D feature to be associated with a single 2-D feature can prevent generation of important features but multiple features can grow large in size. Combining height estimates of pair-wise matches When linears of a match are nearly aligned to the epipolar lines. the estimated height may contain a large error while the error will be smaller when they are perpendicular to the epipolar line. and the intersection point.1. In addition. In ABERS.2. A match for a junction must be near to the epipolar lines (within a certain error tolerance). the junction extraction can cause up to 2. ABERS estimates the height of a matched pair by matching the mid-point of one line segment. the height of the matched pair is determined by assuming that the actual 3-D line for the pair is parallel to the ground (note that we only consider ﬂat rooftop boundary hypotheses in ABERS). We follow the approach of Noronha and Nevatia [17] for the epipolar matching. Once a candidate is found. p. 3. two types of 3-D features are used: 3-D linears and 3-D junctions. A 3-D feature is a group of 2-D features from diﬀerent views. Possible line matches in other images are searched for by applying simple epipolar geometry. To obtain 3-D features. Kim. we ﬁnd pair-wise matches of 2-D features using epipolar geometry. R.2 pixel error when it is projected to another view. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 exhibit a great deal of parallelism in their design and construction. ABERS allows multiple 3-D features to be associated with a single 2-D feature. so we also extract parallel relationships among the linears for each view. and group the matched features among diﬀerent views. which is set to the epipolar error tolerance. 4. By applying epipolar geometry to the image combination of Fig. Pair-wise epipolar matching Pair-wise matches of 2-D features (linears and junctions) are found by using epipolar geometry derived from the known camera parameters. For example. this ‘‘accidental epipolar alignments’’ of impor- .

is a uniform distribution within a possible height range. X m Þ ¼ a P ðX i j X Þ. we empirically veriﬁed that the independence assumption among the measurement did not cause a problem. the longest line segment is picked and the measurements are estimated from the n À 1 line segments pairs of it. see Appendix for details) in each image. Although matching junctions does not suﬀer from the epipolar alignment problem. . Among them. . Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 71 tant features are not rare in aerial image analysis. In ABERS.3 3 Strictly. Given a line/junction pair. It uses projective geometry and iterative optimization to estimate the height of general 3-D features. In the following subsection. the height (mean of the Gaussian random variable) is estimated from stereo analysis. Kim. .. R. ABERS estimates the height of a 3-D linear from the pair-wise height estimates. . l. buildings are aligned with epipolar lines because aerial photos are usually taken from a ﬂight along a road. the height estimates are represented as Gaussian random variables.1 pixel for linears and 1.2. In ABERS. which will be used for grouping 2-D and 3-D features. where X P xi i r2 1 ð 2Þ l ¼ P 1i . We also face similar problems with junction matches. 4. . . Then. we introduce a height estimation approach based on maximum likelihood estimation. The height estimation error will depend on the angles of linears and camera parameters. how to combine these pair-wise estimates of various degrees of errors is not clear. the height estimate from ith and jth features is directly dependent on the height estimates from ith and kth features and from jth and kth features). However. Height estimation Suppose we estimate the height X of an object based on the m measurements X1. we can get at most n(n À 1)/2 pairwise matches. . we can apply maximum likelihood estimation: Y P ðX jX 1 . only n À 1 measurements are not directly dependent on each other (for example. the resulting height estimate ^ $ Gðx. and a conﬁdence interval is given based on possible displacement errors (1. Another height estimation approach addressing epipolar alignment is found in [8].1. Xm. P(X). A compatibility test of two height estimates is also introduced. ð 1Þ i where a is a normalizing constant. r2 Þ. However. In many cases. the n À 1 measurements are not independent either because the original n 2-D features are not randomly chosen but through a systematic grouping procedure. When we assume that the measurements are independent to each other.3 pixel for junctions. the accuracy of height estimation depends on the resolutions of the images and the baseline length (physical distance of the cameras that the images were taken from). X2. We assume that the prior probability distribution of X. X 2 . . r2 ¼ P 1 : i r2 i i r2 i When a 3-D feature is generated from a set of n 2-D features.Z. We assume that P(Xi ŒX) follows a Gaussian distribution with a mean xi and a variance r2 i .

However. and the variance r1 and r2 2. Height compatibility test ^ 1 and X ^ 2 . J3. such as the end-points of a 2-D linear and linear–junction relationship. J4. To get a 3-D junction. A junction.72 Z. J3] which may generate another 3-D junction group. More than one 3-D junction can be generated given a seed because there can be more than one 2-D junction of a compatible height in one image. J4] of the height h124 is generated. J2. Three-dimensional properties. If X ^ 1 and height distribution X. 2-D junctions may not be extracted in all of the views because of occlusion. 13. Grouping junctions. h12.2. two 3-D junctions can be generated from [J1. ð 3Þ Fig. possible matches from other images are collected recursively. J2. (3)). J4 is collected. J5. J2. J5]. and junctions from the rest of the images are collected recursively given a group and the height. J2]. An example is shown in Fig. For example. is close to 0.3. is a subset of an existing 3-D junction.2. a matched junction pair of a small height-error-bound is used as a seed. J4. DX 2 ^ X 2 are normal distributions with the means l1 and l2. are also assigned to a 3-D feature and used in the grouping procedures. junctions are collected from other views which match with at least one of the junctions in the seed and have compatible heights (Eq. These 2-D linears are called member linears of the 3-D linear. X ^ ¼X ^1 À X ^ 2 . 13. For example. which are [J1. and accidental illumination. is compatible to h12. J4]. such as [J1. such as a height estimate and a polarity. h14. A 3-D junction is made when all the possible matches are collected. when the diﬀerence. J2] of the height. such as J1. ABERS intensively uses the properties of the member features. J6]. Kim. J6]. Given a seed. . A 3-D junction is a group of matched junctions (member junctions). say [J1. may be part of another seed. in higher-level feature grouping procedures. For a building corner. Feature grouping A 3-D linear is a group of 2-D linears from diﬀerent views that match each other. We use 3-D junctions which have ‘‘members’’ in at least two-thirds of the images. say [J1. and independent on each other. ^ has a normal disrespectively. R. then the diﬀerence DX 2 tribution with the mean l = l1 À l2 and the variance r2 ¼ r2 1 þ r2 . such as [J1. are considered to come from the same Two height measurements. if a seed. the search for a new junction group is not initiated. Given a seed [J1. accidental alignments. of a junction pair J1 and J4. when the height. Then a new group [J1. The height esti^ ^ mates X 1 and X 2 are regarded to be compatible if l À r 6 0 6 l þ r: 4. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 4. J6] and [J1. J2.

(A2). Let A1 and C1 be the branches of J1. but not A3.2). The separate grouping processes of 3-D junctions and 3-D linears may cause some diﬃculties in determining the branch relationships. (L2. . 14. (A3)] and [(C1). 15. and B3 be nearby 2-D linears of A1. and A3 and C3 be the branches of J3. of diﬀerent heights. (C2). Grouping linears. A parallel relationship is assigned between two 3-D linears when any of the member linears have a parallel relationship (Section 3. (L2. 15. consider a 3-D junction J. and A3. For example. A similar method is applied to get 3-D linears. (A2)] to be grouped with B3. (L5)]. J3]. (L4)] [(L1). for example [(L1). A 3-D linear is a branch of a 3-D junction when all the branches (of a certain side) of the 2-D member junctions are members of this 3-D linear. 14. This will generate a 3. L3). Kim. in Fig. and the 3-D positions are determined by using epipolar geometry. in Fig. A2. Fig. Once a 3-D linear is generated. (C3)] will be set to be branches of the 3-D junction J. in this case. let B1. (2). Thus. Then. Ambiguities in determining the branch relationships of 3-D features. Since L2 and L3 are collinears we group them into a single 3-D linear. J2.Z. In Fig. A2 and C2 be the branches of J2. For example.2) and have compatible heights (Section 4. a 3-D linear can have more than one member linear for a view. To generate 3-D linears. A small line position error can result in the seed [(A1). R. we generate two 3-D linears [(L1). as well as L4 and L5 of the third view with compatible heights. the possible presence of collinear lines is also considered. its height is estimated from the pair-wise height estimates by Eq. L3). We deﬁne branch relationships between 3-D junctions and 3-D linears. 15. (L2. 3-D linears [(A1). (2). parallel relationships are determined from those of their member linears. linear L1 matches with both L2 and L3 of the second view. But L4 and L5 are not.D linear Fig. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 73 Once a 3-D junction is generated its height is estimated from the pair-wise height estimates by Eq. B2. [J1. L3). (L4)].

(B3)] may be formed but [(A1). 3 are shown in Fig. 4) are used to eliminate some of the false matches as illustrated in Fig. In such a case. 16. Similarly. The arrow shows the actual height of the building. It is necessary to increase the detection rate. another set of 3D linears are generated from the 2-D linears which do not belong to any 3-D linears. which may not have a compatible height to that of the 3-D junction J. We assume that 3-D linears have members in at least two-thirds of the images (for example. To reduce these problems. Kim. (A2).4. For example. However. (For interpretation of the references to colours in this ﬁgure legend. 17. 3. this does not completely solve the problem. (A3)] or [(B1). (A2). (B3)] may not be compatible with the height of the 3-D junction [J1. (B2). (B3)].) . the heights of the 3-D linears [(A1). Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 [(A1). are ﬁrst generated by using the branches of 2-D member junctions (given a 3-D junction) as seeds. DEM data (Fig. Note that an important line may be missing in one or two view due to accidental illumination condition. The 291 3-D linears generated from ﬁve images of the building in Fig. (B2). J2. Then. (A3)] may not be. we may deﬁne a 3-D linear as a branch of a 3-D junction when the branches of the 2-D member junctions intersect with the members of the 3-D linear. For example. but results in a large number of ambiguities due to distracting parallel lines and the epipolar alignment problem. We may deﬁne branch relationships diﬀerently to handle this problem. A 3-D linear is veriﬁed when its height is compatible with that of a Fig. R.74 Z. A 3-D linear is projected onto the DEM image and statistics of the DEM values in regions (sampling windows) adjacent to and on both sides of the projected line are computed. Moreover. and the heights are compatible. J3]. Purple colored ones are higher than greens. (A3)] and [(B1). this will increase the numbers of branches for 3-D junctions and make hypotheses search (Section 5) computationally more expensive. the reader is referred to the web version of this paper. 4. Large numbers of false matches are found due to distracting parallel lines. 16. 3-D linears [(A1). ABERS ﬁrst generates 3-D linears which are the possible branches of 3-D junctions. Three-dimensional feature selection with DEM The 291 3-D linears generated from ﬁve images of a building in Fig. 4 when 5 or 6 images are used). the 3-D junction J may not have branches although its members do. (A2). (A2).

18 shows the veriﬁed 3-D linears (151 remain). R. Therefore. The polarity of a 3-D junction is positive when its height is the same as that of its inner-side. for possible building sides. especially for sloping roofs. Note that wall base lines remain as well as rooftop boundary lines since they are from the real 3-D lines. 17. Once a polarity is assigned to all the 3-D features. Finally. we can reduce the search complexity of the hypotheses generation signiﬁcantly. The arrow shows the actual height of the building. the forwardness and the backwardness of the branch relation- Fig. 3-D linears are veriﬁed when their heights lie between those of side windows.Z. . Three-dimensional junctions are veriﬁed when both of their branches are veriﬁed. Fig. Note that the accuracy of DEM is poor near building sides. 3-D linears veriﬁed with DEM. Many false matches were eliminated. a polarity is assigned to each 3-D feature. The polarity of a 3-D linear is deﬁned as positive when its height is the same as that of its left-hand side. By knowing which side of a 3-D feature a building lies on. 18. sampling window of either side. Kim. Also note that the inside of sloping roofs are higher than their boundary lines (eaves). 3-D line veriﬁcation with DEM. where the height of the center window is between two side windows. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 75 Fig.

we use DEM layers (Section 3). veriﬁcation (Section 6). 19. When DEM is not available. A neighborhood search is performed on the coarse-level 3-D features and a relaxation procedure is applied to reﬁne the generated coarse hypotheses. First. Rooftop boundary hypotheses are generated by a neighborhood search on 3-D features. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 Fig. We apply hypotheses generation. . Fig. we cannot use DEM layers for sloping roofs since the layers are deﬁned to be ﬂat (parallel to the ground) patches. For a chain (or a pair of chains). to generate the rooftop boundary hypothesis of a sloping roof. For sloping roofs. Therefore. and overlap analysis (Section 7) repeatedly for each DEM layer. a number of nearby 3-D linears may be present. Two-dimensional features which belong to the veriﬁed 3-D features. 19 shows the 2-D features which are the members of the 3-D veriﬁed features. For ﬂat rooftops. We apply a level-of-detail technique to reduce the search complexity. coarselevel 3-D features are generated by grouping nearby 3-D features (member 3-D features). we apply the same approach as for ﬂat rooftops. we assume that the outer boundaries (eaves) are parallel to the ground (Section 2). Polarities are shown with arrows. two 3-D features of the same 2-D member features but opposite polarities are generated and used for higher-level grouping. the other includes sloping roofs. ships are determined among the 3-D features. R. veriﬁcation. then sloping roof analysis (Section 8) is applied. Polarities are shown with arrows. However. Thus. The forward (backward) branches of a positive (negative) 3-D junction are the branches at the counter-clockwise side of the junction point. as in the example of Fig. and the search complexity can be exponential in the number of such linears. hypotheses generation. 18. The search is performed on a graph consisting of branch relationships to ﬁnd closed loops or chains (open paths). closures are determined to make a rooftop boundary hypothesis. From 74 3-D junctions 55 of them were veriﬁed. 5. and the others are the backward (forward) branches. Rooftop boundary hypotheses generation ABERS operates in two modes: one is for detecting ﬂat rooftops only (which takes less time).76 Z. Kim. and overlap analysis are performed only once. Even after DEM veriﬁcation.

We see that many of the ambiguities are removed. not from 3-D physical properties of the building. a search is performed to group them into coarse hypotheses. . the distance between two 3-D linears is deﬁned to be the maximum of the pixel distances between 2-D member linears in all images. xi1. . Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 77 Three-dimensional coarse junctions are generated ﬁrst by grouping nearby 3-D junctions. As in the 3-D feature generation. Given a hypothesis. The remaining 3-D linears are grouped with nearby 3-D linears to make 3-D coarse linears. After generating 3-D coarse features. . 16 3-D coarse junctions were generated from 55 3-D junctions. R. line ﬁtting. and wrong feature grouping. Closed loops and chains (open paths which are not subset of any other open paths or closed loops) are collected in the grouping process. given a 3-D coarse junction. we need Fig. In other words. 18. For this grouping. ﬁrst. Kim. Fig. We allowed maximum 8 pixel distance for coarse junction generation. From the same example.. the forward and backward branch relationship is consistently maintained by grouping the branches of the member 3-D junctions. The forward neighbors of a 3-D coarse junction are their forward branches. its ith 3-D coarse feature has m member 3-D features. A depth-ﬁrst-search starting from a 3-D coarse junction is performed on neighborhood features. .Z. The forward neighbors of a 3-D coarse linear are the 3-D coarse junctions which have the 3-D coarse linear as one of their backward branches and vice versa. the distance between two 3-D junctions is deﬁned to be the sum of the distances between their branches. The member 3-D linears were merged only for the purpose of display. 18. Note that we use image errors instead of the distances between 3-D junction position (which may be more intuitive) because most of the errors come from image quantization. this reduces the search space signiﬁcantly. and our goal is to choose one that has good local properties and is compatible with neighboring features. 20. The next step is to reﬁne the hypotheses (the closed loops and the chains) generated from the 3-D coarse features. 20 shows the 62 3-D coarse linears generated from the 151 3-D linears of Fig. When DEM layers are used. xim. The neighborhood features of a 3-D feature consist of the forward neighbors and the backward neighbors. The 62 3-D coarse linears generated from the 151 3-D linears of Fig. only the 3-D coarse features near the boundary of a given layer with compatible heights (to the layer) are used in the search.

Kim. For EL. Therefore. L ðxij Þ½1 þ M ðtÞ ðxij Þ þ LðtÞ ðxij Þ½1 þ M ðtÞ ðxij Þ ðt Þ ð 6Þ The complexity of the problem is relatively simple and the correct solution is normally found in less than ﬁve iterations. xkl Þ ¼ log P ðE N jxij . We apply continuous relaxation labeling [13]. xkl ÞLðxij ÞLðxkl Þ þ r2 ðxij . negative 3-D features. The probability distribution of each evidence was approximated to a Gaussian distribution. Lð xÞ 1 À LðxÞ. positive 3-D features representing the ridges and the corners are collected by examining their pixel distances between the projected ridge or corner positions. To generate all of the desired hypotheses we need to consider missing features for building sides and corners. 1. y ÞLðy Þ þ r2 ðx. we used wall vertical line support and the compatibility of junction positions across the views (maximum distance between member junctionÕs 2-D position and the projected 3-D junction point). and r1(x) and r2(x. 1} for all xij where j Lðxij Þ ¼ 1 (only one 3-D feature is chosen for a 3-D coarse feature) and a gain function. Given the gain function. Grouping only the neighbor features may not be sufﬁcient enough to compensate the missing features.xkl g2N fxij . R. N is the neighborhood relationship. the other member features of the positive featuresÕ corresponding 3-D coarse features. As well as positive features. Given a ground truth building model. We use r1 ðxij Þ ¼ log P ðE L jxij Þ and r2 ðxij . are collected.78 Z. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 P to assign a label L(xij) 2 {0. we apply an iterative minimization algorithm proposed by Rosenfeld et al. The generated and reﬁned hypotheses are either closed loops or chains of 3-D features.xkl g2N where we use the continuous label LðxÞ 2 ½0. ð 5Þ where EL and EN stand for evidence for local and neighborhood scores. Note that a 3-D junctionÕs neighbor is always a 3-D linear and vice versa. xkl ÞLðxij ÞLðxkl Þ.xkl g2N X fxij .x0 ) are the unary and the binary compatibility functions. Therefore. xkl Þ. xkl ÞLðxij ÞLðxkl Þ þ X fxij . ð 4Þ fxij . xkl ÞLðxij ÞLðxkl Þ r2 ðxij . y ÞLð y Þ: oLðxÞ fx. For each iteration the label is updated as follows: Lðtþ1Þ ðxij Þ where M ðxÞ ¼ X oG ¼ r 1 ðxÞ þ 2 r2 ðx. and use the gain function X X Gðfxij gÞ ¼ r1 ðxij ÞLðxij Þ þ r1 ðxij ÞLðxij Þ xij þ þ X X xij r2 ðxij .y g2N ð 7Þ LðtÞ ðxij Þ½1 þ M ðtÞ ðxij Þ . G({xij}) is maximized.xkl g2N r2 ðxij . we use the maximum pixel distance between a member linear and the corresponding branch of a member junction. we use one more . The statistics of the scores of the positive and negative features are gathered to estimate the probability distributions. for EN.

we need to make the closures for them. and closeness of a hypothesis to the boundary of a DEM layer. once again. (D) closed L. Line support consists of the supporting (RP ) and the distracting (RN ) line evidence. 21. (B) parallels with opposite polarities. This consists of line support. see Section 6 for details) is chosen. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 79 Fig. 6.Z. Suggested closures (dashed lines) for (A) parallels with the same polarity. We use line scoring function given in [17]. The end-points of a closure are generated from the 2-D member linears of the respective 3-D linears. (C) collinears. and the chains always end with 3-D linears. and. For each side of the polygon. Since chains and pairs of chains are not closed. Given a 3-D rooftop boundary hypothesis its projection (2-D polygon) onto each image is calculated. supporting and distracting line segments are collected. darkness of the cast shadow region. The next step is to obtain 3-D rooftop boundary hypotheses from closed loops. Buildings are regarded to lie on the shaded sides. Note that 3-D junctions always have branches (3-D linears) as neighbors. Hypotheses veriﬁcation Once rooftop hypotheses are obtained. R. A line segment is con- . 21 shows possible alignments of 3-D linears (arrows represent polarities) and suggested closures. grouping criteria which is a parallel relationship. (E) open L. Closures are generated from all the possible end-points and the one with the best line support (the coverage of the supporting line segments and lack of crossing lines. Kim. Blank circles represent the previous corners and ﬁlled circles represent the next corners. wall vertical line support. Note that there are more than one 2-D member linears for a given 3-D linear. Arrows represent 3-D linears with polarities. (F) general L. and pairs of chains by hypothesizing closed polygons. The 3-D positions of 3-D junctions (Section 4) are used to determine the corner points of the rooftop boundaries. group two chains. supporting evidence is collected for them. we. which have 3-D linears that are parallel to each other. Fig. Spatial indices are used to eﬃciently retrieve the line segments near the sides. chains. We only consider closures parallel or perpendicular to those 3-D linears.

Fig. Only the visible wall verticals. Only one of the two verticals has actual line support in the image (shown in red line in the lower-right corner). ð 8Þ s where hs is the angle between a distracting line segment s and the side. ds is the distance of the intersection from the closest end-point of the side. we can calculate the angle of the Sun. (For interpretation of the references to colours in this ﬁgure legend. we can calculate the latitude and the longitude for a image window (with known ground height). are investigated. R. self-occluded lines and lines in the shadow side of a building are not considered (this is estimated from the 3-D model hypotheses and the angle of the Sun). and ls is the length of the line segment. 22.) . Wall vertical line evidence for a rooftop hypothesis. Only the two wall verticals in the upper-left and lower-right corners of the building are investigated. Only one of the two vertical lines (the one in the lower-right corner) has actual line support in the image. Thus. the camera orientation with world coordinates is given. which are not self-occluded nor in the shadow side of the building. the reader is referred to the web version of this paper. We generate the search window for the shadow darkness evidence (SD ) with the known illumination geometry (shown in Fig. In aerial image analysis. in the upperleft and lower-right corners of the building. Knowing the date and the time when the image is acquired. Kim. The supporting roof line score (RP ) for a projected polygon is the length-weighted summation (wrt the length of the side) of the coverages for all sides of the polygon. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 sidered as supporting when it is aligned and close to the side (the angle and the distance between them should be smaller than the threshold suggested in [17]). its coverage of the side is calculated by projecting the end-points of the line segments onto the side. Fig. ABERS considers only the potentially visible lines for wall verticals. A line segment is considered to be distracting when it intersects the side and the angle between the segment and the side is bigger than a certain threshold. Once supporting line segments are collected for each side. Wall vertical line support (WV ) is the coverage of the supporting line segments for the possible wall verticals (building corners). The distracting roof line score of a projected rooftop side is given by the following equation: X h s d s ls .80 Z. 22 shows the wall vertical line support for an example hypothesis. 23).

The evidence score is set to the percentage of dark pixels in the search windows. Two evidence variables are generated to measure the closeness of a hypothesis to the corresponding DEM layer. . However. The second one Fig. ABERS uses a simple estimation which is the maximum of the distances of the rooftop corner points or mid-points of boundary lines to the layer boundary (Fig. Hence. 24).Z. The ﬁrst one (DEM1 ) measures how much of the boundary of a rooftop hypothesis is close to that of the DEM layer. for robustness. the estimated shadow geometry may contain errors. The actual estimate of the shadow area includes the outer parallelograms (dotted line). Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 81 Fig. For each rooftop boundary line. Therefore. its extrusion to the ground and the projection of a shadow line are computed. Search windows for shadow darkness evidence for a rooftop hypothesis. 24. an example hypothesis is superimposed on an image of a DEM layer. R. 24. Kim. Note that the shadow may fall onto sloping ground or extruded objects such as smaller buildings nearby. only the ﬁrst half of the estimated area is investigated. only the ﬁrst half of the estimated area is used as a search window for robustness. 23. In Fig. An example rooftop hypothesis given a DEM layer.

In ABERS. . For image-derived evidence variables. 25A and the one using full evidence is shown in Fig. Therefore. We apply two stages of veriﬁcation to reduce the computation. Kim. 26. Additional information is used. (A) All the building hypotheses generated for a building in an example building complex and (B) the veriﬁed hypotheses. such as the size (area) of the projected rooftop hypotheses on an image (SIZE ) and projected length of wall verticals (WL ). In the ﬁrst stage (hypotheses selection) hypotheses with bad roof line support (calculated from RP and RN ) are ﬁltered out. 190 Fig. reasoning on the evidence from images and DEM. a binary node was used for SIZE . Therefore. DEM2 may work as negative evidence in such a case. a reasoning tool for varying numbers of evidence and missing data is needed. DEM2 is used only for overlap analysis (Section 7). R. Note that a single DEM layer often times represents more than one building component. One challenge we face. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 Fig. (A) A simple EBN used for hypotheses selection and (B) an EBN used for hypotheses veriﬁcation with full evidence. and a ternary node was used for WL . and the rest of the evidence is collected and applied to the remaining hypotheses (hypotheses veriﬁcation with full evidence). 25B. 25.82 Z. is that the numbers of evidence features that are used for the reasoning may vary at runtime because the numbers of images used are variable. where the numbers vary according to the number of images. (DEM2 ) is the area coverage of the hypothesis on the DEM layer. A simple EBN for the hypotheses selection is shown in Fig. We apply an expandable Bayesian network (EBN) [12] to compute P(Building ŒEvidence). EBN provides repeatable nodes which instantiate (make copies) at runtime. is relatively more expensive than others. Note that the computation for some of the evidence. All the continuous evidence variables were discretized into 5 levels. such as the darkness of shadow region (SD ).

The learning dataset for the comparative classiﬁer is obtained by specifying accurate and inaccurate hypotheses by hand. for each evidence pairs. 25B) but for the use of DEM2 (Section 6). Fig. C(m) = C(Àm). was designed and learned for the overlap analysis. we used 583 pairs of learning data which are generated from 211 building hypotheses.Z. However. Fig. Note that the classiﬁcation result should be commutative. 26B. Kim. where these hypotheses represent parts of an actual building as in Fig. where m is the vector diﬀerence between two sets of evidence comparing. we use a comparative classiﬁer. We see that many of the false positives have been removed. is not accurate because that binary classiﬁer is not designed and learned to compare two good building hypotheses but to determine whether a certain hypothesis corresponds to a building or not. 27. where Fig. For the comparative classiﬁcation. We aim to choose the best possible building component. which takes the vector diﬀerences as input. Another EBN. 26A shows all the building hypotheses generated for a building in Fig. Overlap analysis It is common that more than one hypothesis is veriﬁed for a single building component. are generated. The structure of the comparative EBN (shown in Fig. we use the vector diﬀerence between two sets of evidence variables since all the evidence variables are continuous. However. 27) is similar to the one for the hypothesis veriﬁcation (Fig. and C(m) is the classiﬁcation result for m. A common overlap analysis procedure is to select all the hypotheses which do not have any overlapping hypotheses of better scores. 26B. m and Àm. P(Building ŒEvidence) of the EBN in Fig. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 83 examples were collected for hypotheses selection and 96 were collected for hypotheses veriﬁcation. Then. the learning data are obtained by taking the vector diﬀerence of their evidence variables. two learning data. R. for all the overlapping pairs of accurate and inaccurate hypotheses. Therefore. 28 shows an example. . comparing two veriﬁed hypotheses according to their veriﬁcation score. 7 and the veriﬁed hypotheses are shown in Fig. following this approach may eliminate some of the important hypotheses. which takes two sets of evidence variables as input and determines the probability that one hypothesis is better than another. 7. In ABERS. Therefore. 25B. An EBN used for the overlap analysis. An example veriﬁcation result is shown in Fig. 26.

the same procedure is repeated for the remaining hypotheses until no more hypotheses remain. one of the rooftop hypotheses which is superior to all of its overlapping hypotheses is selected. Rooftop analysis 8. this procedure only selects a as a ﬁnal hypothesis while the desirable results are either c or a and b together because all three of them are already veriﬁed. To handle this situation. they are not considered to be overlapping. Above overlap analysis is applied for each layer. R. A rooftop Fig. the overlap analysis procedure of ABERS follows the following iterative steps. When a sloping roof hypothesis conﬂicts with a superstructure hypothesis. Consider a building complex shown in Fig. (A) Estimation of shadow and wall without considering the base building and (B) corrected estimation with a base building component. another overlap analysis is applied to the ﬁnal hypotheses of all the DEM layers once again. we need to consider the interaction among building components. . may conﬂict with superstructure hypotheses. a. and c are three overlapping hypotheses. Also. First. non-ﬂat rooftop hypotheses. 29. but a and b do not overlap each other. When a is superior to c. 8. Then. 28.84 Z. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 Fig. which are generated after the overlap analysis. A diﬃcult situation for the overlap analysis. b. Kim. when one hypothesis is completely inside another and signiﬁcantly higher. the sloping roof hypothesis is selected.1. However. Therefore. but c is superior to b. Superstructure analysis For a multi-layered building complex. 29A. To get multi-layered buildings (superstructures). and all of its overlapping hypotheses are eliminated. overlapping hypotheses can also be generated from two diﬀerent DEM layers. Hypothesis c overlap both with a and b.

The superstructure candidates are veriﬁed with the modiﬁed evidence considering the base building. ABERS performs a separate superstructure search after the overlap analysis. Hips (diagonal lines). Hypotheses of a given layer are veriﬁed with the corrected shadow and wall evidence given a base building. given a base building. Fig. 29B shows the corrected estimation of the wall and shadow considering a base building component. 8. 3-D linears (3-D hip linears) are generated by matching them. When DEM layers are used (Section 5). R. For each corner of a rooftop boundary (eaves) hypothesis. Given a corner point. Sloping roof analysis The next step is that of sloping roof analysis. a search is performed on the selected hypotheses (Section 6) to ﬁnd the superstructure candidates by examining their heights and the percentages of the overlap. 30. ABERS proceeds from the layer of the lowest height and uses the ﬁnal results (after the overlap analysis of each layer) of the lower layers to ﬁnd the appropriate base buildings. The ﬁnal hypotheses after the overlap analysis are considered as base buildings. 30 shows the search windows and the hip lines found for the example hypothesis. Finally overlap analysis is applied to the veriﬁed superstructure candidates. Note that these 3-D linears are not parallel to the ground. Therefore. Line matching without any restriction is very susceptible to noise. 31 illustrates the situation. possible hip lines (diagonal lines from the corners) are gathered. it is desirable to ﬁrst ﬁnd the base building for the accurate veriﬁcation of the superstructure. Fig. Fig. a nearby line segment is considered to be a hip candidate when it is inside the angle determined by the eaves and the angle is directed toward the corner point. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 85 boundary hypothesis for the superstructure can be found with the suggested approach.Z. we know that one of the 3-D end-points of a 3-D hip linear is the corresponding corner point of the rooftop hypothesis. Fig. Fortunately. Once hip lines are collected from all the images. but it will have weak wall and shadow evidence support when the estimation of the shadow and the wall does not consider the interaction with the base building. When DEM layers are not used. .2. Square boxes are the search windows. Then. Kim.

When 3-D hip linears are available. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 Fig. Thus.86 Z. 2-D linears parallel to the neighboring building sides are gathered. r0 ) on l1 and l2. which is obtained by cross-product of the normals of Pq and Pr. . 31. generated from a building corner p. we use relatively small search windows from the 3-D end-points of the linears as shown in Fig. R. The sizes of the search windows are in determined by the end-points of 3D hip linears. a search is performed to ﬁnd ridges. 32. The resulting ridge line candidates (2-D linears) are shown in the search windows. The goal is to reconstruct the 3-D line l from the line segments (q. l is the intersection of Pq and Pr (unless Pq and Pr are identical which is rare for aerial images). the 3-D line l intersects with the line of sight lq for q. Example search results. Searches for the ridges. x) where x is any point on lq (lr). and p. First. we ﬁnd one constraint that l lies on the plane Pq which contains both lq and p. the line of sight for r. The normal of Pq (or Pr) is obtained by cross-product of lq (lr) and (p. l lies on the plane Pr determined by lr. Three-dimensional end-points of a hip linear are generated from its 2-D members. The ridge line candidates are the 3-D linears which have those 2-D linears as their members and have heights higher than the eaves (in a certain range). Knowing that l passes through p. Kim. 32. we only need to know its 3-D orientation. Let lines l1 and l2 be the projections of a hip line l (for a corner point p) in view1 and view2. are illustrated in Fig. After 3-D hip linears are found. Since we know that l passes though p. 32. q0 ) and (r. Since q is the projection of a point on l. In the same manner. Determining the 3-D orientation of a 3-D hip linear. The ridge corner point corresponding to the boundary corner Fig.

When one of the hip linears is not found. point p will be the intersection of a pair of the candidate 3-D linears. not a pair. The number of linear matches in one image pair is O(l) when the possible height ranges are ﬁxed. which is bounded by O(l). R.Z. local line support (Section 5) and compatibilities with candidate pairs of neighbor corners are calculated. The ridge corner points are set from these pairs. the position of the ridge corners cannot be determined exactly. For all the candidate pairs. 2). a corner hypothesis is generated from the neighbor corner points forcing the parallel relationship. In such a case. the corner point g of Fig. for example. The number of junctions is usually much smaller than that of the linears (Section 3. it is chosen from the end-points of the ridge linear (generated from 2-D member linears). a rooftop is generated by collecting all the ridge corners. the corner point is determined by the intersection of a hip linear and the ridge linear. it is between 2l and 10l. Kim. Such a corner hypothesis is veriﬁed with the line support. The actual numbers vary according to the image conﬁguration. for example. for example. Thus. When neither of them are found. and the number of images. alignment of epipolar lines and building sides and the complexity of buildings. 9. Time complexity To estimate the time complexity of ABERS. l. corner points near a ridge line are projected onto the ridge line. For a missing ridge corner. Due to the photogrammetric errors. Finally. two factors are considered.2). the estimate of the ridge corner point h of Fig. the corner point is determined by the intersection of the two 3-D hip linears (ag and fg in Fig. The result for the eaves boundary of Fig. n. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 87 Fig. A ridge corner point can also be determined by a single ridge. similar to that of Section 5. 22. 2 may not be on the ridge line gi. 33. the average number of 2-D linears per image. After some iterations of relaxation procedure. Not all the ridge corner points can be found from the above method. In most cases. 22 is shown in Fig. A detection result for the example building of Fig. The number of all . 33. the best pair for each corner is determined. In the ﬁnal reﬁnement of rooftop analysis. 2.

in most cases. The number of 3-D coarse junctions is also bound by the same complexity. In the worst case. The number of 3-D coarse linears is bound by O(l) because most of the 2-D linears correspond to only one 3-D coarse linear or none (note that the false matches were eliminated by the DEM veriﬁcation. The worst case of the hypotheses formation. R. the time complexity is still O(ln2) because we need to look at all the linear matches to generate the 3-D linears. The time complexity for reﬁning coarse hypotheses is linear in the number of hypotheses. However. b. The most time-consuming part of the hypotheses generation is determin- Fig. the actual numbers of 3-D linears are much smaller than those of the linear matches. Kim. The number of 3-D linears is also bound to a similar complexity. and no more ambiguities exist by collapsing nearby 3-D linears). the number of T-junctions is usually small. . the number of possible chains (subgraphs) is O(lbk). in practice. The most time-consuming procedure is that of hypotheses generation. The branching factor. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 the pair-wise matches is O(ln2). In fact. for example. and even for a single DEM cue. Note that DEM cues are used to ﬁlter out 2-D linears by their locations. not exponential. and the number of chains is O(lk). The white dots are T-junctions. as shown in Fig. which means that the number of T-junctions can be bound by a constant. and the actual number is even smaller. However. because there are O(n2) number of image pairs.88 Z. Hence. 34. where b is the branching factor of the graphs. their height is determined and the number of candidate in other images will be much smaller. b = O(l). this is rare. the numbers of coarse hypotheses (before parallel relationships are applied) are smaller than three times the numbers of 3-D coarse junctions. Fig. many graphs exist because of the broken linears and missing junctions. in the worst case. the maximum area covered by a DEM cue. for parallel relationships are applied to combine two chains. The total number of the hypotheses generated is O(l2b2k). Therefore. When we limit the maximum number of rooftop corners (maximum depth of the depth-ﬁrst-search) to k. the sizes of the graphs generated by 3-D coarse features will be bound by the maximum area covered by a DEM cue. Also. In most cases. the number of the generated hypotheses is much smaller than the square of the number of the chains because only small number of chain pairs have common parallel 3-D linears. 34 shows an example of the worst case where there is only one graph with all the 3-D coarse features. bk is usually small. 20. is bound by the maximum number of T-junctions in one graph. Consider graphs of 3-D coarse features with the neighborhood relationship. The actual number is much smaller than l. because once two linears are matched. because many false matches are eliminated.

1 s. ﬁve images of the 394. 10. 33. respectively. Fig. 35B. However. Fig. Unfortunately. the total time complexity for the hypotheses generation is O(l2b2k). 10. 7. 36 is the detection result for the example building of Fig. is not included.1 and 12 s were spent for hypotheses veriﬁcation and overlap/superstructure analysis.Z. l. We ﬁnd that the suggest approach nicely describes the U-shaped buildings on the right side while the previous approach based on a rectilinear representation (as a collection of rectangular rooftops) cannot. In this case. the number of linears. In addition. 35A and two images were used to get Fig. 21E and F). and 85. 128. 35A is the detection result on the example building site at Fort Hood. R. The worst case is ﬁnding L-closures (Figs. Four images were used. The pre-processing time. 750 MHz).2 2-D linears per image (total 641) were extracted and ﬁltered. The time spent for the 3-D feature generation was 4. it is diﬃcult to acquire large data sets with multiple image coverage for a valid statistical evaluation.5 s on a SunBlade 1000 workstation (Ultra Sparc III Processor. Texas. Thus. when we limit the maximum size of the target building. we can run the algorithm on small windows and combine the results later. One hundred and ﬁfty-one 3-D linears and 55 3-D junctions were ﬁltered by the veriﬁcation with DEM. Total time spent was 1 min and 37 s. we get linear complexity. Two hundred and ninety-one 3-D linears and 74 3-D junctions were generated. Although all the rooftops are rectangular. In the actual example of Fig. Four images were used to get Fig. and statistical evaluation on a small number of examples is less meaningful when the results strongly depend on how to choose a test dataset.6 · 452 average resolution were used. on each window is constant. Experimental results We show results on several examples in this section. most of the building detection and description systems have diﬀerent representational powers. Sixty-two 3-D coarse linears and 16 3-D coarse junctions were used to generate 79 hypotheses in 38. . it is hard to detect them correctly without the aid of multiple views because some corners are occluded by trees and some of the important lines are broken by small superstructures or missing because of accidental illumination (Fig. Therefore. Therefore. the total time complexity of ABERS is O(ln2) + O(l2b2k) in the worst case.1 s. 35B is the detection result of NoronhaÕs system [16] on the same site. Fig. Kim.2 2-D junctions per image (total 426) were generated from the 2-D linears. The number of total closure candidates are determined by the number of the end-points. such as for loading images and generating DEM. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 89 ing proper closures for each hypothesis. 37). The time complexities of the remaining procedures are linear in the number of hypotheses. the number of end-points is bound by a constant (the number of 2-D linears divided by the number of 3-D linears). The time spent for the 2-D feature extraction was 28. We ﬁrst show the results on ﬂat buildings. However. because the number of possible closures is the square of the number of the end-points.

Kim. Fig. and overlap analysis. Fig. Line segments of building in Fig. Fig. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 Fig. 40A. Another result on a complex building is shown in Fig. (A) Automatic detection and description results with 4 images from Fort Hood. 29. 38 is the DEM segmentation result. 39 shows a result on the building complex shown in Fig. Fig.90 Z. 40B shows the extracted line segments from an image. . The same result was obtained without using DEM layers. 41 shows more results on complex buildings. Another ﬂat polygonal building complex is shown in Fig. however. 37.8 · 443. 36. Fig. We see that the base building has a much more complex rooftop boundary with many T-junctions which make the hypotheses search harder. Fig. and the automatic description result is shown in Fig. 4 min and 44 s were spent for the hypotheses generation veriﬁcation. and overlap analysis. 35. where 2 min and 59 s of them were for the hypotheses generation. which shows the complexity of the problem. A total of 4 min and 33 s were spent on a Sun Ultra 2 workstation. Five images of 419. 38A. An automatic building detection and description result. R. We see broken boundary lines and large numbers of distracting lines both on the rooftop and the ground. in this case. 36. Texas and (B) the results from [16] (2 images were used). veriﬁcation. Important lines and junctions are broken or not detected.6 average resolution were used. 38C.

38. 39. Kim.Z. (B) DEM layers. (B) the line segments show the diﬃculty of the problem. . R. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 91 Fig. (A) A detection result for another complex building. 40. An automatic description result a multi-layered polygonal building complex. Fig. Fig. and (C) an automatic description result. (A) A ﬂat polygonal building complex.

42A. 11. Kim. shown in yellow). R. The problem of modeling complex buildings retains many complexities requiring substantial . it was not veriﬁed due to the interaction with the nearby objects. The detection and description result was robust. best results were obtained with six images. (A) A building complex containing a sloping rooftop and (B) an automatic description result. 42 shows an example of a failure. For most sites. (For interpretation of the references to colours in this ﬁgure legend. 41. We applied the same parameters to most of the sites (except Fig. Although the boundary hypotheses for the sloping roof was generated (Fig. The shadow was not veriﬁed due to the bright rooftop of the building next to it. 42. but the results (as well as the computation time) were not much diﬀerent than when we used ﬁve or seven images.92 Z. The description result is shown in Fig. the reader is referred to the web version of this paper. The superstructure is veriﬁed due to accidental alignments with eaves and a hip line. Conclusion We have presented an approach to detection and description of buildings with complex shape rooftops and shown results on some challenging examples. Fig. 42. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 Fig.) Fig. and the wall lines were not veriﬁed due to nearby trees. More results on complex buildings. 35 where only four images were available).

43. The position errors of 2-D linears and junctions mostly come from image quantization. We ﬁrst applied manually determined parameters to detect buildings automatically. The error distribution (accumulated) for the position errors of linears is shown in Fig.Z. Our method uses multiple images and multiple cues such as results obtained by region matching stereo analysis and feature-based matching. Linear position error. such as edges and corners. . Appendix. Acknowledgment This research was supported by a MURI subgrant from Purdue University under Army Research Oﬃce Grant No.5 pixels. and line ﬁtting error. We see Fig. Kim. Part of the low-level processing (Section 3 and Section 4. R.2) is a result of joint research with Andres Huertas. About 95% of the linears have position errors of less than 2. we gather 2-D member linears and junctions from the ﬁnal hypothesis. 43. calibration error. we gather the statistics from the automatic description results (building model). DAAH04-96-1-0444. In this Appendix. we show an example of determining the parameters regarding 2-D linear/junction position error. The diﬃculty comes in gathering supervised learning examples because it is very tedious to manually assign or provide ground truth for the intermediate features. Learning parameters Various parameters and threshold values are used in various grouping stages. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 93 future research but we believe that this work points to a promising approach. Then. into higher-level features by using a hierarchy of grouping processes with multiple levels of detail and use of probabilistic reasoning methods to select among the multiple hypotheses. Therefore. From the manually collected (good) description results the parameters can be tuned by applying statistical learning to the description results. We expect that the use of such methods will be helpful in many other object detection and description problem domains as well.2. we projected the resulting building model onto each and every image and measured the position errors of the linears and the junctions. We apply statistical learning to estimate such parameters. We have described perceptual grouping techniques to group low-level features. Given a good building description result (picked manually).

Kanade. Kim. K. W. 72 (2) (1998) 185–203. 3-D building detection and description from multiple intensity images using hierarchical grouping and matching of features. Fischer. The ascender system: automated site modeling from multiple aerial images. Chan.1 pixel) and junctions (1. Artif.B. Fo ¨ rstner. 2001. Energy minimization and relaxation labeling. pp. Cocquerez. M. These maximum position errors were used in various feature grouping stages. C. Vision Graph. Nevatia. 517–524. 187–192. maximum junction position error was determined to 2. 24–35. Petrou. IEEE Computer Vision and Pattern Recognition. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 that about 95% of the linears have position errors of less than 2. [9] A. [8] S. pp. 2. [15] J. pp. J. T.C. Thesis. Y. in: Proc. W. [10] A. Computer Science. Methods for exploiting the relationship between buildings and their shadows in aerial imagery. [4] M. F. A. Incremental reconstruction of 3-D scenes from multiple. [3] R. and validating hip lines. University of Southern California. R. . ISPRS Automatic Extraction of GIS Objects from Digital Imagery. A. 577–582. in: Proc. Mach. pp. Baillard. Intell. D. IEEE Computer Vision and Pattern Recognition. Building detection and description from a single intensity image. [16] S. Munich. Heuel. 1994. Herman. Kim. 72 (2) (1998) 101–121. V. Jordan. E. Projective and object space geometry for monocular building extraction.D. Detection and description of buildings from multiple aerial images. Intell. [2] C. Lin. R. A.-P. Pattern Anal. 2000. Paparoditis. Expandable bayesian networks for 3-D object descriptions from multiple views and multiple mode inputs.L. Pattern Anal. 54–61. IEEE Trans. reconstructing and grouping 3D lines from multiple views using uncertain projective geometry. [14] C. Kolbe. Irving. Huertas.H. pair-wise 2-D feature matching. Germany. [12] Z. Vision Image Understand. Extracting buildings from aerial images using hierarchical aggregation in 2D and 3D. vol. Kim.-Q. in: Proc. Shufelt. Jaynes. Riseman. Matching. Nevatia. Man Cybernetics 19 (6) (1989) 1564–1575. Comput. Syst. Stolle.5 pixel. Cheng. R. 25 (6) (2003). Lang. Collins. 1998. IEEE Computer Vision and Pattern Recognition. References [1] B. Computer Vision and Image Understanding: Special Issue on Automatic Building Extraction from Aerial Images 72(2) (1998). F. [11] R. Nevatia.94 Z. Wang. Gruen. Noronha.2). Image Process. 30 (3) (1986) 289–341. H. in: Proc. R. Similarily. J. 41 (2) (1988) 131–152. pp. Li. Nevatia (Eds. [7] M. R. Wang. IEEE Trans. 72 (2) (1998) 143–162. Imaging Vision 7 (1997) 1–162. Nevatia. 559–565. Ph. 1998. Intell. Vision Image Understand. Nevatia. for determining the colinearity between two linears. Huertas. Detecting buildings in aerial images. IEEE Trans. Math. R. [5] A. Mach. Ameri. N. Feature based model veriﬁcation (FBMV): a new concept for validation in building reconstruction. 2. Hanson. ISPRS Congress. Use of cues from range data for building modeling. in: Proc. 1999. Zisserman. which is considered as the maximum linear position error.3 pixel) were also calculated and used to determine the conﬁdence interval in the height estimation (see Section 4. for example. [13] S. X. Cremers. vol. McGlone. J. [17] S. T. M. Automatic reconstruction of piecewise planar models from multiple views. Comput. McKeown. The mean errors for linears (1. Cord. Plu ¨ mer. complex images. pp. in: Proc. [6] A. 1999. Fo ¨ rstner. Automatic extraction and modelling of urban buildings from high resolution aerial images. R. 23 (5) (2001) 501–518. Z. Comput. Noronha.5 pixel. DARPA Image Understanding Workshop. L.). Comput. Steinhage. Vision Image Understand.

[19] P. in: Proc. Nevatia / Computer Vision and Image Understanding 96 (2004) 60–95 95 [18] M. pp. Saint-Marc. DARPA Image Understanding Workshop. 1988. Roux. Feature matching for building extraction from multiple views. 1100–1113.Z. Kim. R. Adaptive smoothing for feature extraction. IEEE Computer Vision and Pattern Recognition. Vestri. [20] C. in Proc. McKeown. 2000. in: Proc. 1. pp. . Devernay. 46–53. G. Medioni. IEEE Computer Vision and Pattern Recognition. 1994. pp. Improving correlation-based DEMs by image warping and facade correlation. vol. 438–443. F. D.

- New Microsoft Office Word Document
- ch08
- Ece Auto
- 10.Registration Form
- CPM Assignment II(2015 16)
- Unit i Computer Architecture
- 100 Ques
- 06 Slide
- H.261 and H.263 Standards
- Unit7-SAH
- Unit3-CRR
- Stack
- Algorithms and Complexity - Wilf
- Ieee 2013 Java Projects
- Queues
- Cprog 0506 PDF
- Str
- Homeopathy Remedies Abbreviations
- Programming in C
- Non Recursive
- Brochure 19Aug
- 10 Chapter 5
- DMS Assignment Questions-I
- EC-222 Assignment - II
- Ds Assignment II

voasjasah

voasjasah

- 221 Chow
- Fu Ch10 Categorical
- A New Technique for Identifying Sequence Heterochrony
- Analogical Prediction
- Mg Statistics Dtoc
- Theory
- ockham.ps
- Dodds, Rothman, Weitz - 2001 - Re-examination of the 34-Law of Metabolism
- SHARPENING OCKHAM'S RAZOR ON A BAYESIAN STROP.pdf
- commit1
- MATH_F353_1479
- SHARPENING OCKHAM'S RAZOR ON A BAYESIAN STROP (1993 (lu).pdf
- Ass 2
- What Are Outliers143
- WHAT ARE OUTLIERS15.pptx
- What Are Outliers141
- What Are Outliers256
- What Are Outliers213
- WHAT ARE OUTLIERS17.pptx
- What Are Outliers107
- What Are Outliers253
- WHAT ARE OUTLIERS115.pptx
- What Are Outliers119
- What Are Outliers106
- What Are Outliers127
- What Are Outliers4
- What Are Outliers175
- WHAT ARE OUTLIERS81.pptx
- What Are Outliers38
- What Are Outliers21
- zuwhan-cviu04.pdf

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd