You are on page 1of 4

HEIGHT ESTIMATION FOR BUILDINGS IN MONOCULAR SATELLITE/AIRBORNE

IMAGES BASED ON FUZZY REASONING AND GENETIC ALGORITHM


Mohammad Izadi and Parvaneh Saeedi
School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada,
ABSTRACT
This paper presents a height estimation method for buildings
with polygonal footprints in monocular satellite/airborne images by combining fuzzy reasoning and genetic algorithm. A
fitness function is employed using a set of fuzzy rules that
asses various height hypothesis candidates for each building.
A genetic algorithm is utilized that optimizes the fitness function to recover the most accurate height quickly with an accuracy that is independent of the acquisition method. Experimental results verify the effectiveness of the proposed method,
with an overall mean height error of 35 cm.

gabled rooftops from multiple view (non-stereo) aerial images. Kim and Nevatia [4] utilized multiple overlapping images of a scene to model and describe complex 3D buildings.
Fujii and Arikawa [5] proposed a method that utilized airborne laser elevation maps with aerial images for the 3D reconstruction of urban structures.
One common problem with the previous approaches is the
complexity issue that arises by incorporating multiple views
and complicated shapes other than rectangular buildings. Also,
relying on only lines, in some of the previous works, limits the
scope of such height estimation approaches to bulidings with
very simple profiles. Estimation of the height for complicated
buildings is still an open research problem.

1. INTRODUCTION
Automatic 3D map reconstruction has been an active research
subject with a wide range of applications such as urban city
planning, military assessment simulations, and control for disaster preparedness. For many years, 3D building reconstruction, as the most prominent component of 3D map reconstruction, is done through semi-automatic approaches in which an
operator identifies buildings boundaries in a stereo set of
aerial images. Using acquisition geometry, image displacement, and perspective projection the dimension of buildings
are then determined. This process is a time consuming and
tiresome process. The abundance of inexpensive frequently
updated satellite imageries has initiated much work toward
automatic methods for 3D building models generation. While
numerous semiautomatic systems have been developed, only
a limited number of automated systems are reported in the
literature. These systems are still far from being capable of
coping with existing complexities of urban structures.
1.1. Previous Work
Lin and Nevatia [1] presented a method for estimating the
height of rectilinear flat buildings in monocular aerial images using lines. Collins et al. [2] described a system for 3D
representation of rectangular buildings from multiple views.
Noronha and Nevatia [3] proposed a method that detected and
reconstructed 3D models for rectilinear buildings with flat or
Authors would like to acknowledge with gratitude the NSERC Canada
for the support through the NSERC Strategic Grant Program.

c
978-1-4244-3610-1/09/$25.00 2009
IEEE.

125

2. SUGGESTED APPROACH
The main objective of this paper is to present a methodology
for height estimation of buildings with flat polygonal rooftops
using single view satellite/airborne imageries. The assumption made here is that the polygonal definition each rooftop is
provided as an input.
2.1. Acquisition Geometry
The acquisition geometry determines the Jacobian of the ground
to image and ground to shadow transformations (oblique/normal
viewings). It also transforms the geometry into a data structure that is independent of sensor and platform (satellite/airborne).
The input to this process for the satellite imageries are rational function and meta data files. For airborne photos a number
of feature points are selected, manual selection, in a file that
is automatically processed to extract the acquisition geometry [6, 7]. The acquisition calibration consists of the following steps: Determining the direction of gravity, Obtaining the
horizon, Calibrating the camera, Estimating the size of objects in meters, Calculating the vertical scale, and Computing
the direction of the sun.
2.2. Shadow Segmentation
Many shadow segmentation approaches rely on employing
threshold values [8, 9]. Such approaches could fail to identify

WIAMIS 2009

Table 1. Linguistic variables and labels for the fuzzy rulebased fitness function

(a)

(d)

(b)

(e)

(c)

Linguistic Variable

Linguistic Label

Input

Spectral Ratio
Shape Fitness

Small, Large
Small, Medium, Large

Output

Score

Negative Large, Negative Small,


Moderate, Positive Small,
Positive Large

(f)

Fig. 1. Various steps in predicting the expected shadow.

1

true shadows or suffer inaccuracies under varying illumination. In the proposed approach, a threshold independent local
segmentation method is employed to segment only the areas
around each building.
Tsai [10] utilized a segmentation method in an automatic
de-shadowing approach for shadow detection compensation
in color aerial images using spectral ratio with an automatic
thresholding technique. The proposed approach in this work
employs the spectral ratio of (H + 1)/(I + 1) (H and I are
normalized hue and intensity values in [0 1]) to construct a
ratio image. The ratio image is then segmented into regions
Ri using the Mean Shift Segmentation algorithm.
2.3. Expected Shadow Prediction
In this section the expected shadows at each candidate height
is estimated. Figure 1-a shows a polygonal rooftop with its
vertices (the hatching region). When viewing from the top,
the buildings rooftop covers parts of the walls footprint (the
gray region) and shadow areas (the black region), Figure 1a. The walls footprint covers some parts of the shadow areas. The walls footprint and building shadows are obtained
by projecting the rooftop vertices using the image acquisition
and the suns geometries for a given height, Figure 1-b. The
buildings walls and rooftop vertices are combined together
as displayed in Figure 1-c. The rooftop shadows are added to
the walls shadow by connecting lines between walls ground
projections and the expected shadow points, Figure 1-d. The
shadow of walls (Figure 1-d) and rooftops area are then combined to generate region shown in Figure 1-e. By subtracting
the merged rooftops and walls (Figure 1-c) from the resulting
shadow region (Figure 1-e), expected visible shadow regions
of the building are predicted as displayed in Figure 1-f.
2.4. Fuzzy Rule Based Fitness Function
In this section, a fuzzy rule based function, that plays the evaluation function role in the Genetic Algorithm (GA) [11] optimization process, is defined. This function evaluates build-

126

f2 (x)

f1 (x)

f3 (x)





 




Small
Medium
Large


   





 





1/3

1/2

2/3


x

Fig. 2. Membership functions of Shape Fitness variable


ings shadow areas according to the spectral ratio segmentation results and the fitness of projected shape of the region.
Here, the significance of the GA algorithm is in the fact that
the height candidate values are randomly chosen (with the
general direction towards the best fit). This improves the accuracy/speed by not limiting the height precision to a fixed
value and a faster convergence to the best solution instead of
examining a large number of possible height candidates.
For each height candidate, the predicted shadow projected
on the ground is assessed using the fuzzy rules (Table 1).
Two membership functions 1 (x) and 2 (x) are calculated for Spectral Ratio variable. Gaussian models are used to
define 1 (x) and 2 (x) as followings:

exp((x c1 )2 /212 ) x c1
1 (x) =
(1)

1
x < c1

exp((x c2 )2 /222 ) x c2
(2)
2 (x) =

1
x > c2
Here x represents the mean value of a region in the ratio
image. To calculate the parameters c1 , c2 , 1 and 2 , the ratio image is clustered first into two clusters by fuzzy c-means
clustering method [12]. The mean value and standard deviation of pixels in each cluster are calculated. The smaller mean
value is set to c1 and the larger one is set to c2 . Three membership functions f1 (x), f2 (x) and f3 (x) are defined for the
Shape Fitness variable as shown in Figure 2. Also five membership functions g1 (x), g2 (x), g3 (x), g4 (x) and g5 (x) are
assumed for the Score variable as depicted in Figure 3.
With the assumption that the related projected shadow of
a height candidate is RShadow , all regions Ri (extracted in


g (x)
1

g2 (x)

g3 (x)

g4 (x)

g5 (x)










N Large
 N Small  Moderate  P Small  P Large














-1

-0.5

0.5


x

Fig. 3. Membership functions of the Score variable


Section 2.2) that partially or fully overlap with the predicted
shadow regions are extracted and put in the set P . For each
region Ri in P , two parameters mRi and vRi are estimated.
mRi represents the mean value of pixels in region Ri , calculated using their spectral ratio values. vRi denotes the percentage of shadow coverage for region Ri computed by :
vRi = (RShadow Ri )/the area of Ri

Fig. 4. Final results for scene 5. The dotted line highlights the
roof top definition. The solid line displays the shadow projection for the optimum height found by the proposed algorithm.

(3)

At this stage a score SRi is calculated for each region Ri


using the following fuzzy rules:
Small SR & Small SF
Small SR & Medium SF
Small SR & Large SF
Large SR & Small SF
Large SR & Medium SF
Large SR & Large SF

Moderate Sc
Negative Small Sc
Negative Large Sc
Moderate Sc
Positive Small Sc
Positive Large Sc

Here SR, SF and Sc represent Spectral Ratio, Shape Fitness and Score variables. The membership values in the premise
part are combined through the minimum function to acquire
the strength of each rule [13]. The strength of the above rules
is computed using:
h1 =min(1 (mRi ), f1 (vRi )),
h3 =min(1 (mRi ), f3 (vRi )),
h5 =min(2 (mRi ), f2 (vRi )),

h2 =min(1 (mRi ), f2 (vRi ))


h4 =min(2 (mRi ), f1 (vRi ))
h6 =min(2 (mRi ), f3 (vRi ))
(4)
The implication of each rule is then computed for all zs from
the gi s domain:
D1 (z) = min(h1 , g3 (z)),
D3 (z) = min(h3 , g1 (z)),
D5 (z) = min(h5 , g4 (z)),

D2 (z) = min(h2 , g2 (z))


D4 (z) = min(h4 , g3 (z))
D6 (z) = min(h6 , g5 (z))

(5)
At this point, the total contribution of all rules is calculated
for all zs in the union of all gi s:
C(z) = max(D1 (z), ..., D6 (z))

(6)

The crisp output is then computed as the score of region Ri :




SRi = (
zi C(zi ))/
C(zi )
(7)
Finally, the Height Score HS is computed by:

(Area(RShadow Ri )) S(Ri )
HS =

Ri P

Area(RShadow)

(8)

127

Fig. 5. Test case: scene 7s output.


3. EXPERIMENTAL RESULTS
The performance of the system is assessed using 8 QuickBird
(0.6 [m/pixel]) and one aerial (0.14 [m/pixel]) images. Figure 4 shows the shadows of the estimated height (solid lines)
for the buildings with the manually found ground truth (dashdot lines). As displayed, the detected shadow regions match
against ground truth regions. Figure 5 displays another output image. Although building A has a complex rooftop and its
shadow is connected to other buildings shadows, its shadow
was detected correctly. The results for buildings B and C are
worst case in the whole test set because the adjacent forest
with heavy foliage overcasts the true shadow regions.
The test images were labeled from 1 to 9 with buildings
of each image labeled from A to E. The ground truth for each
building was prepared manually. The evaluation results are
shown in Table 2. Here, all experiments were conducted on
a PC with CPU Intel Core2 2.4GHz with 2GB RAM. Also
all programs were implemented in MATLAB 7. The systems
average time for processing a building in an image of 400
500 pixels is about 53.76 seconds.
Table 2 compares the accuracy of the proposed approach
with that of the manually found ground truth.

Stolle, E.M. Riseman, and A.R. Hanson, The ascender


system: Automated site modeling from multiple aerial
images, CVIU, vol. 72, no. 2, pp. 143162, Nov. 1998.

Table 2. Evaluation results.


Img.
No.

Bldg.
Id.

Est.
Height[m]

Actual
Height[m]

Absolute
Diff.[m]

6.5423

6.6797

0.1374

4.6885

5.1404

0.4519

4.9946

5.2500

0.2554

5.6876

5.7388

0.0512

3.55878

3.4999

0.0589

5.5942

5.3341

0.2601

5.6665

5.3891

0.2774

5.2086

5.1404

0.0682

4.6326

4.5211

0.1115

4.2147

4.2112

0.0035

4.3837

4.3293

0.0544

4.3698

4.3682

0.0016

8.4281

8.4302

0.0021

8.8967

8.9072

0.0105

7.1563

6.9359

0.2204

9.4826

8.5375

0.9451

12.3066

7.9688

4.3378

7.7647

7.7266

0.0381

5.2414

5.3047

0.0633

6.6073

6.7524

0.1451

4.1036

4.1719

0.0683

3.8795

4.1450

0.2655

Mean Error:

0.3558

RMS Error:

0.9610

[3] S. Noronha and R. Nevatia, Detection and modeling


of buildings from multiple aerial images, IEEE Trans.
PAMI, vol. 23, no. 5, pp. 501518, 2001.
[4] Z. Kim and R. Nevatia, Automatic description of complex buildings from multiple images, CVIU, vol. 96,
no. 1, pp. 6095, Oct. 2004.
[5] K. Fujii and T. Arikawa, Urban object reconstruction
using airborne laser elevation image and aerial image,
IEEE Trans. GRSS, vol. 40, no. 10, pp. 22342240,
2002.
[6] R.I. Hartley and R. Kaucic, Sensitivity of calibration to
principal point position, 7th European Conf. Computer
Vision, p. II: 433 ff., 2002.
[7] B. Johansson and R. Cipolla, A system for automatic
pose estimation from a single image in a city scene,
Proc. IASTED, 2002.
[8] T. Kim, T. Javzandulam, and T. Y. Lee, Semiautomatic
reconstruction of building height and footprints from
single satellite images, IEEE Int. GRSS, pp. 4737
4740, 2007.
[9] X. Huang and L. K. Kwoh, 3D building reconstruction and visualization for single high resolution satellite
image, IEEE Int. GRSS, pp. 50095012, 2007.

4. CONCLUSIONS
In this paper a height estimation method was presented for
buildings with polygonal rooftops in monocular images. Building shadows and shape constraints were used to estimate the
buildings heights. A fitness function was introduced that employed fuzzy rules to evaluate height candidates. True height
was retrieved using a genetic algorithm in the search space.
5. REFERENCES
[1] C. Lin and R. Nevatia, Building detection and description from a single intensity image, Comput. Vis. Image
Underst., vol. 72, no. 2, pp. 101121, 1998.
[2] R.T. Collins, C.O. Jaynes, Y.Q. Cheng, X.G. Wang, F.R.

128

[10] V.J.D. Tsai, A Comparative Study on Shadow Compensation of Color Aerial Images in Invariant Color
Models, GeoRS, vol. 44, no. 6, pp. 16611671, 2006.
[11] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison: Wesley Publishing, 1989.
[12] R.L. Cannon, J.V. Dave, and J.C. Bezdek, Efficient
Implementation of the Fuzzy C-Means Clustering Algorithm, IEEE Trans. PAMI, vol. 8, no. 2, pp. 248255,
March 1986.
[13] E.H. Mamdani and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller, Journal
of Man-Machine Studies, vol. 7, no. 2, pp. 113, 1975.