You are on page 1of 12

Systematic information fusion methodology for static

and dynamic obstacle detection in ITS


Yajun Fang, Berthold K.P. Horn, Ichiro Masaki
Intelligent Transportation Research Center
Microsystems Technology Laboratories (MTL)
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
yajufang@csail.mit.edu, bkph@csail.mit.edu, masaki@mit.edu

Abstract
Environment understanding technology is vital for intelligent vehicles that are expected to automatically respond to fast changing environment and dangerous situations.
To obtain perception abilities, it is expected to automatically detect static and dynamic
obstacles, and estimate their related information. Conventional methods independently
detect individual piece of overall information. Each process is computationally heavy
and often produces noisy results without high reliability. Here we propose fusion-based
and layered-based methodology to systematically detect dynamic and static obstacles
and obtain their location/timing information for visible and infrared sequences. The
proposed obstacle detection methodologies take advantage of connection between different information and increase the computational accuracy of obstacle information
estimation, yet reduce computing time, thus improving environment understanding
abilities, and driving safety.
Keywords:intelligent transportation system, intelligent vehicle, sensor fusion, obstacle detection, time-to-contact, distance detection, image segmentation, motion stereo

Introduction

The research on intelligent transportation system has attracted more and more attention
because of the following concerns: safety, security, efficiency, mobile access, and environment.
Around 75 percent of vehicular crashes are caused by inattentive drivers [1]. According to the
National Highway Traffic Safety Administration (NHTSA) in US (Traffic Safety Facts 2006)
[3], more than 6.1 million police-reported motor vehicle crashes occurred in the United States
in 2006, leading to 42,642 death and 2.575 million people injured. The lifetime economic cost
of these crashes is estimated over $150 billion annually [2]. Furthermore, age-related decreases
in vision, cognitive functions, and physical impairments affect the driving ability of senior
drivers(Owsley 1999). A 50-year-old driver needs twice as much light to see as does a 30-yearold driver[5]. Seniors are over represented in traffic fatalities. Nine percent of the population
are involved in 13 percent of the fatal crashes. Their fatality rate is 17 times as high as the
25-65 age group (NHTSA) [2]. There is an increase of the number of senior drivers who have
difficulties in driving. According to the U.S. Census Bureau, in 1994 one out of every eight
1

Americans was age 65 or older. In 2050, those aged 65 and over will be one out of every five
Americans[6]. Between 1995 and 2005, the increase of licensed senior drivers is 17-percent in
contrast to 14-percent for total drivers. Thus, while human driving errors lead to dangerous
outcome, driving safety becomes more and more challenging for aging drivers and the whole
society.

1.1

The status and challenges of current safety enhancement research

Since 1980, researchers have been working on technologies to enhance vehicle safety and driver
comfort as summarized in [1]. The previous research focus was to improve the capabilities of
vehicles, including electronic fuel injection, antilock braking systems, and cruise control, etc.
The current interest is to develop driver assistance systems and automation technologies[17],
including passive safety systems, such as anti-brake system, stability control, etc., and active
driver assistance systems such as adaptive headlamps, blind spot monitoring, autonomous
cruise control, lane departure warning, driver drowsiness alert, etc.
About 60% crashes at intersections and about 30% head-on collisions could be avoided if
drivers had an extra half-second to respond [10]. It is estimated that implementing collisionavoidance systems in vehicles could prevent 1.1 million accidents in US each year 17 percent
of all traffic accidents, which could save 17,500 lives (compared to the 10,500 lives saved by
seatbelts and airbags) and $26 billion in accident-related costs [1]. The demand for in-car
electronic products, especially for, safety applications, is increasing. Actually around 35
percent of the cost of car assembly comes from electronics[7]. Revenues in the U.S. will be $29
billion by 2013[7]. Additional vehicle networking might result in a market worth $1 billion
per year in 2014. However these above systems are far behind the drivers need.

1.2

Challenges of Conventional Methods to Obtain Obstacle Information

Current algorithms involve heavy computational load while their reliability and accuracy are
limited. To detect driving environment, we need both the statistical and dynamic information. The static information are: location, obstacle/foreground at different distance ranges,
etc. The dynamic information are: speed, time-to-contact, possible cross over or occlusion,
obstacle/foreground at different time-to-contact range, collision/occlusion possibility, and
other current/historical information, etc.
To obtain distance information for automatic cruise control, researchers have used radar,
ladar or binocular stereo vision. A radar detects potential danger during bad weather and
night, but a plain radar does not provide horizontal information. Binocular stereo is subject
to correspondence error, which is hard to avoid and leads to the inaccuracy of depth estimation. The performance of binocular stereo is also subject to other disturbances. The bumping
of vehicles make it not easy to calibrate stereo cameras. To obtain obstacle location information, people apply image segmentation/recognition algorithms to visible images for day
time information and to infrared images for night time. Traditional segmentation algorithms
are challenged by noisy background [16], specifically for background with non-rigid objects,

such as trees, or constantly-changing unpredictable backgrounds. Static segmentation usually


depends on special assumption and the performance is usually limited.
For intelligent vehicle application, people also take advantage of the following features,
such as the symmetry of obstacle regions [13] and asymmetry of background, obstacle motion [14], etc.Segmentation/tracking algorithms are also challenged by the significant variation of image position and size of objects between successive video frames. Instead, we need
a general algorithm with less assumption.
1.2.1

Expectation for Cooperated Sensors and Automatic Environment Understanding

So far, conventional methods independently detect individual information for dynamic environment interpretation and lead to many in-vehicle sensor devices. While new techniques
enhance the drivers capabilities and comfort, the infusion of the uncoordinated in-vehicle
devices may just overwhelm drivers, and disturb drivers attention & alertness, thus may
degrade driving safety and performance[1]. In order to help drivers to respond better to
fast changing environment and dangerous situations, all safety-enhanced devices should be
cooperated within a framework to manage & display information. All dynamic and static obstacle information should be integrated to provide an accurate description of current driving
environment, say, are there any obstacles (pedestrian, vehicles, etc.)? Where are they in the
camera? How far away? Whether and when will the current vehicle run into these obstacles?
In summary, in order to aid drivers, we need a highly coordinated and integrated system that will work cooperatively with drivers [1] to provide obstacle information. In the
next section we introduce fusion-based and layered-based schemes to systematically detect
and combine obstacle information. The additional information from other sensors helps to
simplify original complicated task. Thus the scheme is simpler, more reliable and provides
better performance.

Fusion-based and Layered-based Scheme to Obtain


Obstacle Information

Sensor fusion is a common technology to improve the detection performance by combining


information from different resources. The technology can be divided into three categories:
feature-level fusion, data-level fusion, and decision-level fusion[8][9]. Feature-level fusion
takes advantage of specific information from one sensor and uses the information in another
sensors process, which needs special algorithms to incorporate the additional information.
Data fusion combines raw data from different resources statistically, for example, through
voting technique, Bayesian, and Dempster Shafer, etc., which requires all data in similar format. Decision-level fusion takes all sensors decision (after individuals process) into account
when making final decisions at a higher level.
We are proposing a general frame work shown in Figure (1)(a) which belongs to featurelevel fusion category. Our framework incorporates information obtained from other detectors/algorithms into the segmentation process to match corresponding obstacles information,
which improves detection performance. Instead of estimating separately static/dynamic information, including distance ranges, segmentation, motion, and classification features, our
3

scheme makes use of the physical connections among these features. Additional information
such as distance, motion, and dynamic history can all be used to enhance the segmentation
accuracy. The additional information can be of low quality, yet they can improve overall performance [18] [19] [23] [24]. Furthermore, the accurate segmentation information can be used
to improve the detection accuracy of these fused information or other timing information, for
example, time to contact.

(a)

(b)

(c)

(d)

Figure 1: (a) General framework of fusion-and-layer based methodology. (b) Framework for
3D segmentation. (c) Framework for fusing historical information and dynamic estimation. (d)
Framework for night vision.
Our scheme is not only fusion-based but also layer-based. With extra information, one
complex task can be splitted into several simple tasks in different layers, which are easier to
solve than the complex one. Different signal combinations in Figure (1)(a) lead to different
applications as highlighted by different color shading blocks. The first application is 3D
segmentation shown in Figure (1)(b) which detects obstacles at different distance ranges
and provides complete 3D information for the current road situation [18] [19] [21]. The
segmentation scheme incorporates distance range information and motion information and
significantly improves its obstacle segmentation performance. The second application shown
in Figure (1)(c) is to incorporate historical information or predicted information from a
dynamic model into segmentation process in order to understand complicated scenario [12]
and to obtain time-to-contact information for obstacles [24]. The third application shown
in Figure (1)(d) is a pedestrian segmentation scheme for infrared images. The scheme is a
fusion-based and layer-based method and takes advantage of classification feature to enhance
the segmentation accuracy [23]. In the following sections, we discuss how the fusion-based
and layer-based principles are used in our different applications.

The principle of layered techniques: Divide and conquer

The major principle of our framework is to convert a complex segmentation task into several
simple segmentation tasks so that the time-consuming full-image search can be avoided. In
section 3.1, we discuss how to separate an image into several distance-based image-layers, and
each layer only contains potential obstacles within a particular distance range. In section 3.2,
we discuss how to separate an image into several vertical stripes, and each stripe contains
potential pedestrian obstacles.

3.1

Segmentation based on different distance-based edge layers

Correspondence error of traditional binocular stereo leads to depth estimation noise and
segmentation error in detecting boundary pixels. Instead of directly utilizing noisy depth,
we split an edge map into several edge layers at different distance ranges (that also includes
background range), based on binocular disparity histogram as in [18][19].
3.1.1

Distance-based object layer separation

For each pixel in left edge images, we search right edge images for its candidate correspondence pairs within a given disparity range. For all edge pixels in the left edge image, we
keep the pixels that we can find correspondence within the given disparity range in the right
edge image. For each given distance range, those corresponding edge pixels make up the
distance-based edge layers, which includes obstacle information within the range. Thus
we separate the original stereo edge map into different distance layers for further segmentation. The distance range information can be from a radar, other types of sensors, or distance
detection algorithms, say, the disparity histogram acquired through edge-based trinocular
methods [11] or our motion-based binocular methods [21]. More details can be found in
[18] [21].
70

150

60

50
100

40

30

50

20

10

0
10

(a) Binocular stereo images

(b) Motion vector for (a)

10

20

30

40

50

60

0
10

10

20

30

40

50

60

(c) Disparity histogram for (a).

Figure 2: Similarity of motion vectors for binocular stereo images & motion-based disparity histogram. For (c) Left: motion-based. Right: Trinocular-based. x coordinate: disparity value. y
coordinate: the number of edge pixels within the disparity range.

For stereo image pairs shown in Figure (2)(a), the disparity histogram has three peaks
representing three nearest vehicles whose disparity ranges are 22-24, 17-19, 7-9 as shown in
Figure (2)(c). Figure (3) shows the original edge map, the separated distance edge layer for
disparity range 22-24, and the final segmentation results. The results capture target objects
within the disparity range.

(a)

(b)

(c)

(d)

Figure 3: Procedure to locate objects within a distance range (corresponding to disparity range 2224) (a) Original edge map (b) Corresponding pixels within the disparity range. (c)(d) Segmentation
result.

3.1.2

Segmentation based on background removal

Distance layer separation can be used to detect background layers which consist of objects beyond some distance ranges. When distance layers are separated, edge pixels may be assigned
to several distance ranges in order not to lose obstacles pixels when there is correspondence
ambiguity. Typically, extra pixels in different layers can be removed through segmentation,
but extra pixels from background, for example, non-rigid objects, will lead to segmentation
noises shown in Figure (5)(b1)(c1). With background-layer separation[19], we can remove
pixels in background layers from each object layers as shown in Figure (5)(b2)(c2). The
operation can reduce the impact of background on segmentation error, but it might remove
target pixels and lead to smaller segmentation than it should be, which can be compensated
by motion-based region expansion discussed in section 4.2.1.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4: Binocular stereo frames and edge images. (a)(b) Left/Right images at time n. (d)(e)
Left/Right images at time n 1. (c)(f) Left/Right edge images at n. Images (a)(b)-(d)(e) are
courtesy of Daimler-Chrysler Research and Technology.

For stereo frames shown in Figure (4), Figure (5) shows the segmentation results for one
obstacle distance layer without and with applying background pixel removal operation. The
result indicates that the distance-based background distance layer helps to remove most tree
pixels in the background while preserving all pixels of target objects including the ball.

(a1)

(b1)

(c1)

(a2)

(b2)

(c2)

Figure 5: Distance-range-based segmentation results without and with background removal. Figure(a1)(b1)(c1)/Figure(a2)(b2)(c2): Without/with background removal. (a) Edge layer at disparity
range 17-19 for Figure (4)(a)(b). (b)(c) Segmented regions.

3.2

Vertical stripe-based object separation

In previous section, we have discussed how to separate an image into several image layers.
This section we will split an image with a size of nrow ncol into several narrower vertical
stripes with a size of nrow ni , ni < ncol . The original full-image segmentation is solved
using the strategy of Horizontal-segmentation first, Vertical segmentation second.
3.2.1

Vertical stripes separation for pedestrian candidates - brightness-based


for infrared frames

One of our method to split image horizontally is based on bright-pixel-vertical-projection


curves, i.e., bright-pixel number in image columns versus their corresponding horizontal
positions[23]. As shown in Figure (6)(a)(b), the horizontal locations and width of projection
waves correspond to pedestrians and are robust to parameter choices. Then we can apply two
vertical segmentation algorithms based on either brightness or bodylines and automatically
estimate human sizes in scheme shown in Figure (1)(d).

(a)

(b)

(c)

(d)

Figure 6: The Feature of Bright-Pixel-Vertical-Projection Curves for Infrared Images and Segmentation Results. For (a)(c): Winter. For (b)(d): Summer. (a): Top row: original infrared image in
winter. Center/Bottom row: bright-pixel-vertical-projection curves when using two different thresholds. (b): Top row: original infrared image in summer. Center row: bright-pixel-vertical-projection
curve. Bottom row: horizontally segmented image stripes based on projection curve. Note that
Several separated stripes shown in the center row seem to be connected. For (c): Brightness-based
vertical segmentation results. (d): Bodyline-based vertical segmentation results. For all projection
curves: X axis: Image column position. Y axis: Number of bright pixels in each column.

3.2.2

Vertical stripes separation for pedestrian candidates - difference-imagebased for visible frames

Our another method of vertical image-stripe separation is based on the vertical projection
of difference images between consecutive images. The vertical image stripes corresponding to
sharp triangle spikes in vertical projections may contain pedestrian candidates as discussed
in framework Figure (1)(c). Within these vertical stripes, human beings can be further
segmented. This method can be used for difference-image-based moving-object detection.
7

The principle of fusion techniques

Our framework shown in Figure (1)(a) is based on information fusion. In Section 3, additional
information is used to separate the original image segmentation into segmentation in different
layers/stripes. In this section, we discusses the fusion techniques used in our general scheme.

4.1

Information fusion for obstacle detection: Segmentation based


on fusion of segmentation and classification features

The scheme in Figure (1)(d) takes advantage of classification feature to enhance the segmentation accuracy for infrared images [23]. Within each separated image stripe, in order
to choose among multiple candidate regions during the vertical segmentation, we use one
histogram-based classification feature to search for the best candidate within the stripe and
thus we avoid brute-force searching.
Our method balances the complexity and performance of two subsystems: segmentation
and classification. The method focuses on improving the performance of combined segmentation/classification systems instead of maximizing one process while sacrificing the other.
High quality segmentation can ease the classification task, while robust classification can
tolerate segmentation errors. The performance comparison is shown in Figure (7) and more
details are discussed in [23].

Figure 7: Detection Performance Comparison.

4.2
4.2.1

Fusion with historical information


Segmentation based on fusion with motion information

The segmentation performance can be improved by introducing motion information into the
above distance-based segmentation discussed in Section 3.1.1. Figure (8)(a) and (b) show that
motion vectors of edge pixels in the obstacle distance layer stand out much better than in the
original edge pixels because of the impact from background noise. Thus the segmentation
result from Figure (5)(c2) can be enlarged to its real size as shown in Figure (8)(c) and
(d). The process compares the motion/depth similarity of segmentation seed boxes and

surrounding edge pixels and includes target pixels lost in the process of removing background
pixels.

(a)

(b)

(c)

(d)

Figure 8: (a) and (b): Motion vectors for all edge pixels and for obstacle distance layer for Figure (4)(a). (c) and (d): Motion-based segment expansion results based on Figure (5)(c2).

Furthermore, we can eliminate false segmentation blocks in background region by removing static blocks based on motion information when we are only interested in moving-object.
As in Figure (5)(c1), two false boxes will be also erased when eliminating static blocks,
leading to the same motion-based expansion results shown in Figure (8)(c)(d).
4.2.2

Complicated scenario understanding: segmentation based on fusion of


initial segmentation and dynamic tracking

In order to enhance segmentation accuracy, the fusion-based scheme in Figure (1)(c) takes
advantages of combined information, including, initial horizontal segmentation mentioned
in section 3.2.2, estimated human locations from dynamic tracking model, and information
from previous detection.
The combination of segmentation and dynamic tracking makes the whole human detection algorithm simple but effective in reducing the segmentation/tracking ambiguity when
human intersects and occludes. We detect potential intersection by observing whether there
are independent sharp triangle spikes at estimated regions. When two people start to merge,
the original two independent triangle spikes also merge. Then we search for the potential
head locations at all the peaks of merged waves by comparing their corresponding image
regions with human body trunk templates from last frames. After heads are detected, we
can reuse the size information from previous frames to finish human segmentation to avoid
wrong segmentation during serious occlusion. The current detection results show that our
algorithm has the capability to track humans with changing walking poses, to deal with segmentation with partial occlusions and target scale variations, and has significantly improved
the detection accuracy from 50% to 85% and reliability for our 26 video sequences. Details
are discussed in [12].
4.2.3

Motion-based correspondence matching criteria

In order to remove matching ambiguity for stereo vision, we introduce motion information
into correspondence matching criteria as in [21]. Because of object rigidity, there exists
similarity of motion information for the same target feature-points calculated from the left
and right camera video frames. As in Figure 2(b), motion vectors for both stereo video
frames show similar patterns even if motion vector detection involves noise. Though motion
9

vectors themselves are very noisy, fusing both traditional epipolar line constraint and motion
constraint help to improve correspondence reliability.

4.3
4.3.1

Information fusion between boundary and interior pixels


Time to contact estimation based on the fusion of boundary feature points
and interior image pixels

Traditional time-to-contact methods are based on location and motion information of boundary feature points of interested obstacle. We proposed a method[24] using both boundary
pixels and interior image pixels by accumulating sums of suitable products of image brightness derivatives. The method works directly with the derivatives of image brightness and
does not require object detecting, feature tracking, estimation of the optical flow, or any
other higher level processing, which greatly enhances the calculation speed and accuracy.
Specifically, the direct method can deal with situations when the obstacle size might change
in sub-pixel.
4.3.2

Other cases

There are several other cases when we take advantages of both boundary and interior pixels.
Traditional segmentation algorithms are mainly based on the features of boundary pixels. The
interior pixels within an object are not effectively used except for the template matching.
In the motion-based expansion as in section 4.2.1, the motion/depth information of interior
pixels is used to expand initial segmentation region based on the similarity between interior
and boundary objects. In infrared-based pedestrian detection, the interior pixels within the
boundary are used to detect the vertical stripe containing targets, which helps to simplify
traditionally difficult tasks and thus improves segmentation accuracy. Binocular disparity
histogram is also acquired using both boundary and interior pixels.

Summary

We have proposed a sensor fusion scheme on how to set up sensors, how to associate data
and how to fuse data among various sensors according to different task requirements and
specific task environments. The central segmentation is fusion-based and layer-based. By
communicating information among sensors ahead of the final processing period and making
use of their close relationship, better environment interpretation is achieved. The additional
information helps to divide original complicated segmentation task into several simpler tasks
for better performance. For example, we can split original image map either into several
distance-based layers, including background layers, or projection-curve based vertical candidate stripes, and differentiate static and moving candidates. The idea behind it is that
the combination of several simple segmentation operators are better than one complicated
segmentation operator. The scheme is task oriented and object oriented. Instead of paying
attention to all frame pixels and their complete 3D reconstruction, we focus on target objects
and their related information. The central segmentation block is fusion-based which can incorporate distance and motion information, classification feature, historical information, and
information from dynamic tracking model at the feature level.
10

Our proposed scheme has the following advantages. Firstly, the scheme can be applied
in very general situations and fast changing environment with satisfying performance. It
does not make assumptions about the driving environment. It does not assume either the
symmetry of detected objects or the appearance of same objects with similar shape/size in
consecutive frames. For pedestrian detection, it only assumes that local contrast between the
image of a pedestrian and its surroundings and avoids using features like faces, skins, etc. The
scheme has the ability to automatically estimate size of pedestrian regions, to track them, and
to detect multiple objects of different sizes within the same distance range. The algorithm can
also detect non-rigid objects, moving-objects, and differentiate objects from their occlusion
in very complicated situations. Secondly, the computational load of our proposed scheme
is low. The horizontal-first, vertical-second segmentation scheme involves only 1D searching
in vertical direction with computational load o(n) while conventional template-shape-based
segmentation involves searching with computational load o(n2 ) [23]. For dynamic tracking,
we only compare human template from previous frame with several candidates based on
projection curve of difference image [12]. We acquire simple extra vertical segmentation for
each frame, and avoid complicated tracking model nor long initialization time. Thus we
can significantly decrease potential ambiguity and computational load. Thus, our proposed
scheme decreases the quality requirement of individual sensors and the computational load,
while increasing estimation accuracy and robustness of obstacle information. Therefore our
scheme improves environment understanding abilities for driving safety. Special thanks to
Dr.Franke Uwe and Dr.Stefan Heinrich for providing the video frames in Figure 4(a)(b)(d)(e), courtesy of DaimlerChrysler Research and Technology.

References
[1] The Intelligent Vehicle Initiative: Advancing Human-Centered Smart Vehicles. Available:
http://www.tfhrc.gov/pubrds/pr97-10/p18.htm.

[2] NHTSA 2020 Report. People Saving People On the Road to a Healthier Future. Available:
http://www.nhtsa.gov/nhtsa/whatis/planning/2020Report/2020report.html

[3] 2006

FARS/GES Traffic Safety

Facts

Annual Report (Early Edition). Available:

http://www.nhtsa.gov/portal/nhtsa static file downloader.jsp?file=staticfiles/DOT/NHTSA/NCSA/Content/TSF/2006/810809.pdf

[4] 2006

Traffic

Safety

Fact

Sheets

older

population.

Available:

http://www.nhtsa.gov/portal/nhtsa static file downloader.jsp?file=/staticfiles/DOT/NHTSA/NCSA/Content/TSF/2006/810808.pdf

[5] Nebraska highway safety program and the lincoln-lancaster county health department (2001,
July). Available: http://nncf.unl.edu/eldercare/info/seniordriving/nightdrive.html.
[6] Senior drivers. Available: http://www.nysgtsc.state.ny.us/senr-ndx.htm
[7] Asia

New

Hotbed

for

Consumer

Automotive

Electronics.

Available:

http://www.technewsworld.com/story/52539.html

[8] Frank Cremer, Wim de Jong, Klamer Schutte. Fusion of Polarimetric Infrared Features and
GPR Features for Lanmine Detection, 2nd International Workshop on Advanced GPR, 14-16
May, 2003, Delft, The Netherlands.
[9] Silvia Ferrari, Alberto Vaghi, Demining Sensor Modeling and Feature-Level Fusion by
Bayesian Networks, IEEE Sensors Journal, Vol.6, No.2, Apr.2006.

11

[10] Peter Haapaniemi. Smart Vehicles Have Minds of Their Own, Safety + Health, November
1996, p. 64.
[11] Bergendahl Jason. A computationally efficient stereo vision algorithm for adaptive cruise
control, MIT Master thesis, 1997.
[12] Yajun Fang. Information fusion for static and dynamic obstacle detection in ITS, MIT Ph.D.
thesis, 2008.
[13] M.Bertozzi, A.Broggi, A.Fascioli, S.Nichele. Stereo Vision-based Vehicle Detection, Proc.
of IEEE Conf. on Intelligent Transportation System, 1997,pp.717-722.
[14] Thomas Meier, King N. Ngan. Automatic Segmentation of Moving Objects for Video Object
Plane Generation, IEEE Transactions on Circuits and Systems For Video Technology, Vol.8,
No.5, September, 1998, pp.525-538.
[15] D. Willersinn, W.Enkelmann. Robust Obstacle Detection and Tracking by Motion Analysis,
Proc. of IEEE Conf. on Intelligent Transportation System, 1997,pp.717-722.
[16] Chris Stauffer, W.E.L. Grimson. Adaptive background mixture methods for real-time tracking,
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 1999,
p.246-252.
[17] IVI: 8 Major Problems. Available: http://www.its.dot.gov/ivi/8MPA.html#SI
[18] Yajun Fang, Berthold Horn, Ichiro Masaki, Distance Range Based Segmentation in Intelligent Transportation Systems: Fusion of Radar and Binocular Stereo, Proceedings of IEEE
Intelligent Vehicles Symposium 2001, 2001, pp.171-176.
[19] Yajun Fang, Berthold Horn, Ichiro Masaki, Distance/Motion Based Segmentation under
Heavy Background Noise, Proceedings of IEEE Intelligent Vehicles Symposium, 2002, pp.483488.
[20] Yajun Fang, Yoshiki Ninomiya, Ichiro Masaki, Intelligent Transportation Systems - Challenges and Opportunities, the 2nd International Symposium on Multimedia Mediation Systems, 2002, pp.72-77.
[21] Yajun Fang, Berthold Horn, Ichiro Masaki, Depth-Based Target Segmentation for Intelligent
Vehicles: Fusion of Radar and Binocular Stereo, IEEE Transactions on Intelligent Transportation Systems, Vol.3, No.3, Sept. 2002, pp.196 -202.
[22] Yajun Fang, Keiichi Yamada, Yoshiki Ninomiya, Berthold Horn, Ichiro Masaki, Comparison
between Infrared-image-based and Visible-image-based Approaches for Pedestrian Detection,
Proceedings of IEEE Intelligent Vehicles Symposium, 2003, pp.505-510.
[23] Yajun Fang, Keiichi Yamada, Yoshiki Ninomiya, Berthold Horn, Ichiro Masaki,A ShapeIndependent-Method for Pedestrian Detection with Far Infrared-images, Special Issue on
In-Vehicle Computer Vision Systems of the IEEE Transactions on Vehicular Technology,
Vol.53, No.6, Nov. 2004 (pp.1679-1697).
[24] Berthold Horn, Yajun Fang, Ichiro Masaki, Time to Contact Relative to a Planar Surface,
Proceedings of IEEE Intelligent Vehicles Symposium, 2007.

12