Professional Documents
Culture Documents
I. INTRODUCTION
91
Fig. 1. Three-stage process for estimating mean vehicle speed. 1) Estimation of camera position. 2) Image feature measurement for camera calibration.
3) Estimation of mean vehicle speed for each lane.
(2)
These expressions are further simplified because objects are as. Next, there
sumed to lie on the ground plane where
is a displacement to obtain the camera-centered coordinates
.
(3)
Fig. 2. Camera and roadway geometry.
(1)
(4)
(5)
Fig. 3 contains an overhead view of the road scene contained
in Fig. 2. We can measure several points on the road boundary
and
in the image that correspond to those in Fig. 3.
lines
and
at the intersections of line
We select 2-D points
with the and axes. The corresponding 3-D points
and
lie at the intersection of
with the and axes. Equations
(4) and (5) show the convenience of this selection, i.e.,
implies
and
implies
.
92
Fig. 4. Road geometry in the image showing the vanishing points for lines
parallel and perpendicular to the road.
on the line
in the image. This is obtained analytically by
.
evaluating (5) as
(8)
Thus, points infinitely far from the camera map onto the line
in the image. Similarly, we can evaluate (4) as
,
for lines parallel to the road
using the fact that
(see Fig. 3)
X and Y
axis
(6)
and
lie on a line such that the Y
Similarly, we note that
is
. Using (4), the u
coordinate of
in the image plane is
coordinate of
(7)
These two expressions contain all of the camera parameters and
relate them to two measurable quantities in the image.
Vanishing point coordinates can also be estimated from the
image. The coordinate of the vanishing line of the
ground plane is
and is shown in Fig. 4. Any pair
of parallel lines on the ground plane intersect at a point that lies
(9)
Similarly, the 3-D coordinates for lines perpendicular to the
, as shown in Fig. 3.
roadway are related by
Their vanishing point (labeled ( , ) in Fig. 4) has a coordinate of
(10)
Estimation of these five fundamental quantities [(6)(10)] in the
image allows us to estimate the position of the camera, its focal
length, and its orientation.
III. CAMERA POSITION ESTIMATION
Having expressed the camera parameters in terms of measurable quantities, we now describe the first stage of our algorithm, which estimates the camera position by analyzing the
traffic scene. We create an activity map for vehicle motion in the
images. From this activity map, we can obtain estimates of the
vanishing point ( , ) of the roadway and the lane boundaries.
In turn, we use the lane boundaries to extract sets of lines that
are perpendicular to the roadway in 3-D coordinates, obtaining
the vanishing point ( , ) shown in Fig. 4. The two vanishing
points, together with the lane boundaries, enable us to estimate
the camera position.
We use JPEG images sampled five times per second as the
basic set of images to support this effort. The speed and format
are chosen so that the algorithms created will be useful with the
93
line templates. We obtain the line structure from the activity map
using Cannys edge detector [13] with
(for 320 240 images) and an edge sensitivity threshold of 0.01. The wide kernel
and small edge sensitivity insure that all edges are detected.
Once a binary edge image is available, we employ the Hough
transform [14] to detect the lines that intersect at the vanishing
point ( , ). Because we require the road to exit the bottom
of the image, we parameterize the lines by their angle to horiwith the line
in the
zontal and their intersection
image as follows:
(12)
in single-pixel intervals and such that the
We quantize
transform can distinguish lines at the extremes of the bottom
half of the image
present generation of inexpensive digital cameras. In our implementation, we only process the lower third of the images due to
the quantization error in measuring vehicle positions far from
the camera. We apply this mask to all images used by the algorithms; however, in our presentation we include the complete
images for clarity. Using even less of the image (e.g., the bottom
one-fourth) would reduce the quantization errors in the position
at the expense of reduced accuracy in measuring the lines used
to estimate the vanishing point and lane boundaries.
A. Generating the Activity Map
An activity map is used to identify both the location and intensity of vehicular motion [12]. Inactive lanes are thus excluded
from the speed estimation. We assume that pixel values change
over time as the vehicles pass through the image and that the
only significant motion in the image is due to the vehicles. We
generate the activity map by calculating the expected value of
the absolute intensity difference between two frames and
. Each image frame is smoothed with a 3 3 boxcar kernel
to remove JPEG artifacts prior to the calculation of
(11)
Such a summation of an image sequence resembles a concept
presented in [12] where the authors describe how to generate
an activity map with binary images obtained by thresholding.
They use the map to obtain very rough lane masks using morphological operators. Our method does not require us to choose
a threshold to binarize the image and is more appropriate for the
task of extracting the traffic lanes. For example, the sample activity map for a normal traffic scene in Fig. 5 shows horizontal
gray-scale variations, indicating the middle of the lanes and their
boundaries. The small blotches on the right and top of the image
are due to raindrops on the camera lens.
B. Estimating the Vanishing Point (
(13)
After generating the Hough transform, we accept lines that occupy at least 40% of the bottom window, i.e., the Hough accu. Fig. 6 contains a sample Hough
mulators that exceed
transform image where the dark spots indicate the ( , ) pairs
corresponding to lines found in the activity map.
Once we have detected the lines, we fit the parameters for
each ( , ) pair to a second model to estimate the vanishing
point ( , ). The line connecting the vanishing point ( , )
, ) on the horizontal line
has an
and a point (
orientation angle of
(14)
We apply the LevenbergMarquardt method [15], [16] of noncoordinate data,
linear least-squares optimization to the
and . Finding
using (14) as the model for estimating
and yields a curve that best fits the data, as shown in Fig. 6.
C. Estimating the Lane Boundaries
To estimate the lane boundaries and road width, we sample
the activity map along the lines connecting the vanishing point
in the bottom row of the image. Aver( , ) and pixel
aging the activity value along each line creates a one-dimensional signal that is a function of , as shown in Fig. 7. The
peaks indicate strong traffic activity (middle of the lane) and
the valleys indicate the absence of vehicle activity (lane boundaries). After filtering with a low-pass FIR filter, we count any
peaks that have at least 50% of the activity of the most active
peaks, then there must be
valleys.
lane. If we locate
using the distances
We estimate the horizontal lane width
and valley positions
between the peak positions
We use the line structure present in the activity map to estimate the vanishing point for lines parallel to the roadway using
(15)
94
Fig. 6.
Hough transform in u
Fig. 8. Sample image with superimposed lane boundaries found using the new
algorithm.
Fig. 7. Average relative activity across the lower third of the activity map in
Fig. 5 at the orientations specified by the vanishing point.
(16)
and
represent extrapolawhere the cases of
tions of the boundaries of the road.
The lane masks are generated by connecting the lane
,
) and the vanishing point
boundary coordinates (
95
Fig. 9.
(17)
and
. 4) For each image
where
, we calculate its edge gradient magnitude and subtract from it
the edge gradient magnitude of the background. 5) We threshold
containing
the resulting difference to obtain a binary mask
the vehicle edges. The threshold is chosen as a fraction of the
maximum image intensity. An ideal step edge has a maximum
filter response of approximately 10% of the step magnitude.
so that step edge magniThus, we threshold at
tudes are ideally at least half of the maximum image intensity.
with each lane mask generated in the
6) We logically AND
previous section. 7) We calculate the Hough transform [14] of
the binary edge image for each lane using the standard line pa. Since we require the
rameterization
vehicles to be passing through the bottom of the image we redegrees when
and
quire
degrees when
. In the Hough transform we quantize
into 2-pixel increments, and quantize alpha into 0.15-degrees
increments for images of size 640 480. 8) We select ( , )
bins from the Hough transform that contain enough pixels to
traverse at least one-third of the lane width, since vehicles will
be at least this wide. For images that are suitable for use with
this algorithm, the resulting distribution of is approximately
using
normal. We obtain an estimate of
(18)
for each of the ( , ) pairs where the line is sufficiently long. 9)
as an estimate of
We select the median of the distribution of
because the nonlinear transformation for creates a peaked
distribution with a long tail.
Unlike the exaggerated perpendicular lines shown in Fig. 4,
the slope of the vehicle edges in typical traffic images (e.g.,
Fig. 8) is typically quite small and decreases with distance from
the camera. Due to the image quantization, only lines with a
slope of at least 1/15 (0.067) can be accurately detected by our
algorithm; smaller slopes are unreliable. The slope of each perpendicular line depends on the camera parameters , , and .
The slope typically decreases as its vertical image position increases. The simulation in Fig. 10 illustrates the variation in the
vertical length
for an image line that is 45 pixels long with
,
). Using values for a typleft coordinate (
ical road scene, the camera height was fixed at 40 feet, its focal
length at 800 pixels, and the lane width at 12 feet. The central
pixels. Thus, when
flat section indicates where
800, the operator must orient the camera in one of two regimes.
When too many of the lines (e.g., 25%) have a slope less than
1/15, then we conclude that the camera must be repositioned in
order to estimate .
E. Correcting the Activity Map
The estimation of the camera position is very sensitive to errors in the lane boundaries, which are only as accurate as the activity map. When generating the activity map we assumed that
), which is clearly
the vehicles do not have any height (i.e.,
yields
false. Correcting (4) and (5) for
(19)
(20)
One can verify using these equations and (8) and (9) that a
nonzero value of does not change the location of the vanishing
is significant,
point. However, is visibly altered when
96
TABLE I
COMPARISON OF CALIBRATION RESULTS FOR TWO CASES
Fig. 11.
Fig. 12.
and
(22)
often causing pixels from a vehicle to contribute to the activity
measurement of the adjacent lane (e.g., Fig. 8).
We now propose a method of postprocessing the activity map
to correct for the finite height of the vehicles. We observe that
the extent of an average vehicle in the direction is roughly
one-half of the apparent vehicle height. Based on this assumption, we estimate the vehicle height by subtracting a given image
frame from the background and obtaining a mask of the vehicles via thresholding (see Fig. 9). The vehicles are isolated by
scanning the lane masks found with the method described in
and lowest
Section III-C. For vehicle , its apparent height
vertical position is recorded.
After obtaining data for several hundred vehicles, we apply
to the data, as shown in Fig. 11. We use
a linear fit
to postprocess the activity map
by convolving it
in the direction with an adaptive averaging kernel as follows:
(21)
Fig. 12 contains the postprocessed version of Fig. 5. The resulting image is smoother and shifted downward slightly. The
lane boundaries can now be re-estimated using the method of
Section III-C.
97
Fig. 14. (a) Distribution of 20-s average speeds measured by inductance loops.
(b) Distribution of vehicle tracking results (camera calibrated by algorithm).
(c) Distribution of vehicle tracking results (camera calibrated by hand).
estimates from a video sequence. After extracting the background and lane masks, we obtain the position of the cars in
each lane by scanning individual lanes of the car blob image of
Fig. 9 until we encounter the bottom edge of a vehicle blob. We
record the center of the lane as the coordinate and the bottom
edge as the coordinate for the vehicles position, and the current position is added to the time series unique to each vehicle.
Using only the bottom third of the image typically allows about
four or five position samples of the vehicle at 60 MPH and five
frames/s. The position data is used to provide two-frame estimates of the vehicle speeds on the road.
We determine the direction of traffic flow by horizontally averaging the lane mask for each car blob image to create a one-dimensional signal. The positive or negative lag in the average
cross-correlation between time samples indicates the direction
of traffic flow.
Because we assumed the vehicles lie on a flat plane when
developing our camera model, we can transform the image coordinates into their 3-D coordinates ( , , 0) using the camera
calibration, similar to the result of Lai and Yung [11]. The image
coordinates come from the bottom edges of the vehicle blobs,
98
which lie on the ground plane. Referring to (4) and (5), we solve
for and , yielding
(25)
(26)
m/pixel is a scale factor that extends
where
Lai and Yungs result for real-world coordinates.
We tested our camera calibration methods on two different
sequences spanning 200 and 400 s, providing 563 and 452 vehicle speed estimates, respectively. Traffic density was light in
both cases; an example of case 1 is shown in Fig. 8. The results
for the two cases are summarized in Table I, Fig. 13 and Fig. 14
where the speed limit is 60 MPH. In case 1, inductance loops
are not available on the northbound side of the freeway. Fig. 13
shows that the camera can serve as a complementary sensor to
the inductance loops. In case 2, the inductance loops and camera
are sensing the same vehicles. The relative error of the mean vehicle speed is slightly less than or equal to that of the calibration
parameters. The sensor bias for the two cases is due to at least
two factors: 1) nonzero road grade and 2) the use of low-resolution (320 240) images that cause errors in the estimation of
the vanishing point and lane positions.
VI. CONCLUSION
In this work, we provide a detailed model of the roadway
scene suitable for analysis via computer vision techniques.
Using reasonable assumptions, our algorithm estimates the
position of the camera using information extracted from the
motion and edges of the vehicles. Given the camera position
we also show how to calibrate the camera in general based on
straightforward measurements of the scene using the image
sequence.
The results presented for estimating mean vehicle speeds indicate that uncalibrated surveillance video cameras can be used
to augment inductance loops as traffic condition sensors and
even replace loop sensors where it is not cost effective to install inductance loops.
REFERENCES
[1] F. Chausse, R. Aufrere, and R. Chapuis, Recovering the 3-D shape of
a road by on-board monocular vision, in Proc. 15th Int. Conf. Pattern
Recognition, vol. 1, 2000, pp. 325328.
[2] D. Beymer, P. McLauchlan, B. Coifman, and J. Malik, A real-time computer vision system for measuring traffic parameters, in Proc. IEEE
Comp. Soc. Conf. Computer Vision and Pattern Recognition, 1997, pp.
495501.
[3] Z. Zhu, B. Yang, G. Xu, and D. Shi, A real-time vision system for
automatic traffic monitoring based on 2-D spatio-temporal images, in
Proc. 3rd IEEE Workshop Appl. Comp. Vision, 1996, pp. 162167.
[4] Y. K. Jung and Y. S. Ho, Traffic parameter extraction using video-based
vehicle tracking, in Proc. IEEE/IEEJ/JSAI Int. Conf. Intelligent Transport., 1999, pp. 764769.
[5] S. Bouzar, F. Lenoir, J. M. Blosseville, and R. Glachet, Traffic measurement: image processing using road markings, in Proc. 8th Int. Conf.
Road Traffic Monitoring and Control, 1996, pp. 105109.
[6] J. Yang and S. Ozawa, Recover of 3-D road plane based on 2-D perspective image analysis and processing, IEICE Trans. Fund. Elect., Comm.,
and Comp. Sci., vol. E79-A, pp. 11881193, 1996.
[7] A. Fusiello, Uncalibrated Euclidean reconstruction: a review, Image
and Vision Computing, vol. 18, pp. 555563, 2000.
[8] L. de Agapito, R. I. Hartley, and E. Hayman, Linear self-calibration
of a rotating and zooming camera, in Proc. IEEE Computer Soc. Conf.
Computer Vision and Pattern Recognition, vol. 1, Dec. 1999, pp. 1521.
[9] H. Kim and K. S. Hong, Practical self-calibration of pan-tilt cameras,
in Proc. Inst. Elect. Eng. Vis., Image, and Signal Processing, vol. 148,
2001, pp. 349355.
[10] D. J. Dailey, F. W. Cathey, and S. Pumrin, An algorithm to estimate
mean traffic speed using uncalibrated cameras, IEEE Trans. Intell.
Transport. Syst., vol. 1, pp. 98107, Mar. 2000.
[11] A. Lai and N. Yung, Lane detection by orientation and length discrimination, IEEE Trans. Syst., Man, Cybern. B, vol. 30, pp. 539548, 2000.
[12] B. D. Stewart, I. Reading, and M. S. Thomson et al., Adaptive lane
finding in road traffic image analysis, in Proc. 7th Int. Conf. Road
Traffic Monitoring and Control, 1994, pp. 133136.
[13] J. Canny, A computational approach to edge detection, IEEE Trans.
Pattern Anal. Mach. Intell., vol. 8, pp. 679698, 1986.
[14] R. O. Duda and P. E. Hart, Use of the Hough transformation to detect
lines and curves in pictures, Comm. ACM, vol. 15, pp. 1115, 1972.
[15] K. Levenberg, A method for the solution of certain problems in least
squares, Q. Appl. Math., vol. 2, pp. 164168, 1944.
[16] D. Marquardt, An algorithm for least-squares estimation of nonlinear
parameters, SIAM J. Appl. Math., vol. 11, pp. 431441, 1963.
[17] T. F. Coleman and Y. Li, An interior, trust region approach for nonlinear
minimization subject to bounds, SIAM J. Optim., vol. 6, pp. 418445,
1996.
, On the convergence of reflective Newton methods for large-scale
[18]
nonlinear minimization subject to bounds, Math. Programming, vol.
67, pp. 189224, 1994.
[19] R. M. Haralick, Propagating covariance in computer vision, Int. J. Pattern Recognition Artif. Intel., vol. 10, pp. 561572, 1996.