Professional Documents
Culture Documents
net/publication/331727963
CITATION READS
1 1,387
2 authors, including:
4 PUBLICATIONS 7 CITATIONS
SEE PROFILE
All content following this page was uploaded by Gabriel Ambrósio Archanjo on 21 March 2019.
human society. Since motor vehicles are the primary mode of transportation for most, this system is
mainly designed for them. Traffic solutions are used to optimize the use of road network, support
driver decisions and detect and react in abnormal situations among other applications. In order to
collect the necessary information to be processed in these applications, sensors are necessary. This
work describes the use of video cameras as the main sensor in traffic applications and image
processing as the approach to interpret the information collected by these kinds of sensors. Concepts
of image processing techniques such as background removal, environment modeling, and pedestrian
and vehicle detection are discussed. The usage of these concepts is presented in traffic applications
such as street intersection monitoring, congestion and accident detection, driver assistance and self-
driving systems. Results and challenges are discussed as well as future trends are outlined.
Introduction
Motor vehicles are the primary mode of transportation for most. Consequently, they have the most
impact in the transportation system, affecting the lives of almost every citizen in cities around the
world. The city size, population density, road network characteristics, traffic laws, traffic monitoring
and many other elements influence the traffic flow and consequently are objects of attention.
Moreover, being the most widely used means of transportation implies some consequences like the
significant number of deaths caused by motor vehicle accidents. The two most populated countries,
China and India, together account for 500,000 deaths per year. Considering figures from all countries
combined, the number of deaths exceeds one million per year. This data suggests that any effort to
With regard to more basic traffic applications like street intersection monitoring and congestion
estimation, the use of physical sensors in the road is the most predominant approach to detect
vehicles. Usually, inductive loop detectors (ILD) are installed in the pavement for detecting objects
with significant metal mass crossing them. The use of image processing for these purposes has gained
attention since (i) ILD demands installation and maintenance interrupting the use of the lanes for
some periods; and (ii) it is necessary to install an ILD in each lane of each road intersection whereas
video cameras are capable of monitoring multiple lanes at the same time. Moreover, cameras perform
more complex analysis and can be used for other purposes such as surveillance.
Improvements in real-time image and video analysis, the emergence of smart cities and self-guided
vehicle applications are amongst other factors contributing to the increased research and applications
in the field. Vehicle traffic analysis through image processing is a complex task which involves (i)
understanding the environment by detecting its features such as roads, lane regions, transit signs and
traffic lights, (ii) detection and tracking of objects of interest such as vehicles and pedestrians and (iii)
the analysis of these elements in order to extract meaningful information and support decisions. Since
the scene is usually outdoor, the applications have to face a variety of challenges including weather
conditions which affect color and texture of the objects in the scene, shadows caused by buildings,
trees and other elements in the scene, object occlusion due to camera angle, and changes in the
Before discussing the applications in traffic analysis, some image processing concepts are discussed.
The next section addresses background modeling and subtraction, an important technique used for
segmenting the foreground objects. Section 3 describes environment modelling, for instance, the
detection of lanes and other elements necessary for traffic analysis. Sections 4 and 5 describe the
detection, tracking and analysis of vehicles and pedestrians, respectively. Applications of traffic
analysis employing image processing are discussed in Section 5. Finally, in Section 6, concluding
Background Removal
An important image processing technique for many applications that need to detect, segment and
classify objects is background removal. In most cases, there are objects in the scene which are of
interest and need to be identified and analyzed and others which are irrelevant and should not be
considered in the analysis. Irrespective of their relative distances, the objects of interest are labeled as
foreground or background objects. In most cases, the ability to separate the foreground elements from
the background is fundamental for effectively segmenting and classifying scene objects.
Since the approaches are quite different for stationary and moving cameras, they are addressed
Stationary Camera
The first step to analyze foreground elements like vehicles and persons is to separate them from the
rest of the scene. A common approach to detect such elements is the premise that these scene
elements are usually in motion and their motion relative to the background objects allows them to be
separated from the scene background. In basic terms, any stationary element in the scene should be
considered part of the background and any moving element considered part of the foreground. A
simple approach to segment the foreground elements is to store an image of the scene without
foreground elements as background, denoted as B. Then, for each pixel at video frame V(t) at time t,
subtract the intensity of the background pixel at the same position and determine whether it is pixel
Where B(x,y) is the background pixel intensity at (x,y); V(x,y,t) is the pixel intensity at (x,y) of the
video frame at time t and T is a tolerance threshold for comparing background and video frame pixel
intensities. It is important to note that the subtraction operation is inherent to image color model.
However, the previous approach does not handle scene changes due to the passing of time such as
brightness variance or changes in the background configuration for any other reason. A more
sophisticated approach, proposed by Wren et al. [1], analyzes each pixel (x,y) independently, fitting a
Gaussian probability density function (pdf) based on the last n pixel values for each pixel coordinate.
In other words, given the last N video frames, what are the most probable colors for each pixel
position? The pdf is defined by the parameters mean μ and variance σ2. The initial condition may
assume the pixel's intensity of the first frame as the mean and some default value for the variance. For
each new frame at time t, the mean and variance are updated as follows:
μt = 𝜌𝐼t + (1 – 𝜌)𝜇t −1
2
𝜎 2 t = 𝑑 2 𝜌 + (1– 𝜌)𝜎𝑡−1
𝑑 = |(𝐼𝑡 − 𝜇t )|
Where 𝑑 is the Euclidian distance between the value of the pixel and mean and 𝜌 is the temporal
window which determines the impact on the pdf by each new video frame update. For 𝜌 = 1, the pdf
mean and varaince is determined by only the current frame, and therefore every new frame is the
background. The smaller the value of 𝜌, the larger the number of frames used to compute the pdf. A
threshold 𝑘 is used to determine whether a pixel value lies withing the confidence interval of the
accordingly:
|(𝐼𝑡 − 𝜇𝑡 )|
> 𝑘 → 𝐹𝑜𝑟𝑒𝑔𝑟𝑜𝑢𝑛𝑑
𝜎𝑡
|(𝐼𝑡 − 𝜇𝑡 )|
< 𝑘 → 𝐵𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑
𝜎𝑡
This approach has the advantage of updating the background model in real-time, adapting to changes
in the scene such as illumination and the presence of non-static background objects. Figure 1
demonstrates the outcome of employing this approach for modeling the background of a street scene.
Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians; b) the modelled
background after analyzing frames for a few seconds.
The simple approach presented previously is ideal to explain the concept. Nevertheless, there are
more sophisticated approaches for background modeling which can achieve a better performance, as
discussed by Piccardi [2]. Having created the background model it is possible to subtract it from the
current scene frame in order to perform the first step in foreground object segmentation. In the case of
the pixel-based approach explained previously, for each pixel in the image, the pixel is disregarded if
its value is considered to be background in that position, otherwise it is kept as a foreground element
Moving Camera
When isolating foreground objects with a moving camera, the movement of these objects must be
identified and distinguished from the apparent movement of the background due to the change in
position of the camera. Separation of these movements can be performed by estimating the motion of
the camera and building up a model of the background using probability distributions similar to those
The approach taken by Szolgay et al. [3] to estimating camera motion is similar to that performed
during the MPEG-2 encoding process. The current frame is divided into a series of blocks and these
blocks are searched for in a reference frame. The displacement of each block between the current and
reference frames is known as a displacement vector or a motion vector. The motion vectors calculated
can be used to create an estimate of the reference frame in the current frame. Assume that F(t-1) and
F(t) are the reference and current frames, and D contains the motion vectors for the center pixels in
each block, then an estimate of the current frame can be made using the reference frame and the
displacements: F’(t) = (F(t-1) + D). The difference between the two frames, E, is calculated using E =
|F’(t) - F(t)|. Szlogay et al create what they call a Modified Error Image, MEI, using E and a threshold.
If the value at a pixel of E is greater than a threshold then MEI(x, y) = I(x,y), else MEI(x, y) = 0. An
The background model is created using the previous n frames (n = 15 in Szol11 et al. [3]). When a new
frame is acquired the oldest is removed. The displacements for each of the stored frames are updated
so that they are relative to the newest frame. A probability density function is created which is used to
identify when pixels in the current MEI belong to the background or an independently moving object
Environment Modeling
Traffic standardization predates the automobile industry. In the 19th century horse-drawn vehicles
were the most used mode of transport. The first traffic light, for instance, was installed in 1868 in
London as pointed by Taale et al. [4]. Nevertheless, it was only in the beginning of the 20th century
that started the long transition from horse-drawn vehicles to automobiles. From that period to the
present many traffic rules have being created to standardize the use of such vehicles and improve
safety. Among these standards are the signs in the street that specify where the vehicles must travel,
plates indicating speed and rules for that road, and traffic light systems controlling the flow.
Therefore, the detection and analysis of these elements is fundamental to understanding the traffic
The existence the traffic standards facilitate the understanding of traffic conditions by humans and
therefore by computer automated systems. Just like humans, computer systems may be able to
recognize these signs and the intrinsic information. This section describes the concepts and some
Street lanes are a traffic standard that specifies where the vehicle must travel and other possible lanes
in the same or the opposite direction. There are two major issues related to detecting lane markings:
they are not always clearly visible due to print quality and natural wear, and the geometry of the
markings cannot be used as a discriminating factor as there no governing standard in this aspect as
Lai and Yung [5] proposed an approach to detect the lanes in the road using a stationary camera. First
the background is estimated using a sequence of frames. The background is expected to have all
visible lanes since there are no vehicles in the image. Then, Sobel edge detection is used to detect
lines. In the next step a transformation is performed, based on camera parameters such as height, tilt
angle and focal length, in order to compensate for distortions caused by perspective, resulting in a flat
image. Finally, the lines detected in the image are clustered and discriminated by orientation and
For on-road applications such as driver assistance systems and autonomous vehicles, Wang et al. [6]
modeled lanes using cubic b-splines in order to support curved road models. Images of the road
collected by a camera placed in the car are divided horizontally into sections. The Hough transform is
used to detect the vertical straight lines of the lane markings in each section. By connecting the
straight lines of each section it is possible to determine the curve of the road. Assuming the road
boundaries are parallel, projecting the intersection of the boundaries of each side it is possible to
determine the vanishing points of the scene, a necessary parameter to handle perspective. Combining
the curve and perspective parameters, splines are defined and used to model the boundaries and the
discusses the detection, analysis and recognition of another group of important elements in the scene:
transit signs. Driver assistance systems can detect and recognize signs and notify the driver in the case
of doing a prohibited maneuver. Moreover, in the case of autonomous vehicle applications the
detection, recognition and interpretation of traffic signs are mandatory in the absence of a priori
Solutions to recognize traffic signs face similar challenges compared to the recognition of other traffic
elements in the environment. Weather and hour of the day affect the brightness of the scene and
consequently scene objects’ color and texture. Like road lanes, traffic sign appearance is degraded by
weathering. Shadow and occlusions may be caused by other objects in the scene such as buildings,
Fortunately, traffic sign design respects restrictive rules. Usually, the signs use simple geometric
shapes like triangles, circles, diamonds and octagons as well as different colors to distinguish from
each other. The sign colors, mostly red and yellow, were selected in such a way that they are easily
noticeable in a natural environment by humans. These definitions also facilitate detection and
recognition by computers.
Zadeh et al. [7] proposed an approach is presented that employs a supervised learning method, in
which use a dataset composed by a manually labeled samples to train an algorithm, to detect and
recognize traffic signs. The first step filters image pixel removing non-sign pixels based on their
colors. The algorithm was trained using samples of the traffic signs in different light conditions. Using
the color distribution in the signs in these different conditions it is possible to remove the pixels with
values outside the distribution. In the next step, edge detection is performed and straight lines and
curves are detected. Then, these connected lines are approximated to known classes of polygons and
shape is identified. Finally, analyzing the shape and color ratios in subregions of the image a traffic
attention. Independent of the application, analyzing vehicles through image processing can be
separated into two steps: vehicle detection and vehicle tracking. The first step is responsible for the
first assignment of identification to a vehicle in the scene. Usually, an identification number (ID) and
the initial coordinates are assigned to that vehicle. Detected vehicles are tracked until they leave the
scene in the second step. When determining the current coordinate of a given vehicle, the past
coordinates and the motion pattern might be used to improve the accuracy and performance.
Traffic is usually analyzed by the city infrastructure management, extracting useful information about
the traffic condition in order to support decisions, detect incidents, congestions and other undesired
situations that demand rapid reaction, or by onboard computer systems in vehicles in order to support
the driver or make autonomous decisions. In the former, fixed cameras are placed in strategic points
of the city to monitor roads of interest. In the latter, the cameras are placed in the vehicle for
analyzing the current road condition. The traffic scene is analyzed under considerably different
perspectives in these two situations, therefore the approaches are considerably different too and are
Stationary Camera
Considering scenes taken by stationary cameras, from an image processing point of view, vehicles
might be assigned as foreground elements since they are usually in motion in relation to the scene’s
background elements. Thus, background modeling and subtraction techniques can be used for
segmenting foreground elements, including the vehicles. Having separated the foreground elements
from the scene background, the next step is the classification of them based on visual or motion
features. After classification, these elements can be analyzed in order to provide useful information
Color and texture are important perceptual descriptors to describe objects, so they are commonly used
as one of the first discriminative elements in order to determine segments of interest in images. Since
roads are in shades of gray, in some scenarios just the color is sufficient to detect and segment
vehicles with distinctive colors. Nevertheless, the majority of vehicles are in shades of gray too.
Therefore other features such as edges and shapes must be considered. However, in outdoor
environments like streets and roads, the color and texture of a given object may appear quite different
under different weather and brightness conditions and these have to be taken into account. Rojas and
Crisman [8] proposed an approach is presented that classifies pixels as road or non-road color based on
the color distribution of roads estimated using a set of images. Then, non-road pixels are grouped into
regions and filter rules are applied to filter out non-vehicles. Back-projection is used to find the region
coordinates in the ground plane and finally, in the last step, regions that collectively form vehicles are
joined.
Moving Camera
The problem of detecting objects using a moving camera is more challenging than in a fixed camera
situation since there is no clear separation between foreground and background objects. Vehicles
ahead at the same speed of the vehicle hosting the camera are seen as stationary objects. Therefore,
their detection must be based on visual features that discriminate them from objects of other classes in
the scene. Continuously changing landscapes along the road and vehicles that suddenly enter the
scene with very different speeds, sizes and appearances make the task challenging. The approaches
presented in this section are fundamental for driver assistance and autonomous vehicle applications.
As discussed by Betke et al. [9], the detection of passing and distant cars is addressed employing
different methods. When other cars pass the camera-assisted car, they usually cover large portions of
the image frame, changing the brightness in those regions. This brightness change in a given region is
used to hypothesize the presence of a car on it. On the other hand, distant vehicles are detected as
rectangular objects. A feature based approach creates edge maps for vertical and horizontal edges,
then aspect ratio and correlation are used to determine whether such an object is a car or not. Bucher
et al. [10] focused on the detection of approaching vehicles for collision avoidance. It scans each road
lane, starting from the bottom of the video frames, seeking strong horizontal segments. Using a priori
knowledge of the range of car length in road lane positions, the approach can infer whether that
When it is hard to separate foreground elements from the background with the precision necessary for
the target application, stereo-vision approaches might be employed. Using a stereo camera or two
cameras placed in slightly different positions it is possible to use the same principle of human
binocular vision to estimate the distance of the objects in the field of view. The disparity-map is
computed using the difference in the coordinates of pixels in the left and right images which
corresponding to the same scene point. This coordinate difference is known as disparity and when
combined with information about the cameras obtained through calibration (distance between centers
of the optical systems, focal lengths of lenses, etc.) can be used to map disparity values to depth. The
inverse mapping can be used in traffic analysis systems as discussed by Sun et al [11]. The set of
disparity values corresponding to the range of crucial depths can be calculated. When the disparity
map for the current scene has been computed a histogram of disparity values is calculated. The
number of peaks in the histogram within the critical range indicates the number of objects which must
Pedestrian Detection
The previous section described the detection of the most common object on roads: vehicles. However,
pedestrians are as important as vehicles since any real-world application must respect safety
constraints. In the case of infrastructure cameras, detecting pedestrians crossing roads or streets is
useful in order to take action in the case they are in a prohibited area or even to notify drivers in some
way. The latter depends on a nonexistent infrastructure to create a communication channel between
smart city systems and vehicles, but it is already subject of study. On the other hand, for on road
applications, like driver assistance systems, in which the camera is placed in the car, pedestrian
detection is fundamental. In the case of autonomous vehicle applications the ability to detect
pedestrians is mandatory in order to make reliable decisions, ensuring the safety of passengers and
pedestrians.
The detection, recognition and tracking of objects in real-time in outdoor scenes is a challenging task
in image processing applications. In the case of pedestrian detection, as pointed by Geronimo et al.
[12]
, the major challenges are: (i) the appearance of pedestrians vary since they can change pose, wear
different clothes, carry different objects; (ii) they must be detected in outdoor urban scenarios with
cluttered background, a wide range of illumination and weather conditions; (iii) pedestrians can be
partially occluded by other pedestrians and urban elements; (iv) they must be identified in motion and
different view angles. Moreover, for most applications, the solution must be able to react in real-time
Stationary Camera
background modeling and foreground object segmentation - similar to presented in the previous
section - might be used. The fundamental difference between detection vehicles and pedestrian is on
in the classification step. In the work presented by Viola and Snow [13], for instance, a cascading
classifier is employed. In this approach multiple classifiers are combined in order to estimate the final
classification as shown in Figure 2. Usually, each classifier use a small set of features and with a
lower false positive rate, are combined in order to obtain a more powerful classifier. In this case
classifiers use motion pattern and appearance features. An object is classified as a pedestrian in the
Moreover, pedestrians detected at intersections or crossing the street is useful information for drivers
that could be sent to vehicle in order to avoid accidents. Schaack et al. [14] presented a solution for
detecting and estimating pedestrians’ position and send the information to a screen in vehicle through
manually defined in the scene as polygon. Comparing sequential frames, objects in motion, or
foreground objects, are separated from the background. Since vehicles and pedestrians have a distinct
pattern of movement - regarding path, direction and velocity - they can be discriminated by analyzing
their motion patterns. In the latter work, the classification is performed employing a Support Vector
Machine (SVM) using motion features as parameters. In this specific case, the advantage of not using
visual features for classification is the ability to detect not only pedestrians in the scene, but any
Moving Camera
The detection of pedestrian using a moving camera faces the same challenges of detecting vehicles
due the dynamic background and existence of a clear separation between foreground and background
objects. Vehicles have a rigid body, considerable size and must travel in specific regions. On the other
hand, pedestrian appear in different poses, increasing the challenge for shape algorithms. They are
also smaller than vehicles, in many cases presented in low resolution considering the distance to the
camera. In addition, unlike vehicles, people can walk in most places of the scene as sidewalks, town
squares, shoulders and roads. Consequently, pedestrian can be occluded by many different objects in
the scene and may appear in the scene from any direction. In combination, these factors make the
Alonso et al. [15] proposed an approach is presented for person detection that uses stereo-vision for
segmentation, with feature detection algorithms to create an input vector for an SVM classifier. First,
points of interest are selected in the left and right images and their distance to the camera is computed.
A clustering technique is employed grouping these points in order to have candidate regions of
interest (ROI) that are possibly related to a person. For each ROI, a series of features are extracted
such as Canny-based features, Haar wavelets, gradient magnitude and orientation, co-occurrence
matrix, histogram of intensity differences and number of texture units. For each ROI, the features are
combined to obtain a feature vector. A database is manually created, selecting positive and negative
Applications
The previous sections described approaches for environment modeling and vehicle and pedestrian
detection and tracking. The simple detection and tracking of these elements does not solve any
problem. They are the means, not the end. Nevertheless, the analysis of those elements can result in
useful information that can be used into a wide range of traffic applications. The solutions described
in this section were chosen based on its popularity or its potential for solving problems in future
applications.
Street Intersection Monitoring
Street intersections are worthy of attention in traffic analysis since streets crossing each other have
vehicles travelling in different directions. At these intersections vehicles may stay in the same street
or change their route going to another one. When there are traffic lights controlling the flow, the use
of lanes is optimized allowing high speeds at intersections, but increase the gravity of accidents when
vehicles violate the red light. Therefore monitoring the intersection is fundamental for punishing
drivers violating traffic laws and for detecting accidents in order to rapidly dispatch emergency
services to minimize the consequences. Rapid medical assistance is crucial for people involved in the
accident. The removal of vehicles involved in the accident is fundamental to avoid a traffic jam that
Regarding vehicle detection at intersection, inductive loop detectors (ILD) are currently most widely
used approach. Basically, it is a physical sensor buried in the traffic lane. This sensor has a wire loop
that resonates at a constant frequency when there is no vehicle over the loop. This frequency is
monitored by a detection device. When a large metal object, such as a vehicle, moves over the loop,
the resonated frequency increases and the vehicle is detected. Since objects are detected by it metal
mass, the sensor works in the same way in different weather and lighting conditions, which are among
the facts that most affect image processing approaches. On the other hand, it is necessary to install a
sensor in each lane intersection. A single vehicle might activate two sensors when passing between
them, trucks may be detected as multiple vehicles in some situations and maintenance issues are
among some drawbacks of this approach. The possibility of using a single camera to monitor multiple
lanes and the ability to perform more complex analysis, considering visual aspects of the scene,
promotes the research for a camera based solution for this issue.
Kamijo et al. [16] presented an approach is proposed to tackle intersection monitoring with the use of
image processing. In the first step, the background model of the intersection is constructed by
accumulating and averaging the pixel values of each position using an initial 20 minutes of sequential
image frames. Then, virtual sensors are set perpendicular to the incoming roads. Analyzing the
intensity variance in these virtual sensors (a rectangle area), it is possible to detect incoming vehicles.
When a vehicle is detected by a virtual sensor, the background is subtracted and the vehicle outlined.
The intersection at each time is represented by a data structure that stores the state for each 8x8 pixel
block of the scene. After the vehicle detection and segmentation, each block that is occupied by a part
of the vehicle is assigned with the vehicle’s ID. By analyzing the state of the each block in a sequence
of time, its motion vector can be determined. By analyzing the motion of each block in the region of a
given vehicle the motion is estimated and the vehicle tracked. This approach is employed because
simply segmenting the vehicle at each time frame cannot handle occlusions due to the camera angle or
vehicles very close to each other. The solution was able to detect vehicles at intersections with
Congestion Detection
Traffic congestion is a problem faced by many cities with high population density or limited
infrastructure. Among the various approaches, traffic control system optimization might contribute to
reducing traffic congestion in some way, saving fuel, reducing emission of CO(2) and consequently
improving the lives of millions of citizens. Congestion can be defined as a situation in which the
throughput of the road is significantly lower than expected and consequently the same for vehicles
speed. Detecting or predicting congestions can be used to take action in order to mitigate it or reduce
its consequences. Dynamic lane reversal, for instance, is an approach that increases the capacity of the
road in one direction by inverting the direction of some lanes in the other direction, as illustrated in
Figure 3. In experiments conducted by Hausknecht at al. [17], the network efficiency was increased by
up to 72% in some situations. The sooner an action is taken, the better the result may be. Thus rapid
reaction by the use of automatic detection systems monitoring multiple roads can become a tool to
In order to estimate the traffic condition in a given road, each lane can be monitored, analyzing the
speed of the cars in some manner. For instance, Li et al. [18] presented a solution in which virtual line
sensors are placed perpendicular to each lane. Using the pixel’s values in that virtual line in a
sequence of camera frames, a time-spatial image S is constructed where each row in the image is the
pixel values in that line in a different time. When there is no vehicle, S is just a gray shaded rectangle.
When a vehicle crosses the line, a slice of the vehicle from each frame is copied to S and the result is
a shape. The lower the height of the shape in S, higher the speed of the vehicle. On the other hand, the
higher the height of the shape in S, the lower the speed of the vehicle. Mapping the relationship
between shape sizes and traffic conditions it is possible to estimate traffic and detect congestions in
Instead of using a virtual sensor, Yang et al. [19] analyzed the corners of the vehicles. In each camera
frame a corner detector is applied and the corners are grouped based on the distance between them.
Analyzing the motion of the corners it is possible to discriminate the corners of vehicles from the
corners in the background. Each corner group is assigned to a vehicle and analyzing the coordinates of
the corners in a sequence of frames it is possible to estimate the speed in some sense, in the case of
that work, corner pixels moved per frame. Manual setup can map low speed situations in pixels per
frame in order to notify probable congestions. Another approach grouping corners to detect and track
vehicles for congestion detection is presented by Chintalacheruvu and Muthukumar [20]. But, in this
case, in order to assess the speed in miles per hour, a more intuitive unit for understanding the
situation, two virtual detection zones are manually placed in the scene. The distance of the two zones
in scene is also a parameter of the system. Analyzing the difference in time between the detection of
the same vehicle in each zone it is possible to assess the speed in distance per time unit.
In these works the results analyze the capacity of the proposed solutions to detect congestion
scenarios effectively. Manual setups, inability to work during the night or in heavy raining conditions
are some barriers for broader adoption of these systems. The existence of other solutions using
physical sensors, RFID or GPS data, which work independently of illumination and weather
conditions, contributes to this phenomenon either. Nevertheless, the possibility to use a single camera
sensor for surveillance and traffic analysis for multiple purposes promote the research and application
in the field.
Accident Detection
With much technological advancement in driver safety systems and road quality, the number of traffic
fatalities per distance travelled has been decreasing in recent years. Figure 4 illustrates this aspect for
the case of the United States. The decreasing line is the number of annual deaths per billion miles
traveled. The increasing line is the number of vehicle miles traveled. As can be seen, people are
statistically less involved in traffic accidents per distance travelled. However, the population and
number of cars increased and, consequently, the absolute number of deaths per year still increases in
many countries. In the case of USA, the number of deaths increased to 33,561 in 2012, which is 1,082
more fatalities than in 2011. In the case of India and China, the two most populated countries, the
scenario is more dramatic. In 2011, the numbers of China and India in together accounts more than
500.000 deaths.
Figure 4. Annual US traffic fatalities per billion vehicle miles travelled (decreasing line) and miles travelled (increasing
line) from 1922 to 2012.
Considering the data presented above, any effort to reduce these rates are worthwhile. An aspect that
affects the number of fatal accidents is the lag time between an accident and the arrival of medical
assistance, as mentioned by Evanco [21]. Automatic accident detection systems might be employed to
analyze a camera stream, estimating motion by computing optical flow for a sequence of video
frames. The center of gravity is located for each motion pattern using an Histogram of Flow Gradient
(HFG) and the distance between them are calculated. Analyzing the distance between patterns during
a time period, a probabilistic classifier is obtained for predicting the probability of an accident in the
scene. The approach was tested in a small dataset, depicting a total of 250 real scenes of traffic
accidents or abnormal vehicle events captured by traffic surveillance cameras, and presented an
The first motor vehicles date from the end of the 20th century. Automatic transmission systems date
from the end of 1940s. Cruise control systems became popular in some countries in the 1970s and on-
board computer systems in the 1980s. As can be seen, motor vehicle development is heading towards
reducing manual operations of driving. The last advancements in this field are mainly in intelligent
driver assistance systems. Basically, a computer system, monitoring the internal and external
environment using sensors, is able to take simple actions and notify the driver of any important
information. In order to get information about the vehicle operation, driver situation and road
condition as reliably as possible, multiple types of sensors such as radar, laser and cameras are
employed. It is important to note that sensors are only responsible for getting raw information. This
information must be processed and interpreted by a computer to be useful. In the case of the data
provided by the cameras, image processing techniques are responsible for processing and extracting
information.
In order to illustrate the use of image processing in driver assistance systems, two applications were
As can be seen in several statistics, driver fatigue accounts for a significant number of road accidents.
Truck drivers and others who drive certain amount of hours a day are the most common victim of
fatigue, risking their lives and the others. Therefore, the development and adoption of systems capable
of detecting fatigue and notifying the driver in some manner could improve safety and save lives.
The most serious consequence of fatigue is the driver falling asleep. A firm closure of the eyes usually
precedes the onset of sleep and it can be detected in order to send some signal to the driver. Devi and
Bajaj [23] proposed an approach to tackle this issue by placing a camera in front of the driver. Applying
an approach to detect skin-colored pixels, the face region is defined. Analyzing the average intensity
of each image row, the eyes are detected since they have a distinctive pattern compared to the other
face elements. Having identified the eye region, analyzing the color distribution and focusing on
white, it is possible to determine whether the eye is open or closed. In the case of the eye being closed
for a number of video frames, superior or a given threshold, fatigue is detected. The approach was
tested in a controlled scenario with good image quality regarding brightness and sharpness. For real-
world applications improvements are necessary to deal with low light conditions and a method to
Collision Avoidance
When driving accompanied by family members or friends, it is a difficult task to remain focused on
the road all the time. It is usual to get into conversations when driving. Consequently, many rear-end
collisions are the consequence of drivers distraction. Abrupt braking maneuvers by a vehicle ahead
are another cause of this type of collision. In this situation, any delay in making a decision is decisive.
Therefore, collision avoidance systems which can notify the driver when an object is approaching the
Laser and radar-based solutions are the most common ones because of reliability. Nevertheless, the
potential of using a video camera, a sensor that provides more rich information, for multiple
applications and for dealing with more complex scenarios promotes research in image processing and
Aiming to detect braking maneuvers of vehicles ahead, Chachuli et al. [24] proposed an approach to
detect the beak lights being turned on by the use of a hybrid color model and morphological
operations. The hybrid color model combines the red color from the RGB color model and saturation
and intensity values from the HSI color model. With knowledge of the brake light position it is
possible to estimate the vehicle’s position. Even though this approach is dependent on the presence of
the brake light, it can be combined with other approaches since in the scenario in which there is a
Also using a camera placed in the vehicle, Bucher et al. [10] proposed a method that scans each road
lane, starting from the bottom, applying image processing techniques in order to detect possible
approaching vehicles. After the vehicle is detected its vertical position in the video frame is used to
estimate the distance in some manner. The advantage of this approach is that it is able to handle
Self-driving Automobile
The previous section described the applications of driver assistance systems to support drivers,
increasing safety and comfort. Since the 1980s approaches to create autonomous motor vehicles,
completely removing the driver in some situations have been developed. In 2004, the first DARPA
Grand Challenge - a prize competition for American autonomous vehicles - took place in order to
promote the development in the field. The entry of a large company like Google into the field brought
up the topic for the mainstream media, creating a lot of expectations. Consequently, the research
efforts in the field have been increasing in the last few years.
The potential of future commercial applications will probably motivate even more investments in the
next years. More reliable decisions in less controlled environments, for instance, in difficult weather
conditions, in situations in which other drivers made unexpected or even illegal maneuvers, in cases
that objects suddenly appear in the scene such as distracted pedestrians, are essential to get people
Different to driver assistance systems which just support the driver with information and simple
actions, self-driving capability implies total control of the vehicle functions. The system must be able
to not only drive the car, but also use the proper signals to inform other drivers about the next
maneuver and most importantly, it must be prepared for unexpected behavior from other drivers and
people in the street. In order to accomplish that, the self-driving automobile combines a series of
sensors and systems. As expected given its rich information, video cameras are among the most
important sensors and image processing techniques are at the heart of the solutions.
Image processing solutions have an important role in the development of autonomous driving
solutions. Levinson et al. [25] and Teichman et al. [26] described an autonomous driving systems
consisting of two spherical cameras and two stereo vision cameras that demands image processing to
interpret their data. Moreover, the system also has a rotating LIDAR which use a rotating laser system
to produce an image of the scene. Other supporting systems such as radars and GPS are also used.
However, the most significant data to the system are the images; therefore image processing is on its
core. For instance, in order to recognize objects, it is combined object shape and motion description
classifiers. The system is able to recognize pedestrian, cyclists and cars in real time. Many techniques
described in this work such as pedestrian, vehicle and lane recognition, brake light detection, traffic
sign recognition might be combined to compose self-driving solutions. Many improvements and new
approaches are expected to be release in this field of application and image processing is probable to
Conclusion
This work presented an overview of image processing approaches and applications to analyze the
traffic scene images in order to extract meaningful information to support decisions in a wide range of
applications. The basic concepts of image processing techniques such as background removal,
environment modeling, pedestrian and vehicle detections are explained and references are given for
details of each specific approach. Applications using these basic concepts and others for traffic
analysis (e.g., street intersection monitoring, congestion and accident detection, collision avoidance
systems, driver fatigue detection, self-driving automobiles) are discussed in order to give a panorama
of the field. It is important to note that this work does not cover all existing traffic applications that
employ image processing techniques, but it rather considers representative systems that demonstrate
the most common concepts and highlight the major trends in the area.
Although this work addresses several sophisticated solutions, their adoption seems likely to happen
gradually. In the case of driver assistance systems, a new solution must be well developed and reliable
in many different conditions to go to the market. There is also an economic barrier for many of these
solutions to get into mainstream market. Automobile industry is a very competitive one; therefore the
benefits of a new solution must justify the increasing in the vehicles prices. This phenomenon can be
observed in the case of driver assistance systems and other more sophisticated solutions only available
in high-end and more expensive vehicles. Economic indices of the markets also affect the
configuration of the vehicles and the adoption of such solutions. For instance, seatbelt is mandatory in
every country, whereas airbag is not mandatory in many developing countries. In the case of
applications that take autonomous actions, in which for safety reasons must have high reliability, such
as autonomous vehicles, the adoption will depend of the regulation and political efforts in each
market.
For the next decades much of the attention will be focused on smart cities applications and
autonomous vehicles. Real-time automatic traffic analysis integrated with other city systems has
applications in public safety and traffic and emergency management, so it is expected to become more
popular in next years. In the case of autonomous vehicles, a lot of expectations were created due to
significant technological improvements and efforts from companies like Google that brought the
subject into mainstream media. Traditional automakers like Volvo has announced that will release
autonomous vehicles for test with real drivers before 2020. Nevertheless, even after regulation
approval and releasing of the first commercial solutions, the adoption is expected to happen gradually
due to many reasons. Most notably, just the sensors used in current autonomous vehicle solutions are
several hundred thousand US dollars’ worth. The entrance of new players in the competition targeting
this new and very valuable market may contribute for development of solutions and equipment and
References
[1] Wren, C. R., Azarbayejani, A., Darrell, T., Pentland, A. P. Pfinder: Real-time tracking of the
human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, Vol. 19,
780-785.
[3] Szolgay, D., Benois-Pineau, J., Mégret, R., Gaëstel, Y., Dartigues, J. F. Detection of moving
foreground objects in videos with strong camera motion. Pattern Analysis and Applications,
2011, Vol. 14, 311-328.
[4] Taale, H., Hoogendoorn, S., van den Berg, M.,De Schutter, B. Anticiperende
netwerkregelingen, NM Magazine, 2006, Vol. 1, 22-27.
[5] Lai, A. H., & Yung, N. H. Lane detection by orientation and length discrimination. IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2000, Vol. 30, 539-548.
[6] Wang, Y., Teoh, E. K., & Shen, D. Lane detection and tracking using B-Snake. Image and
Vision computing, 2004, Vol. 22, 269-280.
[7] Zadeh, M. M., Kasvand, T., Suen, C. Y. Localization and recognition of traffic signs for
automated vehicle control systems. In Intelligent Systems & Advanced Manufacturing, 1998,
272-282.
[8] Rojas, J. C., Crisman, J. D.. Vehicle detection in color images. IEEE Conference on
Intelligent Transportation System, 1997.
[9] Betke, M., Haritaoglu, E., Davis, L. S. Real-time multiple vehicle detection and tracking from
a moving vehicle. Machine vision and applications, 2000, Vol 12, 69-83.
[10] Bucher, T., Curio, C., Edelbrunner, J., Igel, C., Kastrup, D., Leefken, I., von Seelen, W.
Image processing and behavior planning for intelligent vehicles. IEEE Transactions on
Industrial Electronics, 2003, Vol. 50, 62-75.
[11] Sun, Z., Bebis, G., Miller, R. On-road vehicle detection: A review. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 2006, Vol. 28, 694-711.
[12] Geronimo, D., Lopez, A. M., Sappa, A. D., Graf, T. Survey of pedestrian detection for
advanced driver assistance systems. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2010, Vol. 32, 1239-1258.
[13] Viola, P., Jones, M. J., Snow, D. Detecting pedestrians using patterns of motion and
appearance. Ninth IEEE International Conference on In Computer Vision, 2003, 734-741.
[14] Schaack, S., Mauthofer, A., Brunsmann, U. Stationary video-based pedestrian recognition for
driver assistance systems. In Proceedings of 21st International Technical Conference on the
Enhanced Safety of Vehicles, 2009.
[15] Alonso, I. P., Llorca, D. F., Sotelo, M. Á., Bergasa, L. M., Revenga de Toro, P., Nuevo, J.,
Garrido, M. G. Combination of feature extraction methods for SVM pedestrian detection.
IEEE Transactions on Intelligent Transportation Systems, 2007, Vol. 8, 292-307.
[16] Kamijo, S., Matsushita, Y., Ikeuchi, K., Sakauchi, M. Traffic monitoring and accident
detection at intersections. IEEE Transactions on Intelligent Transportation Systems, 2000,
Vol. 1, 108-118.
[17] Hausknecht, M., Au, T. C., Stone, P., Fajardo, D., Waller, T. Dynamic lane reversal in traffic
management. IEEE Conference on Intelligent Transportation Systems (ITSC), 2011, 1929-
1934.
[18] Li, L., Chen, L., Huang, X., Huang, J. A traffic congestion estimation approach from video
using time-spatial imagery. In Intelligent Networks and Intelligent Systems, 2008, 465-469.
[19] Yang, Z., Meng, H., Wei, Y., Zhang, H., Wang, X. Tracking ground vehicles in heavy-traffic
video by grouping tracks of vehicle corners. In Proc. of the IEEE ITSC, 2007, 396-399.
[20] Chintalacheruvu, N., Muthukumar, V. Video based vehicle detection and its application in
intelligent transportation systems. Journal of transportation technologies, 2012, Vol. 2.
[21] Evanco, W. M. The impact of rapid incident detection on freeway accident fatalities.
Mitretek, 1996.
[22] Sadeky, S., Al-Hamadiy, A., Michaelisy, B., Sayed, U. Real-time automatic traffic accident
recognition using HFG. In Pattern Recognition (ICPR), 2010 20th International Conference
on, 2010, 3348-3351.
[23] Devi, M. S., Bajaj, P. R. Driver fatigue detection based on eye tracking. First International
Conference on Emerging Trends in Engineering and Technology, 2008, 649-652.
[24] Chachuli, S. A. M., Ishak, K. A., Yusop, N. Vehicle Brake Light Detection Using Hybrid
Color Model. Applied Mechanics & Materials , 2014.
[25] Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel, S., Thrun, S. Towards
fully autonomous driving: Systems and algorithms. In Intelligent Vehicles Symposium (IV),
2011, 163-168.
[26] Teichman, A., Levinson, J., Thrun, S. Towards 3d object recognition via classification of
arbitrary object tracks. In Robotics and Automation (ICRA), 2011 IEEE International
Conference on. 4034-4041.
Figure Captions
Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians;
b) the modelled background after a analysing frames for a few seconds.... ......................................... ..5
Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians;
b) the modelled background after analyzing frames for a few seconds. ............................................... 5
Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians;
b) the modelled background after analyzing frames for a few seconds. ............................................... 5
Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians;
b) the modelled background after analyzing frames for a few seconds. ............................................... 5