You are on page 1of 29

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/331727963

Traffic Analysis: Basic Concepts and Applications

Chapter · November 2018


DOI: 10.1201/9781351032742

CITATION READS

1 1,387

2 authors, including:

Gabriel Ambrósio Archanjo

4 PUBLICATIONS   7 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Gabriel Ambrósio Archanjo on 21 March 2019.

The user has requested enhancement of the downloaded file.


Abstract
The transportation system is a vital element of any city and consequently one of the center points of

human society. Since motor vehicles are the primary mode of transportation for most, this system is

mainly designed for them. Traffic solutions are used to optimize the use of road network, support

driver decisions and detect and react in abnormal situations among other applications. In order to

collect the necessary information to be processed in these applications, sensors are necessary. This

work describes the use of video cameras as the main sensor in traffic applications and image

processing as the approach to interpret the information collected by these kinds of sensors. Concepts

of image processing techniques such as background removal, environment modeling, and pedestrian

and vehicle detection are discussed. The usage of these concepts is presented in traffic applications

such as street intersection monitoring, congestion and accident detection, driver assistance and self-

driving systems. Results and challenges are discussed as well as future trends are outlined.

Introduction
Motor vehicles are the primary mode of transportation for most. Consequently, they have the most

impact in the transportation system, affecting the lives of almost every citizen in cities around the

world. The city size, population density, road network characteristics, traffic laws, traffic monitoring

and many other elements influence the traffic flow and consequently are objects of attention.

Moreover, being the most widely used means of transportation implies some consequences like the

significant number of deaths caused by motor vehicle accidents. The two most populated countries,

China and India, together account for 500,000 deaths per year. Considering figures from all countries

combined, the number of deaths exceeds one million per year. This data suggests that any effort to

improve traffic safety is worthwhile.

With regard to more basic traffic applications like street intersection monitoring and congestion

estimation, the use of physical sensors in the road is the most predominant approach to detect

vehicles. Usually, inductive loop detectors (ILD) are installed in the pavement for detecting objects

with significant metal mass crossing them. The use of image processing for these purposes has gained
attention since (i) ILD demands installation and maintenance interrupting the use of the lanes for

some periods; and (ii) it is necessary to install an ILD in each lane of each road intersection whereas

video cameras are capable of monitoring multiple lanes at the same time. Moreover, cameras perform

more complex analysis and can be used for other purposes such as surveillance.

Improvements in real-time image and video analysis, the emergence of smart cities and self-guided

vehicle applications are amongst other factors contributing to the increased research and applications

in the field. Vehicle traffic analysis through image processing is a complex task which involves (i)

understanding the environment by detecting its features such as roads, lane regions, transit signs and

traffic lights, (ii) detection and tracking of objects of interest such as vehicles and pedestrians and (iii)

the analysis of these elements in order to extract meaningful information and support decisions. Since

the scene is usually outdoor, the applications have to face a variety of challenges including weather

conditions which affect color and texture of the objects in the scene, shadows caused by buildings,

trees and other elements in the scene, object occlusion due to camera angle, and changes in the

brightness throughout the day.

Before discussing the applications in traffic analysis, some image processing concepts are discussed.

The next section addresses background modeling and subtraction, an important technique used for

segmenting the foreground objects. Section 3 describes environment modelling, for instance, the

detection of lanes and other elements necessary for traffic analysis. Sections 4 and 5 describe the

detection, tracking and analysis of vehicles and pedestrians, respectively. Applications of traffic

analysis employing image processing are discussed in Section 5. Finally, in Section 6, concluding

remarks and future prospects are outlined.

Background Removal
An important image processing technique for many applications that need to detect, segment and

classify objects is background removal. In most cases, there are objects in the scene which are of

interest and need to be identified and analyzed and others which are irrelevant and should not be
considered in the analysis. Irrespective of their relative distances, the objects of interest are labeled as

foreground or background objects. In most cases, the ability to separate the foreground elements from

the background is fundamental for effectively segmenting and classifying scene objects.

Since the approaches are quite different for stationary and moving cameras, they are addressed

separately in two subsections.

Stationary Camera

The first step to analyze foreground elements like vehicles and persons is to separate them from the

rest of the scene. A common approach to detect such elements is the premise that these scene

elements are usually in motion and their motion relative to the background objects allows them to be

separated from the scene background. In basic terms, any stationary element in the scene should be

considered part of the background and any moving element considered part of the foreground. A

simple approach to segment the foreground elements is to store an image of the scene without

foreground elements as background, denoted as B. Then, for each pixel at video frame V(t) at time t,

subtract the intensity of the background pixel at the same position and determine whether it is pixel

from a foreground element or not, as follows:

𝑓𝑜𝑟𝑒𝑔𝑟𝑜𝑢𝑛𝑑, if | 𝑉(𝑥, 𝑦, 𝑡) − 𝐵(𝑥, 𝑦)| > 𝑇


𝐹(𝑥, 𝑦) = {
𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Where B(x,y) is the background pixel intensity at (x,y); V(x,y,t) is the pixel intensity at (x,y) of the

video frame at time t and T is a tolerance threshold for comparing background and video frame pixel

intensities. It is important to note that the subtraction operation is inherent to image color model.

However, the previous approach does not handle scene changes due to the passing of time such as

brightness variance or changes in the background configuration for any other reason. A more

sophisticated approach, proposed by Wren et al. [1], analyzes each pixel (x,y) independently, fitting a

Gaussian probability density function (pdf) based on the last n pixel values for each pixel coordinate.

In other words, given the last N video frames, what are the most probable colors for each pixel

position? The pdf is defined by the parameters mean μ and variance σ2. The initial condition may
assume the pixel's intensity of the first frame as the mean and some default value for the variance. For

each new frame at time t, the mean and variance are updated as follows:

μt = 𝜌𝐼t + (1 – 𝜌)𝜇t −1

2
𝜎 2 t = 𝑑 2 𝜌 + (1– 𝜌)𝜎𝑡−1

𝑑 = |(𝐼𝑡 − 𝜇t )|

Where 𝑑 is the Euclidian distance between the value of the pixel and mean and 𝜌 is the temporal

window which determines the impact on the pdf by each new video frame update. For 𝜌 = 1, the pdf

mean and varaince is determined by only the current frame, and therefore every new frame is the

background. The smaller the value of 𝜌, the larger the number of frames used to compute the pdf. A

threshold 𝑘 is used to determine whether a pixel value lies withing the confidence interval of the

background pixel intensities distribution. Pixels are classified as background or foreground

accordingly:

|(𝐼𝑡 − 𝜇𝑡 )|
> 𝑘 → 𝐹𝑜𝑟𝑒𝑔𝑟𝑜𝑢𝑛𝑑
𝜎𝑡

|(𝐼𝑡 − 𝜇𝑡 )|
< 𝑘 → 𝐵𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑
𝜎𝑡

This approach has the advantage of updating the background model in real-time, adapting to changes

in the scene such as illumination and the presence of non-static background objects. Figure 1

demonstrates the outcome of employing this approach for modeling the background of a street scene.
Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians; b) the modelled
background after analyzing frames for a few seconds.

The simple approach presented previously is ideal to explain the concept. Nevertheless, there are

more sophisticated approaches for background modeling which can achieve a better performance, as

discussed by Piccardi [2]. Having created the background model it is possible to subtract it from the

current scene frame in order to perform the first step in foreground object segmentation. In the case of

the pixel-based approach explained previously, for each pixel in the image, the pixel is disregarded if

its value is considered to be background in that position, otherwise it is kept as a foreground element

for further analysis.

Moving Camera

When isolating foreground objects with a moving camera, the movement of these objects must be

identified and distinguished from the apparent movement of the background due to the change in

position of the camera. Separation of these movements can be performed by estimating the motion of

the camera and building up a model of the background using probability distributions similar to those

employed in the static camera scenario.

The approach taken by Szolgay et al. [3] to estimating camera motion is similar to that performed

during the MPEG-2 encoding process. The current frame is divided into a series of blocks and these

blocks are searched for in a reference frame. The displacement of each block between the current and

reference frames is known as a displacement vector or a motion vector. The motion vectors calculated

can be used to create an estimate of the reference frame in the current frame. Assume that F(t-1) and

F(t) are the reference and current frames, and D contains the motion vectors for the center pixels in
each block, then an estimate of the current frame can be made using the reference frame and the

displacements: F’(t) = (F(t-1) + D). The difference between the two frames, E, is calculated using E =

|F’(t) - F(t)|. Szlogay et al create what they call a Modified Error Image, MEI, using E and a threshold.

If the value at a pixel of E is greater than a threshold then MEI(x, y) = I(x,y), else MEI(x, y) = 0. An

MEI is calculated for each frame.

The background model is created using the previous n frames (n = 15 in Szol11 et al. [3]). When a new

frame is acquired the oldest is removed. The displacements for each of the stored frames are updated

so that they are relative to the newest frame. A probability density function is created which is used to

identify when pixels in the current MEI belong to the background or an independently moving object

which can be passed to an identification stage.

Environment Modeling
Traffic standardization predates the automobile industry. In the 19th century horse-drawn vehicles

were the most used mode of transport. The first traffic light, for instance, was installed in 1868 in

London as pointed by Taale et al. [4]. Nevertheless, it was only in the beginning of the 20th century

that started the long transition from horse-drawn vehicles to automobiles. From that period to the

present many traffic rules have being created to standardize the use of such vehicles and improve

safety. Among these standards are the signs in the street that specify where the vehicles must travel,

plates indicating speed and rules for that road, and traffic light systems controlling the flow.

Therefore, the detection and analysis of these elements is fundamental to understanding the traffic

situation and for making decisions in the many possible applications.

The existence the traffic standards facilitate the understanding of traffic conditions by humans and

therefore by computer automated systems. Just like humans, computer systems may be able to

recognize these signs and the intrinsic information. This section describes the concepts and some

approaches for such a challenging task.


Lane Detection

Street lanes are a traffic standard that specifies where the vehicle must travel and other possible lanes

in the same or the opposite direction. There are two major issues related to detecting lane markings:

they are not always clearly visible due to print quality and natural wear, and the geometry of the

markings cannot be used as a discriminating factor as there no governing standard in this aspect as

pointed by Lai and Yung [5].

Lai and Yung [5] proposed an approach to detect the lanes in the road using a stationary camera. First

the background is estimated using a sequence of frames. The background is expected to have all

visible lanes since there are no vehicles in the image. Then, Sobel edge detection is used to detect

lines. In the next step a transformation is performed, based on camera parameters such as height, tilt

angle and focal length, in order to compensate for distortions caused by perspective, resulting in a flat

image. Finally, the lines detected in the image are clustered and discriminated by orientation and

length in order to determine which ones are related to road lines.

For on-road applications such as driver assistance systems and autonomous vehicles, Wang et al. [6]

modeled lanes using cubic b-splines in order to support curved road models. Images of the road

collected by a camera placed in the car are divided horizontally into sections. The Hough transform is

used to detect the vertical straight lines of the lane markings in each section. By connecting the

straight lines of each section it is possible to determine the curve of the road. Assuming the road

boundaries are parallel, projecting the intersection of the boundaries of each side it is possible to

determine the vanishing points of the scene, a necessary parameter to handle perspective. Combining

the curve and perspective parameters, splines are defined and used to model the boundaries and the

mid-line of the lane.

Traffic Sign Detection and Recognition


The previous section discussed the importance of lane detection in traffic analysis. This section

discusses the detection, analysis and recognition of another group of important elements in the scene:

transit signs. Driver assistance systems can detect and recognize signs and notify the driver in the case

of doing a prohibited maneuver. Moreover, in the case of autonomous vehicle applications the

detection, recognition and interpretation of traffic signs are mandatory in the absence of a priori

knowledge of the environment.

Solutions to recognize traffic signs face similar challenges compared to the recognition of other traffic

elements in the environment. Weather and hour of the day affect the brightness of the scene and

consequently scene objects’ color and texture. Like road lanes, traffic sign appearance is degraded by

weathering. Shadow and occlusions may be caused by other objects in the scene such as buildings,

trees, posts and wires.

Fortunately, traffic sign design respects restrictive rules. Usually, the signs use simple geometric

shapes like triangles, circles, diamonds and octagons as well as different colors to distinguish from

each other. The sign colors, mostly red and yellow, were selected in such a way that they are easily

noticeable in a natural environment by humans. These definitions also facilitate detection and

recognition by computers.

Zadeh et al. [7] proposed an approach is presented that employs a supervised learning method, in

which use a dataset composed by a manually labeled samples to train an algorithm, to detect and

recognize traffic signs. The first step filters image pixel removing non-sign pixels based on their

colors. The algorithm was trained using samples of the traffic signs in different light conditions. Using

the color distribution in the signs in these different conditions it is possible to remove the pixels with

values outside the distribution. In the next step, edge detection is performed and straight lines and

curves are detected. Then, these connected lines are approximated to known classes of polygons and

shape is identified. Finally, analyzing the shape and color ratios in subregions of the image a traffic

sign is assigned to the given candidate.


Vehicle Detection
As expected, when analyzing traffic for any reason, vehicles are the components that deserve the most

attention. Independent of the application, analyzing vehicles through image processing can be

separated into two steps: vehicle detection and vehicle tracking. The first step is responsible for the

first assignment of identification to a vehicle in the scene. Usually, an identification number (ID) and

the initial coordinates are assigned to that vehicle. Detected vehicles are tracked until they leave the

scene in the second step. When determining the current coordinate of a given vehicle, the past

coordinates and the motion pattern might be used to improve the accuracy and performance.

Traffic is usually analyzed by the city infrastructure management, extracting useful information about

the traffic condition in order to support decisions, detect incidents, congestions and other undesired

situations that demand rapid reaction, or by onboard computer systems in vehicles in order to support

the driver or make autonomous decisions. In the former, fixed cameras are placed in strategic points

of the city to monitor roads of interest. In the latter, the cameras are placed in the vehicle for

analyzing the current road condition. The traffic scene is analyzed under considerably different

perspectives in these two situations, therefore the approaches are considerably different too and are

described separately in this work.

Stationary Camera

Considering scenes taken by stationary cameras, from an image processing point of view, vehicles

might be assigned as foreground elements since they are usually in motion in relation to the scene’s

background elements. Thus, background modeling and subtraction techniques can be used for

segmenting foreground elements, including the vehicles. Having separated the foreground elements

from the scene background, the next step is the classification of them based on visual or motion

features. After classification, these elements can be analyzed in order to provide useful information

for the target applications.

Color and texture are important perceptual descriptors to describe objects, so they are commonly used
as one of the first discriminative elements in order to determine segments of interest in images. Since

roads are in shades of gray, in some scenarios just the color is sufficient to detect and segment

vehicles with distinctive colors. Nevertheless, the majority of vehicles are in shades of gray too.

Therefore other features such as edges and shapes must be considered. However, in outdoor

environments like streets and roads, the color and texture of a given object may appear quite different

under different weather and brightness conditions and these have to be taken into account. Rojas and

Crisman [8] proposed an approach is presented that classifies pixels as road or non-road color based on

the color distribution of roads estimated using a set of images. Then, non-road pixels are grouped into

regions and filter rules are applied to filter out non-vehicles. Back-projection is used to find the region

coordinates in the ground plane and finally, in the last step, regions that collectively form vehicles are

joined.

Moving Camera

The problem of detecting objects using a moving camera is more challenging than in a fixed camera

situation since there is no clear separation between foreground and background objects. Vehicles

ahead at the same speed of the vehicle hosting the camera are seen as stationary objects. Therefore,

their detection must be based on visual features that discriminate them from objects of other classes in

the scene. Continuously changing landscapes along the road and vehicles that suddenly enter the

scene with very different speeds, sizes and appearances make the task challenging. The approaches

presented in this section are fundamental for driver assistance and autonomous vehicle applications.

As discussed by Betke et al. [9], the detection of passing and distant cars is addressed employing

different methods. When other cars pass the camera-assisted car, they usually cover large portions of

the image frame, changing the brightness in those regions. This brightness change in a given region is

used to hypothesize the presence of a car on it. On the other hand, distant vehicles are detected as

rectangular objects. A feature based approach creates edge maps for vertical and horizontal edges,

then aspect ratio and correlation are used to determine whether such an object is a car or not. Bucher
et al. [10] focused on the detection of approaching vehicles for collision avoidance. It scans each road

lane, starting from the bottom of the video frames, seeking strong horizontal segments. Using a priori

knowledge of the range of car length in road lane positions, the approach can infer whether that

horizontal line belongs to a vehicle or not.

When it is hard to separate foreground elements from the background with the precision necessary for

the target application, stereo-vision approaches might be employed. Using a stereo camera or two

cameras placed in slightly different positions it is possible to use the same principle of human

binocular vision to estimate the distance of the objects in the field of view. The disparity-map is

computed using the difference in the coordinates of pixels in the left and right images which

corresponding to the same scene point. This coordinate difference is known as disparity and when

combined with information about the cameras obtained through calibration (distance between centers

of the optical systems, focal lengths of lenses, etc.) can be used to map disparity values to depth. The

inverse mapping can be used in traffic analysis systems as discussed by Sun et al [11]. The set of

disparity values corresponding to the range of crucial depths can be calculated. When the disparity

map for the current scene has been computed a histogram of disparity values is calculated. The

number of peaks in the histogram within the critical range indicates the number of objects which must

be identified and processed.

Pedestrian Detection
The previous section described the detection of the most common object on roads: vehicles. However,

pedestrians are as important as vehicles since any real-world application must respect safety

constraints. In the case of infrastructure cameras, detecting pedestrians crossing roads or streets is

useful in order to take action in the case they are in a prohibited area or even to notify drivers in some

way. The latter depends on a nonexistent infrastructure to create a communication channel between

smart city systems and vehicles, but it is already subject of study. On the other hand, for on road

applications, like driver assistance systems, in which the camera is placed in the car, pedestrian

detection is fundamental. In the case of autonomous vehicle applications the ability to detect
pedestrians is mandatory in order to make reliable decisions, ensuring the safety of passengers and

pedestrians.

The detection, recognition and tracking of objects in real-time in outdoor scenes is a challenging task

in image processing applications. In the case of pedestrian detection, as pointed by Geronimo et al.
[12]
, the major challenges are: (i) the appearance of pedestrians vary since they can change pose, wear

different clothes, carry different objects; (ii) they must be detected in outdoor urban scenarios with

cluttered background, a wide range of illumination and weather conditions; (iii) pedestrians can be

partially occluded by other pedestrians and urban elements; (iv) they must be identified in motion and

different view angles. Moreover, for most applications, the solution must be able to react in real-time

to be practical in real-world applications.

Stationary Camera

Regarding segmenting pedestrians in scenes captured by stationary cameras, approaches for

background modeling and foreground object segmentation - similar to presented in the previous

section - might be used. The fundamental difference between detection vehicles and pedestrian is on

in the classification step. In the work presented by Viola and Snow [13], for instance, a cascading

classifier is employed. In this approach multiple classifiers are combined in order to estimate the final

classification as shown in Figure 2. Usually, each classifier use a small set of features and with a

lower false positive rate, are combined in order to obtain a more powerful classifier. In this case

classifiers use motion pattern and appearance features. An object is classified as a pedestrian in the

case all classifiers classify it positively.


Figure 2. An example of a cascading classifier for classifying image element as pedestrian.

Moreover, pedestrians detected at intersections or crossing the street is useful information for drivers

that could be sent to vehicle in order to avoid accidents. Schaack et al. [14] presented a solution for

detecting and estimating pedestrians’ position and send the information to a screen in vehicle through

a car-to-infrastructure (C2I) communication. For detecting pedestrians, a region of interest is

manually defined in the scene as polygon. Comparing sequential frames, objects in motion, or

foreground objects, are separated from the background. Since vehicles and pedestrians have a distinct

pattern of movement - regarding path, direction and velocity - they can be discriminated by analyzing

their motion patterns. In the latter work, the classification is performed employing a Support Vector

Machine (SVM) using motion features as parameters. In this specific case, the advantage of not using

visual features for classification is the ability to detect not only pedestrians in the scene, but any

moving object that can cause an accident such as cyclists or animals.

Moving Camera

The detection of pedestrian using a moving camera faces the same challenges of detecting vehicles

due the dynamic background and existence of a clear separation between foreground and background

objects. Vehicles have a rigid body, considerable size and must travel in specific regions. On the other

hand, pedestrian appear in different poses, increasing the challenge for shape algorithms. They are

also smaller than vehicles, in many cases presented in low resolution considering the distance to the

camera. In addition, unlike vehicles, people can walk in most places of the scene as sidewalks, town
squares, shoulders and roads. Consequently, pedestrian can be occluded by many different objects in

the scene and may appear in the scene from any direction. In combination, these factors make the

pedestrian detection problem usually more complex than vehicle detection.

Alonso et al. [15] proposed an approach is presented for person detection that uses stereo-vision for

segmentation, with feature detection algorithms to create an input vector for an SVM classifier. First,

points of interest are selected in the left and right images and their distance to the camera is computed.

A clustering technique is employed grouping these points in order to have candidate regions of

interest (ROI) that are possibly related to a person. For each ROI, a series of features are extracted

such as Canny-based features, Haar wavelets, gradient magnitude and orientation, co-occurrence

matrix, histogram of intensity differences and number of texture units. For each ROI, the features are

combined to obtain a feature vector. A database is manually created, selecting positive and negative

samples, and used to train a Support Vector Machine (SVM) classifier.

Applications
The previous sections described approaches for environment modeling and vehicle and pedestrian

detection and tracking. The simple detection and tracking of these elements does not solve any

problem. They are the means, not the end. Nevertheless, the analysis of those elements can result in

useful information that can be used into a wide range of traffic applications. The solutions described

in this section were chosen based on its popularity or its potential for solving problems in future

applications.
Street Intersection Monitoring

Street intersections are worthy of attention in traffic analysis since streets crossing each other have

vehicles travelling in different directions. At these intersections vehicles may stay in the same street

or change their route going to another one. When there are traffic lights controlling the flow, the use

of lanes is optimized allowing high speeds at intersections, but increase the gravity of accidents when

vehicles violate the red light. Therefore monitoring the intersection is fundamental for punishing

drivers violating traffic laws and for detecting accidents in order to rapidly dispatch emergency

services to minimize the consequences. Rapid medical assistance is crucial for people involved in the

accident. The removal of vehicles involved in the accident is fundamental to avoid a traffic jam that

may create bottleneck in the traffic network.

Regarding vehicle detection at intersection, inductive loop detectors (ILD) are currently most widely

used approach. Basically, it is a physical sensor buried in the traffic lane. This sensor has a wire loop

that resonates at a constant frequency when there is no vehicle over the loop. This frequency is

monitored by a detection device. When a large metal object, such as a vehicle, moves over the loop,

the resonated frequency increases and the vehicle is detected. Since objects are detected by it metal

mass, the sensor works in the same way in different weather and lighting conditions, which are among

the facts that most affect image processing approaches. On the other hand, it is necessary to install a

sensor in each lane intersection. A single vehicle might activate two sensors when passing between

them, trucks may be detected as multiple vehicles in some situations and maintenance issues are

among some drawbacks of this approach. The possibility of using a single camera to monitor multiple

lanes and the ability to perform more complex analysis, considering visual aspects of the scene,

promotes the research for a camera based solution for this issue.

Kamijo et al. [16] presented an approach is proposed to tackle intersection monitoring with the use of

image processing. In the first step, the background model of the intersection is constructed by

accumulating and averaging the pixel values of each position using an initial 20 minutes of sequential

image frames. Then, virtual sensors are set perpendicular to the incoming roads. Analyzing the
intensity variance in these virtual sensors (a rectangle area), it is possible to detect incoming vehicles.

When a vehicle is detected by a virtual sensor, the background is subtracted and the vehicle outlined.

The intersection at each time is represented by a data structure that stores the state for each 8x8 pixel

block of the scene. After the vehicle detection and segmentation, each block that is occupied by a part

of the vehicle is assigned with the vehicle’s ID. By analyzing the state of the each block in a sequence

of time, its motion vector can be determined. By analyzing the motion of each block in the region of a

given vehicle the motion is estimated and the vehicle tracked. This approach is employed because

simply segmenting the vehicle at each time frame cannot handle occlusions due to the camera angle or

vehicles very close to each other. The solution was able to detect vehicles at intersections with

occlusion and clutter effects at the success rate of 93% – 96%.

Congestion Detection

Traffic congestion is a problem faced by many cities with high population density or limited

infrastructure. Among the various approaches, traffic control system optimization might contribute to

reducing traffic congestion in some way, saving fuel, reducing emission of CO(2) and consequently

improving the lives of millions of citizens. Congestion can be defined as a situation in which the

throughput of the road is significantly lower than expected and consequently the same for vehicles

speed. Detecting or predicting congestions can be used to take action in order to mitigate it or reduce

its consequences. Dynamic lane reversal, for instance, is an approach that increases the capacity of the

road in one direction by inverting the direction of some lanes in the other direction, as illustrated in

Figure 3. In experiments conducted by Hausknecht at al. [17], the network efficiency was increased by

up to 72% in some situations. The sooner an action is taken, the better the result may be. Thus rapid

reaction by the use of automatic detection systems monitoring multiple roads can become a tool to

attack this problem more efficiently.


Figure 3. Dynamic lane reversal. In the right is shown a situation that three lanes are used in one direction and just one
in another in order to increase road capacity in a direction of interest.

In order to estimate the traffic condition in a given road, each lane can be monitored, analyzing the

speed of the cars in some manner. For instance, Li et al. [18] presented a solution in which virtual line

sensors are placed perpendicular to each lane. Using the pixel’s values in that virtual line in a

sequence of camera frames, a time-spatial image S is constructed where each row in the image is the

pixel values in that line in a different time. When there is no vehicle, S is just a gray shaded rectangle.

When a vehicle crosses the line, a slice of the vehicle from each frame is copied to S and the result is

a shape. The lower the height of the shape in S, higher the speed of the vehicle. On the other hand, the

higher the height of the shape in S, the lower the speed of the vehicle. Mapping the relationship

between shape sizes and traffic conditions it is possible to estimate traffic and detect congestions in

order to take some action.

Instead of using a virtual sensor, Yang et al. [19] analyzed the corners of the vehicles. In each camera

frame a corner detector is applied and the corners are grouped based on the distance between them.

Analyzing the motion of the corners it is possible to discriminate the corners of vehicles from the

corners in the background. Each corner group is assigned to a vehicle and analyzing the coordinates of

the corners in a sequence of frames it is possible to estimate the speed in some sense, in the case of

that work, corner pixels moved per frame. Manual setup can map low speed situations in pixels per

frame in order to notify probable congestions. Another approach grouping corners to detect and track

vehicles for congestion detection is presented by Chintalacheruvu and Muthukumar [20]. But, in this

case, in order to assess the speed in miles per hour, a more intuitive unit for understanding the

situation, two virtual detection zones are manually placed in the scene. The distance of the two zones
in scene is also a parameter of the system. Analyzing the difference in time between the detection of

the same vehicle in each zone it is possible to assess the speed in distance per time unit.

In these works the results analyze the capacity of the proposed solutions to detect congestion

scenarios effectively. Manual setups, inability to work during the night or in heavy raining conditions

are some barriers for broader adoption of these systems. The existence of other solutions using

physical sensors, RFID or GPS data, which work independently of illumination and weather

conditions, contributes to this phenomenon either. Nevertheless, the possibility to use a single camera

sensor for surveillance and traffic analysis for multiple purposes promote the research and application

in the field.
Accident Detection

With much technological advancement in driver safety systems and road quality, the number of traffic

fatalities per distance travelled has been decreasing in recent years. Figure 4 illustrates this aspect for

the case of the United States. The decreasing line is the number of annual deaths per billion miles

traveled. The increasing line is the number of vehicle miles traveled. As can be seen, people are

statistically less involved in traffic accidents per distance travelled. However, the population and

number of cars increased and, consequently, the absolute number of deaths per year still increases in

many countries. In the case of USA, the number of deaths increased to 33,561 in 2012, which is 1,082

more fatalities than in 2011. In the case of India and China, the two most populated countries, the

scenario is more dramatic. In 2011, the numbers of China and India in together accounts more than

500.000 deaths.

Figure 4. Annual US traffic fatalities per billion vehicle miles travelled (decreasing line) and miles travelled (increasing
line) from 1922 to 2012.

Considering the data presented above, any effort to reduce these rates are worthwhile. An aspect that

affects the number of fatal accidents is the lag time between an accident and the arrival of medical

assistance, as mentioned by Evanco [21]. Automatic accident detection systems might be employed to

decrease this lag time.


In order to detect traffic accidents in real time, Sadeky et al. [22] demonstrates an approach that

analyze a camera stream, estimating motion by computing optical flow for a sequence of video

frames. The center of gravity is located for each motion pattern using an Histogram of Flow Gradient

(HFG) and the distance between them are calculated. Analyzing the distance between patterns during

a time period, a probabilistic classifier is obtained for predicting the probability of an accident in the

scene. The approach was tested in a small dataset, depicting a total of 250 real scenes of traffic

accidents or abnormal vehicle events captured by traffic surveillance cameras, and presented an

expressive detection rate of 99.6% of accuracy.

Driver Assistance Systems

The first motor vehicles date from the end of the 20th century. Automatic transmission systems date

from the end of 1940s. Cruise control systems became popular in some countries in the 1970s and on-

board computer systems in the 1980s. As can be seen, motor vehicle development is heading towards

reducing manual operations of driving. The last advancements in this field are mainly in intelligent

driver assistance systems. Basically, a computer system, monitoring the internal and external

environment using sensors, is able to take simple actions and notify the driver of any important

information. In order to get information about the vehicle operation, driver situation and road

condition as reliably as possible, multiple types of sensors such as radar, laser and cameras are

employed. It is important to note that sensors are only responsible for getting raw information. This

information must be processed and interpreted by a computer to be useful. In the case of the data

provided by the cameras, image processing techniques are responsible for processing and extracting

information.

In order to illustrate the use of image processing in driver assistance systems, two applications were

chosen and discussed in the next subsections.


Driver Fatigue Detection

As can be seen in several statistics, driver fatigue accounts for a significant number of road accidents.

Truck drivers and others who drive certain amount of hours a day are the most common victim of

fatigue, risking their lives and the others. Therefore, the development and adoption of systems capable

of detecting fatigue and notifying the driver in some manner could improve safety and save lives.

The most serious consequence of fatigue is the driver falling asleep. A firm closure of the eyes usually

precedes the onset of sleep and it can be detected in order to send some signal to the driver. Devi and

Bajaj [23] proposed an approach to tackle this issue by placing a camera in front of the driver. Applying

an approach to detect skin-colored pixels, the face region is defined. Analyzing the average intensity

of each image row, the eyes are detected since they have a distinctive pattern compared to the other

face elements. Having identified the eye region, analyzing the color distribution and focusing on

white, it is possible to determine whether the eye is open or closed. In the case of the eye being closed

for a number of video frames, superior or a given threshold, fatigue is detected. The approach was

tested in a controlled scenario with good image quality regarding brightness and sharpness. For real-

world applications improvements are necessary to deal with low light conditions and a method to

notify the driver has to be defined.

Collision Avoidance

When driving accompanied by family members or friends, it is a difficult task to remain focused on

the road all the time. It is usual to get into conversations when driving. Consequently, many rear-end

collisions are the consequence of drivers distraction. Abrupt braking maneuvers by a vehicle ahead

are another cause of this type of collision. In this situation, any delay in making a decision is decisive.

Therefore, collision avoidance systems which can notify the driver when an object is approaching the

vehicle can improve safety and reduce accidents.

Laser and radar-based solutions are the most common ones because of reliability. Nevertheless, the

potential of using a video camera, a sensor that provides more rich information, for multiple
applications and for dealing with more complex scenarios promotes research in image processing and

computer vision targeting future applications.

Aiming to detect braking maneuvers of vehicles ahead, Chachuli et al. [24] proposed an approach to

detect the beak lights being turned on by the use of a hybrid color model and morphological

operations. The hybrid color model combines the red color from the RGB color model and saturation

and intensity values from the HSI color model. With knowledge of the brake light position it is

possible to estimate the vehicle’s position. Even though this approach is dependent on the presence of

the brake light, it can be combined with other approaches since in the scenario in which there is a

brake light it could provide a better response.

Also using a camera placed in the vehicle, Bucher et al. [10] proposed a method that scans each road

lane, starting from the bottom, applying image processing techniques in order to detect possible

approaching vehicles. After the vehicle is detected its vertical position in the video frame is used to

estimate the distance in some manner. The advantage of this approach is that it is able to handle

approaching vehicles in different situations, including those stopped on the road.

Self-driving Automobile

The previous section described the applications of driver assistance systems to support drivers,

increasing safety and comfort. Since the 1980s approaches to create autonomous motor vehicles,

completely removing the driver in some situations have been developed. In 2004, the first DARPA

Grand Challenge - a prize competition for American autonomous vehicles - took place in order to

promote the development in the field. The entry of a large company like Google into the field brought

up the topic for the mainstream media, creating a lot of expectations. Consequently, the research

efforts in the field have been increasing in the last few years.

The potential of future commercial applications will probably motivate even more investments in the

next years. More reliable decisions in less controlled environments, for instance, in difficult weather
conditions, in situations in which other drivers made unexpected or even illegal maneuvers, in cases

that objects suddenly appear in the scene such as distracted pedestrians, are essential to get people

approval and political support.

Different to driver assistance systems which just support the driver with information and simple

actions, self-driving capability implies total control of the vehicle functions. The system must be able

to not only drive the car, but also use the proper signals to inform other drivers about the next

maneuver and most importantly, it must be prepared for unexpected behavior from other drivers and

people in the street. In order to accomplish that, the self-driving automobile combines a series of

sensors and systems. As expected given its rich information, video cameras are among the most

important sensors and image processing techniques are at the heart of the solutions.

Image processing solutions have an important role in the development of autonomous driving

solutions. Levinson et al. [25] and Teichman et al. [26] described an autonomous driving systems

consisting of two spherical cameras and two stereo vision cameras that demands image processing to

interpret their data. Moreover, the system also has a rotating LIDAR which use a rotating laser system

to produce an image of the scene. Other supporting systems such as radars and GPS are also used.

However, the most significant data to the system are the images; therefore image processing is on its

core. For instance, in order to recognize objects, it is combined object shape and motion description

classifiers. The system is able to recognize pedestrian, cyclists and cars in real time. Many techniques

described in this work such as pedestrian, vehicle and lane recognition, brake light detection, traffic

sign recognition might be combined to compose self-driving solutions. Many improvements and new

approaches are expected to be release in this field of application and image processing is probable to

be involved in many of them.

Conclusion
This work presented an overview of image processing approaches and applications to analyze the

traffic scene images in order to extract meaningful information to support decisions in a wide range of

applications. The basic concepts of image processing techniques such as background removal,

environment modeling, pedestrian and vehicle detections are explained and references are given for

details of each specific approach. Applications using these basic concepts and others for traffic

analysis (e.g., street intersection monitoring, congestion and accident detection, collision avoidance

systems, driver fatigue detection, self-driving automobiles) are discussed in order to give a panorama

of the field. It is important to note that this work does not cover all existing traffic applications that

employ image processing techniques, but it rather considers representative systems that demonstrate

the most common concepts and highlight the major trends in the area.

Although this work addresses several sophisticated solutions, their adoption seems likely to happen

gradually. In the case of driver assistance systems, a new solution must be well developed and reliable

in many different conditions to go to the market. There is also an economic barrier for many of these

solutions to get into mainstream market. Automobile industry is a very competitive one; therefore the

benefits of a new solution must justify the increasing in the vehicles prices. This phenomenon can be

observed in the case of driver assistance systems and other more sophisticated solutions only available

in high-end and more expensive vehicles. Economic indices of the markets also affect the

configuration of the vehicles and the adoption of such solutions. For instance, seatbelt is mandatory in

every country, whereas airbag is not mandatory in many developing countries. In the case of

applications that take autonomous actions, in which for safety reasons must have high reliability, such

as autonomous vehicles, the adoption will depend of the regulation and political efforts in each

market.

For the next decades much of the attention will be focused on smart cities applications and

autonomous vehicles. Real-time automatic traffic analysis integrated with other city systems has

applications in public safety and traffic and emergency management, so it is expected to become more

popular in next years. In the case of autonomous vehicles, a lot of expectations were created due to
significant technological improvements and efforts from companies like Google that brought the

subject into mainstream media. Traditional automakers like Volvo has announced that will release

autonomous vehicles for test with real drivers before 2020. Nevertheless, even after regulation

approval and releasing of the first commercial solutions, the adoption is expected to happen gradually

due to many reasons. Most notably, just the sensors used in current autonomous vehicle solutions are

several hundred thousand US dollars’ worth. The entrance of new players in the competition targeting

this new and very valuable market may contribute for development of solutions and equipment and

continuous drop in price in the next decades.

References

[1] Wren, C. R., Azarbayejani, A., Darrell, T., Pentland, A. P. Pfinder: Real-time tracking of the
human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, Vol. 19,
780-785.

[2] Piccardi, M. Background subtraction techniques: a review. IEEE international conference on


Systems, Man and Cybernetics, 2004, Vol. 4, 3099-3104.

[3] Szolgay, D., Benois-Pineau, J., Mégret, R., Gaëstel, Y., Dartigues, J. F. Detection of moving
foreground objects in videos with strong camera motion. Pattern Analysis and Applications,
2011, Vol. 14, 311-328.

[4] Taale, H., Hoogendoorn, S., van den Berg, M.,De Schutter, B. Anticiperende
netwerkregelingen, NM Magazine, 2006, Vol. 1, 22-27.

[5] Lai, A. H., & Yung, N. H. Lane detection by orientation and length discrimination. IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2000, Vol. 30, 539-548.

[6] Wang, Y., Teoh, E. K., & Shen, D. Lane detection and tracking using B-Snake. Image and
Vision computing, 2004, Vol. 22, 269-280.
[7] Zadeh, M. M., Kasvand, T., Suen, C. Y. Localization and recognition of traffic signs for
automated vehicle control systems. In Intelligent Systems & Advanced Manufacturing, 1998,
272-282.

[8] Rojas, J. C., Crisman, J. D.. Vehicle detection in color images. IEEE Conference on
Intelligent Transportation System, 1997.

[9] Betke, M., Haritaoglu, E., Davis, L. S. Real-time multiple vehicle detection and tracking from
a moving vehicle. Machine vision and applications, 2000, Vol 12, 69-83.

[10] Bucher, T., Curio, C., Edelbrunner, J., Igel, C., Kastrup, D., Leefken, I., von Seelen, W.
Image processing and behavior planning for intelligent vehicles. IEEE Transactions on
Industrial Electronics, 2003, Vol. 50, 62-75.

[11] Sun, Z., Bebis, G., Miller, R. On-road vehicle detection: A review. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 2006, Vol. 28, 694-711.

[12] Geronimo, D., Lopez, A. M., Sappa, A. D., Graf, T. Survey of pedestrian detection for
advanced driver assistance systems. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2010, Vol. 32, 1239-1258.

[13] Viola, P., Jones, M. J., Snow, D. Detecting pedestrians using patterns of motion and
appearance. Ninth IEEE International Conference on In Computer Vision, 2003, 734-741.

[14] Schaack, S., Mauthofer, A., Brunsmann, U. Stationary video-based pedestrian recognition for
driver assistance systems. In Proceedings of 21st International Technical Conference on the
Enhanced Safety of Vehicles, 2009.

[15] Alonso, I. P., Llorca, D. F., Sotelo, M. Á., Bergasa, L. M., Revenga de Toro, P., Nuevo, J.,
Garrido, M. G. Combination of feature extraction methods for SVM pedestrian detection.
IEEE Transactions on Intelligent Transportation Systems, 2007, Vol. 8, 292-307.
[16] Kamijo, S., Matsushita, Y., Ikeuchi, K., Sakauchi, M. Traffic monitoring and accident
detection at intersections. IEEE Transactions on Intelligent Transportation Systems, 2000,
Vol. 1, 108-118.

[17] Hausknecht, M., Au, T. C., Stone, P., Fajardo, D., Waller, T. Dynamic lane reversal in traffic
management. IEEE Conference on Intelligent Transportation Systems (ITSC), 2011, 1929-
1934.

[18] Li, L., Chen, L., Huang, X., Huang, J. A traffic congestion estimation approach from video
using time-spatial imagery. In Intelligent Networks and Intelligent Systems, 2008, 465-469.

[19] Yang, Z., Meng, H., Wei, Y., Zhang, H., Wang, X. Tracking ground vehicles in heavy-traffic
video by grouping tracks of vehicle corners. In Proc. of the IEEE ITSC, 2007, 396-399.

[20] Chintalacheruvu, N., Muthukumar, V. Video based vehicle detection and its application in
intelligent transportation systems. Journal of transportation technologies, 2012, Vol. 2.

[21] Evanco, W. M. The impact of rapid incident detection on freeway accident fatalities.
Mitretek, 1996.

[22] Sadeky, S., Al-Hamadiy, A., Michaelisy, B., Sayed, U. Real-time automatic traffic accident
recognition using HFG. In Pattern Recognition (ICPR), 2010 20th International Conference
on, 2010, 3348-3351.

[23] Devi, M. S., Bajaj, P. R. Driver fatigue detection based on eye tracking. First International
Conference on Emerging Trends in Engineering and Technology, 2008, 649-652.

[24] Chachuli, S. A. M., Ishak, K. A., Yusop, N. Vehicle Brake Light Detection Using Hybrid
Color Model. Applied Mechanics & Materials , 2014.

[25] Levinson, J., Askeland, J., Becker, J., Dolson, J., Held, D., Kammel, S., Thrun, S. Towards
fully autonomous driving: Systems and algorithms. In Intelligent Vehicles Symposium (IV),
2011, 163-168.
[26] Teichman, A., Levinson, J., Thrun, S. Towards 3d object recognition via classification of
arbitrary object tracks. In Robotics and Automation (ICRA), 2011 IEEE International
Conference on. 4034-4041.

Figure Captions
Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians;
b) the modelled background after a analysing frames for a few seconds.... ......................................... ..5
Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians;
b) the modelled background after analyzing frames for a few seconds. ............................................... 5

Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians;
b) the modelled background after analyzing frames for a few seconds. ............................................... 5

Figure 1: Background modelling using a stationary camera. a) a scene frame with many pedestrians;
b) the modelled background after analyzing frames for a few seconds. ............................................... 5

View publication stats

You might also like