Automatic Vehicle Counting Based On Surveillance Video Streams

1
Automatic Vehicle Counting based on

Surveillance Video Streams
Abel Ricardo Marcão Ribeiro and Prof. Paulo Lobato Correia
 cause long lines of vehicles and traffic jams, eventually

Abstract—Automatic counts of vehicles are a important leading to more accidents due to the confusion originating
component of Intelligent Transportation Systems (ITS) from these events. Hence, the importance of Intelligent
applications, which aim to provide services related to distinct Transport Systems (ITSs) development, which should be fast
modes of transportation and traffic management to allow the
enough to reach and divert the traffic in such events.
users to be better informed and make better use of the transport
network. The goal of the ITS is contribute to a smarter use of the The European Commission [1] defines the ITS as advanced
transportation networks allowing a better management of time applications which aim to provide services related to distinct
and resources. modes of transportation and traffic management to allow the
This paper develops a Video-based Traffic Management users to be better informed and make better use of the
System to count vehicles acquired from Traffic Surveillance transport network.
Video (TSV) streams available on the Internet. The system The use of video cameras for monitoring purposes is
obtains Temporal-Spatial Images (TSI) from the TSV streams
and computes a combination of features to acquire the vehicles becoming more popular overtime. Hence, a video-based road
on the videos. The vehicle counts can be provided to traffic traffic monitoring system is proposed in this paper. This
guidance applications, allowing better route selection and thus system provides a traffic assessment through vehicle counting
improved decision-making. using video and image processing methods. Any device such
The developed system has flexibility to allow the selection of as a GPS or a cell phone could receive the result of the
the road lanes which should be counted; it is robust, adapting processed traffic information and use it to decide the best
itself to different operational conditions on the scene being
route to take and thus avoid traffic congested paths.
captured, such as night and day time and inclement weather
conditions, without any human operator interference; it has a The system has to be robust, so that it can automatically
fast processing speed which allows real-time counts of the adapt to different operational conditions on the same scene,
vehicles and an occlusion detection method to provide improved such as weather or illumination changes along the day,
counts. without human intervention, although a human operator
should also be able to adjust its parameters manually when he
Index Terms—Traffic Management System, Temporal-Spatial thinks it is fit. The interface which the human operator has
Image, Traffic Surveillance Video, Virtual Detection Line,
Occlusion Detection
access should allow him to easily choose which lanes of the
road should be analyzed, receive the information from those
I. INTRODUCTION locations in a simple manner and have easy access to the
parameters being used by the system at any time.
T HE increasing number of circulating vehicles has called
for the development of more efficient systems to classify
and monitor road traffic. The information acquired by these
The parameters, which are adapted on distinct conditions,
are mainly detection parameters used by the employed filters
and the edge detector, parameters used in the false vehicle
methods is used to plan and to manage the road network, to
detection module to validate the objects detected as vehicles,
allow monitoring of the vehicles on the network and to
and parameters to find possible occluded vehicles.
provide real-time data to guide the users of the network. These
The system was developed to be able to work with low
systems enhance the capacity of the road network resolution videos, which are the type of videos typically
infrastructure, improving drivers’ safety and traffic provided online for the majority of the traffic surveillance
management efficiency. cameras. The proposal includes an occlusion detection
Traffic management with real-time data gains special method, ensuring the system robustness, allowing to achieve
importance in densely populated areas, due to the high amount high accuracies in all tested operational conditions.
of vehicles circulating in them, as it provides the means to The remainder of the paper is organized as follows: Section
better assess traffic conditions and thus be able to divert it in II reviews the state of the art on traffic management systems,
case of unpredictable events such as car accidents, which may with a special focus on video processing techniques. Section
III presents the proposed system, detailing the used methods.
Abel Ricardo Marcão Ribeiro and Paulo Lobato Correia are with Instituto
Experimental results are presented in Section IV. Section V
de Telecomunicações – Instituto Superior Técnico, Av. Rovisco Pais, 1, 1049- identifies the strengths and weaknesses of the proposed
001 Lisbon, Portugal (e-mail: hds@lx.it.pt, plc@lx.it.pt). technique, providing some suggestions for future work.
The authors acknowledge the support from Instituto de Telecomunicações.
2
II. STATE OF THE ART present a comprehensive inventory of the problems arising
Traffic management is a field primarily dedicated to from the use of such TSV streams for traffic monitoring.
improve the flow of vehicles and improve road’s safety. It These range from undesired camera tilts, to overall low-
involves traffic monitoring and classification by local or quality videos, shadows and vehicle occlusions caused by low-
national roadway authorities to manage traffic flows and angle image acquisition. Possible solutions to address the
provide advice concerning traffic congestion. The traffic flow above challenges [2] are listed below.
classification can be divided into manual, semi-automatic and Shadows- The shadows of moving objects need a careful
automatic. consideration in traffic management systems. The detection of
Manual classification requires the intervention of human moving shadows affects the efficiency of the system as the
operators, such as the observation of roads by traffic reporters shadow of a moving vehicle is also moving and could
or traffic patrols. These methods employ a minimal amount of therefore be detected as a different vehicle or, if detected
technology and rely mainly on the operator. The automatic together with the vehicle, it may seem bigger than it really is.
and semi-automatic classifications try to replicate the human When the shadow of a vehicle connects to other vehicles, the
decisions taken by manual classifications. detection algorithm may merge them into a single object.
Semi-automatic classification requires the intervention of Problems caused by shadows are detailed on [3-7].
the end-user for its efficient operation. This classification uses The methods used in the literature to deal with shadows are
information such as the speed limit and the average speed of divided in two types, property-based and model-based. On one
the users on the roads to determine the traffic congestion along hand, property-based methods use extracted features from the
the day. However, information such as average speed has to be objects, such as geometry, color or brightness, to identify the
voluntarily provided by a considerable amount of users for an shadow regions of the objects. On the other hand, model-based
accurate estimate the traffic on distinct weekdays. methods use a priori knowledge of the scene geometry, the
Automatic classification doesn’t require any assistance from foreground objects or the light sources to detect the presence
a human operator for the computation of traffic information. of moving shadows on the objects [8].
The input information needed for this classification can be Inclement Weather- Distinct types of weather provide
obtained through the use of devices such as inductive loops or different challenges to a traffic monitoring system. Rainy
sensors. Also image and video analysis methods can be used weather may cause puddles on the roads which cause
for the automatic traffic flow classification and they are the reflection in the video’s image and may cause the detection of
main focus of the paper. false vehicles. The visibility of the traffic from the cameras
The video-based classification methods usually start by a tends to be reduced, being specially noted on foggy weather.
foreground estimation of video or image, by removing the This reduced visibility is of special importance, because it
background scene and thus highlight the moving objects that hinders the vehicle detection, due to the reduction of the pixels
are considered candidate vehicles. In this phase a background intensities which cause a lower variation between them and
model is typically used. The next step receives the mask of the reduce the efficiency of the object detection. Once more, the
foreground estimation, to detect the moving objects. A related literature doesn’t provide significant specialized
detection algorithm is used to identify the objects, generating research in these situations [9]. To mitigate the lower
an object list, and extract the relevant features from them. In a efficiency caused by lower variations between pixels, the
third phase, the objects in this list are classified has vehicles or detection methods tend to be adjusted with higher sensitivity
not, based on a set of defined characteristics that should to pixels changes. These changes usually are made by
distinguish the vehicles from any other possible moving object lowering thresholds or reducing the size of smoothing filters
thus avoiding false positives. As an output of this phase, we used to reduce the noise, such as the Gaussian Filter.
should have a list of vehicle allowing their counting and, The illumination changes throughout the day cause
depending on the algorithms used, their tracking. This generic detection problems, especially in places where there are no
architecture is illustrated in Fig. 1. street lights at night. In places with no street illumination,
vehicles can be detected at night by their headlights, taillights
[2].
Vehicle Occlusion- It occurs when the view of an object is
partially or completely obstructed by another. In the cameras
used in traffic management systems, this is usually caused by
the high traffic density on the road or by a poor angle of the
camera placement. As a result, vehicles might be missed,
reducing the overall system performance. To avoid these
Fig. 1: Block diagram of a traffic management system situations in a traffic management system, the cameras should
be placed in high poles over the road, which provide a better
A. Challenges view angle of the road and the vehicles[9].
The detection of vehicles in the foreground is crucial for
video-based traffic management systems. The use of public,
uncontrolled, TSV streams is challenging. Loureiro et al. [2]
3
B. Motion-estimation-based vehicle detection methods together. There are no limitations of direction, angle or length.
Vehicles are usually detected from TSV streams by finding In order to obtain the best results, the VDLs should be drawn
objects with signification motion. There are several researched perpendicularly to the road lanes.
methods, which include Gaussian scale mixture model method TSI Generation – The TSIs of the VDLs drawn on the
[10], the interframe difference method [11], object-based previous step are generated individually. The length of the TSI
segmentation [12] and background subtraction method [13]– depends of the number of frame present in the video under
[15]. analysis and the width of the TSI depends of the length of the
The detection method used on this paper is based on corresponding VDL, as illustrated in Fig. 3.
Virtual Detection Lines (VDLs) and a modified background
estimation method developed in this paper. In this method,
Temporal-Spatial Images (TSIs) are generated using the
luminance values of pixels of the moving objects which cross
the VDLs [16]-[19]. Afterwards, the moving objects are
detected when they cross these VDLs by applying the
background subtraction to the TSIs. This background
subtraction is acquired using the background estimation
proposed in this paper. VDLs and a TSI with the detected Fig. 3: VDL and corresponding TSI
moving objects are illustrated in Fig. 2, respectively on the left
and right. Background Estimation & Subtraction – A new method
was developed to find the best estimation for the background
of the TSI. Traditionally, the background estimation of a TSI
is performed by searching the column of the TSI with lower
variance and using this column as the background subtraction
to the whole TSI. The rationale is that the background of the
TSI is usually the road pavement and its pixels should have
nearly constant intensity and thus this lower variance approach
can find it. This approach works correctly when the road
Fig. 2: Examples of VDLs and TSI present in the TSI is just the asphalt. However, when there are
road surface markings this rationale does not hold true most of
the times. If, for instance, one of these road surface markings
III. PROPOSED SYSTEM
becomes hidden by a dark vehicle, using the lower variance
The system proposed in this paper is a Video-based Traffic approach, one of the columns of the TSIs where the marking is
Management System, which uses an image processing hidden will be chosen as the background instead of a column
algorithm to estimate the number of vehicles from where the pavement markings are present. This will provide
uncontrolled traffic monitoring video cameras. The system has the wrong background as illustrated in Fig. 4 on the right.
to be robust enough to handle distinct operational conditions, The approach proposed in this paper uses a histogram of the
such as night time or inclement weather, and also be able to TSI, to give score to each pixel’s intensities. The score is
minimize the number of counting errors due to occlusions. based on the frequency of that intensity on the TSI. More
The TSV streams tend to have low resolutions, usually frequent intensities have higher scores. Hence, all the TSI
around 200x200 pixels or less, and the acquiring cameras are pixels are given a score following the equation (1).
installed far from the road lanes, thus the visualized vehicles
mostly appear as small objects with a minimum amount of 𝑉𝐼 𝑥, 𝑦 = ℎ(𝐼 𝑥, 𝑦 ) (1)
detail. Therefore, the methods used on this work had to rely on
object features such as area, position and moving direction to
decide if it should be considered as a possible vehicle. These
features have to be extracted from the TSI generated from the Where 𝐼 𝑥, 𝑦 is the matrix of the TSI, 𝑉𝐼 𝑥, 𝑦 is the score
provided video sequence of the camera according to the matrix of all the pixels and h is a function based on the
chosen VDLs of the road lane. histogram that gives higher scores to the intensity values with
higher frequency.
A. System implementation In order to acquire the sum of the score on each column the
The system thus follows a sequence of steps to acquire the sum of each column of 𝑉𝐼 𝑥, 𝑦 is computed, which is defined
features that allow it to obtain a vehicles list. by equation (2).
Initialization – System initialization consists of using the
𝑀
application’s GUI to draw the VDLs on the TSV’s initial
frame, from which the ST images will be automatically 𝑣𝑥 = 𝑉𝐼 (𝑥, 𝑖) (2)
created. The user can draw an unlimited number of VDLs, in 𝑖=𝑖
order to analyze each road lane individually or all the lanes
4
Where 𝑣𝑥 is a vector with the scores of all the columns.

Finally, to obtain the column of 𝑉𝐼 with highest score, it can be Day Mode
computed by equation (3). 800
Canny threshold
600
𝑣𝑠∗ = max 𝑣𝑥 , ∀ 𝑥 ∈ ℤ (3) 400
200
The column with index s has the highest score and thus is 0
the column of the TSI which has the pixels that are more 0 50 100 150 200 250 300
frequent in the whole TSI. Hence, it has the highest Average TSI pixel intensity
probability of being a good representation of the background
of the VDL. The difference between both approaches in a road
with surface markings is illustrated on Fig. 4. Day
Fig. 5: Graph of the day mode
NBSM & NFSM

Fig. 4: Left: Approach developed in this paper; Right: Lower 800
Canny threshold
600
variance approach.
400
Edge detection – The edge detection tries to find the 200
moving objects on the TSI. It works on the modified TSI, 0
which is the result of the background subtraction of the 0 50 100 150 200 250 300
previous step. This modified TSI has the moving foreground Average TSI pixel intensity
objects which includes the moving vehicles on the road.
The Canny edge detector was chosen to extract the edges of
Back side Front side
the foreground-only TSI due to its resilience to image blur,
arising from fog and other adverse weather conditions, and
due to the high correlation between the contrast and the edges Fig. 6: Graph of NBSM and NFSM
in an image.
The Canny edge detector contains two threshold The NFSM has a constant value of this threshold because
parameters, which have to be adapted according to the the detection of the vehicle is made slightly differently on this
illumination present on the video. The ratio 1:3 between the mode. In this case, the TSI take an additional modification
low threshold and the high threshold presented the best results before applying the Canny edge detector. Otsu’s method is
for the vehicle detection. applied to the TSI in this mode in order to highlight the
The value of threshold varied not only accordingly to the headlights of the vehicle and ignore everything else. The
illumination present on the road pavement, measured by the rationale is that due to the headlights higher intensity, they are
average pixel intensity of TSI, but also accordingly to the easier to detect than the body of the vehicle and thus they are
overall illumination of the video, measured by the average used for the vehicle detection. Therefore, the Canny edge
pixels of a sample image taken from the video. detector can have a constant and high threshold independently
This led to the development of the 3 modes: of the average pixel intensity of the TSI.
• the Day Mode (DM); False Vehicle Elimination – There are situations where
• the Night Front Side Mode (NFSM); undesired objects are identified causing false vehicle
• the Night Back Side Mode (NBSM). detections. These spurious objects usually appear due to noise
Each of these modes has its own settings for the Canny low or double detections of the same vehicle. In order to eliminate
threshold. The day mode is used on bright videos, usually these undesired objects, a verification of the corresponding
during the day. NFSM and NBSM are used on dark videos, features, namely their area, width and height, allow deciding
usually at night. NFSM is used when the vehicles are moving whether to keep them. The method proposed and developed in
towards the position of the camera, otherwise the NBSM is this paper for False Vehicle Elimination (FVE) is composed of
used. The variation of the Canny low threshold on the 3 modes three processes for vehicle validation, as described in the
is illustrated in Fig. 5 and Fig. 6. following paragraphs.
The objective of the first proposed process is to eliminate
the objects which have small and unusual dimensions, namely
a small width and an extremely long height or a small height
and an extremely long width. These objects are noise that
reduces the vehicle counting accuracy. To do so, the width and
5
height of the objects are analyzed. When equation (4) is (𝑥𝑎 , 𝑦𝑎 ) and 𝑥𝑏 , 𝑦𝑏 are the top left corner location of a and b
satisfied, the object is eliminated. Otherwise the object is kept. and 𝑎𝑟𝑒𝑎𝑠 is the overlapping area.
Considering that we are verifying if the rectangles should be
𝑤𝑖𝑑𝑡ℎ𝑖 rejected, letting 𝑎𝑟𝑒𝑎𝑎 be the area of rectangle 𝑎, which is the
ℎ𝑒𝑖𝑔ℎ𝑡𝑖 < 𝛿ℎ  𝑟 < rectangle with smaller area, and 𝛽 an arbitrary value chosen
ℎ𝑒𝑖𝑔ℎ𝑡𝑖
(4) by the user in the interval[0,1], we have:
ℎ𝑒𝑖𝑔ℎ𝑡𝑖
 𝑤𝑖𝑑𝑡ℎ𝑖 < 𝛿𝑤  𝑟 < 𝑎𝑟𝑒𝑎𝑠 > areaa . β (9)
𝑤𝑖𝑑𝑡ℎ𝑖
The overlapped area, 𝑎𝑟𝑒𝑎𝑠 , is thus compared with areaa .

Where: 𝑤𝑖𝑑𝑡ℎ𝑖 is the width of the object i in pixels; When (9) is satisfied, the common area is superior to a
ℎ𝑒𝑖𝑔ℎ𝑡𝑖 is the height of object i in pixels; 𝛿ℎ and 𝛿𝑤 are percentage of the area of this rectangle 𝑎, and thus object 𝑎 is
values for minimum height and width of the object in the rejected, otherwise there is no rejection. Rectangle 𝑏 is never
interval and 𝑟 is a ratio between the width and the height of rejected as it is assumed that the bigger rectangle contains the
the object. main body of the vehicle and the smaller rectangle may be
The second developed process is used to verify the objects’ other vehicle near vehicle 𝑏 or a detail of the vehicle 𝑏, which
area using the inequation (5). Its purpose is to reject small in this case must be ignored in the counting.
objects, usually noise, as vehicles. When the inequation is Occlusion detection – The detection of occlusion vehicles
satisfied, the object is rejected. Otherwise the object is not was made using multiple VDLs. The proposed approach is
rejected. partially based on the method presented in [19]. The method
relies on drawing multiple VDLs in different positions along
𝑎𝑟𝑒𝑎𝑜𝑏𝑗𝑒𝑐 𝑡 𝑖 < 𝛼. 𝑎𝑟𝑒𝑎𝑎𝑣𝑒𝑟𝑎𝑔𝑒 , ∀𝑖 ∈ 𝑋 (5) the same lane and tries to find a correspondence between the
vehicles identified in the different resulting TSIs, checking for
vehicles that may not be occluded in some of the TSIs to
Where: 𝑋 is the set of objects detected; 𝑎𝑟𝑒𝑎𝑜𝑏𝑗𝑒𝑐 𝑡 𝑖 is the area
detect them.
of the object i; 𝑎𝑟𝑒𝑎𝑎𝑣𝑒𝑟𝑎𝑔𝑒 is the computed average area of The main difference between the proposed method used and
the objects detected; α is an arbitrary value chosen by the user. the one presented by Mithun et al. [19] is the way the vehicles
The third verification introduced in the FVE developed in are associated between different TSIs.
this dissertation searches for the duplication of objects for the The method proposed here is based on the position of the
same vehicle, due to multiple detections of the same vehicle. vehicles observed in one TSI’s VDL and the expected position
The multiple detections can usually be removed by in the other TSI, which is estimated using the known positions
eliminating the detected objects which are at least partially of the VDL relative to each other. With this knowledge, it is
overlapped. The strategy used to tackle this is based on the possible to correctly associate the vehicles in both TSIs and
comparison of the overlapping area, with the total area of each signal possible undesired objects found throughout the TSIs,
of the individual rectangles. When this overlapping area is for instance due to noise. This avoids early erroneous
larger than a given threshold, the rectangle is rejected. occlusion detection of vehicles, assuring that the remaining
However, both rectangles cannot be simultaneously rejected; objects, which do not have a correspondence to one of the
hence when a rectangle is rejected the one with a bigger area TSIs, are either occluded vehicles or other moving objects
is always kept. detect in the TSI. The final result of the vehicle association is
The overlapping area is computed using the equations (6), illustrated on Fig. 7, where the white rectangle represents a
(7) and (8). detected occlusion.
𝑤𝑖𝑑𝑡ℎ = 𝑚𝑖𝑛 𝑥𝑎 + 𝑤𝑖𝑑𝑡ℎ𝑎 , 𝑥𝑏 + 𝑤𝑖𝑑𝑡ℎ𝑏

− 𝑚𝑎𝑥⁡(𝑥𝑎 , 𝑥𝑏 ) (6)
Fig. 7: Occlusion detection using 2 TSIs
Vehicle Counting – The final step of the proposed system

consists of computing the final number of vehicles which were
ℎ𝑒𝑖𝑔ℎ𝑡 = 𝑚𝑖𝑛 𝑦𝑎 + ℎ𝑒𝑖𝑔ℎ𝑡𝑎 , 𝑦𝑏 + ℎ𝑒𝑖𝑔ℎ𝑡𝑏 crossing the selected lane for the duration of the video
(7) sequence, based on the information provided by the TSIs
− 𝑚𝑎𝑥⁡
(𝑦𝑎 , 𝑦𝑏 )
analyzed thus far and their occlusion analysis. Therefore, the
final number of vehicles is simply the sum of the vehicle that
where associated between the TSIs and the occlusions which
were found.
𝑎𝑟𝑒𝑎𝑠 = 𝑚𝑎𝑥 0, 𝑤𝑖𝑑𝑡ℎ . 𝑚𝑎𝑥⁡{0, ℎ𝑒𝑖𝑔ℎ𝑡} (8) Cumulative data treatment – The data acquired from feature
extraction from the objects detected, namely their area, height
and width, in multiple video sequences can be stored and used
Where the rectangle a and b are the bounding rectangles of to improve the object validation, if the videos are provided by
the two objects being analyzed, sorted by increasing size; the same static video camera. Hence, this cumulative data of
6
the features acquired throughout distinct video sequences, mode. The video sequences used consider variable traffic
when available, is used on the average values utilized across conditions, from low traffic/empty roads to high/congested
the whole system. traffic. This table provides information relative to the number
On each new video sequence analyzed, a selected set of of video sequences used and the number of vehicles present on
values of each feature is added to the cumulative data of that each of the analyzed VDLs. All the system parameters were
featured and is followed by a computation of the cumulative set automatically. Each vehicle count is made separately with
moving average of each feature thus far. In most video no previous data accumulated from previous training to learn
sequences, only a portion of values of each feature is added to the expected feature values for the objects crossing a specific
the cumulative moving average. The accepted values must be VDL.
in a specified interval to be accepted. This interval is defined Accuracy Ground Video
by the parameter 𝛼. The parameter is used to reject a
truth sequences
percentage of the feature value found in the current TSI.
Hence, from all the feature values collected on the current Good 98,7% 155 20
TSI, the lower and higher 𝛼% of these values is rejected and
illumination
thus not included in the cumulative moving average.
The reason for this rejection is that in case of small values, Jitter 94,6% 146 20
they should be avoided because they may be extracted from
objects which are just noise, or in case of big values, the Shadowed area 95,6% 91 10
objects may contain occluded vehicles. Raining 91,8% 98 11
The cumulative moving data (CMA) of each feature is
afterwards computed using the following expression: Blurred 94,2% 206 20
Shaking camera 90,0% 129 17
𝑥1 + … + 𝑥𝑛 (10) Dusk 92,2% 207 20
𝐶𝑀𝐴𝑛 =
𝑛
Averages/Totals: 93,8% 1032 118
Where n is the number of values added to the CMA and x is
the value of a certain feature in a certain object. Table 1: Day mode results
This cumulative data is used instead of only the data
provided by the current video sequence when n is above a As expected, the accuracy is higher in video sequences with
specific value. By default, the cumulative data is used when n good illumination as the contrast between the vehicles and the
is over 25. The assumption made is that over such a value of n, background tends to be higher creating precise object
the features averages provided by the cumulative data can detection. The video accuracy decreases in videos with jitter
provide better results, when used on FVE, than only the mainly due to double detections of the same vehicle caused by
feature averages provided by the feature values found on the the jitter effect on the frames and subsequently on the. The
current video being analyzed. videos with shadowed areas present, as main accuracy
limitation, the selection of the best Canny threshold values,
IV. EXPERIMENTAL RESULTS which allow the detection of vehicles simultaneously on the
shadowed and illuminated areas. This value has to be a
The lack of a common benchmark database to compare
tradeoff between the best value for detection in each of the
different vehicle counting algorithms makes harder the
areas. The desired performance cannot be always achieved.
comparison of the obtained results [9]. Hence, the parameter
The factor which lowered the accuracy in rainy videos was
used here to evaluate the performance of the implemented
the presence of water drops on the lenses of the cameras,
system is its vehicle counting accuracy. In addition, an
which may move during the video sequences causing dynamic
efficiency analysis of the occlusion detection approach used
noise and distortions on the generated TSIs. The blurred
on the developed system is included
videos usually occur due to haze on the camera which is a
The principal causes that reduce the accuracy of the vehicles
similar situation to the rainy videos. However, this visual
counts are:
obstruction on the lenses does not move significantly during
• noise present on the video sequence due to the poor
the video sequence and thus has a lower impact on the
image quality provided by the LTSV cameras, causing false
accuracy. Therefore, the main issue in both rainy video and
positive detections;
blurred videos is the loss of contrast due to the lens
• operational conditions such as rain, bad lighting or
obstruction, which may result in more undetected vehicle and
darkness which impose lower contrasts between the roads and
undesired vehicle occlusions.
the moving vehicles and thus may lower the vehicle detected;
The videos captured with a moving camera present the
• LTSV imperfections due to video acquisition or
worst accuracy, mainly due to the difficulty to estimate the
transmission problems which cause jitter or blur to the frames
background of the TSI. When the camera shakes significantly,
of the video.
if the VDL overlaps road markings, these markings tend to be
A. Day Mode Results detected as one or more vehicles as the background
Table 1 summarizes the obtained vehicle count accuracies subtraction cannot completely remove it as background.
obtained by the developed system when operating in the day
7
Therefore, it is preferable to draw the VDLs in areas with no where the Blurred videos have actually a slightly better
road markings when cameras have a tendency to shake. accuracy than the videos captured with good lighting. This
The videos captured during dusk have a lower light shows that the use of an Otsu binarization in the NFSM, in
intensity on the image and the vehicles tend to present order to just detect the headlights reflection on the pavement,
shadows due to a lower position of the sun relative to the made this operation mode more robust to noise. However, this
vehicles. The issues affecting the accuracy are: is also the operation mode with lower accuracies, mainly due
• occlusions of vehicles due to a connection of the moving to a higher number of false positive detections of vehicles. The
objects by their shadows; false positives detections are typically caused by multiple
• double detections of the same vehicle due to one object detections of the same vehicles.
being its shadow and the other its body; The same approach with VDLs distancing 5 pixels was used
• vehicles which have their headlights already turned on, on the night operation modes to detect occlusions. On the
detecting the reflection of the light on the pavement as another night modes, the percentage of detected occluded vehicles
object. decreased slightly, to 71,9%, due to the lower capacity to
The obtained accuracy is thus mainly affected by the differentiate moving objects at night caused by the lower
capacity to detect occlusions in the TSIs and the ability to luminance. The number of occlusions on the NFSM is almost
reject the double detections. zero due to the different approach using the Otsu’s
The occlusions found when operating in day mode are binarization to just find the reflection of the headlights of the
caused mainly by shadows of vehicle moving close to each vehicles on the road pavement.
other and by blur on the video which reduces the capacity of
C. Cumulative data results
the system to individualize the vehicles as single object. The
occlusion detection by positioning of two parallel VDLs at 5 This section reports the performance evaluation computed
pixels of each other was able to find and correct 81.1% of the when considering a set of 16 video sequences, all captured by
occlusions in the generated TSIs. This proves that using two the same video camera, pointing at the same road lane without
VDLs for occlusion detection can be used as a tool to find the any position or angle variation. These video sequences include
majority of the occluded vehicles. 12 videos with good illumination and 4 videos in dusk. They
are used to compare the results of vehicle counting using the
B. Night Modes Results cumulative learned information of features versus the results
Table 2 summarizes the obtained vehicle counting results of analyzing each video independently, i.e., without learning
for the proposed system when operating in one of the any previous information about vehicle features.
considered night modes: Nightly Back Side Mode (NFSM) or Two sets of performance tests were conducted to evaluate
Nightly Back Side Mode (NBSM). The presentation of results the effect of cumulative data treatment. In both cases two
follows the same structure of the previous section. Therefore, VDLs were placed in the exact same coordinates.
each count is made separately with no previous information For the first set of experiments, 12 videos with good
about the expected object features in each VDL. illumination were used. This set of experiments used 11 of the
Accuracy Ground Video 12 videos for training and used the remaining video to
evaluate the results. The process was repeated 12 times in
truth sequences
order to test all the possible combinations.
NBSM - Good 93,6% 111 20 For the second set of experiments, 16 videos were used. The
12 videos with good illumination from the previous set and 4
illumination dusk videos. This set of experiments used 15 of the 16 videos
NBSM – Blurred 90,6% 128 20 for training and used the remaining video to evaluate the
results. The process was repeated 16 times in order to test all
NBSM – Jitter 90,2% 143 20 the possible combinations. The goal is to verify if using videos
NFSM - Good 89,4% 114 20 with different operational conditions in the cumulative data
can improve the overall accuracy of the system.
lighting All the system parameters were set automatically in both
NFSM – Blurred 90,6% 180 20 methods. The results of both experiments are presented on
Table 3.
NFSM – Jitter 89,0% 135 19 Sets Accuracy Accuracy Total Average
Averages/Totals 90,6% 811 119 with with number of number of
Table 2: Night modes results untrained trained vehicles vehicles on

data data counted on cumulative
The obtained accuracies on these two modes are slightly
inferior to those obtained when operating in day mode. This is test videos data
expected due to the lower luminance levels and resulting 1 93.5% 96.9% 91 36.6
lower contrast found in the generated TSIs.
It is interesting to note the low variance in the observed 2 94.7% 97.2% 127 57.5
accuracies in most of cases, especially in the NFSM mode,
Table 3: Cumulative data results
8
The results of the first set of experiments show that the The literature related with vehicle counting tends to focus on
accuracy with trained data tends to slightly improve the their performance tests on videos with good lighting [9].
vehicle counting accuracy. The results of the second set of Hence, the good lighting accuracy of the developed system is
experiments also show accuracy improvements compared to used as comparison of the results on the Table 5 in this
the accuracy with untrained data. As a result, we can conclude section.
that the use of cumulative data treatment tend to improve the The comparisons were made with five other systems. In
quality of the results that can later be achieved when analyzing Rashid et al. [14] a background-subtraction-based (BSB)
videos, even if captured in different conditions. method with a heuristically chosen threshold is used to
individualize and count the vehicles. The systems developed
D. Computational Performance
by Hue et al. [20] and Rashid et al. [21] use a single VDL
The principal element which affects the time and speed of (SVDL) approach for detection of the vehicles, while Mithun
the various steps of the developed system is the spatial and et al. [19] and the system developed in this paper use a
temporal resolutions, and also the duration of the video multiple VDL (MVDL) approach.
sequence. This is due to the increase of computational
resources necessary as a result of bigger TSIs which need to Method Accuracy Vehicles Ground
be analyzed.
The tests reported on this paper were run on a personal counted truth
desktop computer equipped with an Intel Core i7 CPU, 4 GB BSB [5] 86,20% 306 355
of RAM on a 64-bit Ubuntu 12.04 operating system, which
can be considered moderate computational resources. The SVDL [20] 92,7% 329 355
running times presented in Table 4 were obtained considering SVDL [21] 92.7% 329 355
2 VDLs, with the second VDL being generated automatically
based on the position of the first one. It includes the whole MVDL [19] 98.3% 349 355
processing time since the first VDL is selected by the user
Proposed 98.6% 292 296
until the final vehicle count is displayed. The tests were made
on low resolution video sequences of 15 seconds duration, system
with 15 FPS for the low resolution videos sequences and 19
seconds with 30 FPS for the high resolution ones. The set of Table 5: Results comparison
good illumination videos was used as test set for the low
resolution videos, with one run on each video, while a single The obtained accuracy of the developed system is
high resolution video, showing several roads was used for all clearly superior to the method using a BSB approach and has a
the runs, considering different positioning of the VDL. significant accuracy improvement compared to the SVDL
Video Average Average Number of approach. The system has slightly superior accuracy compared
to the MVDL system developed by Mithun et al. [19]. It
Resolution running time time per runs
would be interesting to compare the performance of both
frame systems on night videos, however there were no tests
performed on such operational conditions by the authors of the
200x200 0.357 s 1.47 ms 20
paper.
1920x1080 10.39 s 21.64 ms 20
V. CONCLUSIONS
Table 4: Time results The developed Video-based Traffic Management system
achieves its goals to be efficient and flexible method to count
As expected, the running time increases with the resolution vehicles. It can work without any necessary adjustment by its
of the video. The main reason is the bigger size of the images operator, independently of the operational conditions it
on the high resolution video sequence. The higher resolution
encounters. The system can be used as an efficient and low
images need more resources from the system to process and to
cost replacement for other vehicle counting methods.
generate the TSIs. The TSIs also tend to have larger
The following paragraphs describe the strengths and
dimensions, because there are more pixels detailing each road
lane crossed by the VDLs. The high resolution video used has weaknesses of the developed system and suggest future
a higher FPS rate than low resolution videos, which also improvements to increase the performance on the system.
increases the running time. However, the running times are A. Strengths
still significantly lower than the duration of the videos beings
The use of more than one VDL to analyze the features
analyzed. On the low resolution videos, they are around 42
extracted from the TSIs, allows the development of a more
times lower and on the high resolution videos, they are around
flexible, accurate and robust Video-based Traffic Management
2 times lower. This still allows the system to be used on real-
Systems. The main strengths are summarized in the following
time situations in both cases.
points:
E. Results Comparison • there are no limitations to the number of VDLs used on
This section provides a comparison of the performance each video nor angle and length limits to them;
results of the developed system with other developed works.
9
• there are no limits for the resolutions of the videos, their the objects’ features with the camera on a constant angle and
duration or their resolution; position. This could be used to know if the traffic velocity is
• all the parameters used on the developed system can be changing and could help predict possible traffic congestions
adapted by the operator; before it occurs.
• The system has a good performance and robustness in all This Video-based system could also be integrated with
the operational conditions, especially on day mode video another sensing system, such as inductive loops allowing the
sequences, where it can achieve an accuracy of almost 99% verification of possible malfunctions on the inductive loop
and even on dark videos the lowest accuracies round the 90%; count. This would provide as an alternative to the tedious job
• The vehicle count is acquired with a limited used of that an operator may have of verifying if the inductive loops
computational resource for videos of any size and duration. are functionally correctly counting the vehicle crossing it. The
The processing time rarely surpasses 0.5 seconds for low use of a VDL line on a camera pointing at the location of the
resolution video sequences of 15 seconds, which allow this inductive loop could do it instead of the human operator.
system to be also used in real time situations.
REFERENCES
B. Weaknesses
[1] EU Directive 2010/40/EU of July 7th, 2010: http://eur-
The performance and efficiency of the developed Video- lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32010L0040
based Traffic Management System is dependent of the quality [2] P. Loureiro, R. Rossetti, R. Braga, “Video Processing Techniques for
of the TSIs it can generate. Hence, the factors which affect the Traffic Information Acquisition Using Uncontrolled Video Streams” in
quality of the TSIs are the main weaknesses found in this Proceedings of the 12th International IEEE Conference on Intelligent
Transportation Systems, October 2009, pp. 127-133.
vehicle counting approach, which are summarized in the [3] B. Tian, Q. Yao, Y. Gu, K. Wang and Y. Li, “Video Processing
following points: Techniques for Traffic Flow Monitoring: A Survey” in 14th
• Cameras with continuous shaking movements cause International IEEE Conference on Intelligent Transportation Systems,
distortions on the TSI which reduce the efficiency of the October 2011, pp. 1103-1108.
[4] J. Wang, Y. Chung, C. Chang and S. Chen, “Shadow detection and
background estimation and subtraction. This situation might removal for traffic images” in Proceedings of the AICI, Vol. 1, March
even turn the vehicle count impossible if the shaking is 2004, pp. 649-654.
extremely accentuated; [5] M. Kilger, “A shadow handler in a video-based real-time traffic
monitoring system” in Proceedings of the IEEE WACV, 1992, pp. 1060-
• The cameras should be positioned as high and as vertical
1066.
as possible to the road to obtain the best performance. If the [6] Z. Zhu and X. B. Lu, “An Accurate Shadow Removal Method for
camera is too close to the ground or has an angle almost Vehicle Tracking” in Proceedings of the AICI, Vol. 2, October 2010, pp.
parallel to the road pavement the system parameters may have 59-62.
to be adjusted manually, due to the bad angle or position of the [7] M. Vargas, S. L. Toral, J. M. Milla and F. Barrero, “A shadow removal
algorithm for vehicle detection based on reflectance ratio and edge
camera; density” in Proceedings of the IEEE International Conference on
• The light reflections at night represent a problem in the Intelligent Transportation Systems, September 2010, pp. 1123-1128.
vehicle detection due to possible double detections by the head [8] Bian Jianyong, Yang Runfeng andYang Yang, “A Novel Vehicle’s
lights of the vehicles; Shadow Detection and Removal Algorithm” in Consumer Electronics,
Communications and Networks (CECNet), 2nd International
• The shadows which can be created by the sun, street lamps Conference, April 2012, pp. 822-826.
or the vehicles’ lights might also cause multiple detections of [9] Norbert Buch, Sergio A. Velastin and James Orwell, “A Review of
the same vehicle or occlusion of closely moving vehicles. Computer Vision Techniques for Analysis of Urban Traffic,” IEEE
Trans. on Intelligent Transportation Systems, Vol. 12, No. 3,
C. Future Work Semptember 2011, pp. 920-939.
[10] Z. Zhang, Y. Cai, K. Huang, and T. Tan, “Real-time moving object
There are some aspects and featured which could be used to classification with automatic scene division,” in Proc. IEEE Int. Conf.
improve on the system with further investigation on the use Image Process., San Antonio, TX, 2007, vol. 5, pp. 149–152.
TSIs for vehicle detection and counting. This topic is [11] K. Park, D. Lee, and Y. Park, “Video-based detection of street-parking
discussed in the following paragraphs. violation,” in Proc. Int. Conf. Image Process., Comput. Vis., Pattern
Recognit., Las Vegas, NV, 2007, vol. 1, pp. 152–156.
The TSIs approach presented here for detection does not [12] S. Gupte, O. Masoud, R. F. K. Martin, and N. P. Papanikolopoulos,
take any advantage of the color channels which may be “Detection and classification of vehicles,” IEEE Trans. Intell. Transp.
available on the video. In this paper, the TSIs are always Syst., vol. 3, no. 1, Mar. 2002, pp. 37–47.
converted to a grayscale format for analysis. This color could [13] P. G. Michalopoulos, “Vehicle detection video through image
processing: The autoscope system,” IEEE Trans. Veh. Technol., vol. 40,
probably be used, for instance, to detect the red taillights of no. 1, Feb. 1991, pp. 21–29.
the vehicles at night and use it as an alternative way to count [14] E. Rivlin, M. Rudzsky, M. Goldenberg, U. Bogomolov, and S. Lapchev,
vehicle in scenario with high tendency for the occurrence of “A real-time system for classification of moving objects,” in Proc. Int.
occlusions. This could be especially helpful on occlusions Conf. Pattern Recognit., Quebec City, QC, Canada, 2002, vol. 3, pp.
688–691.
caused by shadows, because the presence of shadows tends to [15] L. Xie, G. Zhu, Y. Wang, H. Xu, and Z. Zhang, “Robust vehicles
not interfere significantly with the lights emitted by the extraction in a video-based intelligent transportation system,” in Proc.
vehicles. IEEE Int. Conf. Commun., Circuits Syst., Hong Kong, China, 2005, vol.
The width of the objects on the TSI varies accordingly to 2, pp. 887–890.
[16] L. Anan, Y. Zhaoxuan, and L. Jintao, “Video vehicle detection
the speed the object crosses the VDL and also accordingly to algorithm based on virtual-line group,” in Proc. IEEE Asia Pacific Conf.
the FPS rate of the camera. This width could be used to have Circuits Syst., Singapore, 2006, pp. 1148–1151.
an estimate of the velocity of the vehicle. To work correctly, [17] L. Li, L. Chen, X. Huang, and J. Huang, “A traffic congestion estimation
the software needs to have enough long term information of approach from video using time-spatial imagery,” in Proc. Int. Conf.
Intell. Netw. Intell. Syst., Wuhan, China, 2008, pp. 465–469.
10
[18] J.Wu, Z. Yang, J.Wu, and A. Liu, “Virtual line group based video
vehicle detection algorithm utilizing both luminance and chrominance,”
in Proc. IEEE Conf. Ind. Electron. Appl., Harbin, China, 2007, pp.
2854–2858.
[19] N. Mithun, N. Rashid, and S.M. Rahman, “Detection and Classification
of Vehicles from Video Using Multiple Time-Spatial Images” in IEEE
Transactions on Intelligent Transportation Systems, Vol. 13, No. 3,
September 2012, pp. 1215-1225.
[20] Y. Hue, “A traffic-flow parameters evaluation approach based on urban
road video,” Int. J. Intell. Eng. Syst., vol. 2, no. 1, 2009, pp. 33–39.
[21] N. U. Rashid, N. C. Mithun, B. R. Joy, and S. M. M. Rahman,
“Detection and classification of vehicles from a video using time-spatial
image,” in Proc. 6th Int. Conf. Elect. Comput. Eng., Dhaka, Bangladesh,
2010, pp. 502–505.

Automatic Vehicle Counting Based On Surveillance Video Streams

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Vehicle Counting Based On Surveillance Video Streams

Uploaded by

Copyright:

Available Formats

1

Automatic Vehicle Counting based on

 cause long lines of vehicles and traffic jams, eventually

Where 𝑣𝑥 is a vector with the scores of all the columns.

Fig. 5: Graph of the day mode

NBSM & NFSM

The overlapped area, 𝑎𝑟𝑒𝑎𝑠 , is thus compared with areaa .

𝑤𝑖𝑑𝑡ℎ = 𝑚𝑖𝑛 𝑥𝑎 + 𝑤𝑖𝑑𝑡ℎ𝑎 , 𝑥𝑏 + 𝑤𝑖𝑑𝑡ℎ𝑏

Vehicle Counting – The final step of the proposed system

Table 2: Night modes results untrained trained vehicles vehicles on

You might also like