You are on page 1of 10

IET Image Processing

Review Article

Vehicle detection in intelligent transport ISSN 1751-9659


Received on 21st April 2018
Revised 24th April 2019
system under a hazy environment: a survey Accepted on 19th September 2019
E-First on 28th November 2019
doi: 10.1049/iet-ipr.2018.5351
www.ietdl.org

Agha Asim Husain1 , Tanmoy Maity1, Ravindra Kumar Yadav2


1Department of Mining Machinery Engineering, Indian Institute of Technology (ISM), Dhanbad, India
2Department of Electronics and Communication Engineering, Skyline Institute of Engineering and Technology, Greater Noida, India
E-mail: aghahusain@gmail.com

Abstract: Developing an intelligent transportation system has attracted a lot of attention in the recent past. Moreover, with the
growing number of vehicles on the road most nations are adopting an intelligent transport system (ITS) for handling issues like
traffic flow density, queue length, the average speed of the traffic, and total vehicles passing through a point in a specific time
interval and so on. ITS by capturing traffic images and videos through cameras, helps the traffic control centres in monitoring
and managing the traffic. Efficient and unfailing vehicle detection is a crucial step for the ITS. This study reviews different
techniques and applications used around the world for vehicle detection under various environmental conditions based on video
processing systems. This study also discusses the types of cameras used for vehicle detections, and the classification of
vehicles for traffic monitoring and controlling. This study finally highlights the problems encountered during surveillance under
extreme weather conditions.

1 Introduction images are converted into a digital form using a processor. The
images in their digital form are processed and examined using
Nowadays, traffic detection is attracting a lot of researchers’ image processing techniques. The extracted information can then
attention in computer vision and intelligent transportation systems be used by any external user, such as a traffic control centre, for
[1, 2]. The rapid upsurge in several vehicles in the last few years traffic monitoring and controlling, etc.
has posed serious problems in the management of the Image processing techniques are applied for a video-based
transportation system vividly. Conditions such as accidents, traffic monitoring system in five stages: (i) image acquisition
blockages on roads, vehicle robberies, and traffic congestion are a (CCTV) and digitisation, (ii) vehicle detection through background
common problem faced in the management of traffic. The subtraction/foreground extraction, (iii) image segmentation, (iv)
intelligent transport system (ITS) has become a dynamic area of vehicle identification under different environmental conditions and
research since the last decade. It plays an important role in classification, and (v) tracking.
handling many real-time situations like avoiding traffic congestions Fig. 2 shows that in the process of vehicle detection, the video
and accidents, in the surveillance of traffic, in the collection of toll image is processed, and the region where the vehicles are present is
tax, terrorist monitoring, and in traffic flow under hazy determined. This region of interest can be an alone pixel, a line of
environments [3], and many more. That is why ITS has become a pixels, or a group of pixels [9].
significant field of study. Throughout the traffic parameter extraction, they are obtained
Vehicle detection is the key task of the transport monitoring by comparing the vehicle detection status on the region of interest
system, as after the detection of the vehicle, other applications can at the different instant of time frames. In a traffic surveillance
be used in a better way. Various technologies [4, 5] are being system, background and illumination changes make vehicle
undertaken for detecting and categorising vehicles automatically. detection a difficult task. The key problems are changing
Different algorithms (based on shape, size, texture, colour, etc.) are backgrounds (camera jitter, water rippling, and waving trees, etc.),
used to categorise different vehicles from videos, using the image illumination changes (gradual or sudden light changes), and
processing [6], this information is then transferred to the climatic changes [10]. In these issues, the foreground mask by a
centralised controlling and monitoring system [7]. Gaussian mixture model (GMM) is used, but there is a big chance
Fig. 1 shows the process that is followed in a traffic monitoring of fault detection. To overcome this problem, various projects are
system based on videos, which has a camera mounted on the being taken up to find better vehicle surveillance ways through
roadside, watching over the traffic scene. The camera works as a which the chances of false detections can be reduced.
sensor device used to capture traffic videos. The captured video

Fig. 1  Block diagram of vehicle detection and categorisation

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10 1


© The Institution of Engineering and Technology 2019
monitoring the accidents or major closures, jams on roads, etc.
Generally, day-to-day traffic flows do not differ much, but in case
of a road mishap or a closure, a traffic alert can be very helpful for
a time-crunched commuter. Real-time videos from specified
locations are collected through surveillance cameras using Pan Tilt
Zoom, Gatso, Vector camera system, etc. and the images are then
transmitted to the enforcement traffic control centre through the
Internet [22–24]. Figs. 3a–d show some of the good cameras used
in traffic surveillance.
One of the cameras named Gatso is based on a radar signal to
record a vehicle's speed. The camera machine sends a radio signal
Fig. 2  Filtering acquisition, detection, and identification of vehicles [8] to the vehicle, the signal is then returned by the vehicle. The period
taken between the signal sent and received back from the vehicle is
noted. Similarly, another signal is sent and received back, and the
timing is also noted. The difference in the timings enables the
‘Gatso’ camera to know how quickly the vehicle is travelling
between the two points. This camera takes several pictures if the
vehicle crosses the specified speed limit.
There is another camera named ‘Specs’ which can monitor four
lanes simultaneously and is generally mounted on gantries. This
type of camera is enabled with the Automatic Number Plate
recognition system. The camera takes the photograph of each
vehicle that passes it and the vehicle's data is then sent to another
set of cameras, which is mounted at around 200–250 m on the
road. The average time taken by the vehicle in travelling between
these two set points can be calculated and helps in estimating the
speed of the vehicle. These cameras can work night and day, and in
all weather conditions-heat, wind, hazy, ice, and snow, as they are
fitted with infra-red sensors that capture electromagnetic radiations
[26, 27] of objects.
The precautions that are taken while installing these traffic
cameras are
Fig. 3  Traffic surveillance camera
(a) Gatso, (b) Highway agency CCTV, (c) Specs, (d) Trafficmaster [25]
• Cameras are mounted in such a way that the whole intersection
area gets covered.
This paper reviews the vehicle detection process used for • The flashlight is installed for recordings at night.
surveillance and includes Section 2, which describes traffic • Cameras are positioned and calibrated to easily record the data
surveillance cameras, Section 3 explains about moving vehicle of the license plate of the violating vehicles.
detection techniques and the problems encountered during • Local law enforcement agencies are consulted to know the most
surveillance, Section 4 describes the classification of vehicles for bothersome areas/intersections.
effective traffic monitoring and controlling. Section 5 discusses the • Cameras are placed in environment-controlled housings to
approach used to remove the haziness from image sequences. protect from bad weather.
• Cameras are placed such that they have enough visibility and a
2 Traffic surveillance cameras decent view of all streets/lanes.
Traffic cameras are modern equipment that makes functional usage
of video surveillance technology. Their footage can be seen on TV 3 Vehicle detection
news during traffic reports. These cameras are implanted at the top The concept of vehicle detection is, to select the area of interest as
of traffic signals or are mounted along the busy streets, or at the to where a vehicle exists on the road in real-time. The system has
busy crossings of the highways. The cameras also helped in very high complexity because of real-time acquiring of images, as
recording traffic patterns for future study and helped in monitoring the traffic scene's surroundings are continuously changing. The
and controlling the traffic [11]. Therefore, traffic cameras are an complexity of vehicle detection also depends upon light change,
important device of video surveillance [12], which aims to monitor weather conditions, shadow, complex background interference,
a specified area for safety and security reasons. Smart traffic camera parameters, drift and occlusion of vehicles, etc. So, it is
cameras are commonly used in many road transportation systems, very difficult to detect vehicles under these changing conditions.
like surveillance, traffic management, driver assistance, and control For this, vehicles must be extracted from an image sequence. The
access systems, security and law enforcement, etc. [13]. ‘moving’ characteristic is used to differentiate a moving vehicle
The research-based on vehicle detection methods through from a stationary background. The precision or accuracy of vehicle
videos taken by camera started in the late 1970s [14]. In 1984, the detection is of great significance for vehicle tracking movement
University of Minnesota initiated a wide-area multiple video expression and behaviour understanding and is the basis for
imaging detection system known as ‘Autoscope’ [15, 16]. It helps subsequent processing. Generally, the algorithm used for vehicle
in reducing the installation and maintenance costs and led to the detection is divided into four stages: problem domain, image or
development of ITS [17, 18]. The earlier main emphasis was given video acquisition, frame segmentation, and vehicle recognition [28,
to the detection of licensed plates only [19], but because of the 29], as shown in Fig. 4.
rising demand, other ways of vehicle classification like colour, Fig. 4 shows the framework of the vehicle detection based on
size, environment, etc. are being taken as a criterion of vehicle video processing. By using the problem domain information in the
classification [20, 21]. video, the vehicle is identified from the image sequence. Then,
Traffic cameras are generally placed at some common with the use of embedded technology and smart camera, image
congestion point to monitor vehicle flow on roads, highways, acquisition takes place. Since the vehicle is in motion, acquiring
major routes. The monitored information then passes to the traffic images can be a bit complex.
control centre for taking necessary action for any mishappening. Therefore, researchers prepare an algorithm with great
Traffic cameras are also important in decisions concerning robustness to improve the adaptability and precision of detection,
surveillance, future road construction, and development, apart from which helps in recognising the image [30].

2 IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10


© The Institution of Engineering and Technology 2019
Fig. 4  Generalised framework of vehicle detection and recognition

In a broad sense, vehicle detection can be categorised based on matched with corresponding 3D features of the position [39].
[31]: Karsten et al. suggested a method that works on the 3D restoration
of a dynamic atmosphere with a fully regulated background for
(i) prior knowledge, traffic movements [45]. Lou et al. [39] proposed another method
(ii) motion, that disintegrated the two motions, i.e. translation and rotation.
(iii) Wavelets Besides that, Ghosh and Bhanu [46] suggested a 3D model
(iv) machine learning (ML), mapping arrangement where he classified the vehicles into various
(v) deep learning (DL). types like hatchback, sedan, and wagons.

3.1 Prior knowledge-based methods 3.2 Motion-based methods

Various approaches to detect a vehicle have been made based on 3.2.1 Optical flow: This method of vehicle detection is based on
prior knowledge. They are symmetry, colour, vertical, or horizontal optical flow calculations, whereas the above-mentioned methods
edge, shadow, wheels, and 3D model, etc. [31, 32]. have used spatial features. Motion detection is one of the vital
parameters in an ITS. There is a relative motion between the traffic
scene and the sensor because of which the pixels on the image
3.1.1 Symmetry: For both horizontal and vertical directions, there appear to be moving. This motion because of the vector field is
must be a symmetry while taking the observation from the called optical flow. By using optical flow autonomously any
stationary camera. The method has not been considered as the vehicle can be detected from the camera. However, its computation
finest method of detection due to noise. is very complex and subtle to noise. Meyer et al. [47] estimated the
concept of a displacement vector field, which initialises a contour-
3.1.2 Colour: Colour segmentation, along with motion has been based tracking process. Real-time analysis of video streams
introduced to detect the moving vehicle. Colour segmentation without any specialised hardware is not that easy. Barron's et al.
generally uses split and merge algorithm, whereas vehicle [48] have mentioned a detailed discussion on optical flow in his
segmentation uses frame differentiation. work.

3.1.3 Vertical and horizontal edges: Conventional gradient- 3.2.2 Background subtraction: Background subtraction, also
based and morphological edge detectors are the two kinds of edge named as foreground detection, is one of the vital jobs by which
detection methods. The conventional gradient-based uses a Sobel many other applications like vehicle tracking, recognition, and
operator and generalised Hough transform. Morphological edge irregularity detection can be realised. It is used for motion
detection uses dilation and erosion operation to spot edges, and segmentation in a static scene [49]. For detection of motion, pixel
they are better than conventional edge detectors based on by pixel image is subtracted from a background image, which is
computational cost. Apart from the above-mentioned method, considered as a reference image. A certain threshold level is
separable morphological edge detector was proposed by Siyal defined if the difference is above the threshold level, then the
which depends less on edge directionality and increases the image considered as foreground. The formation of the background
robustness of vehicle detection [33]. image is recognised as background modelling. Some
morphological post-processing is done after the creation of
3.1.4 Shadow: As the region under any vehicle is darker as foreground pixel maps like erosion, dilation, and closing to reduce
compared to other regions on the road, this fact becomes the basis noise and improve the detected region. The reference background
of vehicle detection [34]. A threshold method centred on the image is updated from time-to-time and the image where the
characteristics of the Hue saturation value colour space is section of the road is shown with no vehicle serves as [49, 50].
suggested by Cucchiara et al. [35]. Xua et al. [36] suggested There are various techniques by which background subtraction can
another algorithm which suppresses the moving cast shadow of be applied in terms of foreground detection.
vehicle detection. Heikkila and Silven [51] gave a very simple concept of
background technique in which a pixel at a location (x, y) in the
3.1.5 Wheel: The methods proposed for vehicle detection in the present image It is marked as foreground if It(x, y) − Bt(x, y) > Th
daytime are not valid for nighttime. This fact was discussed by is fulfilled; where a pre-defined threshold is denoted as Th. An
Yoneyama et al. [37] stated that methods applied to daytime do not impulse response filter updates the background image, as shown in
work properly if it is applied for detection at nighttime. While the equation below:
considering the problem faced by Yoneyama et al., Iwasaki and
Kurogi [38] defined a new unique method specifically for Bt + 1 = αIt + (1 − α)Bt (1)
nighttime. The method calculates the distance between its wheels
(front and rear), which helps in finding the size of the vehicle. where α is a constant that ranges between 0 and 1.
Also, this method helped in finding the lane where the vehicle The foreground pixel map construction is followed by
exists. However, for this method, the camera needs to be installed morphological closing and the elimination of small-sized areas [51,
on the roadside rather than on top of any traffic signal. 52]. With the help of this easy background model, pixels
corresponding to foreground moving vehicles can be detected by
3.1.6 3D model: The vehicles’ 3D pose is mapped to a thresholding any of the distance functions [53]
corresponding model description. Dealing with obstruction is
another benefit of the 3D model [39]. Various approaches of 3D d0 = It − Bt (2)
modelling have been proposed like graph matching [40], viewpoint
consistency check [41], alignment-based 2D method, indexing and d0 = (ItR − BtR)2 + (ItG − BtG)2 + (ItB − BtB)2 (3)
invariants [42], gradient-based method [42, 43], neural network,
self-similarity [44], etc. In all the above-mentioned methods, 2D
d∞ = max ItR − BtR , ItG − BtG , ItB − BtB (4)
image features like line segments, points, and conic sections, are

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10 3


© The Institution of Engineering and Technology 2019
where R stands for red, G stands for green, and B stands for blue
channels, and d0 is a measure operating on greyscale images.
Another method is a GMM, which was given by Stauffer and
Grimson [54] for real-time tracking. The algorithm depends upon
the assumption that the background is made of animated textures
like waves or wind-shaken trees. It represents every pixel with a
mixture of K Gaussians. Here, the probability of occurrence of
colour at a given pixel is
K
P(It) = ∑ ωi t N
, μi, t, ∑ (5)
i=1 i, t
Fig. 5  DL is a subset of ML which is again the subset of AI
N(μi, t, ∑ i, t) is the ith Gaussian model, and ωi, t weight [53].
Zivkovic [55] suggested a better GMM using a recursive
computation to continuously update the parameters of the model.
This model is widely adopted by many researchers for the analysis
of traffic [56–58] using a fixed number of Gaussians. Rapid
changes in illumination pose difficulty in applying background
subtraction in a real-world setting. Amali and co-authors [59]
suggested a method using sample-based background subtraction
called rapid background subtraction. The whole algorithm basically
uses three different techniques. In the first technique, the first two
consecutive frames are taken as a background model and this
model can be updated after the threshold period. The next
technique includes the classification of pixels in correspondence to Fig. 6  Graph plotted between accuracy (performance) and size of training
the background pixel model and the shadow detection method. The data
last technique involves updating of the background pixel model by
random pixel locations. By using this technique, accuracy and There are different approaches to solve any problem, and so
efficiency can be greatly increased. there are different ways a machine can learn. Depending on these
For online data, the subspace learning technique has been used approaches, there are three ways a machine can learn [67, 68].
for the background model. Oliver et al. [60] suggested a method The types of ML ways or algorithms are
using the concept of principal component analysis (PCA)
depending upon probability distribution function. In recent times, (a) Supervised learning (task driven-regression/classification).
Bouwmans et al. [61] introduced a detailed review of robust PCA- (b) Unsupervised learning (data driven-clustering/association)
PCP based methods for testing and ranking the existing algorithms
(c) Reinforcement learning (learns to react to an environment).
for foreground detection. The study here is driven by the need for
robust vehicle detection and a classification algorithm that can be
used in a traffic monitoring system to deal with such changes. 3.4.1 Supervised learning: In this algorithm, continuous
Because of critical situations in video surveillance, some monitoring is done to make it understandable and it practices a
researchers introduced the concept of fuzzy logic for background known dataset called the training dataset to make estimates. The
subtraction. training dataset contains input data and response values. As of it,
the supervised learning algorithm seeks out to build a prototype
that can make estimations of the response values for a new dataset.
3.3 Wavelet-based methods
Since the last decade, the wavelet transform has become one of the 3.4.2 Unsupervised learning: In the case of unsupervised
vital tools for signal processing applications. The vehicle motion is learning, the algorithm draws interferences from datasets
taken in 3D spatial–temporal data from the image scene [62–64], consisting of input data without categorised reactions. This is
and this is possible by wavelet transform. The benefit is very low simply a self-learning algorithm where a machine takes its own
computational complexity and an easy application in both spatial decisions.
and temporal directions. Though, these methods depend upon noise
and on variations of timings of movements. Wang and Chen [63] 3.4.3 Reinforcement learning: In this algorithm, the machine can
proposed a new algorithm using the spatial–temporal wavelet take its own decision depending upon its past experiences. It is
transform on videos to obtain motion information of vehicles. A totally relying on behaviourist psychology; whose outcomes are to
fusion technique of wavelet transformation and colour information expand the numerical reward. The learner is not advised which
features for automatic vehicle identification was proposed by Yin move to make; however, rather should find which activity will
et al. [65]. yield the greatest reward.
A major limitation of ML is that it cannot handle multiple
3.4 Machine learning dimensional data. Also, complex AI problems cannot be handled
by ML.
ML [66] is an interesting topic that helps in understanding how a
So, DL concept came into existence, which handled high-
machine updates the code on its own. ML is a subset of artificial
dimensional data and focused on the right features by themselves,
intelligence (AI) as shown in Fig. 5, which provides computers
requiring slight supervision from the system analyst. Fig. 6 shows a
with the ability to learn without being absolutely programmed. The
graph which shows the relation between accuracy and size of
meaning of this is that once the machine is programmed, it will not
training data, and it is found that DL shows better performance
change its own code even when a new problem is encountered. The
with an increase in the size of training data.
objective of ML is to focus on the development of computer
programs that change when exposed to new data. It changes
according to a situation where the machine does not change its 3.5 Deep learning
code. This is a self-learning concept; it learns whatever it has DL [69] is processed through neural networks, which is a multi-
acquired from the past scenario, situations, experiences, etc. So, the layered neural network learning algorithm. Every deep neural
machine comes up with new solutions by using the concept of AI. network (DNN) consists of three layers, the input layer, the hidden
layer, and the output layer. In between the input and output layers,

4 IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10


© The Institution of Engineering and Technology 2019
video classification [73] and segmentation [24], etc. These
triumphs motivated a new line of research that focused on formed
higher execution CNNs, and the execution of framework models
has been improved by utilising further and more broad structures.
Simonyan and Zisserman [74] proposed VGGNet, empowering the
examination on the use of significant outline in PC vision. Szegedy
et al. [75] showed GoogLeNet which contains Inception modules,
setting the new best in class for the ImageNet Challenge. To deal
with this deprivation issue, He et al. [76] showed a waiting
learning framework named ResNet that can get ready liberally
more significant frameworks than those used in advance.
DL models are quite very interesting and proved to be an
outstanding solution for the detection and classification of vehicles,
so we briefly overview some of the models.

Fig. 7  Deep network with two hidden layers in between input and output 3.5.1 Convolutional neural networks: It is a deep discriminative
layers [68] network, usually looks like that of the ordinary artificial neural
network, as mentioned in Fig. 7, where transforms have been done
through hidden layers. The deep architecture of CNN is explained
in [66] (refer to Fig. 3c) that discussed the arrangements with a
multi-channelled image as the participation input [77]. To manage
this complicated input, the CNN comprises of a few layers of
convolutions with non-linear actuation capacities to figure the
output. As an outcome, the CNN contains limited associations
whereby every area of the input is associated with a neuron in the
output. Each layer applies a distinctive channel and joins their
outcomes. What is more, the CNN comprises of the pooling layers
for subsampling. During the preparation stage, a CNN
consequently takes in the estimations of its channels considering
the given assignment. In its first layer, CNN may figure out how to
distinguish the edges from the raw pixels. At that point, in the
second layer, the CNN utilises the edges to distinguish
straightforward shapes. In this form, in the consequent higher
layers, by utilising these shapes, CNN might have the capacity to
learn more elevated amount highlights like facial shapes of the
animal present in the image. In the last layer, a classifier is utilised
to exploit these abnormal state highlights. Profound CNN's have
ended up being very fruitful in learning task highlights, which have
given much-enhanced outcomes, especially on computer vision
undertakings, conversely with contemporary ML procedures.
Normally, CNN's are prepared by utilising supervised learning
strategies whereby an expansive number of input–output sets is
important. Notwithstanding, acquiring a considerably huge data set
has been a key test in applying CNN's for settling another
assignment. Likewise, CNN's have been accounted for to be widely
utilised as a part of unsupervised techniques [71, 78, 79].

3.5.2 Deep Boltzmann machine (DBM): To understand the


DBMs, a basic knowledge of basic Boltzmann machines and the
restricted Boltzmann machines [80] is necessary. A network of
binary stochastic units with a defined ‘energy’ for the network is a
Boltzmann machine. A representation of the BM architecture is
shown in Fig. 8a. Although learning is ineffective in a shallow
BM, it can be made relatively efficient through a restricted
Boltzmann machine (RBM), as it does not allow the networks
Fig. 8  Machine neural architecture among units in the same layer, as mentioned in Fig. 8b.
(a) Graphical representation of each undirected edge showing dependency for After training RBM, activities of hidden units can be taken as
Boltzmann machine architecture [66], (b) Restricted Boltzmann machine architecture the data for training an even higher level RBM. This stacking of
with stochastic neurons [66] RBMs allows training numerous layers of hidden units in an
efficient manner, and this is considered as one of the most common
there can be n-multiple hidden layers, and processing of ‘n’ hidden DL strategies. With each added layer, the overall model of RBM
layers is possible because of high resources available these days. improves significantly.
Fig. 7 represents a generalised model of DNN where two
hidden layers are taken in between input and output layers. 3.5.3 Recurrent neural networks (RNNs): RNNs, also known as
Since the last decade, DNNs have prompted an arrangement of long short-term memory networks (LSTMs). This is specifically
hops forward on an assortment of assignments, e.g. computer useful in the modelling of text and speech-related sequence data.
vision, machine interpretation, and voice acknowledgment, and so Fig. 9 shows the deep generative architecture [81] of RNNs, whose
forth. One of the basic parts achieving these achievements comes depth may be as much as the length of the input data sequence.
about is convolutional neural networks (CNN) [70]. Krizhevsky et Recently, its use was limited because of the problem of ‘vanishing
al. [71] proposed a concept called AlexNet after which, CNNs gradient’ although it has a high potential strength the recent
have exhibited prevalent execution for picture classification literature points towards the new methods of optimising generative
differentiated and conventional ‘shallow learning’ strategies, and RNNs, that modifies the stochastic gradient reasonably.
have similarly been adequately associated for object detection [72],

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10 5


© The Institution of Engineering and Technology 2019
shown significantly better results. YOLO is very fast in terms of
identification of vehicles that use around 45 frames per second
compared to state-of-the-art detection systems. It uses a solitary
convolutional network at once predicts multiple bounding boxes
and class probabilities for those boxes. YOLO prepares on full
images and much improves the detection performance. YOLO
lags in terms of accuracy, but the model has several advantages
over previously used methods of vehicle detection. Some of the
practical algorithms, their methods and purpose are mentioned
in Table 1.
• Apart from the above-mentioned algorithms, the large-scale
dataset is of prime importance. There are various datasets
available for the manual and self-driving vehicle, which includes
ImageNet, COCO, KITTI, Cityscapes, CamVid, etc. ImageNet
[90–92] is an image dataset having of 14,197,122 images. This
dataset is mainly used for image classification and detection of
an object based on neural network training. Common Objects in
Fig. 9  Recurrent neural network [66] Context (COCO) [92–94] dataset by Microsoft is a large-scale
dataset feature over 200,000 labelled images with 80 object
categories. This dataset is used for scene analysis in 2D. KITTI
[95, 96] contains a driving 3D dataset of images and contains
ground truth for 400 dynamic scenes. It develops challenges for
stereo vision, optical flow, video odometry. Cityscapes [92, 93]
dataset contain 5000 fully segmented image and 20,000 partial
segmented images that focus on a semantic understanding of
urban street scenes. It works on pixel-level and instance-level
semantic labelling. Cambridge-driving Labelled Video Database
(CamVid) [92, 93, 97] is specified for 700 semantically
segmented images. These datasets are trained through DNN and
then tuned as per application.
• Problems encountered in vehicle detection: Various challenges
which exist in detecting a vehicle are continuous changes in the
background, shadowing, headlight reflection, camouflage,
clutter, video noise, environment, etc. [98]. Proper vehicle
detection is still a challenge in the present scenario. The details
of the various problem encountered are mentioned in the
Fig. 10  Stacked auto-encoder [66] Appendix (Table 2).

Table 1 DL algorithms for vehicles.


Algorithms Method used Purpose 4 Vehicle classification
fast region-CNN [88, selective search narrow down of bounding
The classification of vehicles in a surveillance [99] system can be
89] boxes
categorised as two-wheeler, three-wheeler, light motor vehicle,
overfeat [90] sliding window scanning the image at sports utility vehicle, light commercial and heavy motor vehicles,
multiple scales as shown in Fig. 11. To differentiate between different types of
faster region-CNN [88, region proposal identifying bounding boxes vehicles a feature-based optimisation like width, length, height,
89] network that must be tested etc., is used [100]. These vehicles belong to different classes, but
SSD (single shot bounding box forwards image only once maybe they have the same colour. Apart from the dimension-based
detector) multibox [88] regression for discretisation feature, the texture-based feature is also used to improve the
classification accuracy of vehicles. With the fast progression of the
graphics processing unit, the computation capacity of handling
3.5.4 Stacked auto-encoder (SAE): An auto-encoder means an image has been extraordinarily improved which brought the quick
artificial neural network that helps in active coding [82] by way of headway of DL. Looked at with customary feature extraction
encoding, where a set of data flows shown in Fig. 10. The encoded calculation, DL has better flexibility and adaptability [100]. Till the
data is a compressed form of the entire dataset with reduced last decade, traditional methods are used in image processing for
dimensionality. A typical auto-encoder has an input layer with a no classification of the vehicle which is histogram of gradient along
of smaller hidden layers following it, and then an output layer. The with support vector machine, but that required a lot of calculations.
input data set is encoded by the small hidden layers and the output Recently, more and more researchers are working on DL model for
layer rebuilds the input layer. the vehicle type classification and it becomes very successful in the
In the study, Bourlard and Kamp [83] pointed out that if the identification and recognition of vehicles.
auto-encoder architecture had just a single hidden layer of linear Depending upon the classification of vehicle types, LeCun et al.
neurons, the solution obtained from the auto-encoder would be [101] gave the concept of the modern structure of the CNN
correlated to PCA. At that point, the component which is found out (LeNet-5) that was used for the classification of handwriting. After
by the auto-encoder is utilised to prepare another layer of auto- that, many developments had done on CNN [102], Zhang et al.
encoder, etc. The scholarly weights, by means of this procedure, used the concept of deep CNN [103, 104] which helped to identify
are, in the long run used to instate a DNN [84]. the vehicle of the regular image.
Apart from examining the procedure utilised for classification
• Discussion on the implementation of deep neural algorithms for of vehicles, feature-based classification (like size or structure) has
object-based instance: One of the DL algorithms for the object been discussed here. Since the last decade, there is a significant
(vehicle, human, animal, etc.) based instance is DNN [85, 86]. change in the size or structure of two-wheelers. Now the
In contrast to various video-based vehicle detection [87, 88], companies are designing the structure more stylish, robust, and
classification and in low illumination algorithms, DNN uses masculine. Bajaj Pulsar, Suzuki, Ducati, Honda, Harley-Davidson
‘You Look Only Once’ (YOLO) [89] framework method has are few of the companies where the sizes of two-wheelers have

6 IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10


© The Institution of Engineering and Technology 2019
Table 2 Various problems encountered
S. no. Various Brief description
problems
1. continuous change There are some cases when changes in background exist, but due to its relevance, it is considered as a background.
in background These movements can be regular or irregular depending upon its condition like the traffic light, day to night lightning
or vice-versa, waving trees, etc. Managing these continuous changes in the background is a challenging task.
2. shadowing Moving shadow is another problem on a bright sunny day, as one vehicle may cast its shadow on another vehicle.
This shadow may cause false detection of the vehicle size. So, this also becomes a perplexing task.
3. headlight refection Headlight reflection at night is another aspect causing fault detection when using the background differencing
approach for detection of vehicles.
4. camouflage Sometimes a vehicle can poorly differ from their background, making accurate classification difficult. This is most
important in surveillance applications where camouflage is difficult for temporal differencing methods.
5. clutter Task of segmentation is difficult in case of background clutter. Various models have been proposed that continuously
produces the clutter background and split up the moving vehicle in the foreground from the background.
6. motion of camera Unstable video captured when the camera is moving. Slight irregular movement magnitude differs from one video to
the other.
7. environment Moving vehicle detection becomes a very tough job when videos are taken in challenging weather conditions (snow,
storm, fog, cloudy, air turbulence, hazy environment, rainy, etc.).
8. video noise Noise is a completely unwanted signal and with video's, it is commonly superimposed. Background subtraction for
video surveillance must deal with such weak signals affected by various kinds of noise, such as sensor noise or
compression artefacts.

Fig. 11  Moving vehicles classification based on attributes derived from feature space optimisation

Table 3 Classification of vehicles based on their sizes The classification of vehicles based on their sizes includes
S. no. Type of vehicle Width, m Length, m Height, m width, length, and height of the vehicle is mentioned in Table 3.
1. two-wheeler 0.60–0.90 2.10–2.40 1.2–1.5 The approximate dimensions of all types of possible vehicles have
been included in the table. The features of these vehicles have been
2. three-wheeler 1.2–1.4 2.5–2.8 1.6–1.9
recognised and detected using any detection method [105].
3. light motor vehicle 1.3–1.6 3.2–3.8 1.4–1.7 Figs. 12a–f show the classification of vehicles in terms of size or
4. sports utility vehicle 1.6–1.8 3.8–4.1 1.6–1.8 structure.
5. light commercial vehicle 1.5–1.7 4.0–4.2 1.8–2.1 Vehicle type is notable in view of the characterised scope of
6. heavy motor vehicle 2.5–3.0 8.0–12.5 3.6–4.0 vehicle sizes. After the various methodologies applied for
classification of vehicles, the vehicle position in the video frame
sequences does not only depend on the vehicle size; rather it also
been increased significantly. Another segment of the classification depends upon what type of weather environment the vehicle is in.
of vehicle type is three-wheeler. Different types of auto-rickshaws
come under this category, which is used to carry passengers or 5 Hazy environment
goods for shorter distances. ‘Vikram’ from ‘Scooter India’ was a
very famous three-wheeler at one time, but now many companies Detecting a vehicle under challenging weather environment such as
are manufacturing three-wheelers that are small structure-wise and sunny, cloudy, snowy, rainy, hazy, etc., is a challenging job. Many
efficient for work. The light motor vehicle is a small vehicle used researchers have tried to eliminate the effect of bad light from the
to carry passengers that includes a jeep, motor cars, taxi, etc., video, to improve the quality of image sequences. However, these
whereas sports utility vehicle is a rugged vehicle that can carry methods do not detect the velocity and classification of the vehicle.
both passengers and cargo. Some of the sports utility vehicles As hazy weather conditions pose the most common problem in
which use this technique are Hummer, Ford Eco Sport, and Range surveillance systems, thereby reducing the clarity in traffic images
Rover, etc. Another segment is Light Commercial Vehicles, resulting in low accuracy of vehicle recognition and detection.
designed for low goods carriage. Piaggio Ape is one of the good There are various computational methods available for noisy
examples of this classification. Heavy motor vehicle includes images captured in hazy weather conditions and to overcome such
trucks, bus, etc. is bigger in all dimensions. challenges, some of them are discussed as below:

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10 7


© The Institution of Engineering and Technology 2019
The soft matting Laplacian matrix M is set by

1
∑ δi, j −
ωm
1 + (Ii − μm)T
m (i, j) ∈ ωm
−1 (9)

+ Σm + U Ii − μm
ωm 3

where Σm is called the covariance matrix of the colours in frame or


window ωm, δi j is known as the Kronecker delta function, µm is the
mean matrix, U3 is known as an identity matrix of size 3 × 3, ɛ is a
normalising parameter, and |ωm| is the overall number of pixels in
frame or window ωm.
The distinguished transmission map t can be achieved by the
following linear system:
~
(M + λU)t = λt (10)

where M is called the soft matting Laplacian matrix, U is


considered as an identity matrix of the same size as M, and λ is
taken as 10−4.

5.4 Recovering the scene radiance


Fig. 12  Vehicle classification in terms of size or structure
At last, a single hazy image can be recovered as scene radiance K
(a) Two-wheeler, (b) Three-wheeler, (c) Light motor vehicle, (d) Sports utility vehicle,
as
(e) Light commercial vehicle, (f) Heavy motor vehicle

I f (m, n) − L f
5.1 Dark channel prior K f (m, n) = + Lf (11)
max (t(m, n), t0)
It is a technique of image restoration that removes hazy from a
single image [106, 107]. The estimation of hazy can be calculated where f∈{r, g, b}, the value of t0 is supposed to be 0.1. In the real
as input image, atmospheric light (L) is the highest intensity pixel as
per its correspondence in the dark channel to the brightest 0.1% of
K dark(m, n) = min min K f i, j (6) pixels.
i, j ∈ Ω m, n f ∈ r, g, b In addition to the above-discussed methods, another traditional
method for detection of moving the vehicle is GMM, but for the
Here ‘f’ belongs to red (r), green (g) and blue (b). K f symbolises a hazy environment [110] due to rain, dust, mist, snow, fog, etc., this
network of colour image K. A local patch denoted by Ω is centred model underperforms. To handle these kinds of challenges DL-
at around (m, n). based framework outperforms over GMM because of lower
The minimum operation min f ε r, g, b K f (i, j) is executed on K f computational cost and accuracy.
and min(i, j) ∈ Ω(m, n) is centred at (m, n), which is executed as a
minimum filter on the local patch. 6 Conclusion
When the outdoor image lacks hazy in that case, the dark Vehicle detection in a complex environment is a key task in an ITS.
channel has low intensity [108]. For a haze-free image, the value of The modern and futuristic practices of vehicle detection that are
the dark channel is approximately zero. For haze region, the dark used across the world have been reviewed in this paper. This paper
channel value is always greater than zero. is categorised into four sections, including the types of traffic
surveillance cameras, vehicle detection techniques, bases of vehicle
5.2 Transmission map estimation classification and detection of a vehicle in the hazy environment.
Among other cameras, Gatso and specs cameras are widely used in
The amount of hazy and transmission map is estimated under traffic surveillance because of their high resolution and accuracy.
normalised atmospheric light L f by dark channel prior as Moreover, the five major techniques used for vehicle detection are
given as (i) knowledge-based, (ii) motion-based, (iii) wavelet-
~ I f (i, j) based, (iv) ML, and (v) DL, which has been discussed briefly in
t (m, n) = 1 − min ( min (7)
(i, j) ∈ Ω(m, n) f ∈ r, g, b Lf this paper. A brief discussion on vehicle detection algorithms
shows the importance of DNN over existing traditional methods.
On the comprehensive removal of hazy, the image becomes Various challenges such as weather condition, varying illumination,
unnatural [108]. Thus, a constant parameter ω ∼0.95 is added to dynamic background scenes, etc. may affect the performance of the
retain a portion of the hazy for distant objects as vehicle detection algorithm. A brief overview is given for vehicle
classification based on a CNN. Finally, the computational methods
I f (i, j) that can be used for rectifying the faulty image detection under a
~
t (m, n) = 1 − ω min ( min (8) hazy environment have been discussed. Although lots of
i, j ε m, n f ∈ r, g, b Lf methodologies-based algorithms are available in the form of the
state-of-the-art for vehicle classification, detection, and tracking,
The dark channel prior to patch size 15 × 15 was proposed by He et still challenges arise due to various issues related to moving
al. [108]. vehicle dataset. Thus, leading the researchers to work on further for
the updating and enhancement of existing state-of-the-art.
5.3 Soft matting
After the estimation of the transmission map, the images recovered
contain certain block effects in a hazy image. Transmission map
has been improved by using soft matting [109], which was used by
Levin et al. to reduce the block effects.
8 IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10
© The Institution of Engineering and Technology 2019
7 References Conf. on Intelligent Transportation Systems, Oakland, USA, 2001, pp. 334–
339
[1] Buch, N., Velastin, S.A., Orwell, J.: ‘A review of computer vision techniques [36] Xua, H., Xia, X., Guo, L., et al.: ‘A novel algorithm of moving cast shadow
for the analysis of urban traffic’, IEEE Trans. Intell. Transp. Syst., 2011, 12, suppression’. Proc. of the 18 Int. Conf. on Signal Processing, Beijing, China,
(3), pp. 920–939 2006, pp. 1–4
[2] Zhang, J., Wang, F.Y., Wang, K., et al.: ‘Data-driven intelligent transportation [37] Yoneyama, A., Yeh, C.H., Kuo, C.C.J.: ‘Moving cast shadow elimination for
systems: a survey’, IEEE Trans. Intell. Transp. Syst., 2011, 12, (4), pp. 1624– robust vehicle extraction based on 2D joint vehicle/shadow models’. Proc.
1639 IEEE Conf. Advanced Video and Signal Based Surveillance, Miami, USA,
[3] Sun, Z., Bebis, G., Miller, R.: ‘On-road vehicle detection: a review’, IEEE 2003, pp. 229–236
Trans Pattern Anal. Mach. Intell., 2006, 28, (5), pp. 694–711 [38] Iwasaki, Y., Kurogi, Y.: ‘Real-time robust vehicle detection through the same
[4] Saran, K.B., Sreelekha, G.: ‘Traffic video surveillance: vehicle detection and algorithm both day and night’. Proc. of the Int. Conf. on Wavelet Analysis and
classification’, Int. Conf. on Control Communication and Computing, India, Pattern Recognition, Beijing, China, 2007, pp. 1008–1014
November 2015, pp. 516–521 [39] Lou, J., Tan, T., Hu, W., et al.: ‘3-D model-based vehicle tracking’, IEEE
[5] Wang, G., Xiao, D., Gu, J.: ‘Review on vehicle detection based on video for Trans. Image Process., 2005, 14, (10), pp. 1561–1569
traffic surveillance’, IEEE Int. Conf. Automation and Logistics, Qingdao, [40] Kogut, G., Trivedi, M.: ‘A wide area tracking system for vision sensor
China, September 2008, pp. 2961–2966 networks’. The 9th World Congress Intelligent Transport Systems, Chicago,
[6] Chen, Z., Ellis, T., Velastin, S.A.: ‘Vehicle type categorization: a comparison USA, 2002
of classification schemes’. IEEE Conf. Intelligent Transportation Systems [41] Lee, J.W., Kim, M.S., Kweon, I.S.: ‘A Kalman filter based visual tracking
Proc., ITSC, Washington, DC, USA, 2011, pp. 74–79 algorithm for an object moving in 3-D’. Proc. Int. Conf. Intelligent Robots
[7] Lai, A.S.H., Yung, N.H.C.: ‘Vehicle-type identification through automated and Systems, Pittsburgh, USA, 1995, pp. 355–358
virtual loop assignment and block-based direction-biased motion estimation’, [42] Costa, M.S., Shapiro, L.G.: ‘3-D object recognition and pose with relational
IEEE Trans. Intell. Transp. Syst., 2000, 1, (2), pp. 86–97 indexing’, Comput. Vis. Image Underst., 2000, 79, (3), pp. 64–407
[8] https://i.ytimg.com/vi/xVwsr9p3irA/maxresdefault.jpg [43] Tan, T.N., Baker, K.D.: ‘Efficient image gradient based vehicle localization’,
[9] Kim, J.B., Kim, H.J.: ‘Efficient region-based motion segmentation for a video IEEE Trans. Image Process., 2000, 9, (11), pp. 1343–1356
monitoring system’, Pattern Recognit. Lett., 2003, 24, (1–3), pp. 113–128 [44] Muller, K., Smolic, A., Drose, M., et al.: ‘3-D construction of a dynamic
[10] Wu, B.F., Juang, J.H.: ‘Adaptive vehicle detector approach for complex environment with a fully calibrated background for traffic scenes’, IEEE
environments’, IEEE Trans. Intell. Transp. Syst., 2012, 13, (2), pp. 817–827 Trans. Circuits Syst. Video Technol., 2005, 15, (4), pp. 538–549
[11] Bishop, R.: ‘Intelligent vehicle applications worldwide’, IEEE Trans. Intell. [45] Cutler, R., Davis, L.S.: ‘Model-based object tracking in monocular image
Transp. Syst., 2000, 15, (1), pp. 78–81 sequences of road traffic scenes’, IEEE Trans. Pattern Anal. Mach. Intell.,
[12] Rai, M., Yadav, R.K.: ‘A novel method for detection and extraction of human 2000, 22, (8), pp. 781–796
face for video surveillance applications’, Int. J. Signal Imaging Syst. Eng., [46] Ghosh, N., Bhanu, B.: ‘Incremental vehicle 3-D modeling from video’. Proc.
2016, 9, (3), pp. 165–173 of the 18th Int. Conf. on Pattern Recognition, Hong Kong, China, 2006, pp.
[13] Baran, R., Rusc, T., Fornalski, P.: ‘A smart camera for the surveillance of 272–275
vehicles in intelligent transportation systems’, Multimedia Tools Appl., 2016, [47] Meyer, D., Denzler, J., Niemann, H.: ‘Model based extraction of articulated
75, (17), pp. 10471–10493 objects in image sequences for gait analysis’. Proc. IEEE Int. Conf. Image
[14] Deb, S.K., Nathr, R.K.: ‘Vehicle detection based on video for traffic Processing, Santa Barbara, USA, 1998, pp. 78–81
surveillance on road’, Int. J. Comput. Sci. Emerg. Technol., 2012, 3, (4), pp. [48] Barron, J., Fleet, D., Beauchemin, S.: ‘Performance of optical flow
121–137 techniques’, Int. J. Comput. Vis., 1994, 12, (1), pp. 42–77
[15] Michalopoulos, P.G.: ‘Vehicle detection video through image processing: the [49] Shaikh, S.H., Saeed, K., Chaki, N.: ‘Moving object detection using
autoscope system’, IEEE Trans. Veh. Technol., 1991, 40, (1), pp. 21–29 background subtraction’ (Springer Briefs in Computer Science, New York,
[16] Dickmanns, E.D.: ‘The development of machine vision for road vehicles in 2014)
the last decade’, IEEE Intell. Veh. Symp. Proc., 2003, 1, pp. 268–281 [50] Chalidabhongse, T.H., Kim, K., Harwood, D.: ‘A perturbation method for
[17] Sussman, J.M.: ‘Perspectives on intelligent transportation systems’, 2005 evaluating background subtraction algorithms’. Int. Workshop on Visual
[18] Gupte, S., Masoud, O., Martin, R.F.K., et al.: ‘Detection and classification of Surveillance and Performance Evaluation of Tracking and Surveillance, Nice,
vehicles’, IEEE Trans. Intell. Transp. Syst., 2002, 3, (1), pp. 37–47 France, 2003
[19] Janowski, L., Kozłowski, P., Baran, R., et al.: ‘Quality assessment for a visual [51] Heikkila, J., Silven, O.: ‘A real-time system for monitoring of cyclists and
and automatic license plate recognition’, Multimedia Tools Appl., 2014, 68, pedestrians’. Proc. of 2nd IEEE Workshop on Visual Surveillance, Fort
(1), pp. 23–40 Collins, USA, 1999, pp. 74–81
[20] Dule, E., Gokmen, M., Beratoglu, M.S.: ‘A convenient feature vector [52] Cucchiara, R., Grana, C., Prati, A., et al.: ‘Probabilistic classification for
construction for vehicle color recognition’. Proc. Int. Conf. Neural Network, human behaviour analysis in transactions on systems’, Man Cybern., 2005,
WSEAS, Lasi, Romania, 2010, pp. 250–255 35, pp. 42–54
[21] Dziech, W., Baran, R., Wiraszka, D.: ‘Signal compression based on zonal [53] Benezeth, Y., Jodoin, P., Emile, B., et al.: ‘Comparative study of background
selection methods’. Proc. of the Int. Conf. of Mathematical Methods in subtraction algorithms’, J. Electron. Imaging, Soc. Photo-Opt. Instrum. Eng.,
Electromagnetic Theory, Kharkov, Ukraine, 2000, pp. 224–226 2010, 19, (3), pp. 1–12
[22] Cao, M., Vu, A., Barth, M.J.: ‘A novel omni-directional vision sensing [54] Stauffer, C., Grimson, W.E.L.: ‘Adaptive background mixture models for real-
technique for traffic surveillance’. IEEE Intelligent Transportation Systems time tracking’. Int. Conf. on Computer Vision and Pattern Recognition, Fort
Conf., Seattle, USA, 2007, pp. 678–683 Collins, USA, 1999
[23] Bertozzi, M., Broggi, A., Cellario, M.: ‘Artificial vision in road vehicles’, [55] Zivkovic, Z.: ‘Improved adaptive Gaussian mixture model for background
Proc. IEEE, 2002, 90, (7), pp. 1258–1271 subtraction’. Int. Conf. on Pattern Recognition, Cambridge, UK, 2004
[24] Long, J., Shelhamer, E., Darrell, T.: ‘Fully convolutional networks for [56] Unzueta, L., Nieto, M., Cortes, A.: ‘Adaptive multi cue background
semantic segmentation’. Proc. IEEE Conf. Computer Vision Pattern subtraction for robust vehicle counting and classification’, IEEE Trans. Intell.
Recognition, Boston, USA, 2015, pp. 3431–3440 Transp. Syst., 2012, 13, (2), pp. 527–540
[25] http://www.dailymail.co.uk/news/article-2930654/Know-enemy- [57] Yuan, W., Wang, J.: ‘Gaussian mixture model based on the number of moving
Incredibly-20-differentkindscameras-spying-motorists-spot-spotyou.html vehicle detection algorithm’. Proc. of 2012 IEEE Int. Conf. on Intelligent
[26] Rai, M., Maity, T., Yadav, R.K.: ‘Thermal imaging system and its real-time Control Automatic Detection and High-End Equipment (ICADE), Beijing,
applications: a survey’, J. Eng. Technol., 2017, 6, (2), pp. 290–303 China, 2012, pp. 94–97
[27] Rai, M., Husain, A.A., Maity, T., et al.: ‘Advance intelligent video [58] Bouwmans, T., El-Baf, F., Vachon, B.: ‘Background modelling using mixture
surveillance system (AIVSS): a future aspect’ (In Video Surveillance of gaussians for foreground detection: a survey’, Recent Patents Comput. Sci.,
IntechOpen, London, UK, 2018) 2008, 1, (3), pp. 219–237
[28] Pang, C.C.C., Lam, W.W.L., Yung, N.H.C.: ‘A novel method for resolving [59] Jenifa, R., A., T, , Akila, C., et al.: ‘Rapid background subtraction from video
vehicle occlusion in a monocular traffic-image sequence’, IEEE Trans. Intell. sequence’. IEEE Int. Conf. on Computing, Electronic and Electrical
Transp. Syst., 2004, 5, (3), pp. 129–141 Technologies (ICCEET), Kumaracoil, India, 2012, pp. 1077–1086
[29] Wang, Y.: ‘Joint random field model for all-weather moving vehicle [60] Oliver, N.M., Rosario, B., Pentland, A.P.: ‘A Bayesian computer vision
detection’, IEEE Trans. Image Process., 2010, 19, (9), pp. 2491–2501 system for modelling human interactions’, IEEE Trans. Pattern Anal. Mach.
[30] Liu, Y., Tian, B., Chen, S., et al.: ‘A survey of vision-based vehicle detection Intell., 2000, 22, (8), pp. 831–843
and tracking techniques in ITS’. Proc. 2013 IEEE Int. Conf. on Vehicular [61] Bouwmans, T., Zahzah, E.: ‘Robust PCA via principal component pursuit: a
Electronics and Safety, Dongguan, China, 2013, pp. 72–77 review for a comparative evaluation in video surveillance’, Computer Vision
[31] Zhang, J., Marszalek, M., Lazebnik, S., et al.: ‘Local features and kernels for and Image Understanding, 2014, 122, pp. 22–34
classification of texture and object categories: a comprehensive study’, Int. J. [62] Cutler, R., Davis, L.: ‘Robust real-time periodic motion detection, analysis
Comput. Vis., 2007, 73, (2), pp. 213–238 and applications’, IEEE Trans. Pattern Recognit. Mach. Intell., 2000, 13, pp.
[32] Bouwmans, T., Gonzalez, J., Shan, C., et al.: ‘Special issue on background 129–155
modelling for foreground detection in real-world dynamic scenes’, Mach. Vis. [63] Wang, Y., K., Chen, S.: ‘A robust vehicle detection approach’. IEEE Conf. on
Appl., 2014, 25, (5), pp. 1101–1103 Advanced Video and Signal Based Surveillance, Tehran, Iran, 2005, pp. 117–
[33] Siyal, M.Y.: ‘A neural vision-based approach for intelligent transportation 122
system’, IEEE ICIT’ 02, Bankok, Thailand, 2002, pp. 456–460 [64] Wang, X., Zhang, J.: ‘A traffic incident detection method based on wavelet
[34] Liu, Z., Huang, K., Tan, T.: ‘Cast shadow removal in a hierarchical manner algorithm’. IEEE Workshop on Soft Computing in Industrial Applications,
using MRF’, IEEE Trans. Circuits Syst. Video Technol., 2012, 22, (1), pp. 56– Espoo, Finland, 2005, pp. 166–172
66 [65] Yin, M., Zhang, H., Meng, H., et al.: ‘An HMM based algorithm for vehicle
[35] Cucchiara, R., Grana, C., Piccardi, M., et al.: ‘Improving shadow suppression detection in congested traffic situation’. IEEE Intelligent Transportation
in moving object detection with HSV color information’. IEEE Proc. Int. Systems Conf., Seattle, USA, 2007, pp. 736–741

IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10 9


© The Institution of Engineering and Technology 2019
[66] Fadlullah, Z.M., Tang, F., Mao, B., et al.: ‘State-of-the-art deep learning: [89] Redmon, J., Divvala, S., Girshick, R., et al.: ‘You only Look once: unified,
evolving machine intelligence toward tomorrow's intelligent network traffic real-time object detection’. IEEE Conf. on CVPR, Las Vegas, USA, 2016
control systems’, IEEE Commun. Surv. Tutor., 2017, 19, (4), pp. 2432–2455 [90] Sermanet, P, Eigen, D, Zhang, X, et al.: ‘Overfeat: integrated recognition,
[67] Goodfellow, I., Bengio, Y., Courville, A.: ‘Deep learning’ (MIT Press, localization and detection using convolutional networks’. Advances in Neural
Cambridge, Mass, USA, 2016) Information Processing Systems [S.1]: ICLR Press, Banff, Canada, 2014, pp.
[68] Carrio, A., Sampedro, C., Rodrigues-Ramos, A., et al.: ‘A review of deep 1055–1061
learning methods and applications for unmanned aerial vehicles’, J. Sens., [91] Russakovsky, O., Deng, J., Su, H., et al.: ‘Imagenet large scale visual
2017, 1, pp. 1–13 recognition challenge’, IJCV, 2015, 115, (3), pp. 211–252
[69] Wang, Z.: ‘The applications of deep learning on traffic identification’, 2016. [92] Cordts, M., Omran, M., Ramos, S., et al.: ‘The cityscapes dataset’. CVPR
Available at https://www.blackhat.com/docs/us-15/materials/us-15-Wang- Workshop on the Future of Datasets in Vision, Las Vegas, USA, 2015
The-Applications-Of-Deep-Learning-On-Traffic-Identification-wp.pdf [93] Cordts, M., Omran, M., Ramos, S., et al.: ‘The cityscapes dataset for semantic
[70] Liu, W., Zhang, M., Cai, Y.: ‘An ensemble deep learning method for vehicle urban scene understanding’. Proc. of the IEEE Conf. on Computer Vision and
type classification on visual traffic surveillance sensors’, IEEE Access, 2017, Pattern Recognition (CVPR), Las Vegas, USA, 2016
5, pp. 24417–24425 [94] Lin, T. Y., Maire, M., Belongie, S., et al.: ‘Microsoft COCO: common objects
[71] Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep in context’, ECCV, 1, 4, 5, 7, 2014
convolutional neural networks’. Proc. of Advances in Neural Information [95] Geiger, A., Lenz, P., Urtasun, R.: ‘Are We ready for autonomous driving?’.
Processing Systems, Las Vegas, USA, 2012, pp. 1097–1105 The KITTI Vision Benchmark Suite, Conf. on Computer Vision and Pattern
[72] Girshick, R., Donahue, J., Darrell, T., et al.: ‘Rich feature hierarchies for Recognition (CVPR), Providence, USA, 2012
accurate object detection and semantic segmentation’. Proc. of the IEEE [96] Geiger, A., Lenz, P., Stiller, C., et al.: ‘Vision meets robotics: the KITTI
Conf. on Computer Vision and Pattern Recognition, Columbus, USA, 2014, dataset’, Int. J. Robot. Res. (IJRR), 2013, 32, (11), pp. 1231–1237
pp. 580–587 [97] Brostow, G. J., Fauqueur, J., Cipolla, R.: ‘Semantic object classes in video: A
[73] Karpathy, A., Toderici, G., Shetty, S., et al.: ‘Large-scale video classification high-definition ground truth database’, Pattern Recognit. Lett., 2009, 30, (2),
with convolutional neural networks’. Proc. of the IEEE Conf. on Computer pp. 88–97
Vision and Pattern Recognition, Columbus, USA, 2014, pp. 1725–1732 [98] Maddalena, L., Petrosino, A.: ‘A self organizing approach to background
[74] Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large- subtraction for visual surveillance applications’, IEEE Trans. Image Process.,
scale image recognition’, 2014. Available at https://arxiv.org/abs/1409.1556 2008, 17, (7), pp. 1168–1177
[75] Szegedy, C., Liu, W., Jia, Y., et al.: ‘Going deeper with convolutions’. Proc. [99] Lai, A.H.S., Fung, G.S.K., Yung, N.H.C.: ‘Vehicle type classification from
of the IEEE Conf. on Computer Vision and Pattern Recognition, Boston, visual-based dimension estimation’. Proc. of the IEEE Intelligent
USA, 2015, pp. 1–9 Transportation Systems Conf., Oakland, USA, 2001, pp. 201–206
[76] He, K., Zhang, X., Ren, S., et al.: ‘Deep residual learning for image [100] Mithun, N.C., Rashid, N.U., Rahman, S.M.: ‘Detection and classification of
recognition’. Proc. of the IEEE Conf. on Computer Vision and Pattern vehicles from video using multiple time-spatial images’, IEEE Trans. Intell.
Recognition, Las Vegas, USA, 2016, pp. 770–778 Transp. Syst., 2012, 13, (3), pp. 1215–1225
[77] Dosovitskiy, A., Fischer, P., Srringenberg, J.T., et al.: ‘Discriminative [101] LeCun, Y., Bottou, L., Orr, G.B., et al.: ‘Efficient backpropagation in neural
unsupervised feature learning with convolutional neural networks’, CoRR, networks’ (Tricks of the Trade, Springer, 1998), pp. 9–50
2014, vol. abs/1406, no. 6909 [102] Dong, Z., Wu, Y., Pei, M., et al.: ‘Vehicle type classification using a
[78] Kavukcuoglu, K., Sermanet, P., Boureau, Y., et al.: ‘Learning convolutional semisupervised convolutional neural network’, IEEE Trans. Intell. Transp.
feature hierarchies for visual recognition’, Adv. Neural. Inf. Process. Syst., Syst., 2015, 16, (4), pp. 2247–2256
2010, 23, pp. 1090–1098 [103] Zhang, F.: ‘Car detection and vehicle type classification based on deep
[79] LeCun, Y., Boser, B., Denker, J.S., et al.: ‘Backpropagation applied to learning’ (Jiangsu University, China, 2016)
handwritten Zip code recognition’, Neural Comput., 1989, 1, (4), pp. 541–551 [104] Zhang, F., Xu, X, Qiao, Y.: ‘Deep classification of vehicle makers and
[80] Freund, Y., Haussler, D.: ‘Unsupervised learning of distributions on binary models: the effectiveness of Pre-training and data enhancement’. IEEE Int.
vectors using two layers networks’ Proc. 4th Int. Conf. on Neural Information Conf. on Robotics and Biomimetics (ROBIO), Zhuhai, China, 2015, pp. 231–
Processing Systems (NIPS'91), Denver, USA, 1991, pp. 912–919 236
[81] Sutskever, I., Martens, J., Hinton, G.: ‘Generating text with recurrent neural [105] Ambardekar, A., Nicolescu, M.: ‘Vehicle classification framework: a
networks’. Proc. of the 28th Int. Conf. on Machine Learning (ICML 11), comparative study’, EURASIP J. Image Video Process., 2014, 29, pp. 1–13
Bellevue, USA, 2011, pp. 1017–1024 [106] Huang, S.C., Chen, B.H., Cheng, Y.J.: ‘An efficient visibility enhancement
[82] Liou, C.Y., Huang, J.C., Yang, W.C.: ‘Modeling word perception using the algorithm for road scenes captured by intelligent transportation systems’,
Elman network’, Neurocomputing, 2008, 71, (1618), pp. 3150–3157 IEEE Trans. Intell. Transp. Syst., 2014, 15, (5), pp. 2321–2332
[83] Bourlard, H., Kamp, Y.: ‘Auto-association by multilayer perceptrons and [107] Huang, S.C., Chen, B.H., Wang, W.J.: ‘Visibility restoration of single hazy
singular value decomposition’, Biol. Cybern., 1988, 59, (4), pp. 291–294 images captured in real-world weather conditions’, IEEE Trans. Circuits Syst.
[84] Socher, R., Bengio, Y., Manning, C.: ‘Deep learning for NLP (without Video Technol., 2014, 24, (10), pp. 1814–1824
magic)’. Tutorial Abstracts of ACL, Atlanta, USA, 2012, pp. 5–5 [108] He, K., Sun, J., Tang, X.: ‘Single image haze removal using dark channel
[85] Zhou, Y., Nejati, H., Do, T.T., et al.: ‘Image-based vehicle analysis using deep prior’, IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, (12), pp. 2341–
neural network: A systematic study’. IEEE Int. Conf. on Digital Signal 2353
Processing, Beijing, China, 2016 [109] Levin, A., Lischinski, D., Weiss, Y.: ‘A closed form solution to natural image
[86] Schmidhuber, J.: ‘Deep learning in neural networks: an overview’, Neural matting’. Proc. of the IEEE Conf. on Computer Vision and Pattern
Netw., 2015, 61, pp. 85–117 Recognition, New York, USA, 2006, pp. 61–68
[87] Luckow, A., Cook, M., Ashcraft, N., et al.: ‘Deep learning in the automotive [110] Li, S., Ren, W., Zhang, J., et al.: ‘Fast single image rain removal via a deep
industry: applications and tools’, CoRR, 2017, vol. abs/1705.00346 decomposition-composition network’, Comput. Vis. Pattern Recognit.
[88] Suhao, L., Jinzhao, L., Guoquan, L., et al.: ‘Vehicle type detection based on (CVPR), 2018, 186, pp. 48–57
deep learning in traffic scene’, Procedia Comput. Sci., 2018, 131, pp. 564–
572

10 IET Image Process., 2020, Vol. 14 Iss. 1, pp. 1-10


© The Institution of Engineering and Technology 2019

You might also like