Professional Documents
Culture Documents
Research Article
E-mail: hashemizadeh@kiau.ac.ir
Abstract: This article proposes an integrated system for segmentation and classification of two moving objects, including car
and pedestrian from their side-view in a video sequence. Based on the use of grey-level co-occurrence matrix (GLCM) in Haar
wavelet transformed space, the authors calculated features of texture data from different sub-bands separately. Haar wavelet
transform is chosen because the resulting wavelet sub-bands are strongly affecting on the orientation elements in the GLCM
computation. To evaluate the proposed method, the results of different sub-bands are compared with each other. Extracted
features of objects are classified by using a support vector machine (SVM). Finally, the experimental results showed that use of
three sub-bands of wavelets instead of two sub-bands is more effective and has good precision.
1 Introduction variation and can filter out the noise in multi-scale mode by means
of the same pattern [11].
Automatic image analysis systems for detection, tracking, and Wen and et al. in [12] presented a comprehensive system for the
classification of moving objects have been developed within the extraction of parked vehicle information, which include vehicle
different fields of computer vision such as computer conceptual recognition, localisation, classification, and change detection to
relations with humans, robotics and medical imaging, smart video monitor the parking. This system presents a variable vehicle model
compression, and automatic video analysis [1, 2]. This issue is one to fit the vehicle hypothesis for accurate localisation, and each
of the most important problems in image processing and machine parameter is used as target visualisation. In [13], Chandra Mouli
vision. Displaying the location of an object, tracking it in the and et al. presented a pedestrian method to overcome complex
sequence of video images and classifying objects with high background, which is based on SVM and human visual
accuracy has its own specific purpose and can help in the precise mechanism. According to human visual attention, first mean and
analysis and monitoring of traffic conditions or the retrieval of a Laplacian of Gaussian filter are applied to reduce the background
specific object from a video sequence [3–5]. noise but it increases the contrast relation between the foreground
In recent years, many studies have been done for detection and and background. In addition, to remove the remaining noise,
classification of cars and pedestrians. They often used the features morphological process on filtered image is used. Then, pedestrian
like Haar wavelet, histograms of oriented gradients, local binary and non-pedestrian are obtained by local threshold segmentation on
patterns, and grey-level co-occurrence matrix (GLCM). There are processed morphological image. In order to detect pedestrian truly
different methods for video surveillance. In [6], Juan and et al. and reduce the false detection rate, SVM is used.
have classified four moving objects, including pedestrians, cars, This paper provides a two-class object detection and
motorcycles, and bicycles by using local shape features and the classification system for using in a video sequence with a fixed
histogram of oriented gradients in Haar wavelet transform. These camera. The aim of this paper is to improve classification rate for
features improve classification performance and provide smaller these two types of moving objects by proposing a new set of
data dimensionality. In [7], Moreno and et al. present a method features in this field. The proposed method classified objects by
based on Histogram to extract features of a specific range of the using GLCM features in Haar wavelet transformed space with
moving objects and uses support vector machine (SVM) for SVM classifier.
classification in the crowded traffic scene. Kumar and et al. have The remainder of this paper is organised as follows: Section 2
classified the flying birds in the sky and the airplane on the runway introduces the segmentation method that is used to detect moving
by using three methods of feature extraction like Haar wavelet, co- objects, Section 3 explains the feature extraction process. Section 4
occurrence matrix based on Haar wavelet and co-occurrence matrix presents classification process. While Section 5 presents the
in [8]. They concluded that co-occurrence matrix based on Haar experimental results of the proposed classification approach.
wavelet leads to get better results in a short time and increase the Section 6 includes the conclusions of this paper.
classification accuracy. Papageorgiou and et al. have extracted
modified Haar wavelet features from Haar wavelet transformed
space and provided the results for the vehicle and face detection in 2 Proposed method
[9]. Triggs and et al. in [10] introduced the histogram features in 2.1 Background subtraction
Haar wavelet transformed space. In recent years, the histogram
feature is used for detection and classification issues such as Video data which recorded by a surveillance camera are converted
vehicle detection and facial recognition. Various features have their into frames or images. The conventional techniques for extracting
own power to describe an object. Histogram stored local shape moving objects are background subtraction. In this method, the
features of considered objects based on the image direction and moving object is obtained by subtracting the input frame from the
derivatives size. [Local Binary Pattern.] LBP is a texture descriptor background image. In order to guarantee the motion detection and
that is computationally efficient, constant in the grey-level achieve better results, we can overcome and reduce the changes
problems of brightness by updating the background image. In this
IET Intell. Transp. Syst., 2019, Vol. 13 Iss. 7, pp. 1148-1153 1148
© The Institution of Engineering and Technology 2019
Fig. 1 Block diagram of sub-band's feature extraction by applying wavelet
case, the exact background image will be created. Several that can describe the image and the objects in the image well.
background estimation algorithms have been proposed. The Fig. 1 shows the flowchart of the grey-level co-occurrence matrix–
background model of our proposed method is implemented by the Wavelet (GLCM-W) feature extraction process. Pre-processing
Gaussian mixture model [12]. Actually, we have used The usually prepares the image for segmentation and handles the
ForegroundDetector function in our Matlab code, which compares changes in size and brightness of objects in an image. The Haar
a colour video frame to a background model to determine whether wavelet transformation is proposed to improve the
individual pixels are part of the background or the foreground. It distinguishability of extracted features in two classes of objects. So
computes a foreground mask. Then by using background first, the first-level Haar wavelet transformation is carried out.
subtraction, we detect foreground objects in each frame. Then, the GLCM features are gained from the area of each object
ForegroundDetector computes and returns a foreground mask using in each sub-band separately (see Fig. 2) [13, 14].
the Gaussian mixture model. The parameters that we set to this
function are the NumGaussians to 3 and NumTrainingFrames to 2.2.1 Pre-processing: To remove the undesirable background
50. NumGaussians is the number of Gaussian mixture model and noise and fill the empty space of the foreground mask, opening and
the NumTrainingFrames is the number of initial video frames for closing of morphological operations are used (Fig. 3).
training background model. As the main objective of this paper is moving objects
classification by using the extracted features. Hence, we extract the
2.2 Feature extraction features with the information, which obtained from the object.
Although the two-dimensional image after morphological
Feature extraction is one of the most important and challenging operations has no appropriate information, but we use it to find the
issues in image processing. In this process, we extract the features coordinates of the moving object in the colour frames. Therefore,
IET Intell. Transp. Syst., 2019, Vol. 13 Iss. 7, pp. 1148-1153 1149
© The Institution of Engineering and Technology 2019
Fig. 3 Effect of morphological operation in removing background noise output
(a), (b) Images which involve noise, (c), (d) Images with noise reduction
the background and foreground labels refer to the pixels with the
value of zero and one, respectively. After the labelling step, the
maximum and minimum coordinates of the rows and columns pixel
with one value is applied to colour frames. In this case, we find the
position of moving objects in the colour frame by using the
coordinates, which obtained from the two-dimensional frame.
A window rectangle around any detected object is created as
shown in Fig. 4.
1150 IET Intell. Transp. Syst., 2019, Vol. 13 Iss. 7, pp. 1148-1153
© The Institution of Engineering and Technology 2019
variables, W is the coefficients vector, and b is the constant, which pedestrian and otherwise as cars. There are a total of N training
determine the hyper-plane. samples for each class of the pedestrians and cars. Suppose that all
the relevant data points belong to one class. We want to assign the
W tX + b = 0 (2) X input vector to one of −1, +1 classes. Therefore, if the sign of the
function in (1) is positive, we assign +1 to x and otherwise −1 will
Here, features are energy, homogeneity, conflict, cohesion, and assign to X. For training the SVM, the favourite output to train N
classes include pedestrian and cars. First, we create the labels samples is ‘1’, which belongs to the pedestrian class and the
matrix by assigning a label to each row of the features object. favourite output is ‘−1’ for cars. It should be mentioned that the
Thus, using cross-validation method, 70% of features matrix are main subject here is to select features that can help us to improve
considered for training and the rest of them as test data. In the test our problem solving and we use the Rbf kernel function of second
phase, with the arrival of each new object, SVM predicts by using poly-order. To conclude, the reason for choosing this classifier is
the features of input object and their similarity. If the value of that we had two classes and fortunately, SVM was well-suited to
wx + b is belonged to y > 0, it classified as pedestrian and our problem.
otherwise as car (Fig. 8). In other words, if there is similarity with
pedestrian features class, the desired object is classified as a
Fig. 8 Flowchart of pedestrian and car classification using Support Vector Machine
IET Intell. Transp. Syst., 2019, Vol. 13 Iss. 7, pp. 1148-1153 1151
© The Institution of Engineering and Technology 2019
Fig. 9 Collected training samples
Fig. 10 Comparison of two and three sub-bands in red, green and blue spectrums. Features number = 12 in three sub-bands and features number = 8 in two
sub-bands
3 Results and discussion problem on this hardware is promising good performance in real
issues.
To evaluate the proposed method, the collected data in the field of
urban transport are provided with a fixed camera from their side
view (Fig. 9). Then, a human labels the data so the standard data 3.1 Performance of the proposed method in different sub-
are available. The images of car and pedestrian are captured by bands
Cannon A1400 camera, with 640 × 480 size of each frame and 30 Table 2 shows the accuracy, sensitivity, specificity for each of the
frames in each second of imaging rate. MATLAB programming three and two sub-bands. Fig. 10 shows comparison of the number
environment is introduced for the implementation and testing the of features and accuracy for each set of extracted features. It
proposed system. This programming language has the ability and illustrates that by increasing the number of features, which
flexibility in working with video and images in various fields such extracted from different sub-bands, the accuracy of the
as image processing and computer vision. The program is run on a classification will increase.
personal computer with Core-i7 processor, 2.2 GHz frequency, and Since the sensitivity of surveillance cameras in different colour
8 GB Memory. This hardware is not much stronger than what is spectrums are different, we have examined the results of extracted
used in real applications in intelligent systems. So solving the features from each frame by applying a wavelet in red, green, and
1152 IET Intell. Transp. Syst., 2019, Vol. 13 Iss. 7, pp. 1148-1153
© The Institution of Engineering and Technology 2019
Table 3 Evaluation of two and three sub-bands in three red, green and blue spectrums separately
Proposed method Features numbers Spectrums Accuracy Precision Sensitivity Specificity Recall
three- sub bands 12 red 0.9858 0.9955 1 0.9444 1
12 green 0.9958 0.9955 1 0.9444 1
12 blue 0.9875 0.9867 1 0.8333 1
two-sub bands 8 red 0.9942 0.9942 1 0.9333 1
8 green 0.9917 1 0.9910 1 0.9910
8 blue 0.9917 0.9955 0.9955 0.9944 0.9955
blue separately, as shown in Table 3 [22, 23]. Fig. 10 shows the [5] Shah, M., Deng, J.D., Woodford, B.J.: ‘Video background modeling: recent
approaches, issues and our proposed techniques’, Mach. Vis. Appl., 2014, 25,
results of the extracted features in three colour spectrums in a bar (5), pp. 1105–1119
graph that the blue colour range is leading to decrease the [6] Liang, C.-W., Juang, C.-F.: ‘Moving object classification using local shape
classification accuracy [24]. and HOG features in wavelet-transformed space with hierarchical SVM
classifiers’, Appl. Soft Comput., 2015, 28, pp. 483–497
[7] Zangenehpour, S., Miranda-Moreno, L.F., Saunier, N.: ‘Automated
4 Conclusion classification based on video data at intersections with heavy pedestrian and
bicycle traffic: methodology and application’, Transp. Res. C, Emerg.
This paper classifies two types of moving objects using GLCM Technol., 2015, 56, pp. 161–176
features in Haar wavelet transformed space with SVM in intelligent [8] Mohanaiah, P., Sathyanarayana, P., GuruKumar, L.: ‘Image texture feature
monitoring systems with fixed cameras. First, according to the extraction using GLCM approach’, Int. J. Sci. Res. Publ., 2013, 3, (5), p. 1
[9] Papageorgiou, C.P., Oren, M., Poggio, T.: ‘A general framework for object
characteristics of wavelet such as de-noising and precise image detection’. Sixth Int. Conf. on Computer Vision, Bombay, India, 1998
retrieval, wavelet has been applied to the area of each object. As [10] Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’.
LL sub-band contains noise and other useless information, the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition,
results have been examined by removing this sub-band and using 2005. CVPR 2005, Kolkata, India, 2005
[11] Nanni, L., Lumini, A., Brahnam, S.: ‘Survey on LBP based texture
the two and three sub-bands of the wavelet. Information of texture descriptors for image classification’, Expert Syst. Appl., 2012, 39, (3), pp.
image has been computed from GLCM. The GLCM-W feature, 3634–3641
which extracts the GLCM features from the Haar wavelet- [12] Zivkovic, Z.: ‘Improved adaptive Gaussian mixture model for background
transformed space, shows the advantages of desired classification subtraction’. Proc. of the 17th Int. Conf. on Pattern Recognition, 2004. ICPR
2004, Cambridge, UK, 2004
performance. The location information of the area of each object is [13] Mokji, M., Bakar, S.A.: ‘Gray level co-occurrence matrix computation based
investigated in three red, green, and blue colour spectrums on haar wavelet’ (IEEE, London, UK, 2007)
separately. It can be concluded that for three sub-bands of wavelet [14] Guo, L., Cui, H., Li, P., et al.: ‘Urban road congestion recognition using
decomposition in red and green spectrums, the features were more multi-feature fusion of traffic images’, J. Artif. Intell., 2016, 1, pp. 20–24
[15] Dheekonda, R.S.R., Panda, S.K., Khan, N., et al.: ‘Object detection from a
outstanding instead of determining them in two sub-bands of blue vehicle using deep learning network and future integration with multi-sensor
one. In addition to these, the introduced algorithm is optimised for fusion algorithm’. SAE Technical Paper, 2017
applications that the histogram colour variation of each object is [16] Kaur, H., Mittal, R., Mishra, V.: ‘Haar wavelet approximate solutions for the
low. It is also recommended to compare the results to other similar generalized Lane–Emden equations arising in astrophysics’, Comput. Phys.
Commun., 2013, 184, (9), pp. 2169–2177
methods as performed in pioneering studies presented in Varma et [17] Fard, M.R., Mohaymany, A.S., Shahri, M.: ‘A new methodology for vehicle
al. [24] and Fu et al. [25], which we did not practice in this trajectory reconstruction based on wavelet analysis’, Transp. Res. C, Emerg.
research due to our focus on SVM and wavelet. This could be an Technol., 2017, 74, pp. 150–167
interesting field of research for future work. [18] Agarwal, S., Verma, A., Singh, P.: ‘Content based image retrieval using
discrete wavelet transform and edge histogram descriptor’. 2013 Int. Conf. on
Information Systems and Computer Networks (ISCON), Bali, Australia, 2013
5 Acknowledgments [19] Tang, Y., Zhang, C., Gu, R., et al.: ‘Vehicle detection and recognition for
intelligent traffic surveillance system’, Multimedia Tools Appl., 2017, 76, (4),
The authors would like to thank the anonymous reviewers of this pp. 5817–5832
paper for their careful reading, constructive comments and [20] Hashemizadeh, E., Rahbar, S.: ‘The application of legendre multiwavelet
functions in image compression’, J. Mod. Appl. Stat. Methods., 2016, 15, (2),
suggestions which have improved the paper very much. pp. 510–525
[21] Kamkar, S., Safabakhsh, R.: ‘Vehicle detection, counting and classification in
6 References various conditions’, IET Intell. Transp. Syst., 2016, 10, (6), pp. 406–413
[22] Huang, X., Zhang, L.: ‘An SVM ensemble approach combining spectral,
[1] Hu, W., Tan, T., Wang, L., et al.: ‘A survey on visual surveillance of object structural, and semantic features for the classification of high-resolution
motion and behaviors’, IEEE Trans. Syst. Man Cybern. C, Appl. Rev., 2004, remotely sensed imagery’, IEEE Trans. Geosci. Remote Sens., 2013, 51, (1),
34, (3), pp. 334–352 pp. 257–272
[2] Zhang, R., Liu, X., Hu, J., et al.: ‘A fast method for moving object detection [23] Kiaee, N., Hashemizadeh, E., Zarrinpanjeh, N.: ‘Evaluation of moving object
in video surveillance image’, Signal. Image. Video. Process., 2017, 11, (5), detection based on various input noise using fixed camera’, ISPRS-Int. Arch.
pp. 841–848 Photogramm. Remote Sens. Spatial Inf. Sci., 2017, pp. 151–154
[3] Kavitha, C., Ashok, S.D.: ‘A new approach to spindle radial error evaluation [24] Varma, S., Sreeraj, M.: ‘Object detection and classification in surveillance
using a machine vision system’, Metrol. Meas. Syst., 2017, 24, (1), pp. 201– system’. 2013 IEEE Recent Advances in Intelligent Computational Systems
219 (RAICS), Trivandrum, 2013, pp. 299–303
[4] Wen, X., Shao, L., Fang, W., et al.: ‘Efficient feature selection and [25] Fu, Y., Ma, D., Zhang, H., et al.: ‘Moving object recognition based on SVM
classification for vehicle detection’, IEEE Trans. Circuits Syst. Video and binary decision tree’. 2017 6th Data Driven Control and Learning
Technol., 2015, 25, (3), pp. 508–517 Systems (DDCLS), Chongqing, 2017, pp. 495–500
IET Intell. Transp. Syst., 2019, Vol. 13 Iss. 7, pp. 1148-1153 1153
© The Institution of Engineering and Technology 2019