You are on page 1of 6

2013 IEEE Intelligent Vehicles Symposium (IV) June 23-26, 2013, Gold Coast, Australia

Design and implementation of a high performance pedestrian detection
Antonio Prioletti1 , Paolo Grisleri2 , Mohan M. Trivedi3 and Alberto Broggi4

Abstract— Research on pedestrian detection system still presents a lot of space for improvements, both on speed and detection accuracy. This paper presents a full implementation of a pedestrian detection system, using a part-based classification for the candidates identification and a feature based tracking for increasing the result robustness. The novelty of this approach relies on the use of part-based approach with a combination of Haar-cascade and HOG-SVM. Tests have been conducted using standard datasets showing results aligned with those of the other state-of-the-art systems available in literature. Real world tests also show high speed performance.

I. INTRODUCTION Pedestrian detection is a research field where significant improvements are still possible despite its importance in the safety systems arena. Due to the extreme variability of the targets in terms of poses, trajectories, dresses, and carried objects the recognition cannot succeeded in every weather and illumination condition. Vehicle ego motion also complicates things, especially at high speeds, where the benefit of the tracking are reduced. Also cluttered backgrounds contribute to degrade the recognition due to higher probability of getting a false positive. High level functions such as warnings or emergency braking systems needs to be triggered as soon as dangerous situations occur. Therefore pedestrian detection shall occur in the shortest possible time. Pedestrian trajectory also shall be taken into account, this is typically done by the high level and tracking function, since a dangerous situation. Pedestrian detection systems certainly deserve a special place among the obstacle detectors, being able to detect pedestrians allows to activate additional protection systems for the vulnerable users outside the vehicle, such as additional airbags or warnings. This paper is laid out like this: an overview of related works will be provide in section II. The implemented system will be described in section III, with a detailed description of the algorithm’s stages in sections III-A, III-C, III-D and III-E. A thorough set of experiments to build the final system follows in section IV, investigating the impact of parameters involved such as the braking time. Section V presents a final evaluation of the performance, compared to the state-of-the-art of vision based detectors. II. RELATED WORKS Pedestrian safety embraces various techniques as described in [1]: from passive solution such as car design, to active
1 A. Prioletti is a post-graduate student at Vislab, University of Parma, Italy. E-mail: antonio.prioletti@studenti.unipr.it 2 P. Grisleri is a researcher at Vislab at University of Parma, Italy. 3 M. M. Trivedi is head of the CVRR lab at University of California, San Diego 4 A. Broggi is head of Vislab at University of Parma, Italy.

solutions based on pedestrian detection. Furthermore, several kind of sensors can be used, ranging from ordinary and infrared cameras to RADAR and LIDAR [2], [3]. A combination of T.O.F. sensor with a camera is used in [4], where a scenario-driven is used to search approach looking only for pedestrian in relevant areas. A detailed description on protection system can be found in [5]–[9]. A fast pedestrian detector is based on a Haar-cascade [10] allowing to delete in the first stage the most false positive but, the strong dependence on the pedestrians appearance, makes the algorithm not very robust. Other approaches [11], [12] use the HOG-SVM provides a very robust system in terms of false positives but, with a performance degradation, in terms of detection rate. Combining them, it is possible to obtain a fast system without renouncing robustness. Many other features and learning algorithms can be also used, such as edgelets [13], variations of gradient maps, or simple intensity images. A new approach, gaining interest from the research community, is to decompose the pedestrian’s shape in multiple parts to increase the robustness of the system (mainly for tolerance to occlusions). Various works explain the bodypart approach, using various types of features and different kinds of environments: from interfering object detection used in [14] to Bayesian combination of edgelets part detectors by Nevatia et al. [13] or a deformable part model by Felzenszwalb et al. [15]. Several dataset for pedestrian detection system are publicly available. In addiction to the well-known MIT [10] and INRIA [11] dataset a much larger and comprehensive dataset as Daimler Detection Bechmark (DaimlerDB) [6] and Calthech [16] have been put forth. III. SYSTEM OVERVIEW A system including a novel part based pedestrian detection high performance approach with a combination of Haarcascade and HOG-SVM, together with a features-based tracking, will be described in the following of this paper. Considering a simple on board-camera already present in many new high-end cars, a monocular vision system has been implemented. The system has been configured by providing the possibility to vary the image size and change, consequently, the processing time; the parameters related to image size will be automatically adjusted to ensure the correct behavior of the algorithm. The system is built on a part-based, stage detection approach, which was first put forth in [17] decomposing the system in a detection stage, verification stage (where it was introduce a part-based approach) and, a final combined verification stage.

978-1-4673-2754-1/13/$31.00 ©2013 Crown

1398

The relationship used between between k and the weak classifier is reported [20]. Noteworthy is attributable to the type of data sets used: using the old INRIA dataset. Output of detection stage and following stages are bounding boxes. allowing its optimization in the laboratory and the following porting on a real hardware platform: BRAiVE [19]. evaluating two different compositions of body parts (shown in Fig. • full body. Candidate filtering The bounding boxes of candidate pedestrians obtained in the detection stage (an example can be seen in Fig. The structure of a cascade classifier allows to speedup considerably the system filtering the most of pedestrians candidates in the first stage. The choice of the stages number is driven by the trade-off between the number of detection windows and. 3. Filtering is applied to eliminate pedestrians candidate whose dimensions do not respect the human aspect ratio. a feature based tracking was introduced with the aim of counteracting a possible missing detection in two subsequent frames and of increasing the number of true positives and eliminate false positives appearing in single frame. the system results less robust than when it was used the more accurate DaimlerDB dataset. Fig. B. The flowchart of the part based pedestrian detection high performance system described in this paper. and 8GB of DDR2 RAM. Example of part boundaries for the 2 and 3-part verification. 2) are filtered and then passed to the verification stage. The variance of false positives is closely linked to the classifier’s stages. A detailed analysis of the impact of this factor will be illustrated in chapter IV.Part verification stage Full-body verification Detection stage Tracking stage Haar-cascade for detection Lower-body verification Combined verification Features calculation Output pedestrians Features matcher Past features Upper-body verification Fig. 2. head. and one of these.45 − 2.3): • full body. Using the Inverse Perspective mappings(IPM) technique [21]. a part-based approach has been introduced in this phase. with size outside of a selected range [1. A. has been physically connected to the application. Detection stage The weak classifier used is Adaboost. The system has been developed on the latest version of the GOLD [18] software. a prototyping software platform. The PC is equipped with a Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz. torso and legs. upper body and lower body. using the flat road assumption. Since missing detections are more likely than false positives due to the high robustness of the part-based system. 1. 1399 . Detection stage output. C. containing human in general and not specifically pedestrians. Several false positives are contained but these will be removed in the verification stage. here denoted as k . The cut-off performance criteria used for each cascade stage is correct detections versus false positives.1. removing people Fig.20]m. the quality of these windows. [22] it is possible to know the pedestrian height in the world knowing its height in image coordinates. A flowchart of the system is shown in Fig. Part verification stage As previously described. The platform is equipped with 10 cameras. The camera has a firewire interface which is connected to an adapter in an industrial PC located in the trunk. looking forward.

therefore. • a regression output classification approach. means large differences between the same pedestrian framed in consecutive frames and. EXPERIMENTAL ANALYSIS A thorough evaluation of the different setups parameters is done in this section. the remaining 2000 images of this database has been used for the parameters optimization. [12]. two different approaches have been implemented: • a majority vote approach. A. The final test has been performed using the Dimler database Test Set. the fraction of false positives out of the negatives (FPR = false positive rate). where at least two out of three classifier must label the window as pedestrian. The majority vote approach allows. the bounding boxes quality provided by the detection stages degrade to a level where the verification stage only verify a few candidates. 2) the significance of each body part and their combination is evaluated comparing it. with an higher false positive rate. IV. higher complexity for the matching phase. the comparisono between this paper. The choice of k is driven by the tradeoff between false positives and true positives: the most of false positive will be removed in the part verification stage whereas here is fundamental do not remove any true positive. This defines a lower limit 1400 . regression. [23] that provides the estimated function value. Several types of classifiers were tested: a linear SVM. and a Bayesian classifier [24]. which comprehensively counts 26000 images. When running on hardware the system runs at 15Hz. To compare each result with the ground truth will be used the PASCAL VOC: a detection is consider correct if the overlap between the area of ground truth bounding box and the area of the bounding box obtained is more than 50%. The pedestrian will be matched with new pedestrian candidates in the following frames using the features extracted from the images and then updating its position. searching the best combination for the final system. created by plotting the fraction of true positives out of the positives (TPR = true positive rate) vs. It will be updated up to 0. the update of ghost pedestrians would be wrong. at various threshold settings. at some point. Each one have two variants: binary. The matching is exploited using the functions offered by the feature library. if it is detected for at least 250 milliseconds by the SVM. by integral images to speed up the system. To cope with this problem a higher frame rate can be used. will be searched the optimal value of k in the detection stage. However. Detection stage The best training dataset and the best choice of k are determined. The use of different datasets has also been analyzed at this stage. in conjunction with a low frame-rate. Fig. Combined verification stage A combination of the three values returned by the verification stage is need to classify the detection window. providing only the classification (pedestrian or non-pedestrian) of the element and.5 s. crediting the excessive generality of INRIA at the expense of specificity of DaimlerDB. in the results section the different performances of each one will be shown. 6000 images composed of 2400 INRIA and 3600 DaimlerDB imaFges and. also. inserts him in the tracking system. we can omit this scenario since in the real-world experiments the incidence was very low and decreased with the movement of pedestrians. a feature-based tracking system (Fig. E. A further drawback could be the impact of the background pixels in the matching: if their incidence is greater than that of the foreground pixels. This decision has been taken in order to keep a low false detection rate.Dense HOG descriptors. Fig. the system speed is tested and the time is broken down into individual stages.1) has been introduced exploiting the features translation and light invariance. The experiments are laid out like this: 1) the best training dataset is determined and. Several dataset for the Haar-cascade training has been tested: 2400 DaimlerDB images. Two different types of SVM were tested. a feature matching with the entire image will be done introducing a ghost pedestrian when a match is found. The training set has been choosed selecting 24000 images from the Dimler database Training Set. The results are shown using an ROC curves. D. and our system is shown in figure 5. which corresponds to our one based on one part. A set of features described in [25] based on multiple local convolution. also. a non-linear SVM.4 shows the large difference between the system trained with the INRIA dataset and the DaimlerDB dataset. When the position is not matched for 0.4 shows. reducing the variability among consecutive frames. unlike regression output classification approach. High vehicle speed. which is not included in the Training Set. a linear SVM and a non-linear SVM. with a full body system of Geismann at al. key point and descriptor. Training the last SVM including occluded pedestrians in the training dataset could provide similar results but decreasing the number of false positives. 10000 DaimlerDB images. finally. are calculated for each part of the body and passed to the corresponding SVM. The tracker considers a pedestrian candidate as true pedestrian and. 3) the various approaches described in III-C are analyzed.5s the target is removed from the tracking system. to detect pedestrians partially occluded but. In the scenario of a detection missed by the SVM. how a smaller value of k means an higher detection rate but. 2400 INRIA images. where the three float values coming from the detection stage are used to train a new classifier. One of the features-based tracking system downsides is the strict dependence on the vehicle ego-motion. are extracted from two different images: horizontal and vertical gradients. Occluded pedestrians are only detected using tracking. 4) finally. Tracking stage To counteract the high selectivity of part-based approach.

4. One of the most widely discussed topic about the part-based approach regards in how many parts divide the shape of pedestrian.8. As described in IV-A. 7. however. the combination of these two parts is the best possible combination to manage the trade-off between true and false positives. bayesian classification and a majority vote approach. Comparison of different methods for combined part-verification. 5. this causes the consequent bad result of the three parts approach. The bayesian approach [24] seems to provide the best results.7: linear SVM. B. the radial approach performs better. Fig. With a low false positive rate. In order to obtain the classification of the detection window is need to merge the three results from the verification stage.6: due to the low images resolution. where it is possible to see how 2 parts performs better than 1 and 3 part. The largest contribution in processing time is the Haar-features calculation. 6. Detection performance with single parts. In order to justify these results. 2. Data have been generated using the validation dataset. Fig. Speed evaluation An evaluation of system speed performance on a given hardware is shown in Fig. as shown in Fig. The graph confirms the assumptions regarding the difficulty to detect the head. Combined verification stage on the number of stages in the Haar-cascade. Detection performance with varying numbers of parts. Assuming an appropriate trade-off between true positives and false positives. Comparison of different training sets for the detection stage. Part verification stage An SVM has been used in this stage. showing the reliability of each part type. after which the performance degrades. Fig. the relevance of each part is explained in Fig. and 3-part verification is shown in Fig. the radial approach can be considered the best. the detection stage will 1401 . These results are justifiable by the linear separation on set of non-linear data and. radial SVM. Furthermore. the variance in computational time is related to the classification of the body parts. D. the resulting misclassification of many pedestrians. Note that two parts out perform. also justified by the non linear data returned from the verification stage. 13 is the best number of stages. C. while three parts are just as bad as one since the low quality of the images and the high difficulty in identifying small areas such as the head. A performance analysis with 1. it is very difficult to detect the head and.Fig. Four options have been investigated. Incrementing k .5. observing the low detectability of the legs and the high detectability of the upper body. but it provides also an high false positive rate.

obtained a processing time of 64. 2. T HE PAPER CONTAINS AN COURTESY OF D OLL AR EXPLANATION OF EACH OF THE SYSTEMS . 8.40 0.673 0. TABLE II C OMPARISON OF OUR PART BASED PEDESTRIAN DETECTION HIGH PERFORMANCE SYSTEM WITH STATE .72 0. To provide more consistency to the results obtained.62 0. The time has been measured for each stage. AL . B. however they perform on different architecture and on different test set.42 0. T HE FASTEST SYSTEM IS DAIMLER DB A BBREVIATIONS ARE THE SAME DESCRIBED IN [5] .164 0. Final test without tracking Performance on Daimler-DB dataset are shown in Fig. 9. [5]. with a huge speedup.670 FPS provide more candidate pedestrians having a large impact on the system speed.673 with a false positive rate of 0. too small to ensure a good matching between features.51 0. The reported comparison does not include some more recent works such as [26]–[28]. V. a 2-part approach and a radial SVM for the final stage. We see the reduction of false positives and the increase of true positive rising the number of stages. Final test detection.004 0.ART PEDESTRIAN DETECTION SYSTEMS AT 0. 1402 . ALSO LISTED . false positives and speed is given by the 13 stages. denoted as k. it has been seen how our performance in terms of speed and detection rate are similar or better.THE . In order to test the improvements achievable with the tracking system.OF .71 0.5 FPS 0.5 frame per seconds at a resolution of 640x480. our system performs better of each other algorithm and.43 0. Perfomance on the DaimlerDB test set and our dataset are shown in Table I. Moreover.363 ms. Average detection rate 0.046 is obtained.5 FPS 30 FPS 2.02 Frame rate Our system with tracking Fastest system in [5] 15. the final system was built using the Daimler-DB dataset with 10000 images to train the classifier. A. a new test set has been rec (two sequences of 5182 and 11490 frames respectively) captured from the real prototype described in section III.046 0.55 False positive rate 0. A good trade-off between true positives.06 0.010 0.61 0. ALTHOUGH DETECTION RATES FOR THE DATASET ARE UNKNOWN . FINAL PERFORMANCE EVALUATION Following the results explained in section IV.6 FPS Image size 640x480 320x240 640x480 Fig. a comparison of our algorithm with the main pedestrian detectors of state of the arts as described in [5] is shown in table II.089 FPS FPS FPS FPS FPS FPS FPS FPS FPS FPS Fig.036 0.05 N/A Algorithm Our Algorithm MultiFtr+Motion LatSvm-V2 MultiFtr+CSS HogLbp HikSvm MultiFtr LatSvm-V1 HOG Shapelet VJ FPDW Part-based yes no yes no no no no yes no no no no Speed 15.45 0.1 FPPF ON THE DAIMLER DB DATASET.005 0. ´ ET. In this case.017 0. Considering the 13th stages as the best. Performance evaluation on the last part of Daimler-DB images.TABLE I F INAL SYSTEM PERFORMANCE Detection rate Basic system without tracking With resized image (320x240) 0. As it is possible to see.014 0. furthermore. DaimlerDB has a low frame rate (10 Hz). to appreciate the tracking approach benefits sequences are required with an high frame rate. a detection rate of about 0. Tracking improvements As described in III-E. where it was possible made some comparison. corresponding to about 15. choosing 13 as number for the cascade stages.054 0. Speed versus detection rate and false positive rate.098 0.9: an ROC curves is plotted reporting several results varying the number of cascade stages (k ).

224–229. pp. 2005. CVPR 2009. “Real-Time Multi-Person Tracking with Time-Constrained Detection. Dalal and B. [14] X. IEEE Transactions on. Dollar. Møgelmose.” British Machine Vision Conference. IEEE Conference on. P. [6] M. Leibe. S.” Image and Vision Computing Journal. [9] T. survey. Triggs. multiscale. IEEE Transactions on. Mao. 1991. “Multiple-part based Pedestrian Detection using Interfering Object Detection. vol.” 15th IEEE International Conference on Intelligent Transportation Systems. pp. and W. 2001. IEEE Conference on. Schiele. deformable part model. 2005. Stiller. no. and P. [20] P. Jung.” IEEE Transactions on Intelligent Transportation Systems. [21] H. Broggi. IEEE Transactions on. Third International Conference on. 57. vol. Lecce. it was obtained an high-speed system working at 15. A. Ramanan. Krotosky and M. 1. 2008 IEEE. “Pedestrian protection systems: Issues. Timofte. no. Fukunaga. [13] B. 8. Jones. P. pp.” in Procs. [2] K. Wojek. Trivedi. “A two-staged approach to visionbased pedestrian recognition using Haar and HOG features.” in Intelligent Vehicles Symposium. Poggio. Moeslund. IEEE. 619–629. 4. Yamamoto. pp. The authors would also like to acknowledgment Cassa di Risparmio di Parma e Piacenza for funding the test platform used for this work. “A new approach to urban pedestrian detection for automatic braking. “Pedestrian protection using laserscanners. significant improvements are still possible in order to enhance the pedestrian detector. pp. and P. 4. 15– 33. pp. 1–27. [19] P. Bombini. “A trainable system for object detection. 2008. Fuerstenberg. “Parametric ego-motion estimation for vehicle surround analysis using an omnidirectional camera. 8. Wojek. Ghidoni. 1998. Grisleri and I. 2008. june 2008. 2. 2005. [8] S. 2005 IEEE. L. 2008. 3. Proceedings IEEE.” Intelligent Vehicles Symposium. Appel and W.” in Computer Vision and Pattern Recognition. pp. Yoshizawa. 2010. 38. and L. pp. and T. 137–154. Van Gool. Gandhi and M. 7th IFAC Symposium on Intelligent Autonomous Vehicles. “Pedestrian detection with convolutional nerual networks.” Computer Vision and Pattern Recognition. partially occluded humans by bayesian combination of edgelet based part detectors. “Pedestrian detection: A benchmark. [27] R. 1403 .” Pattern Analysis and Machine Intelligence. a features-based tracking was also developed to counteract the close bonds of a partbased approach. pp. Mitzel. 2005. [26] D. Chang and C. and J. “The BRAiVE platform. Further improvements are achievable considering the vehicle ego-motion in the tracking phase and the pitch of the vehicle in the candidate filtering stage. Geiger. 2012. pp. no. 1990. Broggi.” Intelligent Transportation Systems. no. M. Fascioli. pp.” Machine Vision and Applications. Lin. vol. 886–893. june 2009.biomedsearch. vol. Trivedi. 64. P. Available: http://www. Sept. McAllester.” IEEE Intelligent Vehicles Symposium. Cerri. [11] N. Bertozzi. Perona. VII. VI. J. no. IEEE Computer Society Conference on. Gavrila. no. vol. Trivedi. Gandhi and M. “A discriminatively trained. A. pp. 2.72 considering a false positive rate of 0. Ogata. Broggi. 2903–2910. “GOLD: A framework for developing intelligent-vehicle vision applications. Benenson.” Computer Vision and Pattern Recognition. [5] P. vol. 2009. Enzweiler and D.html [22] M. R EFERENCES [1] T. 1. 554–559. 2011. “Crosstalk Cascades for FrameRate Pedestrian Detection . P. Schneider. A.” ACM Transactions on Intelligent Systems and Technology. IEEE. 743–761. Italy. [16] P. and multimodalstereo approaches to pedestrian detection. 16. J.” International Journal of Computer Vision. and B.” in Intelligent Transportation Systems. B¨ ulthoff. april 2012. Geismann and G.1 . A. 75.” Pattern Analysis and Machine Intelligence. Tracking system provides more robust results that allows to handle missed detections of pedestrians in intermediate frames and avoid possible single frame misclassification. D.–Feb. Grisleri. [25] A. 963–968.” European Conference on Computer Vision. Dollar. “Image based estimation of pedestrian orientation for improving path prediction.” in Natural Computation. pp. 23. P. Proceedings. Ziegler. 2007. Sudowe. 2009. Cerri.” International Journal of Computer Vision. Qi.” Intelligent Vehicles Symposium (IV). 85–95. pp. H. 165–169. vol. IEEE. “On color-. “A Two-stage Part-Based Pedestrian Detection System Using Monocular Vision. [7] T. New York: Academic Press. Perona. 177– 85. A. B.” Intelligent Transportation Systems. and C. C. 2007. [12] P. 69–71. 304–311. “Stereoscan: Dense 3d reconstruction in real-time. B. 2005. pp. no. vol. pp. “LIBSVM: a library for support vector machines. “Stereo inverse perspective mapping: Theory and applications. M. “Histograms of oriented gradients for human detection. and P. 34. [15] P. CVPR 2008. 594–605. Wu and R. sept 2012. vol. Fedriga. Viola and M. 2. pp. [4] A. “Inverse perspective mapping simplifies optical flow computation and obstacle detection. and challenges. 2009. Exploiting the CPU multicore features. vol.” International Journal of Computer Vision. using the Kalman filter to compute the future trajectory of the pedestrian and avoiding dangerous situations. Fascioli. Broggi. 12. [Online]. 31. H. 2008. 437–442. infrared-. M.” IEEE Intelligent Systems. vol. “Robust real-time object detection. 73–77. vol.com/nih/ Inverse-perspective-mapping-simplifies-optical/2004128. Doll´ ar. pp. B. 2179–2195. pp. vol. F. The work described in this paper has been developed in the framework of the Open intelligent systems for Future Autonomous Vehicles (OFAV) Project funded by the European Research Council (ERC) within an Advanced Investigators Grant. Nevatia. 2007. jun 2012. and H. ACKNOWLEDGMENTS The authors would like to thank our colleagues in the LISA-CVRR lab and the important contributions of Andreas Mogelmose in developing the initial version of the part based algorithm. pp. CONCLUDING REMARKS AND FUTURE WORKS The implementation of a novel high-performance pedestrian detector on a real prototype has been described in this work. Mathias. IEEE Transactions on. 506–511. vol. [17] A. R.An increase of 27% and 22% of true positives on the two dataset was obtained with a reduction of 5% and 10% of false positives with respect to the same system without tracking. Kienzle. Grisleri. “Monocular pedestrian detection: Survey and experiments. 1 –8. and S. 2000. Gandhi and M. Bertozzi. Bohrer. 413–430. 3. Papageorgiou and T. [23] C. 247–266. “Pedestrian detection at 100 frames per second. An high-level processing could be also included. Introduction to Statistical Pattern Recognition. pp. Zani. Prioletti. 2007. “Detection and tracking of multiple. Schiele. [28] R. 2011. A combination of Haar-cascade and HOG-SVM has been used introducing a novel combination of HOG features with a part-based approach. CVPR 2005. C.” in Computer Vision and Pattern Recognition. Zhu. Little. pp. A partial use of processors (about 70 %) suggests the presence of parts of OpenCV code not yet parallelized: a low-level reimplementation of the system. vol. Mallot. sept. Felzenszwalb. [18] M. [10] C. [3] A. and D. 2007. pp. “Pedestrian Detection: An Evaluation of the State of the Art. Despite these achievements. no. pp. 585–590. 8. A. ICNC 2007.” Biol Cybern.5 Hz that outperforms all the others state-of-the-arts classifiers with a detection rate of 0. 2. 10. [24] K. Jan. 2005. Trivedi. M. Trivedi. P. 2011. Ieee. J. and R.