Image Detection in The Real World

Project Manager
Mark Cracknell
Image Detection in the Real World

Mark Cracknell, Dr John McCarthy, Derek Renaud Transport for London Windsor House, 42 50 Victoria Street, London, SW1H 0TL UK Phone: +44 (0)20 7126 2008 Email: mark.cracknell@tfl.gov.uk Abstract Intelligent image detection systems are part of a centralised approach to modern day traffic management. This has arisen from the need for more cost effective and efficient monitoring of traffic. Traffic monitoring CCTV tends to be unique in that they include high camera numbers, in the public domain and long transmission paths (up to 40Km). With 2000 cameras and over 100 monitors it is not feasible to continuously monitor every CCTV installed within TfLs network. In fact, it has been shown that manual monitoring over time significantly reduces the accuracy of detection. Therefore, the development of a technology that provides automatic and relevant real-time alerts to Traffic Management Operators can have an immediate and long term impact on traffic management through the implementation of responsive traffic strategies. Background In early 2006 Transport for London (TfL) launched the Image Recognition and Incident Detection (IRID) project. This project was tasked to review the current image processing market and see how it met TfLs detection requirements. Testing was carried out on the following criteria:- Congestion, Stopped Vehicles, Banned turns, Vehicle counting, Subway Monitoring and Bus detection. The first phase of IRID showed that there are benefits to be gained by using this technology. In light of this TfL commissioned a second phase of work with Image Detection. Discussion Project IRID- Phase 2 1 of 4 Project Approach Phase 2 of TfLs work with Image recognition systems is formed of several work streams. This paper will in turn show how each of these contribute to Keep London Moving. Evaluation Toolset In order to effectively and efficiently test Image Detection systems an appropriate toolset is required. The Home Office Scientific Development Branch (HOSDB) have released the iLIDS datasets. These datasets primarily focus on security scenarios. As TfLs primary purpose is not for security but for Transport these datasets are not ideally suited. There is only one suitable dataset available, that which features parked vehicles. To maximise the relevancy of any testing it is best that the dataset is built from the ground up using the most relevant infrastructure. TfL took the decision to create a set of data, similar to that from the HOSDB, focussing on urban traffic management issues. The criteria for the TfL dataset are as follows: Reflect the real-world environment TfL works in Include both good and adverse weather conditions Contain the following scenarios: Congestion; Parking; Banned Turns; and Vehicle Counting. Show dense urban street scenes where freeflowing traffic is not prevalent Show a number of different road-types Transport for London DTO - T&S D&R
Project Manager This dataset was capture from the existing TfL CCTV system, which comprises over 1200 cameras. Each camera is transmits full colour video back to a central CCTV matrix where it is distributed to over 500 users.
Mark Cracknell carried out. It is typical in academia to annotate the video on a per-frame basis, that is to say every frame within the video is annotated, describing all vehicles in the scene and their position. This serves a useful purpose when testing a newly formed algorithm. However, TfL is only interested in the application layer of Image Detection systems testing. The groundtruth associated with the TfL Dataset contains event level ground truth. That is, each video clip is annotated with a specific scenario in mind, for example No right turn. In this case the ground truth consists of a number of timestamped events, detailing each vehicle turning right at a particular junction. Whilst this does not give any significant information about the video clip it is of sufficient level for TfL purposes. As an end-user the primary concerns of an image detection system are: False Events and Missed Events. As a user of the system the number of times the systems correctly or incorrectly alarms is more important than whether or not the car was seen 1 frame after it appears. Once the Video is collected and the ground-truth collated the dataset is ready for use. Smart Camera evaluation There are a number of different ways to make use of image detection technology. TfL primarily use existing infrastructure. The advantage of this is that as the infrastructure is already in place the costs are reduced. The design of TfLs CCTV network means that all the CCTV feeds are brought directly back to a central location. This means that in a single location we have a large number of analogue video feeds from which to process. In this case having a centralised processing approach to image detection makes sense. However there are circumstances where this is not ideal or even possible. London is a very large city and unlike many northern American cities its street layout means that there are often many junctions where there is no existing CCTV coverage. In some cases there are key strategic sites which do not have CCTV coverage. In these cases it would be beneficial to have a monitoring capability. However the Traffic Transport for London DTO - T&S D&R
Figure 1 - Typical shots from the TfL dataset
To ensure that all required information was captured in the datasets a confusion matrix was drawn up similar to below: Scenario Congestion Congestion Turns Counts Weather Sun Rain Fog Wet Road Type Single Dual Single Dual Light Level Bright Sun Dark Overcast Dark
Figure 2- Confusion Matrix showing video types
The process of compiling a video dataset is twofold; Firstly to capture the required video segments and edit them to suitable length, secondly to ground-truth the video. Groundtruthing describes the process of manually reviewing a video clip to annotate what is happening within the clip. This annotation forms the baseline data against which any tests are referenced. There are a number of different levels of groundtruth, dependant on the detail of testing to be Project IRID- Phase 2 2 of 4
Project Manager controllers already have too many cameras to watch so increasing this burden may not prove the best solution. To propose a solution to this TfL is currently investigating smart camera technology. This approach is in contrast to the centralised processing approach, where the intelligence is moved out to street. Where there is a requirement to install new cameras at unmonitored junctions there is a real benefit in including image detection hardware also. Smart camera technology can take on a number of forms including; all-in-one camera solutions, with processing built into the camera housing; Edge devices, which sit in a road side cabinet; or codec based architecture which utilises unused processing power in the digital codecs to perform simple image detection. A trial of 3 different technology approaches (as highlighted above) was carried out to determine what capabilities are available and whether there is a loss or increase of functionality and performance when compared to server based processing. Congestion Monitoring system deployment The key work stream in this project was the deployment of a 20 camera congestion detection system. TfL successfully proved that there is value in using image based detection for congestion detection back in 2007. Following on from there TfL has deployed a 20 camera system for this purpose. Initial roll-out concentrated on 20 key sites as selected by relevant stakeholders. The stakeholders are primarily the Traffic controllers who are using this system day-in day-out. The 20 sites highlighted by the Traffic controllers are strategic sites which, if congested will cause severe problems elsewhere on the road network. These sites act as early warning signs for congestion problems. Each monitored camera is configured individually as every site requires its own definition of congestion. In some cases stationary traffic for 20s may be normal but for other any stationary traffic is unusual. These Project IRID- Phase 2 3 of 4
Mark Cracknell cameras are not used exclusively for Image detection but are available for use by a large number of operators. This presents a problem as all of TfLs cameras at Pan-Tilt-Zoom (PTZ). The deployed system had to be designed to ensure that if a camera was moved from its configured or home position that it would suspend processing until it was returned home. Once the system has detected congestion it must alert the traffic controllers so that they can take appropriate action. Multi sensor alarms including audio and visual are delivered to ensure that alerts are received and recognised by the users.
Figure 3 System overview showing delivery of alarms to LTCC users
A three tier architecture was implemented in order to preserve current network integrity. Tier 1: Video servers, linking directly into the CCTV matrix. These perform the video analysis. Tier 2: Alert server: A remote machine placed elsewhere on the network. This machine aggregates data from each Video server and delivers this data to the user. The Alert server uses a web-service as the delivery method for user alerts. Tier 3: User desktop, Users will connect to the Alert server via a standard web browser. The Video Servers are 19 rack mounted servers optimised for minimal power consumption and heat dissipation. The Alert Server is a virtual machine available on the TfL Network running a web service. Background data traffic flows from the Video servers to the Alert server delivering alert data. Alerts are delivered to standard PC clients running any web browser. Transport for London DTO - T&S D&R
Project Manager
Mark Cracknell
The implementation of this system is in line with TfLs greater goal of providing 24/7 real time traffic operations to give journey time reliability. By detecting congestion build-up quicker and implementing relief strategies sooner will reduce the affect of congestion on road-users. The reduction of congestion benefits not only journey times, but journey time reliability, vehicle emissions and fuel consumption. Any automated system runs the risk of being redundant if it erroneously alerts the user too often. Steps were taken to minimise this risk including maximising accuracy and the delivery of the alerts to the appropriate user. Conclusion As this project is nearing completion the full results are not yet available. The project is due to have tabulated all the data by May 2008 when the following conclusions will be drawn and comprehensively discussed in the presentation. Comments on the use of Smart camera technology including; accuracy, reliability, functionality and scalability. Comments on successes and issues of installing a 20 camera congestion monitoring system including user feedback and impact on the road network.
Project IRID- Phase 2
4 of 4
Transport for London DTO - T&S D&R

Image Detection in The Real World

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image Detection in The Real World

Uploaded by

Copyright:

Available Formats

Project Manager

Image Detection in the Real World

Figure 1 - Typical shots from the TfL dataset

Figure 2- Confusion Matrix showing video types

Figure 3 System overview showing delivery of alarms to LTCC users

Project IRID- Phase 2

Transport for London DTO - T&S D&R

You might also like