You are on page 1of 12

HUMAN-CENTERED AUDIO/VIDEO CONTENT ANALYSIS FOR IMPROVED SURVEILLANCE IN METRO STATIONS EXPO Ferroviaria, March 27, 2012 VANAHEIM

project February 2010 July 2013

VANAHEIM CONSORTIUM
Collaboration of
Computer vision & audio processing researchers
Multitel asbl (MULT), Belgium (Coordinator) Institut Dalle Molle d'Intelligence Artificielle Perceptive (IDIAP), Switzerland Institut National de Recherche en Informatique et Automatique (INRIA), France Thales Communications France (TCF), France

Human ethologists (sociologist)


University of Vienna (UNIVIE), Austria

Surveillance system designer


Thales Italia (THALIT), Italy

Public transport operators (metros)


Gruppo Torinese Trasporti (GTT), Italy Rgie Autonome des Transports Parisiens (RATP), France

Large-scale integrating project (IP)


Duration: 42 months (February 2010 July 2013) Budget: 5.471.851 (EU contribution 3.717.998 )

OBJECTIVES
Integrate innovative audio/video analysis tools in cctv surveillance system for assessment in real-scale metro environment (Turin & Paris metros)

Scientific objectives: Audio/video data stream modeling Human behavior analysis Human activity recognition (individual, group and crowd/flow of people) Collective behavior modeling

Technological objectives: Development and deployment of system Technological & scientific assessments

AUTONOMOUS STREAM SELECTION


CURRENT SITUATION CURRENT SITUATION

CCTV video streams never watched (e.g. in Torino, 28 monitors for 1100 cameras). Common situation: monitors in control rooms show empty scenes/spaces, (while many others cameras look at scenes in which something (even normal) is happening) Probability to watch right streams at right time is very limited
VANAHEIM PROPOSAL AUTOMATIC SENSOR SELECTION VANAHEIM PROPOSAL AUTOMATIC SENSOR SELECTION

Mechanisms for selecting relevant/salient audio/video streams in control rooms Models to characterise video streams content Trivial scenario when dealing with empty vs occupied scenes Challenging problem when almost all scenes are occupied Need for unsuperised modelling is even more explicit for audio streams mosaicing of data is impossible due to transparent nature of sound
Autono mo us stream s elect ion

Algorithms to model audio/video streams statistic normality and detect abnormal audio/video stream content

Automatic discovery of normal /usual activities (learning stage) Automatic discovery of normal /usual activities (learning stage)
Extraction of object trajectories from videos Identification of activity patterns from trajectories

Discovery of temporal relations bet ween acti vity patterns

People arri ving from platform

People taking esc alator

People on esc alator

People leaving esc al ator

People going to pl atfor m

People arri ving from platform (by taki ng stairs)

Automatic learning of normal activities from several hours of multi-camera videos Automatic learning of normal activities from several hours of multi-camera videos
Activity represent ation: ti me represented with color gradient: beginni g in violet/blue, end in red

Leavi ng Station

Enteri ng Station (from the right)

Vending M ac hine (leavi ng)

Taking esc al ator up

Left T o Right

Right to Left (slow)

Online recognition of current activities (most probable) Online recognition of current activities (most probable)

Cycle of activities recognized on-the-fly Cycle of activities recognized on-the-fly

Unusual/Abnormal activity detection Unusual/Abnormal activity detection


Scene activ ity Likelihood of activities

Likelihood of trajectories

Abnorma lity dis covered

Extension to multi-camera: unusual/abnormal activity detection Extension to multi-camera: unusual/abnormal activity detection Drunk person falling down

Loitering groups in the back

Unusual trajectory

Unusual crossing trajectories

Unusual group trajectory

Abnorma lity index

Anomalies detected on 8 cameras (210h) Counter flow Falling people (people gathering) Heckling Lost person Person distributing leaflets Cleaning staff emptying a garbage Persons phone calling

Semantic analysis of audio surveillance signals Semantic analysis of audio surveillance signals
Recog nised audio activ ity Train (Arriva l, Depar ture) Doors (open, closing) Doors a larms Station Ambia nce

Time-varying spectral representa tion of a udio s igna l

Raw a udio signa l

Unsupervised abnormal audio events detection Unsupervised abnormal audio events detection
Pos itions of k nown a bnorma l events (children group synthetica lly added to raw audio data)

Time-varying spectral representa tion of a udio s igna l Abnorma lity meas ure raw a udio da ta mixed with synthetic event

Known abnormal events detected

Unknown abnormal event detected (bip)

HUMAN-CENTRED MONITORING
CURRENT SITUATION CURRENT SITUATION

Human behaviour modelling : rarely exploited in Video Content Analysis Need for robust and reliable human-centred features
VANAHEIM PROPOSAL HUMAN-CENTRED MONITORING VANAHEIM PROPOSAL HUMAN-CENTRED MONITORING

Move one step beyond scene understanding based on location features Investigate 3 levels of human behaviours characterization in surveillance data Individual level characterize an individual person with his/her activities. Group level detect small group of people and identify interactions in it. Crowd level monitor crowd/flow of people (dynamics of collective people flow). Two applications: Event detection applications for safety/security Environmental reporting for situational awareness
Situ at ional aw aren es s

ealR eal- time ap plicat ion s

HUMAN-CENTRED MONITORING (BEHAVIOR ANNOTATION)


Human behaviour modelling : Development of a behavior catalogue including not only behaviors regarded as interesting by user, but covering behavior repertoire as completely as possible Catalogue of all behaviors of all people visible on video material

HUMAN-CENTRED MONITORING (INDIVIDUAL)


People tracking (trac king by detection) People tracking (trac king by detection)

Tracking by detection: Associate detection over time Fill the gaps

Body orientation esti mation Body orientation esti mation

Body + He ad pose orientation estimation Body + He ad pose orientation estimation

Head pose Body pose 3D circle (50 cm)

HUMAN-CENTRED MONITORING (GROUP)


Groupdetection & tracking Head detection Groupdetection & tracking Head detection People & head detection People & head detection

Group detection & tracking Group detection & tracking

g1

Event detection related to Position: Group stays in zone (access zone, w aiting z one) Group close to/far from equipment/w alls Trajectory: Group stands still, group w alks, and groups runs

Size: calm group Constant size Medium variation nor mal activity level lively group High variation

HUMAN-CENTRED MONITORING (CROWD/FLOW)


People counting / flow monitoring in e scalat or People counting / flow monitoring in e scalat or

Objectives Cumulative People counting People flow measurement (pers./min) Exploitation qualitative Identification of trends (e.g. weekdays vs week-end days)

Name DOD Acc esso C ernaia (l eft) DOD Acc esso C ernaia (l eft)

Duratio n 2h 9h 2h 30 min. 2h 2h 30 min. 30 min. 30 min. 30 min. 30 min. 30 min. 30 min. 9h 9h 2h 9h

Num. pers. 501 4085 2201 647 2426 497 178 91 413 386 373 30 295 1305 4127 810 4376

Flow correlati on 0.43 0.63 0.64 0.67 0.77 0.82 0.83 0.83 0.85 0.85 0.88 0.89 0.95 0.95 0.96 0.97 0.97

Performances evaluation on one station (8 esc.) Depending on view type (close/medium/far) Correlation ~ 0,85 for close/medium views

DOD Acc esso C ernaia (right) DOD Acc esso C ernaia (l eft) DOD S M1 Acc esso Cernai a (right) DOD S M1 Acc esso Cernai a (left) DOD Atrio M ezzanino 2 BER Atrio Mezz anino 1 DOD Atrio M ezzanino 1 DOD Vi a 2 A DOD Acc esso Stazionne (right) BER Atrio Mezz anino 2 DOD Vi a 1 C DOD Vi a 2 A DOD Atrio M ezzanino 1 DOD Vi a 1 C DOD Vi a 1 C

HUMAN-CENTRED MONITORING (CROWD/FLOW)


Occupanc y rate at plat form Occupanc y rate at plat form

Performance evaluation on different platforms

15% error in counting/occupancy for mid-crowded Change point detection


Detect significant changes in crowd density Detect fast modification of platform occupancy, mostly at metro arrivals.

Under-estimation in dense crowd Density based approach to help in dense crowd situation
First test with simple feature shows promising results.

HUMAN-CENTRED MONITORING (SITUATIONAL REPORTING)


Report position of people on infrastructure map Different algorithms used as input
Escalator flow monitoring Occupancy rate at platform Human detector Multi-object tracking

LONG-TERM COLLECTIVE BEHAVIOUR BUILDING


CURRENT SITUATION CURRENT SITUATION

Transportation terminals are increasingly subject to capacity problems Need expressed by managers for analysis of passenger dynamics/behaviors Bottleneck consist in high variety/complexity of passenger behaviours

VANAHEIM PROPOSAL LONG-TERM COLLECTIVE BEHAVIOUR BUILDING VANAHEIM PROPOSAL LONG-TERM COLLECTIVE BEHAVIOUR BUILDING

System able to identify & characterize structures inherent in collective behavior models that can learn, analyze and cluster individual behavioral information Continuous monitoring of user information locations, routes, spatio-temporal activities (walking, waiting...), interactions with others passengers and/or equipments, contextual data (time of day, density of people...)
Planning app licat ions

Goal: estimate trends of large-scale human behaviour at an infrastructure level, e.g. to Localize common loitering areas and/or highly frequented aisles Identify traffic patterns in the infrastructure, etc.

Real-timeeh avio rs bu ild. monitoring Collective b applications

LONG-TERM COLLECTIVE BEHAVIOUR BUILDING

USER-BOARD
Representative of CCTV end-users (security/safety operators, public infrastructure managers...) Surveillance system designers, manufacturers and suppliers Video Content Analysis (VCA) solutions providers

Register at www.vanaheim-project.eu

SURVEY ON AUDIO & VIDEO CONTENT ANALYSIS FOR TRANSPORT. APP.

SURVEY ON AUDIO & VIDEO CONTENT ANALYSIS FOR TRANSPORT APPLICATION

TECHNICAL VISIT - PROTOTYPE FOR VANAHEIM PROJECT AT METROPOLITANA AUTOMATICA DI TORINO

QUESTIONS ?

The research leading to these results has received funding from the European Communitys Seventh Framework Programme FP7/2007-2013 - Challenge 2- Cognitive Systems, Interaction, Robotics under grant agreement n 248907-VANAHEIM.

www.vanaheim-project.eu carincotte@multitel.be forchino.a@gtt.to.it