Virtual Observers in a Mobile Surveillance System

Stewart Greenhill and Svetha Venkatesh
Department of Computing, Curtin University of Technology

stewartg@cs.curtin.edu.au, svetha@cs.curtin.edu.au

ABSTRACT
Conventional wide-area video surveillance systems use a network of fixed cameras positioned close to locations of interest. We describe an alternative and flexible approach to wide area surveillance based on observation streams collected from mobile cameras mounted on buses. We allow a “virtual observer” to be placed anywhere within the space covered by the sensor network, and reconstruct the scene at these arbitrary points. Use of such imagery is challenging because mobile cameras have variable position and orientation, and sample a large spatial area but at low temporal resolution. Additionally, the views of any particular place are distributed across many different video streams. Addressing this problem, we present a system in which views from an arbitrary perspective can be constructed by indexing, organising, and transforming images collected from multiple streams acquired from a network of mobile cameras. Our system supports retrieval of raw images based on constraints of space, time, and geometry (eg. visibility of landmarks). It also allows the synthesis of wide-angle panoramic views in situations where the camera motion produces suitable sampling of the scene and metaphors for query and presentation that overcome the complexity of the data.

1.

INTRODUCTION

Categories and Subject Descriptors
H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing; I.4.5 [Image Processing and Computer Vision]: Reconstruction

General Terms
Algorithms, Measurement

Keywords
observation systems, mobile surveillance, video indexing, scene reconstruction, panorama, virtual observer, spatial query, visibility query.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’06, October 23–27, 2006, Santa Barbara, California, USA. Copyright 2006 ACM 1-59593-447-2/06/0010 ...$5.00.

Conventional wide-area video surveillance systems use a network of fixed cameras positioned close to locations of interest. With the inclusion of forward facing cameras on buses (current installed systems include: London and Los Angeles transport systems), government agencies may seek to integrate this roaming video surveillance network into the high-level strategic surveillance and security needs of the city. Mobile surveillance on buses assists in the battle against street crime and antisocial behaviour, in the investigation of traffic accidents, in investigating activities of people and with numerous other events that impact government operations. For example in the London bombings of July 7 2005, law enforcement authorities wanted to quickly access the footage of an area around the bombing and the only footage available was from the bus fleet. There are no systems currently able to effectively address such issues in wide area investigations. A surveillance initiative involving a fleet of buses in a city such as London, delivers a powerful network of over 8,300 mobile security cameras traversing the city’s busiest regions, up to 24 hours a day, 7 days a week, significantly complementing and enhancing the city’s static surveillance infrastructure. This paper explores an alternative and more flexible approach to exploit this new and exciting infrastructure by creating the ability to have unlimited numbers of “virtual observers” placed at chosen positions to monitor the scene, spatially and over time. Surveillance along transport routes is significant because they generally link areas where people congregate, and thus provide coverage of important public areas such as central business districts. Our work describes a new approach to such wide area surveillance around transport routes using GPS and frontal camera data acquired from a transport network, and is an example of an observation system [14]. Although mobile cameras are used in specialised applications such as aerial survey (eg. UAVs), or observation of harsh or dangerous environments (eg. undersea robots) there appear to be no current attempts to use networks of mobile cameras to observe the places where people typically live and interact. The use of mobile cameras presents many opportunities, but also many challenges: (a) mobile cameras have both variable location and orientation, (b) mobile cameras sample a large spatial area at low temporal resolution, (c) the views of a particular place are distributed across many different video streams acquired by different sensors on different mobile units, and (d) the cameras are continually moving; there is no stable background image which makes it

579

We provide effective data access by making video frames accessible via spatial. and the trajectory of the vehicle over time. Data is collected in an ad-hoc way. and an orientation. lighting. Given a destination position and range of view angles. Images of a place are taken by cameras mounted on different vehicles. Queries are based on a multidimensional region of interest. or joint spatio-temporal constraints. • Synthesis of panoramic images. or by a temporal constraint based on absolute time. For example. • View Synthesis. A virtual observer represents an imaginary camera that can be placed anywhere in the space covered by the sensor network. and then register images with respect to a common reference frame. • Retrieval of views of a particular object or landmark. radius. 1000 or 10000) of real-time streams. The Scale Invariant Feature Transform (SIFT) [17] identifies feature points in images that can be used to compute the relative registration of a sequence of images (“bundle adjustment”). we use the techniques of Brown and Lowe [5]. for panorama generation we need to be able to select a sequence of relevant images. but in a mobile environment this requires indexing. and orientation constraints. but there is only sufficient bandwidth to retrieve about 5 percent of the collected data. View synthesis involves the retrieval and fusion of images for a given query. temporal and geometric constraints. • Selection of images based on multiple spatial. To exploit surveillance across this wide area network. so sampling times depend on the frequency of buses on that route. the system may build composite images by combining observations taken at different times. we focus on the following design issues: • Data Access. • Selection of video sequences based on spatial location. orientation and field of view. These must be specified in a natural way. • Synthesis of “time-lapse” video. Therefore it is important that the system collect data according to demand. This enables us to determine what a mobile camera sees at any given time. An advantage of using SIFT features in this application is that they are invariant to many of the differences that arise due to the mobile cameras (ie. Observation parameters can be controlled by manipulating visual markers. temporal. Our approach to data management depends partly on features of the existing bus infrastructure. A bus has enough storage for 8 to 9 days data. and perspective. Given a source position. so it is important that data be organised for both temporal and spatial access. and rate-of-change of heading to identify candidate image sequences. Virtual observers can act as standing queries that regulate the collection of data from the network. To generate wide-angle (panorama) views. Wireless networks are used to retrieve data when buses return to their depot. we combine multiple images taken at different times to give a wide-angle perspective. In a static-camera application this is a trivial problem. Queries supported by the system include: • Construction of the view from a virtual observer.difficult to do motion-based segmentation. a spatial position. and assembly of these sequences into a time-ordered composite sequence. Lastly. temporal. retrieval and registration of images from multiple streams. For image registration and blending. This poses several challenges in the context of mobile surveillance. organising. Each bus has 7 cameras which at typical sampling rates generate approximately 225Kb of data per second. • Visual Query Design. In addition to the challenges of interpreting the data. so there is high variability between the images that are available for a particular place and time. radius. and field of view. images may be selected by choosing a position from a map. Where the desired field of view is wider than the camera view. we develop the VIRTOBS system. seeking to provide natural metaphors that the user can use to specify constraints. differences between images may not be a problem. To build such a system that answers these types of queries. determine what is visible for that observer. which allows an operator to see a view of a particular place over time. Along a bus route a place is only imaged when a bus is in the vicinity. We use a network of buses from which GPS data and the front-camera video can be extracted. There may be significant differences due to sensor response. including both video and trajectory information. and transforming images collected from the mobile camera network. The scene is sampled infrequently compared with static-camera surveillance. In the data-management field new systems are being developed to deal with the demands of continuous stream-based processing. we use constraints on position. orientation. and time-lapse views we need to make images or video sequences that combine images taken from different cameras at different times. changes in perspective and lighting). resulting in a sparse sampling of the environment. showing how a view of a place changes over time. or timeof-day. Each virtual observer is associated with a position. Each frame of video is associated with a time. VIRTOBS includes a visual query system where a user can place virtual observers on a map. This allows the user to deal with the complexity of the underlying data in which views of interest may be 580 . This is done by interpolating a position and heading from the GPS track based on the frame sampling time. which includes spatial. but these do not generally deal with complex data such as images. there is the important issue of how to design systems that deal with large numbers (eg. The system then constructs the view for a virtual observer by indexing. Orientation derived from GPS data is not precise enough for image registration. Where possible. For simple image retrieval tasks. heading. by incorporating its orientation relative to the vehicle. Blending is done at multiple spatial scales to reduce visual artifacts at the joins between images. However. To address these issues we propose the following approach: For image selection. retrieve matching images based on simple visibility constraints. Many query operations need to determine views with respect to a particular place. Most queries are based on constraints. and the resulting observer view is automatically updated from the available observations in the database.

Many of these problems are the same in mobile surveillance. There have been no attempts to do wide-area surveillance with mobile camera networks. video-based person identification. and (d) stream data may be lost. these streams can be used effectively for law enforcement. The underlying design allows effective retrieval of frames in a spatio-temporal context from arbitrary perspectives. In the field of data management. marketing) but share common functionality and design principles. population research. (c) applications are event-oriented. and include the ability to handle continuous queries which continuously provide new results as they become available. An emerging paradigm in this area is that of observation systems [14]. traffic monitoring. Observation systems exist in many application areas (eg. 4]. For unconstrained motion in two dimensions (eg. but camera motion imposes an extra level of difficulty. 19]. one acquired from a car. This is useful since more than 80 percent of crime is committed withing 5 km of a transport route. Most wide-area surveillance systems use static fixed cameras or PTZ cameras. and the other using data extracted from a real bus network. for ships moving at sea). situation awareness. Section 6 discusses some open problems identified from our work. various approaches exist based on grid decomposition.4). there is considerable interest in data models and efficient queries for moving objects (see 2. stale. background subtraction for object segmentation) only work when there is no camera motion. or animals). or intentionally omitted so queries may have only approximate answers. which support continuous queries over unbounded data streams (see 2. An important operation when dealing with moving objects is to efficiently index object trajectories. people. stream processing has become an important issue [10. not just its current state. which are described in this section. Models have recently been developed for stream queries over media data [16. specialised data structures can be used to reduce the dimensionality of the search problem (eg. automatic alignment and stitching of images.3 Moving Object Databases 2. object structure anal- Mobile surveillance systems need to be able to deal with large numbers of moving objects. whilst identifying open problems for the community using this new infrastructure. This has become possible using modern feature-based image matching techniques based on SIFT (see 2. Section 2 describes related work. This issue is also examined in many of the new stream-processing data models.and three-dimensional object tracking. the FNR-Tree. forest fires. collect data from multiple sensors. Observation systems differ from traditional data-base applications in several ways: (a) data comes from external sources (eg. analyse and correlate data from multiple sources to derive a record of meaningful events. RELATED WORK The problem of mobile surveillance is related to several areas of active research. Observation systems observe people or objects in the environment. Where vehicles move on a road network. Section 3 describes the data model. hurricanes. (b) data management exists over a history of events. two. the next generation of systems is being developed to automate many of the tasks that currently require the attention of a human operator. vehicles. and to synthesise views of the environment to overcome the issues of sparse sampling. and the design of largescale systems.1 Observation Systems Video surveillance is an area of active research. This includes sample output for queries. For situation awareness systems must have a wide area of observation. and MON-Tree) [8]. Our work demonstrates the power of the paradigm. The design of observation systems includes issues of realtime. The novelty of our system lies in being the first work to address the issue of wide area surveillance using a transport network and the multi-modal data streams acquired from them. but for identifying an tracking people we must observe fine details like faces. Important abstractions are the moving point (eg. and (e) real-time response may be important [3].distributed across time. or hierarchical spatial decomposition (eg. In the spatial-database area. and TelegraphCQ [7]. We demonstrate the efficacy of the systems using two data sets. A key challenge in this area is to observe the scene at the correct scales.2). the other from a bus network. many of the algorithms on which these rely (eg. The structure of this paper is as follows. R-Trees) [9. surveillance. 2. hashing. space and orientation. oil spills at sea). Borealis [2]. The significance of the approach lies in this untapped application: wide area surveillance along transport routes. sensors) and the data-base must actively detect events rather than simply respond to queries. requiring trigger processing beyond the capability of traditional DBMS systems. and some query operators related to visibility. and provide tools to query and present information about activities in the environment. The aim is to develop data models and query languages that allow the modelling of objects with time-dependent position [15]. 2. Section 4 describes the implementation of VIRTOBS. and the moving region (eg. The emphasis of these systems tends to be on real-time scalar data rather than media data such as video. 11]. Section 5 describes our experiments with two data sets: one collected from a car. Moving Object data-bases are an emerging area of interest in spatial data-base systems. The design and solutions presented in this paper forms the foundation of this open and challenging area for multimedia. ysis and movement pattern analysis [12]. We also introduce novel query and presentation metaphors to make this complex data useful and usable. 581 . Some queries in VIRTOBS depend on efficient. These requirements have led to the development of generalised stream-management systems such as Aurora [3]. While many of today’s video surveillance systems simply act as large-scale video recorders. Key components of these new systems are algorithms for object detection. Indeed. Active vision techniques allow systems to focus attention on significant objects using pan-tilt-zoom (PTZ) cameras. continuous queries over stream data.3). Given that most transport fleets have frontal cameras and GPS. We show the results of querying and observing the wide-areas using virtual observers. object classification.2 Stream Query Systems 2. These systems are designed to handle a continuous inflow of data.1). including details of the bus network. Video surveillance systems are increasing in their ability to automate tasks like object tracking and event detection (see 2. The main areas of research are video-based detection and tracking.

A commercial version of this algorithm. Video normally is sampled at a higher frame rate (eg. As vehicles move around the environment their trajectory 582 . V ∈ T such that the distance from trajectory(V. Firstly. T ) to return the track observation V. or simply trajectory(V. The location of the vehicle at any point t in time is therefore trajectory(V )(t). interpolated video is inadmissible as evidence in many legal jurisdictions. A track segment is a tuple V. is used under license in several photographic applications. image stitching (or “mosaicing”) is important because it can be used to improve the effective resolution of a camera. there may be more than one external camera per vehicle.. t1 ). . and tend not to work well in the presence of rotation or foreshortening. it is necessary to interpolate between GPS position fixes in order to obtain accurate image positions. different parts of the scene are seen at different times so it is not possible to reconstruct a true panoramic video by scanning the scene in this way. [ I0 . Pan-tilt-zoom cameras can be used to scan a scene at different scale factors. and needs to care for difference in exposure between the source images. A track is an association between a video stream and a GPS trajectory. t1 . t2 where t1 and t2 are times. viewIn. 5 frames per second). and illumination changes. Currently. 3. Modern feature detectors can be quite robust in the presence of certain amounts of affine transformation. Therefore. However. a compositing surface is chosen onto which each of the images is mapped according to its computed alignment. The video sequence vid(V ) is a sequence of N images Ii taken at discrete times ti . linear interpolation is used. vid(V ) will be sampled periodically. At the base level the system records raw GPS streams and video streams. tN −1 ]. Of particular note is David Lowe’s SIFT (Scale-Invariant Feature Transform) [17]. GPS positions for vehicles are recorded every second.. t) to return the observation I. image alignment. t . where I is an observed image. Image stitching applications vary in the way they handle motion. In a recent survey of a number of feature descriptors [18]. Associated with each track observation is a unique observation (a time-stamped image) vid(V. t) to P is minimised over all times and tracks.2. t such that t is closest to t over all observations. where V is a track. Track observations and track segments are returned by geometric queries. Associated with each track segment is an observation sequence (a time-stamped video segment) [vid(V. is recorded via GPS. let V be a track. We can treat vid(V ) as a function that maps a time to an observation. it is possible to generate movies by sweeping this volume with a time-front surface. Each camera attached to a vehicle has a known orientation relative to that vehicle. Define vid(V. With a dynamic scene. In many cases. The blending algorithm needs to minimise visual artifacts at the joins between images. Recently. The simplest queries map a point in space to a track observation using recorded vehicle trajectories: Definition 1 (proxobs : Closest observation). or using feature-based techniques. t0 . IN −1 . By stitching many images collected at a high “zoom” factor. a motion model must be determined. data is indexed by time. We define an observation to be a tuple I. SIFT was found to be the most robust under image rotations. and t is the time at which the observation occurred. t .1 Geometric Queries At the lowest level. Within a track. t) for short. The implementation is similar to [5]. and the associated frame of video. vid(V. DATA AND QUERY MODEL VIRTOBS manages data collected from mobile cameras... Motion coding tends to reduce the spatial resolution but more importantly. affine transformation. Heikkila and Pietikainen [13] describe a system that builds image mosaics from sequences of video taken by a camera that scans a scene. Let P be a point in space. Next. SIFT features are used in image alignment. This section describes three query operators implemented by VIRTOBS: proxobs. In the context of wide-area surveillance. we can easily determine the associated spatial location. Different interpretations of the original scene can be derived by manipulating the time-front surface. The earliest applications include the production of maps from aerial photographs and satellite images. Autostitch [6]. a globally consistent alignment (or “bundle adjustment”) is computed for the overlapping images. although the time-bases may be different. Given a track and a time. which relates pixel coordinates between images. and t1 ≤ t2 . This process is described by [20] as a “dynamic mosaic” or “Dynamosaic”. Also associated with V is a video sequence. The mapped images are then blended to produce the final image. We define trajectory(V ) to be the trajectory associated with track V . .. Within each stream samples are ordered by time. t2 )]. by stacking the image frames in a three-dimensional spacetime volume. the track association includes calibration between video and GPS timebases.4 Image Stitching Image alignment and stitching algorithms have long been used to create high-resolution images out of mosaics of smaller images. This is one of the first implementations that can automatically recognise multiple panoramas from an input set of images. Panoramic stitching is typically applied to static scenes. spatial queries are implemented using geometric operators on tracks. using direct pixel to pixel comparison.. a high-resolution virtual field of view can be created. We define the function proxobs(P. Formally. Let T be a set of tracks. Gaussian blending is used for compositing images. A track observation is a tuple V. these algorithms have been used in hand-held imaging devices such as camcorders and digital cameras. the system interactively produces views of those regions based on these queries. scale-changes. but we do not require it to be so. and viewOut. Users define points or regions of interest through the visualisation system. but with a few modifications to deal with large numbers of images. Next. 3. The trajectory is a function that maps any point in time to a point in space using using linear interpolation on the recorded GPS track. Direct alignment methods rely on cross-correlation of images. In this application it is important not to use motion-based coding. but also to identify small problems with image registration. and blending. This allows the system to determine a position and heading for each image in the video stream. t). t . Alignment of pairs of images is computed. or intervals of times. Results from these queries are returned as times. Brown and Lowe [5] describe an automatic panorama stitcher based on SIFT feature matching. Video data is stored as JPEG or MJPEG (motion JPEG) files. and t is a time. Image stitching requires several steps [21].

Figure 2: Interpretation of observer-target constraints for view operators. These are layered spatial displays that show trajectories for one or more tracks. Target Trajectory Observer f Observer Trajectory viewOut viewIn Definition 2 (viewOut : View from a place). although its location is unknown to the system. The additional parameter f is related to the camera field of view and constrains the angle between the target and the trajectory heading. A simple visibility constraint defines an acceptance area and view range. For the viewIn operator. The trajectories on the map indicate the paths of vehicles. C. These view operators can be rapidly computed from available trajectory information without reference to the associated video data. D. F . and the heading at trajectory(V. Spatial meta-data can be imported from geographic information systems. but interpret the constraints differently as shown in Figure 2. t2 where trajectory(V. F be a visibility constraint. We use visibility constraints to reconstruct the view at a particular point in space. Visibility constraints are used by view operators to select observations based on visibility. the view target is the center of the defined area. t) is entirely contained within the circle of radius R centered at P . or aerial images associated with a spatial region. In both cases. marker objects (including virtual observers) placed on the map by the operator. Definition 3 (viewIn : View toward a place). The operators produce a set of track segments that can be used in various ways by the system as described in the following sections. This is depicted in Figure 1. and geographic meta-data. the observer is located inside the view area. R. This can be used to show street maps. We define the function viewOut(T. 3. defining how central the target must be in the observed image. where P is a point in space. Formally. Sets of track segments can be used to construct “pseudo timelines” for navigation of video data (see 3.D R F P Figure 1: A simple visibility constraint defines a circular region of space with radius R centered at P . f ) to be the set of track segments V. The two fundamental visibility operators are viewOut and viewIn. D. D.3). t2 where trajectory(V. R. A simple visibility constraint is a tuple P. Figure 3 shows the schematic representation of a virtual observer on 583 . t1 . Virtual observers use view operators to create views of places.2). R is a visibility radius. The proxobs operator is computed by finding closest point on each trajectory to P . Target Figure 3: Schematic view of virtual observer placed on a map. t1 . R. The background is an ECW image of the street directory. The system supports the use of ECW (Enhanced Compressed Wavelet) imagery as display layers. t) is entirely contained within the circle of radius R centered at P . and choosing the trajectory that minimises this distance. let C = P. Track segments can also be used as observations for panorama generation (see 3.2 Virtual Observers VIRTOBS includes navigation of available data based on map displays. being based on a set of spatial constraints. C) to be the set of track segments V. The area is a circle of radius R centered at P . These markers may be named and used for navigation in the map display. F and is created interactively by placing a marker on a map display. these can be defined interactively through the visualisation system (see 3. Visibility queries are more complex. We define the function viewIn(T. t) is between D − F and D + F and is within the camera field of view f of the trajectory heading at t. D is a view direction. Both operators use simple visibility constraints. and the heading of the line between P and trajectory(V. and the constraint is on the direction from the target to the observer. The angular constraint is on the direction from the observer toward the target. the view target is generally outside the defined area.4). and a range of view directions D − F to D + F . A virtual observer combines a view operator with a simple visibility constraint P. and F is a field of view. for t1 ≤ t ≤ t2 . The view range is the range of directions between D − F and D + F . and T be a set of tracks. For the viewOut operator. t) is between D − F and D + F for t1 ≤ t ≤ t2 .

This clearly shows that the segments of video are discontinuous. the forward-facing camera pans across the scene producing a sequence of images which can be combined to form a composite. the matching track segments define a sequence of images suitable for stitching. The field of view is indicated using a shaded arc from heading D − F to D + F . where a video sequence can be generated for an arbitrary trajectory through a map. or by choosing its name from a list. Tracking can be used to generate a kind of “virtual drive” effect. The associated track segments are bounded by the times that the camera is resident within the area. the system can identify candidate track segments by looking for regions where the rate-of-change of heading is high (10 degrees per second seems to give good results). P ) and displays the associated image vid(V. the system computes V. the arrow-head to rotate the view direction D. When selected the system executes the associated view query. Trajectories on the map indicate the paths of vehicles. Figure 5: Observation view showing pseudo-timeline and associated image. Virtual observers act as filters that select sets of track segments. A unique point in space trajectory(V. Blocks in the time-line indicate the order and relative duration of the track segments.BB . Figure 4 shows a scenario where three tracks traverse an observation area. When a virtual observer is placed at an intersection or turning in the road. Showing the segment times on a linear time-line would not be very useful. The relative durations are indicated by the different segment lengths. These parameters may be varied by dragging points in the image (eg. t). 8 and 10 in section 5. Given a point P . or a particular selected track. wide-angle image. Additionally. and the arc boundaries to change the field of view F ). t1 . there are five video segments in the data-base that match the view constraints. t2 .3 Pseudo time-lines time-line can be used to continuously navigate the set of non-continuous video segments by moving the cursor across the pseudo-time line. The system implements a space-time cursor which allows the user to see correspondence between points in the spatial map display and the time-line display. It is important to be able to display and navigate the associated video data.A B C A’ B’ C’ track 1. the circle interior to move the center P . The result is a set of track segments where each image has a position within the defined area and a heading within the constraint of the query operator. Feature points are identified using the SIFT [17] key-point 584 . The corresponding time line shown below orders the segments as AA . In this instance. the user can select points on tracks in the spatial display and see the corresponding images.CC . since the durations of the track segments are short compared to the times between the segments. VIRTOBS uses the method of Brown and Lowe [5] to build panoramas from a set of images. Figure 5 shows example output from the system. T is either the set of all tracks. This involves several steps. the system displays a pseudo-timeline with just the duration of the segments. the map display. 3. the system highlights the corresponding location in the map. searching for trajectories matching the associated constraints. In addition to virtual observers. Instead. The view direction is shown by an arrow with direction D. the circle boundary to change R. but allows them to be navigated as a continuous sequence. day 3 Traversals of observation region A A’ B B’ C C’ Corresponding time−line Figure 4: Pseudo time-line generated from multiple track segments. but not their absolute time. In the longer segment (for which the image is shown) the vehicle had to stop to wait for an oncoming car. map displays implement a “tracking” mode in which the user can move a cursor in the display to select the closest matching observation. ordered according to their start time. day 1 track 2. A virtual observer is selected using the mouse. This resulting 3. Depending on cursor modifiers. Example panoramas are shown in Figures 7. When selecting points in the time-line.4 Panorama Generation When a vehicle turns. The acceptance area is shown as a coloured circle of radius R centered at P . day 2 track 3. t) is associated with any time t1 ≤ t ≤ t2 selected from a track segment V. Alternatively. t = proxobs(T.

Some features of this network are outlined below. Media I/O is done using either core Java classes. The bottom observer (labelled “SouthEntrance”) covers an intersection with views in many directions. Typically. so images can be retained for 8 to 9 days. resulting in about 67Gb of data per week.11g but despite the 54Mbps bandwidth. Each key-point is associated with a position. The bus data set has a lower sampling rate. This is sufficient to retrieve about 5 percent of the video data. and its view out the windscreen is partially obstructed by a strip of tinting that produces colour distortion. The average operational time is 12 to 15 hours per day. and under what circumstances panoramas can successfully be generated. Autostitch1 is used for panorama stitching. There are some significant differences between the two sets. We use Java Swing components for user interface and visualisation. Each depot has about 100 buses. The global sampling rate is around 15 frames per second. Each observer indicates an area of interest that may be stable over long periods of time. A low-level streambased storage management system handles video and GPS data. The car data-set has a camera with high quality optics. but they all converge around the same time. these constraints could be based on desired spatio-temporal resolution at different places and times. At a higher level a track management system relates video streams. and orientation. this is shared between cameras as required. 320x200 resolution to motion JPEG video files. it is important that the system collect data based on demand. Rules must be used to determine what data needs to be systematically recorded. a bus operates 1 The demonstration version of Autostitch [6] is called automatically as part of our visualisation process. and an offset (video time relative to GPS time) was determined by interactively aligning the video and trajectory in the application. no distortion). SIFT features are robust to small amounts of affine transformation. 5.2 Bus Network Each bus has 7 cameras that record 24-bit colour images at 384x288 resolution. outside of “rush hours”. The first data set (“car”) was collected in a regular passenger vehicle. Multi-band blending is then used to combine images. A probabilistic model is then used to verify image matches. The k nearest-neighbours are found for each feature. SIFT features are calculated for each input images. Although designed for photographic work. We evaluated VIRTOBS on the above two data sets to determine how well the view operators work. There are several main parts to the implementation. the effective throughput is about 15–20Mbps. In addition. Using JPEG compression. Third-party components are used to render ECW images. The sampling rate can be increased for particular cameras by reducing the rate for others. The top observer (labelled “CnrBrandTownsing”) is placed at a corner close to a road-works site. around 85 hours per week. This data was recorded at 1 frame per second. camera parameters and trajectories. Each bus is fitted with 80Gb of storage. Thus. the algorithm considers m images that have the greatest number of feature matches to the current image. the bus camera is mounted near the roof of the bus. giving an overall data rate of approximately 225Kb per second. Initial tests show results for the car images. a high sampling rate. 4. Our trials used data collected from the existing bus network. which leaves about 8 to 10 hours per day for downloads. We explored the generation of panoramas at these sites. Given the constraints of the sensor network. or QuickTime APIs.detector. Data around these points should always be collected at high resolution in time and space. EXPERIMENTS AND DISCUSSION 4. The data is stored in a proprietary format which includes time-stamped images for multiple camera channels as well as GPS positions. The wireless link is 802. RANSAC is used to select a set of inliers that are compatible with a homography between the images. For external cameras. Bundle adjustment is then used to solve for all of the camera parameters jointly. Raw NMEA GPS log files were recorded for the same time period. the images can be rendered into a common reference frame. 4.1 Data Sets We used two data sets to evaluate the system. The second data set (“bus”) was collected from a commercial MDR (Mobile Digital Recorder) bus surveillance system developed by DTI [1]. and highly linear images (ie. IMPLEMENTATION VIRTOBS is implemented in Java. giving around 2 images per second for each camera. 384x288 resolution as JPEG images. it also works well for images taken from mobile video cameras. and significant spherical distortion due to a wide-angle lens. Once the camera parameters have been estimated for each image. this was then extracted to standard formats which are used by our system. Virtual observers provide another mechanism for regulating data collection. data is downloaded via wireless LAN. This leaves in the worst case around 540Mb of data per bus per day. Figure 6 shows the placement of two observers on a map. A DV camera was mounted on a tripod in the passenger seat. Video was recorded in colour at 5 frames per second. 585 . For each image. a scale. VIRTOBS uses Autostitch [6] to implement panorama construction. This permits retrieval based on spatial constraints such as proximity and visibility. When buses return to their depot. which are stored on disk and indexed by time. Figure 6: Placement of observers for car panorama experiments. a typical images is around 15Kb. it is critical that the system is selective about what data is retrieved and what data is discarded.

However. it was found that taking every second frame did not significantly reduce the ability to match frames. The first image shows the the scene before the road work starts. since the spherical view produces point correspondences that do not fit the camera model. Algorithms to deal with distortion and low sampling rates: To overcome problems of lens distortion on buses we could use a linear (non-distorting) lens. Despite these problems. Figure 9 shows a short time-lapse sequence captured from the bus data set. it may be misleading to make assumptions about objects moving in the scene. More significant is the effect on bundle adjustment. When viewed in sequence. which the SIFT is designed to handle. involving intertial navigation) may be required in some environments.2 Temporally Non-Continuous Panorama An important feature of the panoramic stitching process is that it simply relies on common features to compute image registration.3 Simple time-lapse Figure 7: Three panoramas generated for virtual observer “CnrBrandTownsing” shown in Figure 6. The second frame at 16:01:28 shows a crowd of people in front of a building. this is a trivial operation. 5. due to differences in the original trajectories. The distortion leads to a relative scaling and rotation of features between frames. the forward facing camera pans across part of the scene. The output is from the “SouthEntrance” observer. Figure 8 shows an example of the kind of scene that can be generated by stitching together images taken at different times. When interpreting such 6. In the car data set. although a parked vehicle can be seen in the same place.1 Time-Lapse Panorama A virtual observer can be used to select a sequence of images at a particular place. 16:01:28.4 Low sampling-rate Panorama The bus data set presented a few difficulties in the generation of panoramas. or change the camera model. 5. we outline several open challenges from our study. Future trials will use a linear lens positioned to avoid the window tinting. this is expected to significantly improve the panorama quality. it is important to recognise that the image is a composite constructed from observations at different times. Figure 10 shows an example. “zoomed”) relative to the earlier images. Firstly. DISCUSSION AND OPEN PROBLEMS While this paper lays the foundation for exploiting this new and exciting infrastructure. so there is significant difference between the frames that make up a track segment. This suggests that better approaches to positioning (eg. taking every third frame did reduce 586 . The left-hand portion of this image is derived from a right-turn from the west-bound lane. showing how the scene changes over time. 5. 5. the next frame at 16:29:15 shows that the crowd has dispersed. For a fixed camera. Secondly. During bundle-adjustment these images are scaled down to fit a consistent reference frame. this give a time-lapse picture. Combined with a higher sampling rate. Subsequent images show the changes as works progress: arrival of trucks in the second. or the quality of the panorama. The panoramas are not completely linear in size since the turn involves some forward motion as well as a rotation. Note that this figure demonstrates some problems with accuracy in the GPS measurements that can occur when the signal paths are obscured by buildings. and the erection of portable barriers in the next. SIFT features have an orientation which is used for feature matching. and 16:29:15 from the bus network footage. A disadvantage is that a relatively narrow view of the scene is obtained. The previous panoramas are generated from temporally contiguous samples.an image. de-warp the images. There are also small variations in the shape of the resulting image. The right-hand portion is derived from a left-turn from the east-bound lane. The virtual observer is located close to the site of road works. some panoramas could be generated. It consists of three sequences retrieved at 08:58:39. spherical distortion affects both feature matching and registration of images (see Figure 9 for sample images). The first frame shows the scene in the morning at 08:58:39. While the large-scale structure will probably be correct. but this is not necessary for the stitching to work. Higher sampling rates produce more overlap between frames which gives more feature-point matches and better registration. and is based on 20 to 30 frames taken over roughly 5 seconds. As a vehicle turns at an intersection. but which may reduce the number of matches. Each panorama corresponds to a separate traversal of an intersection. Providing there is sufficient overlap sub-sequences can be taken at different times. the sampling rate (1 frame per second) is low. For mobile cameras the system must retrieve and sequence images from difference video streams based on spatial location and orientation. An advantage of simple time-lapse over panoramic time-lapse is that it does not require any special camera motion. This means that the later images are enlarged (ie. Figure 7 shows several panoramic views generated automatically from the “SouthEntrance” virtual observer using the viewOut operator.

The right-hand portion is derived from a left-turn from the east-bound lane. 587 . 08:58:39 16:01:28 16:29:15 Figure 9: Time-lapse view of a place generated by the virtual observer shown in the inset. The left-hand portion of this image is derived from a right-turn from the west-bound lane. Figure 10: Panorama of bus images generated by the virtual observer shown in inset.Figure 8: 180 degree synthetic panorama generated by fusing two sequences of video observations taken at different times. This scene is generated by the “SouthEntrance” virtual observer shown in Figure 6.

a side-facing view from a vehicle). U Cetintemel. pages 1218–1227. ACM Press. and S Zdonik.the effectiveness of image matching. Intell. pages 171–180. Autostitch. 2006. New York. Surv. 2004. New York. linear blending instead of multi-band blending) improves processing time dramatically. 2003. Y Xing. USA. 27(10):1615–1630. 9(1):33–60. ACM Press. Alternative configurations for data acquisition: The current implementation assumes that the camera sweeps across the scene by rotating around a common optical center. It is expected that the approach of Brown and Lowe would also work for side-facing cameras. NY. roughly 2 frames per second is probably a minimum sampling rate for this application. 2003. S Babu. IEEE Data Eng. Digital route panoramas. the query operators. In an interactive setting where response time is significant. New York.. Asilomar. Comput. The VLDB Journal. 30(2):170–231. pages 58–65. a forward-facing view from a turning vehicle). 1998. Matthew Brown. These give a strip-like panoramic representation of the view to the side of a road. E Nardelli. L Forlizzi. Western Australia. Bull. Using a simpler blending algorithm (eg. The Design of the Borealis Stream Processing Engine. Geoinformatica. 32(2):5–14. B Liu. New York. We present the design of the system. IEEE Computer Society. this approach (moving perpendicular to the view axis) is used in most aerial photography applications. Jiang Yu Zheng. and u M Schneider. In VSSN ’05: Proceedings of the third ACM international workshop on Video surveillance & sensor networks. M F Mokbel. A Ekin. Distinctive image features from scale-invariant keypoints. D G Lowe. Washington.. 10(3):57–67. ACM Press. [16] 8. 2005. The Computer Journal. T M Ghanem. V T de Almeida and R Hartmut G¨ting. N Haas. N Tatbul. N Tatbul. B Babcock. whereas the former model requires a turning motion. For example. IEEE Trans. Washington. R H G¨ ting. Issues in data stream management. editor. IEEE. We have designed and implemented a system that allows flexible querying of multiple observation streams. 46(6):680–712. it may make sense to progressively improve the blending quality as images are viewed for longer periods. 2003. A Deshpande. USA. M Cherniack. 2003. L Brown.. 60(2):91–110. Web site visited April 2006. which suggests that it is worth extending this prototype to a fully-fledged system. M J Franklin. ACM Press. S Krishnamurthy. MedSMan : A streaming data management system over live multimedia. R Motwani. A Rav-Acha. Web site visited April 2006. M Brown and D G Lowe. 2005. 2005. O Cooper. Recognising panoramas. http://www. Improved Blending Algorithms: In our experiments. TelegraphCQ: Continuous dataflow processing for an uncertain world. M Stonebraker. M Balazinska. V Gaede and O G¨ nther. Int. A related technique is to use a scanning vertical “slit” to build “route panoramas” [22]. and W G Aref. pages 1–16. P Kim. and visualisation environment and demonstrate its efficacy using two data-sets. CONCLUSION AND FUTURE WORK [14] [15] The development of VIRTOBS is an attempt to understand how we can deal with the complex data that arises from mobile cameras. pages 668–668. http://www. 2002. REFERENCES [20] [1] Digital Technology International. 2005. W Hong. S Lee. page 854. pages 273 – 292. A Hampapur. J. 2003. A Gupta. Another model for sweeping a scene would be to have a camera facing perpendicular to the direction of motion (ie. DC. SIGMOD Rec. and R Jain. Spatio-temporal access methods. ACKNOWLEDGEMENTS [17] [18] StreetSmart Street Directory images were provided courtesy of the Department of Land Information. [3] D J Abadi. DC. M Lu. 22(2):38–51. and R Jain. Smart video surveillance: exploring the concept of multiscale spatiotemporal tracking. F Reiss. S Chandrasekaran. Ramesh Jain. NY. Indexing the u trajectories of moving objects on networks. 2003. B Liu. 2005. ACM Comput. ¸ M Cherniack. A Rasin. [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] 7. USA. pages 11–18. H Merkl. USA. 2003. An image mosaicing a a module for wide-area surveillance. Vision. In MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia. In 9th IEEE International Conference on Computer Vision (ICCV). U Cetintemel. W Lindner. J A C Lema.net. D Carney. 26(2):40–49.com. E Ryvkina. Image stitching and alignment. personal communication. Springer. Y Pritch. IEEE MultiMedia. and S Zdonik. and J Widom. USA. A Gupta. Using stream semantics for continuous queries in media stream processors. Models and issues in data stream systems. and M A Shah. A S Maskey. M Heikkil¨ and M Pietik¨inen. White paper on observation systems. and S Pankanti. ¸ C Convey. In PODS ’02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. In ICDE ’04: Proceedings of the 20th International Conference on Data Engineering. the initial image may be presented using linear blending. K Mikolajczyk and C Schmid. In SIGMOD ’03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data.dti. Signal Processing Magazine. Y Ahmad. [21] [22] Aurora: a new model and architecture for data stream management. 2004. NY. although some variation to the formulation of the camera homographies would improve the camera modelling. It also works well where some forward motion occurs during the rotation (ie. M Hellerstein. This latter model has the advantage that almost any motion of the vehicle would scan the scene. January 2005. A performance evaluation of local descriptors. Dynamosaics: Video mosaics with non-chronological time. most of the processing time is required during the blending phase of the algorithm. In Second Biennial Conference on Innovative Data Systems Research (CIDR 2005). while a multi-band blend is started as a background process. CA. R Szeliski. S R Madden. [2] D J Abadi. 588 . [19] 9.au/. In N Paragios. Bus data was provided by Digital Technology International [1]. J-H Hwang.. 2005. Pattern Anal. D Lischinski. ¨ L Golab and M T Ozsu. 2005. taking maybe 20 or 30 seconds to complete with high quality settings. We have received positive interest from both public transport and law enforcement authorities. and S Peleg. Algorithms for moving object databases. Therefore. IEEE Computer Society. Multidimensional access u methods. Indeed. J Connell. USA. M Datar. NY. 12(2):120–139.autostitch. Handbook of Mathematical Models in Computer Visions. Mach. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Sign up to vote on this title
UsefulNot useful