You are on page 1of 9

Master of Engineering Thesis Proposal: Argus: Instrumentation for Rapidly Acquiring Dense Pose Imagery"

Douglas S. J. De Couto December 9, 1997

This paper motivates and describes the thesis work that will be done as part of the author's Master of Engineering degree requirements. We present the primary thesis of the work: it is desirable, feasible, and practical to build a device for rapidly acquiring dense pose image datasets. Pose images are digital images tagged with estimates of the camera's 6 degreeof-freedom DOF position and orientation pose. We describe how such a device is useful for building an end-to-end system which can automatically reconstruct three-dimensional 3D models of the real world. We list the tasks that will be performed for the author's thesis, including a central deliverable of the thesis: a pose image data set acquired using Argus. Argus is a device which is designed to rapidly acquire large pose image data sets.


1 Introduction
The goal of the City Project1 at the MIT Computer Graphics Group is to demonstrate the feasibility of an end-to-end system for automatic reconstruction of urban models from image data. The City Project's approach is to use camera pose estimates to expedite the reconstruction process. As part of the project, we need to acquire datasets of images labeled with camera pose data pose images, which cover signi cant urban areas. Existing image datasets are either not labeled with pose data, or do not cover the kinds of urban regions that our project is concerned with. Therefore, we are constructing our own acquisition system to collect appropriate datasets. Our approach is to build a custom designed cart, or platform, and equip it with a high resolution still digital camera, as well as a multi-sensor integrated
1 http: city city.html

navigation system. The navigation system integrates Global Positioning System GPS, inertial sensor, and wheel counter data via a Kalman lter to produce an estimate of the platform's position and orientation. Also, we deploy a small, portable GPS base station which allows us to post-process the GPS data for higher pose accuracy and precision. We interface the digital camera to a commodity PC, which controls the camera and the navigation system. The control software captures an image with the camera, tags it with the navigation system's current pose estimate, and writes the resulting pose image to a high capacity digital tape drive. The camera is mounted on a motorized pan-tilt head which allows us to automatically capture almost a full hemisphere of imagery at each acquisition location. The pan-tilt head can be raised and lowered to allow us to acquire imagery at di erent heights. Finally, the cart has a leveling system which stabilizes it on uneven ground, for more reliable pose estimates. We call our device Argus, after the all-seeing Greek god with 100 eyes. This author's thesis work has been the design and implementation of various parts of Argus. The work that remains to be done is a characterization and evaluation of Argus. Also, we have yet to actually acquire a pose image dataset using Argus. In section 2, we present and motivate the author's proposed thesis statement. Section 3 provides a survey of some work related to the completed and proposed thesis work, and section 4 provides a detailed description of the tasks that will be completed for the author's thesis. Finally, section 5 outlines an approximate schedule for the thesis work, and section 6 concludes.

2 Thesis Statement
The proposed thesis statement is that it is desirable, feasible, and practical to build a device for rapidly acquiring dense pose image datasets. That device is Argus. We ought to build Argus, because it will be highly useful to us, and we can build Argus. These claims will be demonstrated by completing Argus and using it to acquire a pose image dataset in less time and with less e ort than acquiring a comparable dataset manually.

2.1 Motivation

The above thesis statement is made in the context of the City Project, which aims to demonstrate an end-to-end system for reconstructing 3D models of the urban world using digital images. This system starts with a pose image dataset, which is currently collected manually. Then, the pose estimate for each image in the dataset is re ned using correlation and correspondences between multiple images 1 . After re nement, one of three di erent reconstruction approaches developed by the City Project can be used to automatically produce a 3D model. The rst approach, described in 2 , is a geometric approach that uses incidence 2

counting in an absolute 3D coordinate system to match features extruded from each image. This approach also performs pose re nement. The second approach constructs dense depth maps using epipolar images in a local algorithm 4 . Finally, the third approach directly attacks the correspondence problem in 3D using multi-image triangulation, treating each potential correspondence as a hypothesis whose certainty can evolve 3 . This approach can incrementally process images. Once the 3D model is reconstructed using one of the above approaches, it can be explored in a virtual walk through. However, since the models resulting from the automatic reconstruction process are likely to be large, the City Project has also developed strati ed rendering techniques which allow these large models to be explored interactively 12 . Argus is being developed to address the very rst stage of the City Project: acquiring pose image datasets. We have taken a manual dataset of pose images that covers our o ce park, Technology Square, using a digital camera mounted on an indexed pan-tilt head, attached to a tripod base. It took two days to acquire the pictures, but the pose data took much longer to acquire. Acquiring the pose data rst involved professionally surveying Technology Square. Then, individual camera positions had to be laboriously tied into the survey using tape measures. Also, camera orientation had to be manually recorded from a specially built pan-tilt camera mount, and the absolute rotation around the camera's pan axis had to be determined by post-processing the image data using custom software with manual assistance 1 . All told, the acquisition of our pose image data set took months of full time e ort by researchers, and resulted in over 15 gigabytes of image data with associated pose data.2 We are building Argus so that we can acquire in a day or two a dataset like the one described above, which would take months to acquire manually. Argus allows us to speedily and semi-automatically capture this data, and was designed to meet the following requirements: All the imagery must be collected in a digital form, without intermediate processing steps. If we did not collect digital images, we would have to scan pictures into our system. It is infeasible to scan thousands of images into a database and expect to be able to repeatedly and accurately acquire data this way: scanning is too time consuming and error prone. The navigation system must be completely automatic. That is, acquiring the pose data should not require any manual measuring or calculations during the acquisition process. As we learned from our manual dataset collection e ort, this manual e ort can consume the largest fraction of time spent acquiring the dataset. Reducing the pose acquisition time will drastically reduce the total dataset acquisition time.
2 This work was done by Neel Master, Satyan Coorg, and Adam Holt. Additional postprocessing was also done by Barb Cutler.

The image acquisition and storage, including camera motion and conguration, must be fully automated. This eliminates many chances for operator error, and ensures that datasets are acquired under repeatable conditions. Our aim, with the above requirements, has been to remove the operator from the loop." We feel that if there is less work for the operator of the Argus platform to perform, then the data acquisition process will be faster and smoother. Also, if the operator can concentrate less on the mechanics of acquiring the data, then the operator can concentrate more on the quality of the data being acquired. Because Argus will speed up and improve the quality of the data acquisition in our end-to-end reconstruction system, it will improve the improve the overall quality and performance of the end-to-end system.

3 Related Work
There are other ways to acquire massive amounts of labeled image data, and there are many pre-existing systems which use integrated navigation systems to obtain position data. However, Argus is unique. Aerial photography and satellite imagery are routinely used to obtain georegistered image data. The pose data used in aerial photography is obtained using a combination of instrumentation, ground markers, and manual e ort. Sometimes aerial photography uses GPS or integrated navigation technologies to obtain the pose data, just as we do with Argus 10 . Satellite data is registered by performing calculations about the satellite's orbit and sensor orientation; further calculations are performed to map the satellite data into the appropriate coordinate system. With aerial photography or images from satellite sensors, each image typically covers a unique piece of the world; there is not normally much overlap between images. Argus will allow us to acquire data which covers each area of interest with multiple images. Also, aerial photography and satellite imagery provide long range pictures, while the pictures obtained by Argus are relatively close range. Finally, aerial photography and satellite imagery provide data from directly overhead, while the Argus data will cover scenes of interest from many di erent directions. The Aspen project 9 lmed the streets of Aspen using a special camera dolly that was moved along prede ned paths. The resulting lms were stored on videodiscs with some control data, allowing a virtual tour" of Aspen. The Aspen project did not capture many images covering the same area, it did not capture high accuracy global pose data for every image, and it did not obtain high resolution images. Also, there was no 3D data derived from the lms acquired in the Aspen project. However, it did pursue at one level the same idea as the City Project: making a real scene available in a simulated environment. 4

The GPSVan 6 is used to map features on highways and railroad tracks. It uses a stereo camera system mounted onto a vehicle equipped with an integrated navigation system that employs GPS and inertial navigation technologies. The navigation system uses di erential GPS, so a GPS base station must be deployed to use the system. Also, a Kalman lter 8 11 is used to integrate the GPS and inertial sensors and derive an optimal 3D position estimate. Argus is di erent from the GPSVan in several ways, despite the fact that they both share the same sort of navigation system. First of all, Argus is small and human propelled: it can be used in areas like pedestrian plazas, sidewalks, and courtyards, unlike the GPSVan which is limited to roadways. Also, Argus has a high resolution still camera which can move on a pan-tilt head, while the GPSVan has a pair of relatively low resolution video cameras xed to the vehicle. There are other systems that use integrated GPS and inertial navigation systems to acquire positioning data. For example, RAHCO International developed an unmanned tracked vehicle designed for hazardous environmental remediation" moving nuclear waste 5 . That system also incorporated an electronic compass, gyro, and track velocity sensors, and could stay within 1 foot of a commanded path when moving. Another system was developed at Draper Laboratories to guide ram-air parafoils" parachutes using a combined GPS and inertial navigation system. The navigation system used by Argus is di erent from these systems because it also provides pointing data that described where the camera is pointing in the global coordinate system. Also, we hope to obtain more precise pose estimates than any of these systems, by using post processing and kinematic GPS techniques.

4 Thesis Tasks
This sections outlines the speci c tasks that will be completed for the thesis, and also lists major topics that will be addressed in the thesis.

4.1 Achieve Basic Functionality for Argus

We de ne basic functionality for Argus as the functionality necessary to acquire a pose image dataset similar in scope to the manual dataset described earlier, possibly using post-processing techniques to improve the pose data quality. We have been designing and building Argus since the summer of 1996; Argus is currently almost complete, except for two important software components: the navigation system software needs to be completed and integrated with the rest of the software system, and software needs to be written to control the digital tape drive which stores data on Argus. The navigation system interface software has been written, but does not work. This software needs to be completed and tested. The digital tape drive interface software needs to be rewritten and tested. 5

Also, we need to write a tool which will run on our UNIX server systems to read digital tapes written by Argus. The computer on Argus runs the Windows NT operating system, and writes data to the digital tapes using a custom format. Currently, the only way to load this data onto our UNIX server is by reading the tape on a machine running Windows NT, and sending the data over the network to the server; this is unacceptable given that there may be up to 40 gigabytes of data on each digital tape. Basic functionality also includes status displays which provide the Argus operator with information about the state of the Argus subsystems. These status displays will allow the Argus operator to monitor the performance of the Argus navigation system, and they will provide feedback which will assure the operator that Argus is operating correctly. Finally, there are some problems related to cable and power management which must be resolved before we achieve basic functionality for Argus. There is some functionality that is not basic functionality, but is highly desirable and may be achieved if there is su cient time. This functionality includes: modifying the system to handle di erential GPS in real-time, including a base station; performing dead reckoning using wheel encoders and a level sensor; and providing telemetry data describing the state of Argus, including location, to a base station such as the Graphics Group laboratory. We will achieve basic functionality for Argus, as described above.

4.2 Evaluate Argus

We will objectively evaluate the performance of Argus as a pose image acquisition device in several areas, described below. We can also evaluate Argus in several subjective areas, such as usability, based upon our experiences with Argus.

4.2.1 Navigation System Performance

There are several aspects to navigation system performance that are useful to investigate:

Accuracy This is perhaps the most important and obvious area in which to
evaluate the navigation system. The accuracy of the resulting pose image data set is essentially the accuracy of the navigation system. Acquisition and warmup times These times are very important to the real world performance of Argus in urban environments, as GPS behaviour is very erratic in urban environments. As satellite signals are re ected and blocked by buildings the GPS receiver in the navigation system can lose track of the signals and is therefore unable to provide position data. The faster that the GPS receiver can nd satellite signals, the faster it can 6

provide position data, and the less the Argus operator needs to wait for the navigation system to start providing a good pose estimate again. Filter convergence and error recovery The inertial sensor in our navigation system is subject to saturation when rotation rates exceed the maximum rates of the sensor. When this happens, the Kalman lter loses track of the current orientation. The Argus operator can then perform maneuvers with Argus which will allow the lter to reconverge to a good pose solution. We want to measure how long these maneuvers take and how well the lter reconverges, in order to evaluate the a ect of sensor saturation on the entire acquisition process. Precision It will be interesting to note the precision of our navigation system, if only to place an upper bound on the accuracy of the system.

4.2.2 Acquisition System Performance

There are two main components to the performance of the image acquisition system:

Acquisition time How long does it take to acquire a sequence of images?

This time can be broken down into several components, including: image processing time including the time to read data from the camera, pan-tilt head movement time, and storage system time the time to write data to the tape. Obviously, the faster that we can acquire data the more useful that Argus is. Faster acquisition times mean less hassle for the operator and pose image data that is more consistent, for example with respect to lighting and weather conditions. Repeatability If we want our pose data to be meaningful, it should be repeatable, both throughout a single acquisition session and between sessions. Since the system uses data about the commanded pan-tilt head position to help calculate the camera orientation in an absolute coordinate system, this position needs to repeatable. Also, Argus' leveling subsystem should consistently level to the same position.

4.2.3 Other Areas of Evaluation

In addition to navigation and acquisition system performance, we would like to evaluate Argus in several other areas:

Power budget Does Argus consume an excessive amount of power? The

amount of power consumed by Argus limits its maximum running time. Also, since high capacity batteries are large and heavy, if Argus is not e cient it will be less mobile and harder to handle. 7

Subjective evaluation We would also like to answer the following subjective

questions. How reliable is Argus when used as designed? What sort of experiences did we have with Argus? Finally, given our experiences, do we nd that Argus is feasible for use in an end-to-end system? Cost Is it prohibitively expensive to construct a device like Argus?

4.3 Deliver a Pose Image Dataset

The nal and perhaps most important part of this thesis work will be to acquire a pose image data set with Argus. This is the most appropriate way to judge how well Argus performs as a pose image data acquisition device. Also, acquiring a dataset will require that the entire Argus system works as designed; thus it will be a total system test. This dataset will be valuable for the ongoing work of the City Project. Also, we would like to compare the dataset acquired using Argus to the dataset acquired manually. This will let us evaluate the advantages of the two di erent approaches in areas such as time, convenience, and accuracy.

5 Thesis Schedule
This is a gross schedule of the work to be done:


January 1998 Complete basic functionality for Argus. February March Shakedown tests": acquire trial datasets, debug system, and re ne system as necessary. Early April Acquire dataset. April May Carry out other performance evaluations.


6 Conclusion
We have presented the proposed thesis statement: it is desirable, feasible, and practical to build a device for rapidly acquiring dense pose image datasets. We have also discussed how we will support this statement by constructing and using Argus, a device designed to rapidly acquire dense pose image datasets. Finally, we have presented a variety of criteria by which to judge the suitability of Argus as a device for acquiring pose image data.

1 Coorg, Satyan, Neel Master, and Seth Teller. Acquisition of a Large PoseMosaic Dataset." Submitted for publication. 8

2 Coorg, Satyan, and Seth Teller. Matching and Pose Re nement with Camera Pose Estimates." In Proceedings of the 1997 Image Understanding Workshop, New Orleans. 3 Chou, George T., and Seth Teller. Multi-Image Correspondence Using Geometric and Structural Constraints." In Proceedings of the 1997 Image Understanding Workshop, New Orleans. 4 Mellor, J. P., Tom as Lozano-P erez, and Seth Teller. Dense Depth Maps for Epipolar Images. MIT AI Lab Technical Memo 1593, 1997. 5 Daigh, Raymond C. High Reliability Navigation for Autonomous Vehicles." The Institute of Navigation 52nd Annual Meeting. Cambridge, June 1996. 6 Grejner-Brzezinska, Dorota A. Positioning Accuracy of the GPSVanTM ." The Institute of Navigation 52nd Annual Meeting. Cambridge, June 1996. 7 Hattis, Philip D., and Richard Benney. Demonstration of Precision Guided Ram-Air Parafoil Airdrop Using GPS INS Navigation." The Institute of Navigation 52nd Annual Meeting. Cambridge, June 1996. 8 Kalman, Rudolf E. A New Approach to Linear Filtering and Prediction Problems." Transactions of the ASME|Journal of Basic Engineering, pp. 34 45, March 1960. 9 Lippman, Andrew B. Movie-Maps: An Application of the Optical Video Disc to Computer Graphics." In SIGGRAPH 80 Conference Proceedings, pp. 32 42, 1980. 10 Digital Photogrammetry: An Addendum to the Manual of Photogrammetry. American Society for Photogrammetry and Remote Sensing, 1996. 11 Welch, Greg, and Gary Bishop. An Introduction to the Kalman Filter." At http: welch kalman kalman.html, 1997. 12 Xiong, Rebecca. CityScape: A Virtual Navigation System Applying Strati ed Rendering." M.S. Thesis. MIT, May 1996.