This action might not be possible to undo. Are you sure you want to continue?
We not only see but we look, we not only touch we feel, JJ.Gibson
Active Perception vs. Active Sensing
WHAT IS ACTIVE SENSING? In the robotics and computer vision literature, the term ³active sensor´ generally refers to a sensor that transmits (generally electromagnetic radiation, e.g., radar, sonar, ultrasound, microwaves and collimated light) into the environment and receives and measures the reflected signals. We believe that the use of active sensors is not a necessary condition on active sensing, and that sensing can be performed with passive sensors (that only receive, and do not emit, information), employed actively.
Hence the problem of Active Sensing can be stated as a problem of controlling strategies applied to the data acquisition process which will depend on the current state of the data interpretation and the goal or the task of the process. The question may be asked, ³Is Active Sensing only an application of Control Theory?´ Our answer is: ³No, at least not in its simple version.´ Here is why:
Active Perception 1) The feedback is performed not only on sensory data but on complex processed sensory data. various extracted features. including relational features.e. i.. . 2) The feedback is dependent on a priori knowledge and models that are a mixture of numeric/parametric and symbolic information.
These processes produce parameters with a definite range of expected values plus some measure of uncertainties. first. This is to say. Second. the model of the physics of sensors as well as the noise of the sensors.Active Perception turned into an engineering agenda The implications of the active sensing/perception approach are the following: 1) The necessity of models of sensors. the model of the signal processing and data reduction mechanisms that are applied on the measured data. These models shall be called Local Models. .
including feedback. what is its predictive power? There are two components to our theory. 3) Explicit specification of the initial and final state /goal. models for the whole process. each with certain predictions: . 2) The system (which mirrors the theory) is modular as dictated by good computer science practices and interactive.cont. that is. it acquires data as needed. We shall refer to these as Global Models.Engineering agenda. If the Active Vision theory is a theory. In order to be able to make predictions on the whole outcome. in addition to models of each module (as described in 1) above). we need.
These parameters predict a) the definite range of plausible values. sensitivity . Another example is an edge detection algorithm with parameter of the width of the band pass filter in which one is detecting the edge effect. local models are characterized by certain internal parameters. At each processing level. Examples of local models can be: region growing algorithm with internal parameters. and b) the noise and uncertainty which will determine the expected resolution.robustness of the output results from each module . the local similarity and size of the local neighborhood.Active Vision theory 1) Local models.
parameters. The global model represents all the explicit feedback connection. The global models also embody the Global external parameters. the initial and final global state of the system. 2) Global models characterize the overall performance and make predictions on how the individual modules will interact which in turn will determine how intermediate results are combined.Active Vision. and the optimization criteria which guides the process. The basic assumption of the Active Vision approach is the inclusion of feedback into the system and gathering data as needed. .cont.
.Control Strategies three distinct control stages proceeding in sequence: initialization. bottom-up) and how much a priori or acquired knowledge the system uses at a given stage (knowledge driven. completion of the task. processing in midterm. Of course. Strategies are divided with respect to the tradeoff between how much data measurement the system acquires (data driven. top-down). there is that strategy which combines the two.
Bottom-up (data driven). . in this discussion. as opposed to the top-down strategy where such knowledge is available.Bottom up and Top down process To eliminate possible ambiguities with the terms bottom up and top-down. context dependent model is available. is defined as a control strategy where no concrete semantic. we define them here.
e. . i. the architecture.GOALS/TASKS Different tasks will determine the design of the system. Consider the following tasks: Manipulation Mobility Communication and Interaction of machine to machine or people to people via digital media or people to machine.
remote guidance of physical activities. We are concerned with primarily unspoken communication: gestures and body motion.Goal/Task Geographically distributed communication and interaction using multimedia (vision primarily) using the Internet. . physical exercises. Examples are: coordinated movement such as dance. training of manual skills.
Note Recognition . Learning will play a role in all the tasks. .
focal length. Think of moving furniture vs. the size of the object will determine the data acquisition strategy but also the design of the vision system (choice of field of view. picking up a coin. illumination. . We shall consider only the constraints relevant to the visual task that serves to accomplish the physical activity. For example: in the manipulation task.Environments/context Serves as a constraint in the design. and spatial resolution).
. The position and orientation of the observer will determine the interpretation of the signal. Furthermore there is a difference between outdoor and indoor environment. Varied visibility conditions will influence the design and the architecture. in the air looking down or up.Environment/context Another example: Mobility There is a difference if the mobility is on the ground.
Pompei. for example one can put people into a historical environment (Rome.) . etc. could be digitized environment of the place where the participants are or it also could be a virtual environment.Environment/context For distributed communication and interaction. The environment will depend on the application.
It is only the last level that is concerned with semantic interpretation. 1.Active Vision System for 3D object recognition Table 1 below outlines the multilayered system of an Active vision system. . Note that the first three levels correspond to monocular processing only. with the final goal of 3-D object/shape recognition. The other 3-5 levels are based on binocular images. * with respect to the goal (intermediate results) and feedback parameters. . 2. The layers are enumerated from 0. Naturally the menu of extracted Features from monocular images is far from exhaustive. .
Control of low computed only 2D segmentation Level vision threshold of the width max .Table Feedback Goal Parameters stopping conditions ________________________________________________________ 0. Control of the directly measured focused Physical device focus. control of the directly measured grossly focused Physical device current lighting system scene .#of edges/regions Modules of filters Level .camera adjusted open/close aperture aperture __________________________________________________________ 1. zoom on one object Computed contrast distance from focus _______________________________________________ 2.
Control of semantic Interpretation recognition of 3D objects/scene .Table cont. Control of intermediate computed only: segmentation Geometric vision threshold of similarity Module between surfaces ______________________________________________________________________ 5. Level Feedback Parameters Goal/Stopping _______________________________________________________________________ 3. Control of binocular directly measured: Depth map System hardware vergence angle Software) computed: range of admissible depth values _______________________________________________________________________ 4.Control of compute the position 3D object description Several views rotation of different views Integration process ___________________________________________________________________________ 6.
The only significance in the order of levels is that the lower levels are somewhat more basic and necessary for the higher levels to function. . 2) In fact. we do not believe that is the only way of the flow of information through the system.Comments: Several comments are in order: 1) Although we have presented the levels in a sequential order. the choice of at which level one accesses the system very much depends on the given task and/or the goal.
Active Visual Observer Several groups around the world build a binocular active vision system that can attend to and fixate a moving target.Sweden.GRASP laboratory and the other at KTH (Royal Institute of Technology) in Stockhols. . We will review two such systems one built at UPENN.
The UPENN System .
PennEyes A Binocular Active Vision System .
the functionality of the head is extended through the use of the motorized optics (10x zoom). The architecture is configured to rely minimally on external systems and . .PennEyes PennEyes is a head ±in-hand system with a binocular camera platform mounted on a 6 DOF robotic arm. Although physically limited to reach of the arm.
5 Kg. A MIMD DSP organization was decided as the best tradeoff between performance. focus and aperture) offered an increase functionality. However the binocular camera platform needed to weigh in the range of 2. Electronics: This was the most critical element in the design.Design considerations Mechanical:The precision positioning was afforded by the PUMA arm. . Optics: The use of motorized lenses (zoom. extensibility and ease of integration.
Puma Polka .
allowing the comparison of different visual servoing algorithms.Tracking Performance The two robots afforded objective measures of tracking performance with precision target. A three dimensional path with known precision can be repeatedly generated . .
BiSight Head .
The concern here is how well can be maintained the calibration after repeated exposure to acceleration and vibration. The binocular camera platform has 4 optical (zoom and focus) and 2 mechanical (pan) degrees of freedom.BiSight head Has an independent pan axes with the highest tracking performance of 1000deg/s and 12.000deg/ssquare. Another problem occurred with zoom adjustment the focal length also changed. .
. The former is made possible up to six comports on each module and the later by several Mbytes of local storage.C40 Architecture Beyond the basic computing power of the individual C40s the performance of the network is enhanced by the ability to interconnect the modules with a fair degree of flexibility as well as the ability store an appreciable amount of information.
C40 Architecture .
They involve the coordination of processes running on different subsystems.Critical Issues The performance of any modularly structured active vision system depends critically on a few recurring issues. processing and transmission delays and the control of systems operating at different rates. the management of large data streams. .
an external signal can be used to synchronize independent hardware components.Synchronization The three major components of this modular active vision system are independent entities that work at their own pace. In some cases . The lack of a common time base makes synchronizing the components a difficult task. C40 network. . In this system. the digitizers and the graphics module are slaved on the vertical sync of the genlocked cameras.
Latency. Delays make the control more difficult because they can cause instabilities. Active vision systems suggests by their very nature a hierarchical approach to control .Other considerations Bandwidth ± large data streams System Integration. then some new data compression algorithms must be invoked. If data throughput becomes the bottleneck. Delays between the acquisition of a frame and the motor response to it are an inevitable problem of active vision systems. Multi-rate control.
the mechanical control loops are essentially independent of the visual control loop. .Control If the visual and mechanical control rates are one or more orders of magnitude apart.
This action might not be possible to undo. Are you sure you want to continue?