You are on page 1of 7

Velagapudi ramkrishna Siddhartha engineering college

DEPARTMENT OF COMPUTER SCIENCE


Vijayawada

A paper on

Presented by Ch.Phani Krishna III Year CSE Phanikrishna.chinteti@gmail.com P.Veerendra III Year CSE pveerendra086@gmail.com

1.Abstract:
Multimedia data is rapidly gaining importance along with recent developments such as the increasing deployment of surveillance cameras in public locations. AIBO is a commercially available quadruped robot dog equipped with a color CMOS camera. its main aim to detect a mark on the floor using his camera and tracking the people.. In a few years time, analyzing the content of multimedia data will be a problem of phenomenal proportions, as digital video may produce data at rates beyond 100 Mb/s, and multimedia archives steadily run into Petabytes of storage space. Consequently, for urgent problems in multimedia content analysis, Grid computing is rapidly becoming indispensable. This demonstration shows the viability of wide-area Grid systems in adhering to the heavy demands of a real-time task in multimedia content analysis. Specifically, we show the application of a Sony Aibo robot dog, capable of recognizing objects from a set of learned objects, while connected to a large-scale Grid system comprising of cluster computer located in Europe, the United States, and Australia. As such, we demonstrate the effective integration of state-of the- art results from two largely distinct research fields: Multimedia Content Analysis and Grid computing. This paper briefly introduces the Parallel- Horus architecture, describes services-based approach to wide-area multimedia computing.

data.Due to the increasing storage and connectivity of multimedia data, automatic multimedia content analysis is becoming an ever more important area of research. Fundamental research questions in this area include: 1. Can we automatically find (sub-)genres in images from a statistical evaluation of large image sets? 2. Can we automatically learn to find objects in images and video streams from partially annotated image sets? Scientifically, these questions deal with the philosophical foundations of cognition and giving names to things. From a societal perspective solutions are urgently needed, given the rapidly increasing volume of multimedia data.

Figure(a): Robotic Dog

2.Introduction:
In order to create a social robot, able to interact with people in a natural way, one of the things it should be able to do is to follow a person around an area. This master's research is aimed at enabling a Sony AIBO robot dog to watch people closely and follow them around the house. This is done using the visual data provided by the robot's nose-mounted camera. By analyzing the data with respect to colour distributions and salient features. This is the time in which information be it scientific, industrial, or otherwise is generally composed of multimedia items, i.e. a combination of pictorial, linguistic, and auditory

The existing user transparent programming model (i.e. Parallel-Horus)is matched with an execution model based on Wide-Area Multimedia Services, i.e. high performance multimedia functionality that can be invoked from sequential applications running on a desktop machine. Evaluation of approach by applying it to a state-of-the-art Visual recognition task can be done. Specifically, application of a Sony Aibo robot dog, capable of recognizing objects from a set of 1,000 objects, while connected to a large-scale Grid system comprising of cluster systems in Europe and Australia.

3. Mark Detection:

Two solutions are possible to detect the mark in the picture: 1. Color recognition: By knowing in advance the color of the mark, position in the picture can be computed by using a virtual mask moving on all pixels: The mean color inside the mask is calculated to find the mark position in the picture.

performance multimedia functionality that can be invoked from sequential applications running on a desktop machine.

4. Parrallel-Horus:
Parallel-Horus is a cluster programming framework that allows programmers to implement parallel multimedia applications as fully sequential programs. The Parallel-Horus framework consists of commonly used multimedia data types and associated operations, implemented in C++ and MPI. The librarys API is made identical to that of an existing sequential library: Horus. 4.1. Unary pixel operation: Operations in which a unary function is applied to each pixel in the image. The functions sole argument is the pixel value itself. E.g.: negation, absolute value, square root 4.2. Binary pixel operation: Operations in which a binary function is applied to each pixel in the image. The functions first argument is the pixel value itself. The second argument is either a single value or a pixel value obtained from image itself. E.g.: addition, multiplication, threshold 4.3. Global reduction: Operations in which all pixels in the image are combined to obtain a single result value. E.g.: sum ,product, maximum 4.4. Neighborhood operation: Operations in which several pixel values in the neighborhood of each pixel in the image are combined. E.g.: percentile, median 4.5. Generalized convolution: Special case of neighborhood operation.The combination of pixels in the neighborhood of each pixel is expressed in terms of two binary operations. E.g.: convolution, guass, dilation

Figure(b): robotic dog recognizing the ball

2. Shape recognition: Image processing algorithms allows finding circles or lines in the picture. In the image processing course of Mr. Jourlin, circle recognition needed first different filters applied on the picture to detect the edges. After extracting edges, circles can be detected by using shape parameters and lines are detected by using Hough transform. For the 4 following reasons, the color recogni-tion method is adopted: 1. The camera perspective distortion modify the shape of the mark and could disturb the shape recognition. 2. The shape recognition algorithms need more computational power due to the image filtering and edge extraction. 3. The floor's color will be very different from the mark color, the color recognition technique can then be efficient. 4. Color recognition algorithms are easier to implement than shape recognition. The existing user transparent programming model (i.e. Parallel-Horus)is matched with an execution model based on Wide-Area Multimedia Services, i.e. high

4.6. Geometric Transformations: Operations in which the images domain is transformed. E.g.: rotation, scaling, reflection Current developments include patterns for operations on large datasets, as well as patterns on increasingly important data structures, such as feature vectors obtained from earlier calculations on image and video data. For reasons of efficiency, all ParallelHorus operations are capable of adapting to the performance characteristics of a parallel machine at hand, i.e. by being flexible in the partitioning of data structures.

invariance (e.g., to different lighting conditions). Due to the rapid increase in the size of multimedia repositories consisting of known objects, state-of-the-art sequential computers no longer can live up to the computational demands, making high performance distributed computing indispensable.

5. Services-Based Multimedia Grid Computing:


From the users perspective, Grid computing is still far from being more than just an academic concept. Essentially, this is because Grids do not yet have the full basic functionality needed for extensive use. Consequently, as long as programming and usage is hard, most researchers in multimedia computing will not regard Grids as a viable alternative to more traditional computer systems.
Fig. (c). Object recognition by our robot dog: (1) an object is shown to the dogs camera; (2) video frames are processed on a per-cluster basis; (3) given the resulting feature vectors describing the scene, a database of known objects is searched; (4) in case of recognition, the dog reacts accordingly

6. Performance Evaluation:
In this section the assessment of the architecture is given for the effectiveness in providing significant performance gains. At the end , the wide-area execution of a state-of-theart vision task is described. Specifically, the application of a Sony Aibo robot dog, capable of recognizing objects is presented, while connected to a large-scale Grid system comprising of clusters in Europe and Australia. 6.1 Object Recognition by a Sony Aibo Robot Dog: The example application demonstrates object recognition performed by a Sony Aibo robot dog (see Figure (c)). Irrespective of the application of a robot, the general problem of object recognition is to determine which, if any, of a given repository of objects appears in an image or video stream. It is a computationally demanding problem that involves a non-trivial tradeoff between specificity of recognition (e.g., discriminating between different faces) and

6.2. Local Histogram Invariants. In the robot application, local histograms of invariant features for each aspect of an object are explained. The color invariant features are highly invariant to illumination color, shadow effects, and shading of the object. The features are derived from a kernel based histogram of feature responses. By exploiting natural image statistics, these histograms are modeled by parameterized density functions. The parameters act as a new class of photometric and geometric invariants, yielding a very condensed representation of local image content.

channel histograms of the invariant gradients {Ww, Cw, Cw}, and the edge detectors {Wx, Wy, Cx, Cy, Cx, Cy}, separately. 6.3. Histogram Parameterization: From natural image statistics research, it is known that histograms of derivative filters can be well modeled by simple distributions. from the previous it is shown that histograms of Gaussian derivative filters in a large collection of images follow a Weibull type distribution. Furthermore, the gradient magnitude for invariants W and C given above follow a Weibull distribution, (1)

figure (d):illumination direction variation

In the learning phase of this system, a single condensed representation of each observed object is stored in a database. Subsequently, object recognition is achieved by matching local histograms extracted from the video stream generated by the camera in the dogs nose against the learned database. In this approach, first transform each pixels RGB value to an opponent color representation,

where r represents the response for one of the invariants {Ww, Cw, Cw}. The local histogram of invariants of derivative filters can be well modelled by an integrated Weibull type distribution , .(2)

The rationale behind this is that the RGB sensitivity curves of the camera are transformed to Gaussian basis functions , being the Gaussian and its first and second order derivative. Hence, the transformed values represent an opponent color system, measuring intensity, yellow versus blue, and red versus green Spatial scale is incorporated by convolving the opponent color images with a Gaussian filter. Photometric invariance is now obtained by considering two non-linear transformations. The first color invariant W isolates intensity variation from chromatic variation, i.e. edges due to shading, cast shadow, and albedo changes of the object surface. The second invariant feature C measures all chromatic variation in the image, disregarding intensity variation, i.e. all variation where the color of the pixels change. These invariants measure point-properties of the scene, and are referred to as point-based invariants.

In this case, r represents the response for one of the invariants {Wx, Wy,Cx,Cy, Cx, Cy}. Furthermore, represents the complete Gamma function, (3) In our implementation we convert the histogram density of all local invariant histograms to Weibull form by first centering the data at the mode of the distribution. Then, multiplying each bin frequency p(ri) by its absolute response value |ri|, and normalizing the distribution. These transformations allow the estimation of the Weibull density parameters and , indicating the (local) edge contrast, and the (local) roughness or textureness, respectively . 6.4. Color-Based Object Recognition:

In the robot dog system a simple algorithm is applied for object recognition and localization, based on the described invariant features. In the first learning phase of the experiment, an object is characterized by learning the invariant Weibull parameters at fixed locations in the image, representing a sort of fixed retina of receptive fields positioned at a hexagonal grid, 2 apart, on three rings from the center of the image. Hence, a total of 1+6+12+18 = 37 histograms are constructed. For each histogram, the invariant Weibull parameters are estimated. In the learning phase dog is presented with a set of 1,000 objects under a single visual setting. For each of these objects, the learned set of Weibull parameters is stored in a database. + =

There are many benefits in using the color based object recognition in robot dogs, 1.Robotic pets can replace the real ones. 2.Robotic pets can provide services and health benefits for senior citizens.

figure(g): Robot dog in household environment

3.These robotic dogs are flexible to move and can bring the objects according to the user requirements.

9. Conclusion:
The results obtained by matching the Parallel-Horus programming model with an execution model based on so-called Wide-Area Multimedia Services are described above Considering the above features, it can be decieded that system have succeeded in designing and building a framework. These have served well in showing the potential of applying Grid resources in state-of-the-art multimedia processing in a coordinated manner. However, there is still room for improvement, in functionality, reliability, fault tolerance, and performance. With such improvements, Parallel-Horus will continue to have an immediate and stimulating effect on the study of the many computationally demanding problems in multimedia content analysis. The robot dog application is merely one of these.

Figure (f) : learning phase In the second recognition phase, the learning step is validated by showing the same objects again, under many different appearances, with varying lighting direction, lighting color, and viewing position, using the same retinal structure. In this manner, the robot dog has learned each of the 1,000 objects from only one example, while being capable of recognizing more than 300 of these under a diversity of imaging conditions that may occur in everyday life. Interestingly, this recognition rate is higher than the recognition rate of around 200 objects reported for a real dog. The important thing is the fact that the algorithm runs at around 1 frame per 4 seconds. Moreover, voting is done by the results obtained from 8 consecutive frames. As a result, object recognition takes approximately 30 seconds, which is not even close to real-time performance. This problem is overcome by applying high-performance distributed computing at a very large scale.

10. References:
1. G. Alonso, F. Casati, H. Kuno, and V. Machiraju. Web Services Concepts,Architectures and Applications. Springer-Verlag, 2004. 2. H.E. Bal et al. The Distributed ASCI Supercomputer Project. Operating SystemsReview, 34(4):7696, 2000. 3. D. Comaniciu, V. Ramesh, and P. Meer. Kernel-based Object Tracking. IEEETransactions on Pattern Analysis and Machine Intelligence, 25(4):564 577, 2003. 6

7. Benefits:

4. D. Crookes, P.J. Morrow, and P.J. McParland. IAL: A Parallel Image ProcessingProgramming Language. IEE Proceedings, Part I, 137(3):176182, June 1990. 5. J.M. Geusebroek et al. Color Invariance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12):1338 1350, 2001.

You might also like