Obstacle avoidance using the Kinect
Van-Duc Nguyen Dept. Electronics & Teleco mmun ications Engineering, Ho Chi Minh City University of Technology, Vietnam firstname.lastname@example.org
Abstract—In this paper, we introduce a new method for mobile robot to detect obstacle by using Kinect. The method uses Kinect with the robust support of Point Cloud Library in 3D image processing. Robot can get precise environment’s information for motion plan. Keywords— mobile robot; kinect; obstacle avoidance; point cloud
I. INT RODUCTION Given a priori knowledge of the environment and the goal position, mobile robot navigation refers to the robot’s ability to safely move towards the goal using its knowledge and sensorial information of the surrounding environment. In fact, in mobile robot operating in unstructured environment, the knowledge of the environment is usually absent or partial. Therefore, obstacle detection and avoidance are always mentioned for mobile robot missions . Obstacle avoidance refers to methodologies of shaping the robot’s path and to overcome unexpected obstacles. The resulting motion depends on the robot actual location and the sensor readings.Even though there are many different ways to avoid obstacle; most of them use ultrasonic sensorbut camera sensor is recently interested in. Kinect is not only normal camera sensor but also a special device can provide depth map.Depth mapis acquired through OpenNI librarythen processed by Point Cloud library to extract accurate information about the environment. II. KINECT AND DEPTH MAP The main purpose of using Kinect device in this project is tore-construct 3D scenefrom depth data which is necessary informat ion for robot motion planning. The following sections will g ive more details in Kinect’s components and the way it works. A. The main components of Kinect Kinectincludes: RGB camera, 3D Depth Sensors, Multiarray Micand Motorized Tilt. RGB Camera:the RGB video stream uses 8-bit VGA resolution (640 × 480 pixels) with a Bayer color filterat a frame rate 30 Hz . 3D Depth Sensors:depth data is acquired by the combination of IR Projector and IR camera . Multi-array Mic:The microphone array features four microphone capsulesand operates with each channel processing16-bit audio at a sampling rate of 16 kHz. Motorized Tilt: the motorized p ivot is capable of tilting the sensor up to 27° either up or down.
Fig. 1: The main components of Kinect
One of the most special Kinect’s feature is providing raw depth. The next section will present the way Kinect calculates distance to object in front space. B. Depth calculation The Kinect device has two cameras and one laser-based IR projector. The figure belo w shows their placement on the device. Each lens is associated with a camera or a projector .
Fig. 2: Inside Kinect: RGB camera, IR camera and IR projector
The IR camera and the IR projector form a stereo pair with baseline of approximate 7.5 cm.The IR pro jector sends out a fixed pattern of light and dark areas shown in figure 3. The pattern is generated from a set of diffractiongratings, with special care to eliminate the effect of zero-order propagation of center brightdot. Depth is then calculated by triangulation of pattern received through IR camera againstknown pattern emitted by IR pro jector. Th is technology called Light Coding was developedby PrimeSense and is suitable for any indoor environment. Many current range-finding technologies uses time-of-flight to determine the dis tance to an object by
measuring the time it takes for a flash of light to travel to and reflect back from its surface. Light Coding uses an entirely different approach where the light source is constantly turned on, greatly reducing the need for precision timing of the measurements .
Fig. 5: AKinect-likesetupwithasinglepoint-projector
Fig.3: Fixed pattern emitted by IR projector 
Fro m knowledge on the emitted light pattern, lens distortion, and distance between emitter and receiver, the distance to each dot can be estimated by calculation.This is done internally in the Kinect by chip PS1080 SoC, and the final depth image is directly available. Figure 4 show the basic process of producing scene depth image.
Fig.3: Acquiring depth data process
Toillustratehowsurfacemeasurementusinga projected by projector,see figure 5 .
It is a Kinect-likesetup in which a projector is projecting just a single point into space along the greenline. It is captured by the IR camera once it hits a surface. There are fourplanes in theillustration: a reference plane, a plane closer to the camera than the reference plane, a more distant plane and a projected image plane of IR camera. When the point hits a surface in the close plane it will appear further right in the image plane than if it was hitting a s urface in the reference plane. Likewise, when the point is projected onto an object on a plane which is more distant than the reference plane, the point will appear more to the left. When theorigin and direction of the light is known in advance, and the horizontal position of the dot is known for a reference depth, then it is possible to findout the depth of the surface which the point hits basedon its horizontal position in thecameras image p lane. Now, the Kinect does not projectjust a single point, but a large amount of points in an intricate pattern. Internally, it has an image reference of what the pattern looks like fro m the IR cameras viewpoint when all point in the surface are at a certain, known distance.It finds correspondences between the pattern which it captures with its cameraandthis internal reference pattern. By co mparing the horizontal position of a point in the captured image to its corresponding horizontal position in the reference image, a horizontal delta value can be extracted, wh ich in turn can be used to calculate the depth of the pixel just like described in the above paragraph with the single-point projector. The Kinect itself actually does not calculate the depth, but returns a more abstract value for the hostsystem to handle. While OpenNI abstracts this away for the developer, libfreenect, another driver and library platform, makes these 11-bit values available. Because there are fewerpixels in the IR pattern than there are in the depth map some parts of the depth map are interpolation, mean ing that one cannotexpect the Kinect to bepixel-precise.Itworksbest for smooth cont inuous surfaces. Accordingto NicolasBurrus, whopioneered with informat ion about the Kinect fro m h is own experiments, the depth of a point z can be calculated in meters from the raw disparity of the point d (as provided by theKinect hardware) using the following equation :
1 -0.0030711016 d + 3.3309495161
B. Point Cloud
d is an 11-bit integer which ranges from 0 to 2047. We carried out tests with Kinect pointing straight at the wall on OpenNI, and found that the reliable depth values ranges from0.5÷5.0 meters. These facts mean that only d values of about 434 to 1030 represent actual measurable depth values. Nonetheless, it suffers fro m the fact that for any object closer than about 0.5meter to the device, no depth data is returned. III. OBST ACLE DETECTION When depth map is acquired by OpenNI, it is then processed by Point Cloud Lib rary to ext ract obstacle with full of its information. Figure 6 shows the flow of process:
Fig. 8: RGB point cloud
Point cloud P is a set of p i which containsvaluesto represent nD info rmationabouttheworld(usually n = 3).
p1, p2 ,..., pi ,..., pn , pi xi , yi , zi BesidesXYZdata,eachpoint p canholdadditionalinfo rmation, such as: RGBco lors,intensityvalues,distances, segmentationresults,etc. To increase the process speed, point clouds are downsampled and filteredby Vo xel Grid and Pass Through.
C. Voxel Grid and Pass Through Vo xel grid generation is a kind of technique that can be used to divide the 3 dimensional space into small 3D cubeor may be 3D rectangular boxes if we use variable dimension for x,y and z axis. The main idea is it counts the number of points inside a cube of specific dimension if it is greater th an a certain nu mber then the total number of points will be reduced to a single point, a huge reduction in dimension. The single point actually is the average of all points dimension in different direction.
Fig.6: Flow of image data processing
A. Depth Map
Fig. 7: RGB imageand depth map
Depth map is supported by OpenNI withdifferent resolution and frame rate.
Fig.9: Point cloud after Voxel Grid
RESOLUTION AND FRAME RATE SUPPORTED BY OPENNI LIBRARY
Pass through will limit point clouds in X, Y, Z dimension. In our project, we pass through point clouds in Z dimension in a range fro m 0.5 to 1.4 meters.
Frame rate (fps) 15 30
Resolution (pixel x pixel) 1280×1024 640x480
when it co mp letes the number of iteration g iven before and returns the plane model with the most number of points. The floor is easily detected by RANSA C algorith m with the blue point cloud in figure 12. E. Euclidean cluster extraction:
Hình10: Pass Through
D. Plannar segmentation Plane detection is a prerequisite to a wide variety of vision tasks.RANdomSA mp le Consensus(RANSA C) (Fischler and Bolles ) algorith m is widely used for plane detection in point cloud data.
Fig. 12: Point cloudsare filtered and segmented
Euclidean cluster extract ionext racts other clouds on the floor to each cluster wh ich presents for each obstacle. Figure 12 shows two cluster: red cluster is closest obstacle to robot, and the further one is green. F. Object clusters In this step, clusters will be analyzed to provide the obstacles’ size. This work is supported with effectivefunctions by PCL.
Fig. 13: Object cluster
Figure 13 shows the cluster after analyzing with full of dimensions. This informat ion is very important for robot’s motion p lan.
Fig. 11: Flow chart of RANSAC algorithm
Figure 11 shows the flow chart of RA NSA C algorithm in finding a plane model with a set of point as input. The input values include a number of iterations and a threshold to find out which point belongs to model. It chooses randomly three points and modelizes equation of the plane as ax + by + cz + d = 0. Next, RANSA C calcu lates distance between all other points to the plane. If this values smaller than threshold values, it appertains to the plane model. The algorithm ends
IV. CONCLUSION Obstacle avoidance using Kinect surmounts the limitations of other methods using a single camera, such as: dependence on colors texture, obstacle in the air, etc. Besides, processing speed is acceptable for real-t ime application which is still a problem in using two cameras.
     
REFERENCES http://en.wikipedia.org/wiki/Kinect http://www.ros.org/wiki/kinect_calibration/technical AutorPráce, “Head pose estimation and tracking ,” p.23, 2011. MikkelViager, “ Analysis of Kinect for Mobile Robots,” Technical University of Denmark, p.11. http://www.primesense.com/en/company-profile http://www.ros.org/wiki/kinect_calibration/technical
     
Jacob Kjær , “A Qualitative Analysis of T wo Automated Registration Algorithms In a Real World Scenario Using Point Clouds from the Kinect, ” June 27, 2011. http://nicolas.burrus.name/index.php/Research/KinectCalibration http://openni.org/Documentation/ RaduBogdan RUSU, “PointCloud(2) processing in ROS,” May 2, 2010. http://pointclouds.org/documentation/tutorials/ http://en.wikipedia.org/wiki/RANSAC SunglokChoi, TaeminKim, WonpilYu, “Performance Evaluationof RANSAC Family,” 2009.