Professional Documents
Culture Documents
HGR Progress Presentation Apr 8
HGR Progress Presentation Apr 8
Topic
1. Introduction 2. Human Hand Modeling 3. Feature Selection and Extraction 4. Model-Based Hand Posture Recognition 5. Hand Motion Tracking 6. Conclusion Refs.
1. Introduction
Hand gestures:
Purpose of human gestures: conversational, controlling, manipulative, and communicative. More natural and intuitive in CV, esp. in 3-D apps. As an assistive/supporting means for analyzing human intent and identifying potential threats in a multi-modality surveillance system (Project MUSES_SECRET).
1. Introduction
Highly articulated, with many joints and high DOFs Highly constrained: static and dynamic constraints, hard to model
1. Introduction
My work mainly concentrates on 3D model-based hand gesture recognition Make use of the kinematic structure of the hand, i.e. the pose of the palm, the angles of finger joint, etc. PRO:
View independent, more appropriate for multi-camera vision systems. Provide more detailed info for interpretation of hand gestures.
CON:
Sophisticated modeling Requires more intensive precessing power.
5
Representations of a hand and 3-D model Human hand motion has 26 DOF
Global configuration: six DOF, representing the pose of the hand (position and orientation). Local configuration: 20 angular DOF of fingers DIP and PIP joint each has one degree of freedom for rotation MCP joint has two degrees of freedom
Finger motion constraints are applied to define the ranges each finger may move within.
6
Kinematic model is augmented with shape information to generate appearances of a hand seen in 2D images An 3-D model has been built in OpenGL graphic programming environment. Palm is represented by a flat, chamfered rectangular Each segment of fingers was approximated by a sphere-ended cylinder with a unique dimention. Each joint is modeled using a rotation matrix, with a pre-defined range (constraint).
8
pose2: fist
pose4: victory
10
All 20 local DOFs are modeled with static and dynamic constraints. Different fingers are color-coded just for easy identification. Actual models will use skin color. 2-D projections of any posture at any angle can be easily obtained by manipulating the model in 3-D space and performing a perspective projection. For global configuration, only one DOF is implemented: rotation along virtical axis.
11
Selection of image features and method of extraction have significant impact on the overall system performances.
12
High-level features
Fingertips, fingers, joint locations, etc. Intuitive representation, efficient processing. hard to extract
Low-level features
Colors, contours, edges, silhouette, etc. Skin color segmentation Distance metric: Chamfer matching Easier to obtain; sensitive to finger/palm angles
13
14
15
Skin color segmentation Canny edge detector (Implemented) Hand shape normalization (dimension) 3D features:
Stereo cameras obtain 3D images Depth info helping for cluttered backgrounds Acquired surface is matched to the model surface
16
A hand appears very different at different orientation or viewpoint Database approach: Efficient searching and accurate indexing of image database Template matching: Chamfer distance
Where ||x y|| denotes the Euclidean distance between 2 pixel locations x and y
17
Distance-transform (DT)
Approximation of Euclidean distance in 2-D/3-D Distance mask (x3): b a b a a // int a = 3; b a b // int b = 4; DT generates a new image, in which pixel value gives the distance to the nearest edge. Efficient algorithms to compute. Calculated only once for each frame.
18
Edge model of the target image is superimposed onto the distance image. Avg/Max of distance values that edge model hits gives Chamfer Distance.
19
20
21
Hand gesture: a sequence of hand/fingure motion that bears certain meaning. Two types of human hand tracking:
1. Single hypothesis tracking 2. Multiple hypotheses tracking (MHT)
The configuration space can be represented as a tree. Tree structures improve processing by employing fast hierarchical searches.
23
Updated State
Bayesian tracking
Multi-resolution partitioning of the state space.
Particle filtering
Approximate arbitrary distributions with a set of random samples. Deal with clutter and ambiguous situations more effectively, by multiple hypotheses.
Tracking: Bayesian inference problem: xt n - internal parameters of an object at time t m zt - measurement obtained. state estimation
p ( xt | z1:t 1 ) ! p ( xt | xt 1 ) p ( xt 1 | z1:t 1 )dxt 1
p ( xt z1:t ) ! ct p ( zt xt ) p ( xt z1:t 1 )
where ct ! p ( zt xt ) p ( xt z1:t 1 ) dxt
26
1
27
Challenges:
How to adapt the hand model to specific target? How to establish correspondences and combine (fuse) image data from multiple cameras in a 3-D framework? How good an algorithm handles occlusions and performs in highly cluttered environment? How to interpret the semantic meanings of a hand gesture?
28
6. Conclusion
1. Hand gesture recognition is challenging, due to its complex articulate and constraints, high DOF, and heavy self-occlusion . 2. 3-D model-based recognition is suitable in multi-camera vision-based systems. 3. Global config of hand should be determined first to reduce the search space. Particle filtering and tree-based searching help improve tracking robustness and conquer the computation hurdles.
29
References:
[1] Ying Wu and Thomas S. Huang, Hand modeling, analysis and recognition For Vision-Based Human Computer Interaction. IEEE Signal Processing Mag, May 2001, p. 51-60 [2] A. Erol, et al, Vision-based hand pose estimation: A review. Computer Vision and Image Understanding 108 (2007) 5273 [3] M. Potamias and V. Athitsos, Nearest Neighbor Search Methods for Handshape Recognition. PETRA08 July 1519, 2008, Athens, Greece [4] D. P. Huttenlocher, et. al., Comparing Images Using the Hausdorff Distance. IEEE Trans, PAMI 15 (9) (Sept 1993) 850863 [5] H.G. Barrow, et. al., Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching, NASA Technical Report, Vision-7, p.659-670. [6]
30
Paper Survey:
A Prototype for 3-D Hand Tracking and Posture Estimation
PAMI Lab, U. of Waterloo Manglai Zhou Apr. 8, 2010
31
Overview
Present a prototype for 3-D hand tracking and dynamic gesture recognition. Objective: track the hand in a general background and to be able to recognize dynamic gestures in real time. Three phases simulation, real world video stream test, and multiple camera data fusion Suggest a road map for future development to reach the final goal.
32
Introduction:
Camera-based posture-estimation system. Data glove is used to calibrate and validate the system. (CyberGlove) Color Markers are employed to identify the gesturing hand and the fingertips
33
34
Three phases:
1. Graphical simulation of the hand tracking problem 2. Tracking with a real video camera and validating the accuracy of the tracking system using the CyberGlove as a reference 3. Extend to multi-cameras
35
Phase 1: Simulation
Study the feasibility single camera visionbased hand tracking 26-DOF 3-D hand model CyberGlove Square marker: palm position and orientation (global configuration) Fingertips: finger posture and joint angles (local configurations)
36
2-D projections are used to estimate the 3-D hand posture. Based on geometric computations and inverse kinematics 3-D/2-D Feature-to-Posture Transformation
How 3-D model data are projected onto the image plane. Forward kinematics: 4X4 matrix transformation
37
39
Thumb: binary search of a lookup table of all feasible end-effector positions Other fingers: solved by error model analysis technique
40
Many practical parameters that are different from the simulation Detection of 2-D features from acquired video frames, by utilizing segmented color and silhouette. Palm: Two colored markers (each on front and back) Fingertips: Five colored ring markers (one for each finger)
41
Type 2
Geometrical transformation between camera coordinate frames is used Best orientation is used by both models
42
Conclusions
The framework provides reasonable results, comparing to the CyberGlove Multiple cameras help cover more area and improve tracking accuracy Handles intermittent occlusion for a short time Future work: 3-D marker-less hand tracking
43
Comments:
A prototype of 3-D model-based hand tracking in a general environment with unconstrained background. Recognize dynamic gestures in real-time. Dataglove is used to validate the proposed framework. Colored markers are used to assist palm and finguretip recognition.
44
Comments (Cont.):
Lack of palm identification of bare hands Hand selhouette and skin color for hand orientation estimation Marker-less edge/contour detection for fingertips Elbow, arm and shoulder info may be used to reduce the dimension of matching of 3-D hand model
45
46