You are on page 1of 46

3D Model-Based Hand Gesture Recognition and Tracking

PAMI Lab, U. of Waterloo Manglai Zhou Apr. 8, 2010

Topic
      

1. Introduction 2. Human Hand Modeling 3. Feature Selection and Extraction 4. Model-Based Hand Posture Recognition 5. Hand Motion Tracking 6. Conclusion Refs.

1. Introduction


Hand gestures:
Purpose of human gestures: conversational, controlling, manipulative, and communicative. More natural and intuitive in CV, esp. in 3-D apps. As an assistive/supporting means for analyzing human intent and identifying potential threats in a multi-modality surveillance system (Project MUSES_SECRET).

1. Introduction


Vision-based hand gesture recognition


Challenges:
 

Highly articulated, with many joints and high DOFs Highly constrained: static and dynamic constraints, hard to model

Two representations: Appearance-based and 3-D model-based Two steps:


 

Static posture recognition Gesture understanding (semantics)


4

1. Introduction


My work mainly concentrates on 3D model-based hand gesture recognition Make use of the kinematic structure of the hand, i.e. the pose of the palm, the angles of finger joint, etc. PRO:
View independent, more appropriate for multi-camera vision systems. Provide more detailed info for interpretation of hand gestures.

CON:
Sophisticated modeling Requires more intensive precessing power.
5

2. Human Hand Modeling


 

Representations of a hand and 3-D model Human hand motion has 26 DOF
Global configuration: six DOF, representing the pose of the hand (position and orientation). Local configuration: 20 angular DOF of fingers  DIP and PIP joint each has one degree of freedom for rotation  MCP joint has two degrees of freedom

Finger motion constraints are applied to define the ranges each finger may move within.
6

2. Human Hand Modeling

2. Human Hand Modeling




Kinematic model is augmented with shape information to generate appearances of a hand seen in 2D images An 3-D model has been built in OpenGL graphic programming environment. Palm is represented by a flat, chamfered rectangular Each segment of fingers was approximated by a sphere-ended cylinder with a unique dimention. Each joint is modeled using a rotation matrix, with a pre-defined range (constraint).
8

 

2. Human Hand Modeling


 

3-D hand model: pose1: open palm,

pose2: fist

2. Human Hand Modeling


 

3-D hand model: pose3: pointing,

pose4: victory

10

2. Human Hand Modeling




All 20 local DOFs are modeled with static and dynamic constraints. Different fingers are color-coded just for easy identification. Actual models will use skin color. 2-D projections of any posture at any angle can be easily obtained by manipulating the model in 3-D space and performing a perspective projection. For global configuration, only one DOF is implemented: rotation along virtical axis.

11

3. Feature Selection and Extraction




Selection of image features and method of extraction have significant impact on the overall system performances.

12

3. Feature Selection and Extraction




High-level features
Fingertips, fingers, joint locations, etc. Intuitive representation, efficient processing. hard to extract

Low-level features
Colors, contours, edges, silhouette, etc. Skin color segmentation Distance metric: Chamfer matching Easier to obtain; sensitive to finger/palm angles
13

3. Feature Selection and Extraction


 

Hand feature: silhouette images pose1: open palm, pose2: fist

14

3. Feature Selection and Extraction


 

Hand feature: silhouette images pose3: pointing, pose4: victory

15

3. Feature Selection and Extraction


  

Skin color segmentation Canny edge detector (Implemented) Hand shape normalization (dimension) 3D features:
Stereo cameras obtain 3D images Depth info helping for cluttered backgrounds Acquired surface is matched to the model surface

16

4. Model-Based Hand Posture Recognition




A hand appears very different at different orientation or viewpoint Database approach: Efficient searching and accurate indexing of image database Template matching: Chamfer distance

Where ||x y|| denotes the Euclidean distance between 2 pixel locations x and y
17

4. Model-Based Hand Posture Recognition




Distance-transform (DT)
Approximation of Euclidean distance in 2-D/3-D Distance mask (x3): b a b a a // int a = 3; b a b // int b = 4; DT generates a new image, in which pixel value gives the distance to the nearest edge. Efficient algorithms to compute. Calculated only once for each frame.
18

4. Model-Based Hand Posture Recognition




Edge model of the target image is superimposed onto the distance image. Avg/Max of distance values that edge model hits gives Chamfer Distance.

19

4. Model-Based Hand Posture Recognition




An example of DT image (for the V pose)

20

4. Model-Based Hand Posture Recognition




Single frame pose estimation:


The estimation from one image or multiple images of different views. Hand orientation determined first. Search over all possible configurations, given the hand orientation and motion constraints.

21

4. Model-Based Hand Posture Recognition




Hand Pose Classification:


The classifier is trained by a large number of labeled poses, which can be generated by artificial 3D hand models.

Image database indexing:


Indexing to improve searching large databases of templates Quickly search for the nearest neighbor(s) of a given input
22

5. Hand Motion Tracking




Hand gesture: a sequence of hand/fingure motion that bears certain meaning. Two types of human hand tracking:
1. Single hypothesis tracking 2. Multiple hypotheses tracking (MHT)

The configuration space can be represented as a tree. Tree structures improve processing by employing fast hierarchical searches.
23

5. Hand Motion Tracking


Frame 0 Initialization Prediction Predicted Pose Calculation of Model Features Model Features Error Calculation Search for Match Best State Model-based tracking
24

Pose Estimation Frame k Feature Extraction Observed Features

Updated State

5. Hand Motion Tracking




Bayesian tracking
Multi-resolution partitioning of the state space.

Particle filtering
Approximate arbitrary distributions with a set of random samples. Deal with clutter and ambiguous situations more effectively, by multiple hypotheses.

Tree-based filtering and searching


Cluster prototype: a group of similar shape templates.
25

5. Hand Motion Tracking


   

Tracking: Bayesian inference problem: xt n - internal parameters of an object at time t m zt - measurement obtained. state estimation
p ( xt | z1:t 1 ) ! p ( xt | xt 1 ) p ( xt 1 | z1:t 1 )dxt 1

p ( xt z1:t ) ! ct p ( zt xt ) p ( xt z1:t 1 )
where ct ! p ( zt xt ) p ( xt z1:t 1 ) dxt
26

1

5. Hand Motion Tracking




Hierarchical partitioning of the state space

27

5. Hand Motion Tracking




Challenges:
How to adapt the hand model to specific target? How to establish correspondences and combine (fuse) image data from multiple cameras in a 3-D framework? How good an algorithm handles occlusions and performs in highly cluttered environment? How to interpret the semantic meanings of a hand gesture?
28

6. Conclusion


1. Hand gesture recognition is challenging, due to its complex articulate and constraints, high DOF, and heavy self-occlusion . 2. 3-D model-based recognition is suitable in multi-camera vision-based systems. 3. Global config of hand should be determined first to reduce the search space. Particle filtering and tree-based searching help improve tracking robustness and conquer the computation hurdles.
29

References:


   

[1] Ying Wu and Thomas S. Huang, Hand modeling, analysis and recognition For Vision-Based Human Computer Interaction. IEEE Signal Processing Mag, May 2001, p. 51-60 [2] A. Erol, et al, Vision-based hand pose estimation: A review. Computer Vision and Image Understanding 108 (2007) 5273 [3] M. Potamias and V. Athitsos, Nearest Neighbor Search Methods for Handshape Recognition. PETRA08 July 1519, 2008, Athens, Greece [4] D. P. Huttenlocher, et. al., Comparing Images Using the Hausdorff Distance. IEEE Trans, PAMI 15 (9) (Sept 1993) 850863 [5] H.G. Barrow, et. al., Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching, NASA Technical Report, Vision-7, p.659-670. [6]

30

Paper Survey:
A Prototype for 3-D Hand Tracking and Posture Estimation
PAMI Lab, U. of Waterloo Manglai Zhou Apr. 8, 2010
31

Overview


Present a prototype for 3-D hand tracking and dynamic gesture recognition. Objective: track the hand in a general background and to be able to recognize dynamic gestures in real time. Three phases simulation, real world video stream test, and multiple camera data fusion Suggest a road map for future development to reach the final goal.
32

Introduction:
 

Camera-based posture-estimation system. Data glove is used to calibrate and validate the system. (CyberGlove) Color Markers are employed to identify the gesturing hand and the fingertips

33

Block Diagram of the Prototype

34

The Proposed Approach




Three phases:
1. Graphical simulation of the hand tracking problem 2. Tracking with a real video camera and validating the accuracy of the tracking system using the CyberGlove as a reference 3. Extend to multi-cameras

35

Phase 1: Simulation


  

Study the feasibility single camera visionbased hand tracking 26-DOF 3-D hand model CyberGlove Square marker: palm position and orientation (global configuration) Fingertips: finger posture and joint angles (local configurations)
36

Phase 1: Simulation (Cont.)




2-D projections are used to estimate the 3-D hand posture. Based on geometric computations and inverse kinematics 3-D/2-D Feature-to-Posture Transformation
How 3-D model data are projected onto the image plane. Forward kinematics: 4X4 matrix transformation

37

Phase 1: Simulation (Cont.)




2-D/3-D Feature-to-Posture Transformation


2-D marker features => hand posture hypothesis Pinhole camera model utilized Perspective geometry and its relevant constraints Finger posture: use detected finger markers to determine a reachable range by the finger along the camera view direction The reachable linear segment is then sampled at constant lengths to calculate a finger posture hypothesis by IK.
38

Phase 1: Simulation (Cont.)

39

Phase 1: Simulation (Cont.)




Thumb: binary search of a lookup table of all feasible end-effector positions Other fingers: solved by error model analysis technique

40

Prototype Phase 2 Facing the Reality




Many practical parameters that are different from the simulation Detection of 2-D features from acquired video frames, by utilizing segmented color and silhouette. Palm: Two colored markers (each on front and back) Fingertips: Five colored ring markers (one for each finger)
41

Prototype Phase 3 Multiple cameras


 

Camera sensor fusion Type 1:


posture hypothesis is generated separately, and then validated using the observation models Useful when cameras are mobile

Type 2
Geometrical transformation between camera coordinate frames is used Best orientation is used by both models
42

Conclusions


Framework presented including two steps:


Posture hypothesis and validation

 

The framework provides reasonable results, comparing to the CyberGlove Multiple cameras help cover more area and improve tracking accuracy Handles intermittent occlusion for a short time Future work: 3-D marker-less hand tracking
43

Comments:


 

A prototype of 3-D model-based hand tracking in a general environment with unconstrained background. Recognize dynamic gestures in real-time. Dataglove is used to validate the proposed framework. Colored markers are used to assist palm and finguretip recognition.

44

Comments (Cont.):
 

Lack of palm identification of bare hands Hand selhouette and skin color for hand orientation estimation Marker-less edge/contour detection for fingertips Elbow, arm and shoulder info may be used to reduce the dimension of matching of 3-D hand model

45

3D Model-Based Hand Gesture Recognition and Tracking




Questions...... Comments...... Suggestions......

46

You might also like