HGR Progress Presentation Apr 8

3D Model-Based Hand Gesture Recognition and Tracking
PAMI Lab, U. of Waterloo Manglai Zhou Apr. 8, 2010
Topic

1. Introduction 2. Human Hand Modeling 3. Feature Selection and Extraction 4. Model-Based Hand Posture Recognition 5. Hand Motion Tracking 6. Conclusion Refs.
1. Introduction

Hand gestures:
Purpose of human gestures: conversational, controlling, manipulative, and communicative. More natural and intuitive in CV, esp. in 3-D apps. As an assistive/supporting means for analyzing human intent and identifying potential threats in a multi-modality surveillance system (Project MUSES_SECRET).
1. Introduction

Vision-based hand gesture recognition

Challenges:

Highly articulated, with many joints and high DOFs Highly constrained: static and dynamic constraints, hard to model
Two representations: Appearance-based and 3-D model-based Two steps:

Static posture recognition Gesture understanding (semantics)

4
1. Introduction

My work mainly concentrates on 3D model-based hand gesture recognition Make use of the kinematic structure of the hand, i.e. the pose of the palm, the angles of finger joint, etc. PRO:
View independent, more appropriate for multi-camera vision systems. Provide more detailed info for interpretation of hand gestures.
CON:
Sophisticated modeling Requires more intensive precessing power.
5
2. Human Hand Modeling

Representations of a hand and 3-D model Human hand motion has 26 DOF
Global configuration: six DOF, representing the pose of the hand (position and orientation). Local configuration: 20 angular DOF of fingers DIP and PIP joint each has one degree of freedom for rotation MCP joint has two degrees of freedom
Finger motion constraints are applied to define the ranges each finger may move within.
6

Kinematic model is augmented with shape information to generate appearances of a hand seen in 2D images An 3-D model has been built in OpenGL graphic programming environment. Palm is represented by a flat, chamfered rectangular Each segment of fingers was approximated by a sphere-ended cylinder with a unique dimention. Each joint is modeled using a rotation matrix, with a pre-defined range (constraint).
8


3-D hand model: pose1: open palm,
pose2: fist

3-D hand model: pose3: pointing,
pose4: victory
10

All 20 local DOFs are modeled with static and dynamic constraints. Different fingers are color-coded just for easy identification. Actual models will use skin color. 2-D projections of any posture at any angle can be easily obtained by manipulating the model in 3-D space and performing a perspective projection. For global configuration, only one DOF is implemented: rotation along virtical axis.
11
3. Feature Selection and Extraction

Selection of image features and method of extraction have significant impact on the overall system performances.
12

High-level features
Fingertips, fingers, joint locations, etc. Intuitive representation, efficient processing. hard to extract
Low-level features
Colors, contours, edges, silhouette, etc. Skin color segmentation Distance metric: Chamfer matching Easier to obtain; sensitive to finger/palm angles
13

Hand feature: silhouette images pose1: open palm, pose2: fist
14

Hand feature: silhouette images pose3: pointing, pose4: victory
15

Skin color segmentation Canny edge detector (Implemented) Hand shape normalization (dimension) 3D features:
Stereo cameras obtain 3D images Depth info helping for cluttered backgrounds Acquired surface is matched to the model surface
16
4. Model-Based Hand Posture Recognition

A hand appears very different at different orientation or viewpoint Database approach: Efficient searching and accurate indexing of image database Template matching: Chamfer distance
Where ||x y|| denotes the Euclidean distance between 2 pixel locations x and y
17

Distance-transform (DT)
Approximation of Euclidean distance in 2-D/3-D Distance mask (x3): b a b a a // int a = 3; b a b // int b = 4; DT generates a new image, in which pixel value gives the distance to the nearest edge. Efficient algorithms to compute. Calculated only once for each frame.
18

Edge model of the target image is superimposed onto the distance image. Avg/Max of distance values that edge model hits gives Chamfer Distance.
19

An example of DT image (for the V pose)
20

Single frame pose estimation:

The estimation from one image or multiple images of different views. Hand orientation determined first. Search over all possible configurations, given the hand orientation and motion constraints.
21

Hand Pose Classification:

The classifier is trained by a large number of labeled poses, which can be generated by artificial 3D hand models.
Image database indexing:

Indexing to improve searching large databases of templates Quickly search for the nearest neighbor(s) of a given input
22
5. Hand Motion Tracking

Hand gesture: a sequence of hand/fingure motion that bears certain meaning. Two types of human hand tracking:
1. Single hypothesis tracking 2. Multiple hypotheses tracking (MHT)
The configuration space can be represented as a tree. Tree structures improve processing by employing fast hierarchical searches.
23

Frame 0 Initialization Prediction Predicted Pose Calculation of Model Features Model Features Error Calculation Search for Match Best State Model-based tracking
24
Pose Estimation Frame k Feature Extraction Observed Features
Updated State

Bayesian tracking
Multi-resolution partitioning of the state space.
Particle filtering
Approximate arbitrary distributions with a set of random samples. Deal with clutter and ambiguous situations more effectively, by multiple hypotheses.
Tree-based filtering and searching

Cluster prototype: a group of similar shape templates.
25

Tracking: Bayesian inference problem: xt n - internal parameters of an object at time t m zt - measurement obtained. state estimation
p ( xt | z1:t 1 ) ! p ( xt | xt 1 ) p ( xt 1 | z1:t 1 )dxt 1
p ( xt z1:t ) ! ct p ( zt xt ) p ( xt z1:t 1 )
where ct ! p ( zt xt ) p ( xt z1:t 1 ) dxt
26
1

Hierarchical partitioning of the state space
27

Challenges:
How to adapt the hand model to specific target? How to establish correspondences and combine (fuse) image data from multiple cameras in a 3-D framework? How good an algorithm handles occlusions and performs in highly cluttered environment? How to interpret the semantic meanings of a hand gesture?
28
6. Conclusion

1. Hand gesture recognition is challenging, due to its complex articulate and constraints, high DOF, and heavy self-occlusion . 2. 3-D model-based recognition is suitable in multi-camera vision-based systems. 3. Global config of hand should be determined first to reduce the search space. Particle filtering and tree-based searching help improve tracking robustness and conquer the computation hurdles.
29
References:

[1] Ying Wu and Thomas S. Huang, Hand modeling, analysis and recognition For Vision-Based Human Computer Interaction. IEEE Signal Processing Mag, May 2001, p. 51-60 [2] A. Erol, et al, Vision-based hand pose estimation: A review. Computer Vision and Image Understanding 108 (2007) 5273 [3] M. Potamias and V. Athitsos, Nearest Neighbor Search Methods for Handshape Recognition. PETRA08 July 1519, 2008, Athens, Greece [4] D. P. Huttenlocher, et. al., Comparing Images Using the Hausdorff Distance. IEEE Trans, PAMI 15 (9) (Sept 1993) 850863 [5] H.G. Barrow, et. al., Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching, NASA Technical Report, Vision-7, p.659-670. [6]
30
Paper Survey:
A Prototype for 3-D Hand Tracking and Posture Estimation
PAMI Lab, U. of Waterloo Manglai Zhou Apr. 8, 2010
31
Overview

Present a prototype for 3-D hand tracking and dynamic gesture recognition. Objective: track the hand in a general background and to be able to recognize dynamic gestures in real time. Three phases simulation, real world video stream test, and multiple camera data fusion Suggest a road map for future development to reach the final goal.
32
Introduction:

Camera-based posture-estimation system. Data glove is used to calibrate and validate the system. (CyberGlove) Color Markers are employed to identify the gesturing hand and the fingertips
33
Block Diagram of the Prototype
34
The Proposed Approach

Three phases:
1. Graphical simulation of the hand tracking problem 2. Tracking with a real video camera and validating the accuracy of the tracking system using the CyberGlove as a reference 3. Extend to multi-cameras
35
Phase 1: Simulation

Study the feasibility single camera visionbased hand tracking 26-DOF 3-D hand model CyberGlove Square marker: palm position and orientation (global configuration) Fingertips: finger posture and joint angles (local configurations)
36
Phase 1: Simulation (Cont.)

2-D projections are used to estimate the 3-D hand posture. Based on geometric computations and inverse kinematics 3-D/2-D Feature-to-Posture Transformation
How 3-D model data are projected onto the image plane. Forward kinematics: 4X4 matrix transformation
37

2-D/3-D Feature-to-Posture Transformation

2-D marker features => hand posture hypothesis Pinhole camera model utilized Perspective geometry and its relevant constraints Finger posture: use detected finger markers to determine a reachable range by the finger along the camera view direction The reachable linear segment is then sampled at constant lengths to calculate a finger posture hypothesis by IK.
38
39

Thumb: binary search of a lookup table of all feasible end-effector positions Other fingers: solved by error model analysis technique
40
Prototype Phase 2 Facing the Reality

Many practical parameters that are different from the simulation Detection of 2-D features from acquired video frames, by utilizing segmented color and silhouette. Palm: Two colored markers (each on front and back) Fingertips: Five colored ring markers (one for each finger)
41
Prototype Phase 3 Multiple cameras

Camera sensor fusion Type 1:

posture hypothesis is generated separately, and then validated using the observation models Useful when cameras are mobile
Type 2
Geometrical transformation between camera coordinate frames is used Best orientation is used by both models
42
Conclusions

Framework presented including two steps:

Posture hypothesis and validation

The framework provides reasonable results, comparing to the CyberGlove Multiple cameras help cover more area and improve tracking accuracy Handles intermittent occlusion for a short time Future work: 3-D marker-less hand tracking
43
Comments:

A prototype of 3-D model-based hand tracking in a general environment with unconstrained background. Recognize dynamic gestures in real-time. Dataglove is used to validate the proposed framework. Colored markers are used to assist palm and finguretip recognition.
44
Comments (Cont.):

Lack of palm identification of bare hands Hand selhouette and skin color for hand orientation estimation Marker-less edge/contour detection for fingertips Elbow, arm and shoulder info may be used to reduce the dimension of matching of 3-D hand model
45
3D Model-Based Hand Gesture Recognition and Tracking

Questions...... Comments...... Suggestions......
46

HGR Progress Presentation Apr 8

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HGR Progress Presentation Apr 8

Uploaded by

Copyright:

Available Formats

3D Model-Based Hand Gesture Recognition and Tracking

PAMI Lab, U. of Waterloo Manglai Zhou Apr. 8, 2010

Vision-based hand gesture recognition

Two representations: Appearance-based and 3-D model-based Two steps:

Static posture recognition Gesture understanding (semantics)

2. Human Hand Modeling

2. Human Hand Modeling

2. Human Hand Modeling

2. Human Hand Modeling

3-D hand model: pose1: open palm,

2. Human Hand Modeling

3-D hand model: pose3: pointing,

2. Human Hand Modeling

3. Feature Selection and Extraction

3. Feature Selection and Extraction

3. Feature Selection and Extraction

Hand feature: silhouette images pose1: open palm, pose2: fist

3. Feature Selection and Extraction

Hand feature: silhouette images pose3: pointing, pose4: victory

3. Feature Selection and Extraction

4. Model-Based Hand Posture Recognition

4. Model-Based Hand Posture Recognition

4. Model-Based Hand Posture Recognition

4. Model-Based Hand Posture Recognition

An example of DT image (for the V pose)

4. Model-Based Hand Posture Recognition

Single frame pose estimation:

4. Model-Based Hand Posture Recognition

Hand Pose Classification:

Image database indexing:

5. Hand Motion Tracking

5. Hand Motion Tracking

Pose Estimation Frame k Feature Extraction Observed Features

5. Hand Motion Tracking

Tree-based filtering and searching

5. Hand Motion Tracking

5. Hand Motion Tracking

Hierarchical partitioning of the state space

5. Hand Motion Tracking

Block Diagram of the Prototype

The Proposed Approach

Phase 1: Simulation (Cont.)

Phase 1: Simulation (Cont.)

2-D/3-D Feature-to-Posture Transformation

Phase 1: Simulation (Cont.)

Phase 1: Simulation (Cont.)

Prototype Phase 2 Facing the Reality

Prototype Phase 3 Multiple cameras

Camera sensor fusion Type 1:

Framework presented including two steps:

3D Model-Based Hand Gesture Recognition and Tracking

Questions...... Comments...... Suggestions......

You might also like