You are on page 1of 46

3D Model-Based Hand Gesture

Recognition and Tracking

PAMI Lab, U. of Waterloo


Manglai Zhou
Apr. 8, 2010
Topic

 1. Introduction
 2. Human Hand Modeling
 3. Feature Selection and Extraction
 4. Model-Based Hand Posture Recognition
 5. Hand Motion Tracking
 6. Conclusion
 Refs.

2
1. Introduction

 Hand gestures:
 Purpose of human gestures: conversational,
controlling, manipulative, and communicative.

 More natural and intuitive in CV, esp. in 3-D apps.

 As an assistive/supporting means for analyzing


human intent and identifying potential threats in a
multi-modality surveillance system (Project
MUSES_SECRET).

3
1. Introduction

 Vision-based hand gesture recognition


 Challenges:
 Highly articulated, with many joints and high DOFs
 Highly constrained: static and dynamic constraints, hard
to model
 Two representations: Appearance-based and 3-D
model-based
 Two steps:
 Static posture recognition
 Gesture understanding (semantics)

4
1. Introduction

 My work mainly concentrates on 3D model-based


hand gesture recognition
 Make use of the kinematic structure of the hand, i.e.
the pose of the palm, the angles of finger joint, etc.
 PRO:
 View independent, more appropriate for multi-camera vision
systems.
 Provide more detailed info for interpretation of hand gestures.
 CON:
 Sophisticated modeling
 Requires more intensive precessing power.

5
2. Human Hand Modeling

 Representations of a hand and 3-D model


 Human hand motion has 26 DOF
 Global configuration: six DOF, representing the pose of the
hand (position and orientation).
 Local configuration: 20 angular DOF of fingers
 DIP and PIP joint each has one degree of freedom for

rotation
 MCP joint has two degrees of freedom

 Finger motion constraints are applied to


define the ranges each finger may move
within.
6
2. Human Hand Modeling

7
2. Human Hand Modeling

 Kinematic model is augmented with shape


information to generate appearances of a hand seen
in 2D images
 An 3-D model has been built in OpenGL graphic
programming environment.
 Palm is represented by a flat, chamfered rectangular
 Each segment of fingers was approximated by a
sphere-ended cylinder with a unique dimention.
 Each joint is modeled using a rotation matrix, with a
pre-defined range (constraint).

8
2. Human Hand Modeling

 3-D hand model:


 pose1: open palm, pose2: fist

9
2. Human Hand Modeling

 3-D hand model:


 pose3: pointing, pose4: victory

10
2. Human Hand Modeling

 All 20 local DOFs are modeled with static and


dynamic constraints.
 Different fingers are color-coded just for easy
identification. Actual models will use skin color.
 2-D projections of any posture at any angle can be
easily obtained by manipulating the model in 3-D
space and performing a perspective projection.
 For global configuration, only one DOF is
implemented: rotation along virtical axis.

11
3. Feature Selection and Extraction

 Selection of image features and method of


extraction have significant impact on the
overall system performances.

12
3. Feature Selection and Extraction

 High-level features
 Fingertips, fingers, joint locations, etc.
 Intuitive representation, efficient processing.
 hard to extract
 Low-level features
 Colors, contours, edges, silhouette, etc.
 Skin color segmentation
 Distance metric: Chamfer matching
 Easier to obtain; sensitive to finger/palm angles

13
3. Feature Selection and Extraction

 Hand feature: silhouette images


 pose1: open palm, pose2: fist

14
3. Feature Selection and Extraction

 Hand feature: silhouette images


 pose3: pointing, pose4: victory

15
3. Feature Selection and Extraction

 Skin color segmentation


 Canny edge detector (Implemented)
 Hand shape normalization (dimension)

 3D features:
 Stereo cameras obtain 3D images
 Depth info helping for cluttered backgrounds
 Acquired surface is matched to the model surface

16
4. Model-Based Hand Posture
Recognition
 A hand appears very different at different
orientation or viewpoint
 Database approach: Efficient searching and
accurate indexing of image database
 Template matching: Chamfer distance

 Where ||x – y|| denotes the Euclidean distance


between 2 pixel locations x and y

17
4. Model-Based Hand Posture
Recognition
 Distance-transform (DT)
 Approximation of Euclidean distance in 2-D/3-D
 Distance mask (x3): b a b
 // int a = 3; a 0 a
b a b
 // int b = 4;
 DT generates a new image, in which pixel value
gives the distance to the nearest edge.
 Efficient algorithms to compute. Calculated only
once for each frame.

18
4. Model-Based Hand Posture
Recognition
 Edge model of the target
image is superimposed
onto the distance image.

 Avg/Max of distance
values that edge model
hits gives Chamfer
Distance.

19
4. Model-Based Hand Posture
Recognition
 An example of DT image (for the V pose)

20
4. Model-Based Hand Posture
Recognition
 Single frame pose estimation:
 The estimation from one image or multiple images
of different views.
 Hand orientation determined first.
 Search over all possible configurations, given the
hand orientation and motion constraints.

21
4. Model-Based Hand Posture
Recognition
 Hand Pose Classification:
 The classifier is trained by a large number of
labeled poses, which can be generated by
artificial 3D hand models.

 Image database indexing:


 Indexing to improve searching large databases of
templates
 Quickly search for the nearest neighbor(s) of a
given input

22
5. Hand Motion Tracking

 Hand gesture: a sequence of hand/fingure


motion that bears certain meaning.
 Two types of human hand tracking:
 1. Single hypothesis tracking
 2. Multiple hypotheses tracking (MHT)
 The configuration space can be represented
as a tree.
 Tree structures improve processing by
employing fast hierarchical searches.

23
5. Hand Motion Tracking
Frame 0 Pose Estimation

Initialization Predicted Frame k


Pose
Prediction Calculation of
Model Features Feature Extraction
Model Observed
Features Features
Error Calculation

Search for Match


Best State
Updated State

Model-based tracking

24
5. Hand Motion Tracking

 Bayesian tracking
 Multi-resolution partitioning of the state space.
 Particle filtering
 Approximate arbitrary distributions with a set of
random samples.
 Deal with clutter and ambiguous situations more
effectively, by multiple hypotheses.
 Tree-based filtering and searching
 Cluster prototype: a group of similar shape
templates.
25
5. Hand Motion Tracking

 Tracking: Bayesian inference problem:


 xt   n - internal parameters of an object at time t
 zt   m - measurement obtained.
 state estimation

p ( xt | z1:t 1 )   p ( xt | xt 1 ) p( xt 1 | z1:t 1 )dxt 1


1
p( xt | z1:t )  ct p( zt | xt ) p( xt | z1:t 1 )

where ct   p( zt | xt ) p( xt | z1:t 1 ) dxt

26
5. Hand Motion Tracking
 Hierarchical partitioning of the state space

27
5. Hand Motion Tracking

 Challenges:

 How to adapt the hand model to specific target?


 How to establish correspondences and combine
(fuse) image data from multiple cameras in a 3-D
framework?
 How good an algorithm handles occlusions and
performs in highly cluttered environment?
 How to interpret the semantic meanings of a hand
gesture?

28
6. Conclusion

 1. Hand gesture recognition is challenging,


due to its complex articulate and constraints,
high DOF, and heavy self-occlusion .
 2. 3-D model-based recognition is suitable in
multi-camera vision-based systems.
 3. Global config of hand should be
determined first to reduce the search space.
 Particle filtering and tree-based searching
help improve tracking robustness and
conquer the computation hurdles.
29
References:

 [1] Ying Wu and Thomas S. Huang, Hand modeling, analysis and


recognition For Vision-Based Human Computer Interaction. IEEE Signal
Processing Mag, May 2001, p. 51-60
 [2] A. Erol, et al, Vision-based hand pose estimation: A review.
Computer Vision and Image Understanding 108 (2007) 52–73
 [3] M. Potamias and V. Athitsos, Nearest Neighbor Search Methods for
Handshape Recognition. PETRA’08 July 1519, 2008, Athens, Greece
 [4] D. P. Huttenlocher, et. al., Comparing Images Using the Hausdorff
Distance. IEEE Trans, PAMI 15 (9) (Sept 1993) 850–863
 [5] H.G. Barrow, et. al., Parametric Correspondence and Chamfer
Matching: Two New Techniques for Image Matching, NASA Technical
Report, Vision-7, p.659-670.
 [6]

30
Paper Survey:

A Prototype for 3-D Hand Tracking and


Posture Estimation

PAMI Lab, U. of Waterloo


Manglai Zhou
Apr. 8, 2010
31
Overview
 Present a prototype for 3-D hand tracking and
dynamic gesture recognition.
 Objective: track the hand in a general
background and to be able to recognize
dynamic gestures in real time.
 Three phases — simulation, real world video
stream test, and multiple camera data fusion
 Suggest a road map for future development to
reach the final goal.
32
Introduction:
 Camera-based posture-estimation system.
 Data glove is used to calibrate and validate
the system. (CyberGlove)
 Color Markers are employed to identify the
gesturing hand and the fingertips

33
Block Diagram of the Prototype

34
The Proposed Approach
 Three phases:
 1. Graphical simulation of the hand tracking
problem
 2. Tracking with a real video camera and validating
the accuracy of the tracking system using the
CyberGlove as a reference
 3. Extend to multi-cameras

35
Phase 1: Simulation
 Study the feasibility single camera vision-
based hand tracking
 26-DOF 3-D hand model
 CyberGlove
 Square marker: palm position and orientation
(global configuration)
 Fingertips: finger posture and joint angles
(local configurations)

36
Phase 1: Simulation (Cont.)
 2-D projections are used to estimate the 3-D
hand posture.
 Based on geometric computations and inverse
kinematics
 3-D/2-D Feature-to-Posture Transformation
 How 3-D model data are projected onto the image
plane.
 Forward kinematics: 4X4 matrix transformation

37
Phase 1: Simulation (Cont.)
 2-D/3-D Feature-to-Posture Transformation
 2-D marker features => hand posture hypothesis
 Pinhole camera model utilized
 Perspective geometry and its relevant constraints
 Finger posture: use detected finger markers to
determine a reachable range by the finger along the
camera view direction
 The reachable linear segment is then sampled at
constant lengths to calculate a finger posture
hypothesis by IK.

38
Phase 1: Simulation (Cont.)

39
Phase 1: Simulation (Cont.)
 Thumb: binary search of a lookup table of all
feasible end-effector positions
 Other fingers: solved by error model analysis
technique

40
Prototype Phase 2 – Facing the
Reality
 Many practical parameters that are different
from the simulation
 Detection of 2-D features from acquired video
frames, by utilizing segmented color and
silhouette.
 Palm: Two colored markers (each on front and
back)
 Fingertips: Five colored ring markers (one for
each finger)
41
Prototype Phase 3 – Multiple cameras
 Camera sensor fusion
 Type 1:
 posture hypothesis is generated separately, and
then validated using the observation models
 Useful when cameras are mobile
 Type 2
 Geometrical transformation between camera
coordinate frames is used
 Best orientation is used by both models

42
Conclusions
 Framework presented including two steps:
 Posture hypothesis and validation
 The framework provides reasonable results,
comparing to the CyberGlove
 Multiple cameras help cover more area and
improve tracking accuracy
 Handles intermittent occlusion for a short time
 Future work: 3-D marker-less hand tracking

43
Comments:
 A prototype of 3-D model-based hand tracking
in a general environment with unconstrained
background.
 Recognize dynamic gestures in real-time.
 Dataglove is used to validate the proposed
framework.
 Colored markers are used to assist palm and
finguretip recognition.

44
Comments (Cont.):
 Lack of palm identification of bare hands
 Hand selhouette and skin color for hand
orientation estimation
 Marker-less edge/contour detection for
fingertips
 Elbow, arm and shoulder info may be used to
reduce the dimension of matching of 3-D hand
model

45
3D Model-Based Hand Gesture
Recognition and Tracking
 Questions......

 Comments......

 Suggestions......

46

You might also like