HGR Progress Presentation Apr 8

3D Model-Based Hand Gesture
Recognition and Tracking
PAMI Lab, U. of Waterloo

Manglai Zhou
Apr. 8, 2010
Topic
 1. Introduction
 2. Human Hand Modeling
 3. Feature Selection and Extraction
 4. Model-Based Hand Posture Recognition
 5. Hand Motion Tracking
 6. Conclusion
 Refs.
2
1. Introduction
 Hand gestures:
 Purpose of human gestures: conversational,
controlling, manipulative, and communicative.
 More natural and intuitive in CV, esp. in 3-D apps.
 As an assistive/supporting means for analyzing

human intent and identifying potential threats in a
multi-modality surveillance system (Project
MUSES_SECRET).
3
1. Introduction
 Vision-based hand gesture recognition

 Challenges:
 Highly articulated, with many joints and high DOFs
 Highly constrained: static and dynamic constraints, hard
to model
 Two representations: Appearance-based and 3-D
model-based
 Two steps:
 Static posture recognition
 Gesture understanding (semantics)
4
1. Introduction
 My work mainly concentrates on 3D model-based

hand gesture recognition
 Make use of the kinematic structure of the hand, i.e.
the pose of the palm, the angles of finger joint, etc.
 PRO:
 View independent, more appropriate for multi-camera vision
systems.
 Provide more detailed info for interpretation of hand gestures.
 CON:
 Sophisticated modeling
 Requires more intensive precessing power.
5
2. Human Hand Modeling
 Representations of a hand and 3-D model

 Human hand motion has 26 DOF
 Global configuration: six DOF, representing the pose of the
hand (position and orientation).
 Local configuration: 20 angular DOF of fingers
 DIP and PIP joint each has one degree of freedom for
rotation
 MCP joint has two degrees of freedom
 Finger motion constraints are applied to

define the ranges each finger may move
within.
6
7
 Kinematic model is augmented with shape

information to generate appearances of a hand seen
in 2D images
 An 3-D model has been built in OpenGL graphic
programming environment.
 Palm is represented by a flat, chamfered rectangular
 Each segment of fingers was approximated by a
sphere-ended cylinder with a unique dimention.
 Each joint is modeled using a rotation matrix, with a
pre-defined range (constraint).
8
 3-D hand model:

 pose1: open palm, pose2: fist
9
 3-D hand model:

 pose3: pointing, pose4: victory
10
 All 20 local DOFs are modeled with static and

dynamic constraints.
 Different fingers are color-coded just for easy
identification. Actual models will use skin color.
 2-D projections of any posture at any angle can be
easily obtained by manipulating the model in 3-D
space and performing a perspective projection.
 For global configuration, only one DOF is
implemented: rotation along virtical axis.
11
3. Feature Selection and Extraction
 Selection of image features and method of

extraction have significant impact on the
overall system performances.
12
 High-level features
 Fingertips, fingers, joint locations, etc.
 Intuitive representation, efficient processing.
 hard to extract
 Low-level features
 Colors, contours, edges, silhouette, etc.
 Skin color segmentation
 Distance metric: Chamfer matching
 Easier to obtain; sensitive to finger/palm angles
13
 Hand feature: silhouette images

 pose1: open palm, pose2: fist
14
 Hand feature: silhouette images

 pose3: pointing, pose4: victory
15
 Skin color segmentation

 Canny edge detector (Implemented)
 Hand shape normalization (dimension)
 3D features:
 Stereo cameras obtain 3D images
 Depth info helping for cluttered backgrounds
 Acquired surface is matched to the model surface
16
4. Model-Based Hand Posture
Recognition
 A hand appears very different at different
orientation or viewpoint
 Database approach: Efficient searching and
accurate indexing of image database
 Template matching: Chamfer distance
 Where ||x – y|| denotes the Euclidean distance

between 2 pixel locations x and y
17
Recognition
 Distance-transform (DT)
 Approximation of Euclidean distance in 2-D/3-D
 Distance mask (x3): b a b
 // int a = 3; a 0 a
b a b
 // int b = 4;
 DT generates a new image, in which pixel value
gives the distance to the nearest edge.
 Efficient algorithms to compute. Calculated only
once for each frame.
18
Recognition
 Edge model of the target
image is superimposed
onto the distance image.
 Avg/Max of distance
values that edge model
hits gives Chamfer
Distance.
19
Recognition
 An example of DT image (for the V pose)
20
Recognition
 Single frame pose estimation:
 The estimation from one image or multiple images
of different views.
 Hand orientation determined first.
 Search over all possible configurations, given the
hand orientation and motion constraints.
21
Recognition
 Hand Pose Classification:
 The classifier is trained by a large number of
labeled poses, which can be generated by
artificial 3D hand models.
 Image database indexing:

 Indexing to improve searching large databases of
templates
 Quickly search for the nearest neighbor(s) of a
given input
22
5. Hand Motion Tracking
 Hand gesture: a sequence of hand/fingure

motion that bears certain meaning.
 Two types of human hand tracking:
 1. Single hypothesis tracking
 2. Multiple hypotheses tracking (MHT)
 The configuration space can be represented
as a tree.
 Tree structures improve processing by
employing fast hierarchical searches.
23
Frame 0 Pose Estimation
Initialization Predicted Frame k

Pose
Prediction Calculation of
Model Features Feature Extraction
Model Observed
Features Features
Error Calculation
Search for Match

Best State
Updated State
Model-based tracking
24
 Bayesian tracking
 Multi-resolution partitioning of the state space.
 Particle filtering
 Approximate arbitrary distributions with a set of
random samples.
 Deal with clutter and ambiguous situations more
effectively, by multiple hypotheses.
 Tree-based filtering and searching
 Cluster prototype: a group of similar shape
templates.
25
 Tracking: Bayesian inference problem:

 xt   n - internal parameters of an object at time t
 zt   m - measurement obtained.
 state estimation
p ( xt | z1:t 1 )   p ( xt | xt 1 ) p( xt 1 | z1:t 1 )dxt 1

1
p( xt | z1:t )  ct p( zt | xt ) p( xt | z1:t 1 )
where ct   p( zt | xt ) p( xt | z1:t 1 ) dxt
26
 Hierarchical partitioning of the state space
27
 Challenges:
 How to adapt the hand model to specific target?

 How to establish correspondences and combine
(fuse) image data from multiple cameras in a 3-D
framework?
 How good an algorithm handles occlusions and
performs in highly cluttered environment?
 How to interpret the semantic meanings of a hand
gesture?
28
6. Conclusion
 1. Hand gesture recognition is challenging,

due to its complex articulate and constraints,
high DOF, and heavy self-occlusion .
 2. 3-D model-based recognition is suitable in
multi-camera vision-based systems.
 3. Global config of hand should be
determined first to reduce the search space.
 Particle filtering and tree-based searching
help improve tracking robustness and
conquer the computation hurdles.
29
References:
 [1] Ying Wu and Thomas S. Huang, Hand modeling, analysis and

recognition For Vision-Based Human Computer Interaction. IEEE Signal
Processing Mag, May 2001, p. 51-60
 [2] A. Erol, et al, Vision-based hand pose estimation: A review.
Computer Vision and Image Understanding 108 (2007) 52–73
 [3] M. Potamias and V. Athitsos, Nearest Neighbor Search Methods for
Handshape Recognition. PETRA’08 July 1519, 2008, Athens, Greece
 [4] D. P. Huttenlocher, et. al., Comparing Images Using the Hausdorff
Distance. IEEE Trans, PAMI 15 (9) (Sept 1993) 850–863
 [5] H.G. Barrow, et. al., Parametric Correspondence and Chamfer
Matching: Two New Techniques for Image Matching, NASA Technical
Report, Vision-7, p.659-670.
 [6]
30
Paper Survey:
A Prototype for 3-D Hand Tracking and

Posture Estimation
PAMI Lab, U. of Waterloo

Manglai Zhou
Apr. 8, 2010
31
Overview
 Present a prototype for 3-D hand tracking and
dynamic gesture recognition.
 Objective: track the hand in a general
background and to be able to recognize
dynamic gestures in real time.
 Three phases — simulation, real world video
stream test, and multiple camera data fusion
 Suggest a road map for future development to
reach the final goal.
32
Introduction:
 Camera-based posture-estimation system.
 Data glove is used to calibrate and validate
the system. (CyberGlove)
 Color Markers are employed to identify the
gesturing hand and the fingertips
33
Block Diagram of the Prototype
34
The Proposed Approach
 Three phases:
 1. Graphical simulation of the hand tracking
problem
 2. Tracking with a real video camera and validating
the accuracy of the tracking system using the
CyberGlove as a reference
 3. Extend to multi-cameras
35
Phase 1: Simulation
 Study the feasibility single camera vision-
based hand tracking
 26-DOF 3-D hand model
 CyberGlove
 Square marker: palm position and orientation
(global configuration)
 Fingertips: finger posture and joint angles
(local configurations)
36
Phase 1: Simulation (Cont.)
 2-D projections are used to estimate the 3-D
hand posture.
 Based on geometric computations and inverse
kinematics
 3-D/2-D Feature-to-Posture Transformation
 How 3-D model data are projected onto the image
plane.
 Forward kinematics: 4X4 matrix transformation
37
 2-D/3-D Feature-to-Posture Transformation
 2-D marker features => hand posture hypothesis
 Pinhole camera model utilized
 Perspective geometry and its relevant constraints
 Finger posture: use detected finger markers to
determine a reachable range by the finger along the
camera view direction
 The reachable linear segment is then sampled at
constant lengths to calculate a finger posture
hypothesis by IK.
38
39
 Thumb: binary search of a lookup table of all
feasible end-effector positions
 Other fingers: solved by error model analysis
technique
40
Prototype Phase 2 – Facing the
Reality
 Many practical parameters that are different
from the simulation
 Detection of 2-D features from acquired video
frames, by utilizing segmented color and
silhouette.
 Palm: Two colored markers (each on front and
back)
 Fingertips: Five colored ring markers (one for
each finger)
41
Prototype Phase 3 – Multiple cameras
 Camera sensor fusion
 Type 1:
 posture hypothesis is generated separately, and
then validated using the observation models
 Useful when cameras are mobile
 Type 2
 Geometrical transformation between camera
coordinate frames is used
 Best orientation is used by both models
42
Conclusions
 Framework presented including two steps:
 Posture hypothesis and validation
 The framework provides reasonable results,
comparing to the CyberGlove
 Multiple cameras help cover more area and
improve tracking accuracy
 Handles intermittent occlusion for a short time
 Future work: 3-D marker-less hand tracking
43
Comments:
 A prototype of 3-D model-based hand tracking
in a general environment with unconstrained
background.
 Recognize dynamic gestures in real-time.
 Dataglove is used to validate the proposed
framework.
 Colored markers are used to assist palm and
finguretip recognition.
44
Comments (Cont.):
 Lack of palm identification of bare hands
 Hand selhouette and skin color for hand
orientation estimation
 Marker-less edge/contour detection for
fingertips
 Elbow, arm and shoulder info may be used to
reduce the dimension of matching of 3-D hand
model
45
3D Model-Based Hand Gesture
Recognition and Tracking
 Questions......
 Comments......
 Suggestions......
46

HGR Progress Presentation Apr 8

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HGR Progress Presentation Apr 8

Uploaded by

Copyright:

Available Formats

3D Model-Based Hand Gesture

Recognition and Tracking

PAMI Lab, U. of Waterloo

 More natural and intuitive in CV, esp. in 3-D apps.

 As an assistive/supporting means for analyzing

 Vision-based hand gesture recognition

 My work mainly concentrates on 3D model-based

 Representations of a hand and 3-D model

 Finger motion constraints are applied to

 Kinematic model is augmented with shape

 3-D hand model:

 3-D hand model:

 All 20 local DOFs are modeled with static and

 Selection of image features and method of

 Hand feature: silhouette images

 Hand feature: silhouette images

 Skin color segmentation

 Where ||x – y|| denotes the Euclidean distance

 Image database indexing:

 Hand gesture: a sequence of hand/fingure

Initialization Predicted Frame k

Search for Match

 Tracking: Bayesian inference problem:

p ( xt | z1:t 1 )   p ( xt | xt 1 ) p( xt 1 | z1:t 1 )dxt 1

where ct   p( zt | xt ) p( xt | z1:t 1 ) dxt

 How to adapt the hand model to specific target?

 1. Hand gesture recognition is challenging,

 [1] Ying Wu and Thomas S. Huang, Hand modeling, analysis and

A Prototype for 3-D Hand Tracking and

PAMI Lab, U. of Waterloo

You might also like