Action Recognition

© All Rights Reserved

4 views

Action Recognition

© All Rights Reserved

- Beyond the Basics
- SPIE_EI_9026_2014_Poster
- PersonReIdentification_Results
- SPIE DSS 2013 Presentation
- ADAMS Vibration Theory
- Pca Portfolio Selection
- UD Stander Symposium 2014
- SIAM 2014 InvitedTalk
- Gesture Recognition with a Focus on Important Actions by Using a Path Searching Method in Weighted Graph
- (2007) Robust Smooth Feature Extraction From Point Clouds
- SMC 2013 Presentation
- Ordinary Differential Equations
- A Distinct Approach for Human Face Emotion Detection Using PCA-BFO Method
- Investigating Pianists' Individuality in the Performance of Five Timbral Nuances
- Max Dama Automated Trading
- Max Dama on Automated Trading
- IJAIEM-2014-02-28-079
- hsc1967-1974
- Helicopter Stability
- Air Quality Preterm Birth China

You are on page 1of 18

IEA-AIE 2012

Binu M Nair, Vijayan K Asari

06/10/2012

Contents of Presentation

Introduction

Proposed Methodology

Shape Representation Using HOG

Computation of Reduced Posture Space using PCA

Weizmann Action Dataset

Cambridge Hand Gesture Dataset

INTRODUCTION

Introduction

Action Recognition

Widely Researched Area

Potential applications in Security and Surveillance

Literature Survey

Some of the research work was on space time shapes with different kinds of

representation

Gorelick et. al Representing space time shape using Poissons equation and extracting

stick/plate like structures.

Nair et. al - Representing space time shape using 3D Distance transform along with the

R-Transform at multiple levels.

Representing space time shape as a collection of spatio-temporal words in a bag of words

model. (Neibles et.al using probabilistic Latent Semantic Analysis model).

Batra et.al characterizes a space time shape as a histogram in a dictionary of space time

shapelets which are local motion patterns.

Scovannar et al. represents a spatio-temporal word by a 3D SIFT region descriptor in a

bag of words model.

Introduction

Literature Survey

Some of the research work was done as a tracking problem i.e. to track

suitable points in the human body

Ali et al. used concepts from Chaos Theory to reconstruct the phase space from each

trajectories and compute the dynamic and metric invariants.

Some of the latest work characterizes human action sequences as multidimensional arrays called tensors and use these as the basis for feature

extraction.

Kim et al. presents a framework called Tensor Canonical Correlation Analysis where

descriptive similarity features between two video volumes(tensors) are used.

Lui et al. studied the underlying geometry of the tensor space and performed

factorization on this space to obtain product manifolds and the comparing using the

geodesic measure

Proposed Work

Models the feature variations extracted from a frame with respect to time.

In other words, find an underlying manifold in the feature space which

captures the temporal variance needed for discriminating between action

sequences.

Classifies a set of contiguous frames irrespective of the speed of the action or

the time instant of the body posture.

PROPOSED METHODOLOGY

METHODOLOGY

Proposed Methodology

1.

2.

Feature Extraction Shape Descriptor computed for the region of interest in a frame.

Computation of an appropriate reduced dimensional space which spans the change of

shape occurring across time.

Suitable Modeling of the mapping from the feature space to the reduced space.

3.

Feature

Extraction

Reduced

Space

Computation

Modeling

It is partially invariant to illumination

The inter-frame variation of the shape descriptors across the frame is maximized in the Eigen space.

These variations indirectly corresponds to the variation occurring in the silhouette (body posture

changes) which differs with different action sequences.

To model the mapping from the shape descriptor or feature space to the Eigen space for each

action class, we use a regression based network such as generalized regression neural

network(GRNN).

Back propagation neural networks take a lot of time for training and may not often converge while the

GRNN is based on the radial basis functions which is a one-pass training algorithm and converges to a

stable state.

Proposed Methodology

A set of N frames (complete sequence where N=60 or a partial sequence where N = 15) consisting of

segmented body regions (silhouettes).

From each frame, the Histogram of gradient is computed and accumulated for each frame of the sequences

from all the action classes.

The Eigen space is obtained by performing PCA on the accumulated features and suitable reduced

representations of the features are obtained.

Each Model from 1 to M where M is the number of action classes is represented by one GRNN network.

It is trained by using the HOG descriptors of a frame as input and the corresponding representation in the Eigen space as

the output.

In short, each GRNN models the mapping from the HOG space to the Eigen space.

Testing Phase

From an input set of frames from a test sequence, the corresponding HOG descriptors are computed.

The representation of the HOG descriptors in the Eigen space is computed by projecting it into the Eigen

space (Obtained in the Training) and this is taken as the reference.

The estimation of the reduced representation by each of the GRNN is then compared with the reference

representation and the model which gives the closest match is considered as the class of the test action

sequence.

Image is divided into overlapping K blocks.

Orientation of the gradient is divided into n bins.

For each block, the histogram of the orientation weighted by the magnitude is computed.

Histograms from the various blocks are combined and normalized. (Fig 2. and Fig.3

illustrates the HOG for a particular frame)

Binary Silhouette

Produces HOG which is discriminative and more localized due to block operation. Thus, the

body posture is represented in a discriminative and localized manner.

Noise present in the gradient image due to illumination variations in the image.

Some noise is reflected onto the HOG descriptor but since the HOG are partially illumination

invariant due to the normalization, the feature descriptors do not vary much.

Each action class is color-coded to illustrate how close the action manifolds are and there exists a small separation

betweeen them.

A lot of overlaps are present between the manifolds and the aim is to use a functional mapping for each manifold

to distinguish them.

We denote the action space as

, - corresponding HOG descriptor of dimension 1

The Eigen space is obtained by performing PCA on the matrix = [1,1 1,2 1,3 (), ] to get the Eigen

vectors 1 , 2 . corresponding to the largest variances between the HOG descriptors.

These Eigen vectors with the highest Eigenvalues corresponds to the direction along which the temporal variance

between the HOG descriptors is maximum.

represented as where

= {, : = 1 }

vector representing a point in the Eigen space

GRNN is a one pass learning algorithm which provides fast convergence to the optimal

regression surface.

It is memory intensive and so we train the GRNN with the cluster centers obtained from Kmeans clustering.

Each GRNN is represented by the equation as

Where (, , , ) are the cluster centers in the HOG descriptor space and the Eigen space.

Selection of the standard deviation for the radial basis function for each action class is taken

as the median Euclidean distance between the corresponding actions cluster centers.

Classification

The set of HOG descriptors from consecutive frames of a test sequence are projected onto

the Eigenspace to get the corresponding projections(reference) given by : 1 .

This is compared with the estimated projections of the corresponding frames by

each of the GRNN action model using Mahalanobis distance measure.

The action model which gives the closest estimate to the reference projections is the class.

7 7 overlapping cells

9 orientation bins

Normalized by taking the L2-norm

The histograms from each block are combined to form a feature vector of size 441 1

The Cambridge Hand Gesture Dataset.

Action Dataset consists of 10 action classes and each action class has 9-10 video sequences.

Each video sequence of an action is performed by a different individual.

There is variation in size of person and speed of motion

It has 3 main action classes corresponding to different shapes of the hand. Each of this class is furthur

divided by the motion of the hand. In short, there are 9 different action classes.

a1 - bend; a2 - jplace ; a3 - jack ; a4 - jforward ; a5 - run ; a6 - side ; a7 - wave1 ; a8 - skip ; a9 wave2 ; a10 walk

The test sequence is divided into overlapping windows of size with an overlap

of 1

Testing is done using the Leave-10-sequence out strategy. In short, all the partial

sequences corresponding to the test sequence are left out of the training.

On the left shows the Confusion Matrix and on the right shows the average

accuracy obtained with the framework for different window sizes of

=10,12,15,18,20,23,25,28 and 30.

This database has 5 different sets with each set corresponding to a different kind of illumination.

Each action class has 20 sequences.

For each sequence, skin segmentation was done in order to get the region of interest and for

centering the region in the image.

The HOG descriptor extracted contains noise variation due to different illumination conditions.

The testing strategy used was the leave-9- out test sequences where each test sequence

corresponds to an action class.

The confusion matrix shown on the left is obtained by considering 4 clusters during training.

If all the illumination conditions are trained into the system, the overall accuracy is higher.

Testing was done on each set individually and the overall accuracy computed for each set as

shown on the right.

For set 1, the overall accuracy is high as the lighting in set 1 is pretty uniform but for sets 2,3,4,5

gives moderate overall accuracy due to extreme non-linear lighting conditions.

WORK

The framework is invariant to the speed of the action being performed.

Results shows good accuracy on the Weizmann database but on the Cambridge

hand gesture database, the illumination condition affects the accuracy.

Severe illumination conditions as seen in the Hand gesture database affects the

HOG space and thus the Eigen space is more tuned to the noise.

Our Future work is to develop and use a descriptor which represents a shape

from a set of corner points where the relationships between them are determined

in the spatial and temporal scale.

Other regression techniques and classification methodology will also be

investigated.

Thank You

Questions?

Please contact

Binu M Nair : nairb1@udayton.edu

- Beyond the BasicsUploaded bybiarca8361
- SPIE_EI_9026_2014_PosterUploaded bybinuq8usa
- PersonReIdentification_ResultsUploaded bybinuq8usa
- SPIE DSS 2013 PresentationUploaded bybinuq8usa
- ADAMS Vibration TheoryUploaded bydrkwing
- Pca Portfolio SelectionUploaded byluli_kbrera
- UD Stander Symposium 2014Uploaded bybinuq8usa
- SIAM 2014 InvitedTalkUploaded bybinuq8usa
- Gesture Recognition with a Focus on Important Actions by Using a Path Searching Method in Weighted GraphUploaded byIJCSI Editor
- (2007) Robust Smooth Feature Extraction From Point CloudsUploaded byDaniel Siqueira
- SMC 2013 PresentationUploaded bybinuq8usa
- Ordinary Differential EquationsUploaded byManjunath TC
- A Distinct Approach for Human Face Emotion Detection Using PCA-BFO MethodUploaded byInnovative Research Publications
- Investigating Pianists' Individuality in the Performance of Five Timbral NuancesUploaded byAriel Spalletti
- Max Dama Automated TradingUploaded byDom DeSicilia
- Max Dama on Automated TradingUploaded bytmshu1
- IJAIEM-2014-02-28-079Uploaded byAnonymous vQrJlEN
- hsc1967-1974Uploaded byAbhishekMaran
- Helicopter StabilityUploaded bymanikandan_murugaiah
- Air Quality Preterm Birth ChinaUploaded byGanda Saputra
- 1-s2.0-S0963996913000227-mainUploaded byMilagritos Sánchez De Oro
- reference PCAUploaded byizem_amazigh
- comparirive study of all hand gesture detection algorithmsUploaded byniraj_chandrani
- Book Chapter DG_0504 Irina 13 04Uploaded byHussein Razaq
- Annecddddddddddsssssssssssssssssssssssssssssssssssssssxure IIIdsssssssssssssssUploaded byebe
- Shankar - QM CookbookUploaded bybdefelis
- Diff EQ Exam ReviewUploaded byAnkim Nguyen
- Eigenvalues and Eigenvectors2Uploaded bywallaa
- 90322855-CT5123-LectureNotesThe-Finite-Element-Method-an-Introduction.pdfUploaded byarjun
- obb.pdfUploaded byShubham Satpute

- Smc Ieee 2013Uploaded bybinuq8usa
- ISVC LNCS 2014 AcceptedPaperUploaded bybinuq8usa
- SMC 2013 PresentationUploaded bybinuq8usa
- Masters ThesisUploaded bybinuq8usa
- SIAM 2014 InvitedTalkUploaded bybinuq8usa
- VISAPP_2011Uploaded bybinuq8usa
- IEAAIE_LNAI_2012Uploaded bybinuq8usa
- UD Stander Symposium 2014Uploaded bybinuq8usa
- SPIE DSS 2013 CorrectedUploaded bybinuq8usa
- SPIE_EI_9026_2014Uploaded bybinuq8usa
- SPIE JEI Paper UnderReviewUploaded bybinuq8usa
- Isvc 2014 Paper Id548Uploaded bybinuq8usa
- CAS ScienceDirect 2011Uploaded bybinuq8usa

- Trends in Nursing InformaticsUploaded byKrishnaveni Murugesh
- Cabin Noise ControlUploaded bypravesh1992
- faultUploaded byJavi Bm
- chap_2_1438061Uploaded byashawish
- cmefacUploaded bydamorales76
- Free Php PDF LibrariesUploaded byOlivia
- Asm r2 New FeaturesUploaded byGeorge_Samaan_6480
- Dbi Ttt ExcelUploaded byChijioke Obi
- Csi Bridge Lateral Loads ManualUploaded byPrasert Boontharaksa
- BCCPA.examcollection.premium.exam.242qUploaded byBruno Nunes
- Rapport - The-State-of-Project-Management-Survey - R - 2017.pdfUploaded bycoyote41
- Notification Pune Municipal Corporation Content Writer Manager Other PostsUploaded byRohitRajak
- Guide Cisco ASR920Uploaded byZoelfikar Feby Ramdhani
- Piaggio MP3 400 User ManualUploaded bylynhaven1
- nj10wUploaded bymuszeresz
- How to Estimate Long-run Relationships in EconomicsUploaded byrunawayyy
- IRF1407Uploaded byebertecnico
- PreviewUploaded byravikrs
- Zenh1 ManualUploaded bysingulares2199
- CPDP Daily LogUploaded byNanda Pratama
- CorelDRAW - WikipediaUploaded byRobert Farkas
- STM32 Companion Chip Summary - ESD&iPadUploaded bydcesenther
- VHLP1-13.pdfUploaded bySiti Hajar Razali
- Field bus Temp.TransmitterUploaded bymeledath
- Lindsay Broadband Optical Brochure AMTUploaded bybvlahovicyu
- Simulation of Rectangular Patch Antenna with Jeans Substrate.pdfUploaded byRadhika Sethu
- Extraer Contraseñas ExcelUploaded bySheysoke
- Factor bUploaded byPaola Andrea
- s_af850Uploaded byAndrei Noro
- Jual GPS Geodetic Topcon Gr-5 RTK//Tlp.082123568182Uploaded byandyjayasurvey