Professional Documents
Culture Documents
Thesis Proposal
Feature Selection
in Video Classification
• Yan Liu
• Computer Science
• Columbia University
• Advisor: John R. Kender
1
Outline
1. Introduction • Introduction
2. Progress • Research progress
3. Proposal • Proposed work
4. Conclusion • Conclusion and schedule
2
Outline
1. Introduction • Introduction
– Motivation of feature selection in video
2. Progress
classification
3. Proposal – Definition of feature selection
4. Conclusion – Feature selection algorithm design and
evaluation
– Applications of feature selection
• Research progress
• Proposed work
• Conclusion and schedule
3
Motivation of feature selection in
video classification
1. Introduction
• The problem of efficient video data
1.1.Motivation
1.1. Motivation management is an important issue
1.2.Definition – “Semantic gap”: machine learning methods, such
1.3.Components
1.4. Applications as classification can close it
– Efficiency: reducing the dimensionality of the
2. Progress data prior to processing is necessary
3. Proposal • Feature selection in video classification not
well explored
4. Conclusion
– So far, mostly based on researchers’ intuition [A.
Vailaya 2001]
– Goal: select representative features automatically
4
Definition of feature selection
• Feature selection focuses on
1. Introduction
1.1.Motivation – Finding a feature subset that has the most discriminative
1.2.Definition information from the original feature space
1.2. Definition
1.3.Components – Objective: [Guyon 2003]
1.4. Applications • Improve the prediction performance
• Provide a more cost-effective predictor
2. Progress • Provide a better understanding of the data
5
Feature selection algorithm design
Wrapper Filter
2. Progress
Feature subset Si {f1, f2, ……..fk}, 1≦ k ≦N
3. Proposal
6
Three components of feature selection
algorithms
• Search algorithm
1. Introduction – Forward selection [Singh 1995]
1.1.Motivation
1.2.Definition
– Backward elimination [Koller 1996]
1.3.Components – Genetic algorithm [Oliveira 2001]
1.3. Components
1.4. Applications • Induction algorithm
– SVM [Bi 2003], BN [Singh 1995], kNN [Abe 2002], NN
2. Progress
[Oliveira 2001], Boosting [Das 2001]
3. Proposal – Classifier-specific [Weston 2000] and classifier-
independent feature selection [Abe 2002]
4. Conclusion • Evaluation metric
– Distance measure [H. Liu 2002], dependence measure,
consistency measure [Dash 2000], information measure
[Koller 1996]
– Predictive accuracy measure (for most wrapper methods)
7
Applying feature selection to video
classification
• Current applications with large data sets
1. Introduction
1.1.Motivation – Text categorization [Forman 2003]
1.2.Definition – Genetic microarray [Xing 2001]
1.3.Components
1.4.
– Handwritten digit recognition [Oliveira 2001]
1.4. Applications
Applications
– Web classification [Coetzee 2001]
2. Progress
• Applying existing feature selection
3. Proposal algorithms to video data
4. Conclusion
– Similar need: massive data, high
dimensionality, complex hypotheses
– Difficulty: higher requirement of time cost
– Some existing work in video classification
[Jaimes 2000]
8
Outline
1. Introduction • Introduction
2. Progress
• Research progress
2.1. BSMT – BSMT: Basic Sort-Merge Tree [Liu 2002]
2.2. CSMT
2.3. FSMT – CSMT: Complement Sort-Merge Tree [Liu 2003]
2.4. Retrieval – FSMT: Fast-converging Sort-Merge Tree [Liu
2004]
3. Proposal
– MLFS: Multi-Level Feature Selection [Liu 2003]
4. Conclusion – Fast video retrieval system [Liu 2003]
• Proposed work
• Conclusion and schedule
9
Setup Basic Sort-Merge Tree
4. Conclusion
10
Search algorithm
Combine I1(256)
1. Introduction Sort
Induce
2. Progress
2.1. BSMT
2.1.1. Search
2.1.1.Search
2.1.2. Induction
2.1.3. Time cost C1(4) C2 C64
2.1.4. Application
2.1.5. Experiment
2.2. CSMT Combine
2.3. FSMT Sort
2.4. Retrieval Induce
B1(2) B2 B3 B4 B128
Combine
3. Proposal
Sort
A1 (1) A2 A3 A4 A5 A6 A7 A8 A255 A256
4. Conclusion Induce
Low High 11
Advantages
1. Introduction
• To achieve better performance
– Avoids local optima of forward selection and
2. Progress
backward elimination
2.1. BSMT
2.1.1. Search
2.1.1.Search – Avoids heuristic randomness of genetic algorithms
2.1.2. Induction
2.1.3. Time cost • To achieve lower time cost
2.1.4. Application
2.1.5. Experiment – Search algorithm is linear in the number of
2.2. CSMT
features
2.3. FSMT
2.4. Retrieval – Enables the straightforward creation of near-
optimal feature subsets with little additional work
3. Proposal [Liu 2003]
4. Conclusion
12
Induction algorithm
• Novel combination of Fastmap and Mahalanobis
1. Introduction likelihood
2. Progress • Fastmap for dimensionality reduction [Faloutsos 1995]
2.1. BSMT – Feature extraction algorithm approximates PCA with linear
2.1.1. Search time cost
2.1.2. Induction
2.1.2.Induction – Reduces the dimensionality of feature subsets to a pre-
2.1.3. Time cost
2.1.4. Application specified small number
2.1.5. Experiment • Mahalanobis maximum likelihood for classification
2.2. CSMT [Duda 2000]
2.3. FSMT – Computes the likelihood that a point belongs to a
2.4. Retrieval distribution that is modeled as a multidimensional Gaussian
with arbitrary covariance
3. Proposal
– Works well for video domain
4. Conclusion
13
Applications to instructional video
frame categorization
1. Introduction
• Pre-processing:
– Temporally subsample: every other I frame (one
2. Progress
frame/sec)
2.1. BSMT
2.1.1. Search – Spatially subsample: six DC terms of each macro-
2.1.2. Induction
2.1.3. Time cost block
2.1.4.
2.1.4.Application
Application
2.1.5. Experiment • Feature selection
2.2. CSMT – From 300 six-dimensional features to r features
2.3. FSMT
2.4. Retrieval • Video segmentation and retrieval
3. Proposal – Classify frames or segments in the usual way
using the resulting feature subset
4. Conclusion
14
Test bed of instructional video frame
categorization
• Classify instructional video of a 75 minute lecture in
1. Introduction MPEG-1 format
– 4700 video frames with 300 six-dimensional features
2. Progress
2.1. BSMT – 400 training data for feature selection and classification
2.1.1. Search training
2.1.2. Induction – Classify to four categories
2.1.3. Time cost
2.1.4. Application
2.1.5.
2.1.5.Experiment
Experiment
2.2. CSMT
2.3. FSMT Handwriting Announcement Demo Discussion
2.4. Retrieval
• Benchmarks: Random feature selection for 100
3. Proposal times
– Experiments differ only in selected features
4. Conclusion
– Any other benchmark is intractable on video dataset 15
Accuracy improvement
1. Introduction 0.06
0.05
2. Progress Error rate 0.04
2.1. BSMT 0.03
2.1.1. Search 0.02
2.1.2. Induction 0.01
2.1.3. Time cost
0
2.1.4. Application
1 2 3 4 5 6 7 8 9 10
2.1.5.
2.1.5.Experiment
Experiment
2.2. CSMT Fastmap dimensions c from 1 to 10
2.3. FSMT
2.4. Retrieval Mean of Random Sort-Merge
3. Proposal
Comparison of frame categorization error rate using 30 (of
4. Conclusion 300) features selected by BSMT: nearly perfect!
16
Test bed of sports video retrieval
• Retrieve “pitching” frames from an entire
1. Introduction
video
2. Progress
2.1. BSMT
2.1.1. Search
2.1.2. Induction Pitching Part Competing image types
2.1.3. Time cost
2.1.4. Application – Sampled more finely: every I frame
2.1.5.
2.1.5.Experiment
Experiment
2.2. CSMT – 3600 frames for half an hour
2.3. FSMT
2.4. Retrieval • First task: binary classify 3600 video frames
• Second task: retrieve 45 “pitching” segments
3. Proposal
from 182 pre-segmented video segments
4. Conclusion
17
Accuracy improvement
Precision: percentage of items classified as positive that
1. Introduction actually are positive (left bars in graphs)
Recall: percentage of positives that are classified as positives
2. Progress
(right bars in graphs)
2.1. BSMT
2.1.1. Search Feature num ber r = 2 Feature num ber r = 8 Feature num ber r = 16 Feature num ber r = 32
2.1.2. Induction
1 1 1 1
0.6
0.7
0.6
0.7
0.6
0.7
0.6
0.4
0.5
0.4
0.5
0.4
0.5
0.4
2.1.5.
2.1.5.Experiment
Experiment
0.3
0.2
0.3
0.2
0.3
0.2
0.3
0.2
2.2. CSMT
0.1 0.1 0.1 0.1
0 0 0 0
Precision Recall Precision Recall Precision Recall Precision Recall
2.3. FSMT Mean of Random Sort-Merge Mean of Random Sort-Merge Mean of Random Sort-Merge Mean of Random Sort-Merge
20
Search algorithm
1. Introduction I1
• Leaves:
Complement
2. Progress singleton feature subsets;
2.1. BSMT • White nodes:
2.2. CSMT Sort
C1 C2 C64
unsorted feature subsets
2.2.1. Sparse train Induce • Gray nodes:
2.2.2.
2.2.2.Search
Search Complement white nodes rank-ordered
2.2.3. Experiment Sort
by performance.
2.3. FSMT B1 B2 B3 B4 B128 • Black nodes:
Induce
2.4. Retrieval pairwise mergers of gray
Complement
nodes, with pairs formed
3. Proposal Sort under the complement
A1 A2 A5 A6 A255 A256
requirement.
Induce
4. Conclusion
Illustration of CSMT for N = 256.
21
Complement test
B1 B2 B3
1. Introduction • Suppose the sorted
A1’’ A2’’ A3’’ A4’’ A5’’ A6’’ A7’‘
singletons A1’ and A3’are
2. Progress 0 0 0 0 0
paired to form pair B1
2.1. BSMT 0 0 1 0 0 • To finding a paring for A2’,
2.2. CSMT 0 0 1 1 0
examine A4,’ A5’, A6’, which
2.2.1. Sparse train 1 1 0 0 0
have the same error rate on
2.2.2.
2.2.2.Search
Search Complement
1 1 0 1 1 the m training samples
2.2.3. Experiment 1 1 1 1 1
• The bitwise OR of
2.3. FSMT . . . . . performance vectors of A2’
. . . . .
2.4. Retrieval . . . . . and A5’ maximizes the
1 1 1
performance coverage
1 1
3. Proposal
4. Conclusion
Clean training data But also, noisy training data
23
Accuracy improvement
2.2.3.
2.2.3.Experiment
Experiment 0.03 0.012
Error rate
Error rate
2.3. FSMT 0.025
0.02
0.01
0.008
0.01 0.004
0.005
3. Proposal 0
0.002
0
1 2 3 4 5 6 7 8 9 10 2 4 8 16
Fastmap dimension c (1~10) Number of features
4. Conclusion
Mean of Random CSMT Mean of Random CSMT
24
Test bed of shot classification
• Retrieve “emphasis” frames in an instructional video
1. Introduction
of MPEG-1 format
2. Progress – More subtle, semantically-defined class
2.1. BSMT – Segment into 69 shots
2.2. CSMT
2.2.1. Sparse train
2.2.2. Search
2.2.3.
2.2.3.Experiment
Experiment
2.3. FSMT
2.4. Retrieval Emphasis part Non-emphasis part
1. Introduction
2. Progress
2.1. BSMT
2.2. CSMT
2.2.1. Sparse train
2.2.2. Search
2.2.3.
2.2.3.Experiment
Experiment
2.3. FSMT
2.4. Retrieval
3. Proposal
4. Conclusion
Error rate of CSMT vs. random for retrieval of
“emphasis” with features r fixed at 16.
26
Fast-converging Sort-Merge Tree
27
Search algorithm
1. Introduction
• Initialize level = 0
– N singleton feature subsets.
2. Progress
2.1. BSMT
– Calculate R: number of features retained at
2.2. CSMT each level, based desired convergence rate and
2.3. FSMT the goal of r features at conclusion.
2.3.1.
2.3.1.Search
Search
2.3.2. Experiment • While level < log2 r +1
2.4. Retrieval
– Induce on every feature subset.
3. Proposal – Sort subsets based on information gain.
4. Conclusion
– Prune the level based on R.
– Combine, pair-wise, feature subsets form
those remaining.
28
FSMT of constant converge rate
1. Introduction
v0=1
2. Progress r1 = 16
2.1. BSMT E1 E 2
v1 = 2
2.2. CSMT
2.3. FSMT r2 = 8
D1 D2 D8
2.3.1.
2.3.1.Search
Search v2 = 8
2.3.2. Experiment r3 = 4
C 1 C2 C32
2.4. Retrieval v3 = 32
r4 = 2 B1 B2 B128
3. Proposal
v4 = 128
r5 = 1 A1 A2 A512
4. Conclusion
v5 = 512
F1 F2 F1800
29
Accuracy improvement
30
Stable performance
• Fixed sample rate: r = 8 • Same dimension: c = 4
1. Introduction • Different dimension c: • Different sample rate r:
2. Progress
from 1 to 10. from 2 to 16.
2.1. BSMT
2.2. CSMT 0 . 0 15
0.005
2.3.1. Search
0
2.3.2.
2.3.2.Experiment
Experiment 2 4 8 16
3. Proposal
M ean of Ran dom FS M T
4. Conclusion
Although FSMT is an application-driven algorithm, it does
retain some of the advantages of BSMT.
31
Coarse-fine scene segmentation using
Multi-Level Feature Selection
1. Introduction
2. Progress
2.1. BSMT
2.2. CSMT
2.3. FSMT
2.4. Retrieval
2.4.1.
2.4.1.MLFS
MLFS
2.4.2. Lazy Eval.
2.4.3. Experiment
3. Proposal The feature subset hierarchy enables less work to be done on the
segment interiors, and more costly feature subsets at segment
4. Conclusion edges.
32
How to define the parameters
1. Introduction
2. Progress
2.1. BSMT
2.2. CSMT
2.3. FSMT
2.4. Retrieval
2.4.1. MLFS
2.4.2.
2.4.2.Lazy
LazyEval.
Eval.
2.4.3. Experiment
3. Proposal
4. Conclusion
34
Efficiency improvement
• Using same test
1. Introduction bed of BSMT
• Only 3.6 features
2. Progress
are used per frame
2.1. BSMT
2.2. CSMT • With similar
2.3. FSMT performance of 30
2.4. Retrieval features selected
2.4.1. MLFS by BSMT
2.4.2. Lazy Eval.
2.4.3. Experiment
2.4.3. Experiment
3. Proposal
4. Conclusion
35
Summary of current research progress
36
Outline
1. Introduction • Introduction
2. Progress
• Research progress
3. Proposal
• Proposed work
• Improve current feature selection algorithm
4. Conclusion
• Algorithm evaluation
• New Applications
• Size of training data
• Theoretical analysis
• Conclusion and schedule
37
Improvements to current algorithms
1. Introduction
• Search algorithm
– Set up the bottom of the Sort-Merge Tree more
2. Progress efficiently
3. Proposal • Induction algorithm of feature selection
3.1. Improve
– Explore SVM: powerful for binary
3.2. Evaluation
3.3. Application classification and sparse training data
3.4. Train size – Explore HMM: good performance in temporal
3.5. Theory analysis
4. Conclusion • Evaluation metric
– Explore filter evaluation metric in the wrapper
method
38
Algorithm evaluation
• Accuracy
1. Introduction
– Classification evaluation: Error rate, Balanced Error
2. Progress Rate (BER), Received Operating Characteristic (ROC)
Curve, Area Under Curve (AUC)
3. Proposal – Video analysis evaluation: Precision, Recall, F-measure
3.1. Improve • Efficiency
3.2. Evaluation
3.3. Application
– Selected feature subsets size: Fraction of Features (FF),
3.4. Train size best size of feature subset
3.5. Theory – Time cost: of search algorithm, of induction algorithm;
stopping point
4. Conclusion • Dependence
– How to choose proper classifier
– How to compare feature selection algorithms in certain
applications
39
Dimensions to compare different
algorithms
1. Introduction
2. Progress
3. Proposal
3.1. Improve
3.2. Evaluation
3.3. Application
3.4. Train size
3.5. Theory
4. Conclusion
40
New applications
• Different original feature space
1. Introduction
– Feature fusion: put different kinds of features in one
2. Progress feature space
– High-level semantic features
3. Proposal – Temporal-spatial information
3.1. Improve – Different operates for different kinds of videos based
3.2. Evaluation on subject quality measurement [Y. Wang 2004]
3.3. Application
Applications – Content-based video compression
3.4. Train size
3.5. Theory – On-the-fly search
• Feature selection in video clustering
4. Conclusion
– Forward wrapper method to select features and filter
method to remove redundant ones [Xie 2003]
41
Extension to training data set
extremes
• Sparse training data
1. Introduction – Better feature selection algorithms
– Use training data efficiently in cross-validation
2. Progress • Massive training data
– Random selection based on two assumptions: feature subset performance
3. Proposal stability, and training set independence
3.1. Improve – Methods to extract a representative training data subset for feature selection
3.2. Evaluation • Non-balanced training data
3.3. Application – Positive examples are sparse, negative examples are massive and must be
3.4.
3.4. Train
Train size
size sampled
3.5. Theory – Video retrieval in large database is non-balanced
• One class training data
4. Conclusion – Feature selection for one class training in masquerade (computer security
violation) detection [K. Wang 2003]
– Select features using one class training in video retrieval
42
Outline
1. Introduction • Introduction
2. Progress • Research progress
3. Proposal • Proposed work
4.
4. Conclusion
Conclusion • Conclusion and schedule
– Finished work and further work
– Schedule
43
Finished work and further work
Feature Selection Applications
BSMT
Search Categorization
CSMT Mostly
Algorithm Segmentation Video done
FSMT
Retrieval Classification
MLFS Half
New feature done
Induction Algorithm
set
On-the-fly
Evaluation Metric
search Partly
done
Algorithm Evaluation Clustering
Challenge Training dataset Compression Mostly
Stop point Gene microarray Other undone
discussion applications
Over-fitting Audio
Theoretical Analysis 44
Task Schedule
1. Introduction
2. Progress
3. Proposal
4. Conclusion
4.1. Conclusion
4.2. Schedule
45
Ph.D. Thesis Proposal
1. Introduction
2. Progress
Thank you!
3. Proposal
4. Conclusion
46