0% found this document useful (0 votes)

47 views44 pages

3ML.03.Feature Reduction

The document provides an overview of machine learning and pattern recognition, focusing on feature reduction techniques such as dimensionality reduction (DR) and feature selection (FS). It discusses methods like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), comparing their approaches to data representation and classification. Additionally, it outlines various feature selection strategies, including heuristic searches and genetic algorithms, emphasizing their advantages and limitations.

Uploaded by

Alessandro Manera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views44 pages

3ML.03.Feature Reduction

Uploaded by

Alessandro Manera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Overview of Machine Learning and Pattern Recognition

Building a ML system

Bioinformatics – Overview of Machine Learning and Pattern Recognition

3. Feature Reduction
Rationale
• summarization of data with many (n) variables (features) by a
smaller set of (m) derived variables.
n m

s s

X Y
Balancing of
ease of representation, computational speed
oversimplification: loss of important or relevant information.
Two main approaches

dimensionality
reduction

• Dimensionality reduction (DR) is based on mathematical

recombinations of the original features.
• Feature selection (FS) is based on choosing a sub-set of the original
features
FS vs DR
• feature selection: equivalent to projecting feature space to a lower
dimensional subspace perpendicular to removed feature, which is a subset of
the original one
FS vs DR
• dimensionality reduction: allows other kinds of projection (e.g. PCA re-represents data using
linear combinations of original features)
• The new space is not a subset, but a recombination of the original feature space
Principal Component Analysis (PCA)
• Takes a data matrix of s objects by n variables and remaps it to a different
space (principal components or principal axes) that are linear combinations
of the original n variables, in order to maximize the variance of the data in
the new representation
• Based on the eigen-decomposition of the data’
covariance matrix

• The rst m<n components are the ones that retain the most of the
variace of the objects. Hence, they are assumed to be the most
representative ones

• Ratio: reducing the size of the feature space while retaining as much
of the information as possible.
fi
.

Principal Component Analysis (PCA)

Example in a 2-dimensional space:

• PCA Performs a linear

mapping of the data to a
lower- dimensional space,
in such a way that the
variance of the data in the
low- dimensional
representation is
maximized

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
PCA
Choose directions such that a total variance of data will be maximu
Maximize Total Varianc
Higher variance > Better separability of the dat

Choose directions that are orthogona

Minimize correlation between different feature
Lower correlation > Lower redundanc

Choose m<n orthogonal directions which maximize total variance

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
e

PCA example
PCA applied on UCI ML Breast Cancer Wisconsin (Diagnostic) dataset (32 features)

variance of the data along the i-th

principal component, divided by
the whole variance of the data
Linear Discriminant Analysis (LDA)

• PCA is unsupervised (does not take

into account class information
• Supervised extension: LD

• LDA: nd a low-dimensional space

such that when x is projected,
classes are well-separated
fi
A

Feature selection vs Dimensionality reduction

Dimensionality Reductio
When classifying novel patterns, all features need to be computed
The measurement units of the original features are lost.
PCA,LDA works on linear combinations of the features (does not provide good
results for all data distributions

Feature Selection
When classifying novel patterns, only a small number of features need to be
computed (i.e., faster classification).
The measurement units of the original features are preserved.
)

Feature Selection: steps

• Feature selection is an optimization problem

– Step 1: Search the space of possible feature

subsets

– Step 2: Pick the subset that is optimal or near-

optimal with respect to some objective
function.
.

Feature Selection: steps

Evaluation strategie
- Filter method
- Wrapper method

Search strategie
– Exhaustiv
– Heuristi
• Sequential (SFS, etc.
• Randomized (GA)
c

Feature Selection: evaluation

• Wrapper Methods

– Evaluation uses criteria related to

the classi cation algorithm.

– The objective function is a pattern

classifier, which evaluates feature
subsets by their predictive accuracy.
fi
Feature Selection: evaluation
• Filter Methods

– Evaluation is independent of the classi cation

algorithm.

– The objective function evaluates feature subsets

by their information content, typically interclass
distance, statistical dependence or information-
theoretic measures. fi
Feature Selection: Wrapper pro/cons

• Accuracy: wrappers
generally achieve better
recognition rates than • Execution time: the wrapper
must train a classifier for
filters, as they are tuned to
each feature subset, which is
the specific classifier
unfeasible in case of
computationally intensive
• Generalization capability: ML methods
Wrappers have inherent
mechanism to avoid
overfitting, as they are • Generality: the solution is
ed to a specific classifier.
typically based on
crossvalidation
ti
Feature Selection: Filter pro/cons

• Execution me: filters • Tendency to select large

generally involve a non- subsets: as most of the mes
iterative computation, which the objective function of the
is usually faster than a filter is monotonic, the filter
classi er training session may tend to select the full
feature set as optimal
• Generality: filters evaluate solution. This forces the user
intrinsic properties of the to set an arbitrary cutoff on
data. Hence, their results the number of features to be
may apply to a large set of selected
ML methods
Bioinformatics – Overview of Machine Learning and Pattern Recognition
fi
ti
ti
Search strategies
• Assuming n features, an exhaustive search would
require:
– Examining all the possible subsets of size d.

– Selecting the subset that performs the best according

to the criterion function.

• The number of subsets grows with n, making exhaustive

search impractical.

• In practice, heuristics are used to speed-up search but

they cannot guarantee full optimality.
𝑛𝑑
Example
Say we have features A, B, C and classifier M. We want to predict T given the smallest
possible subset of {A,B,C}, while achieving maximal classification performance (accuracy)

FEATURE SET CLASSIFIER PERFORMANCE

{A,B,C} M 98%
{A,B} M 98%
{A,C} M 77%
{B,C} M 56%
{A} M 89%
{B} M 90%
{C} M 91%
Example
For large sets of features it is unfeasible to search for all possible subsets exhaustively;
instead, we rely on heuristic search of the space of all possible feature subsets.

{A,B} 98

{A} 89 {A,C} 77

{A,B}98
{B} 90 {A,B,C}98
start {B,C} 56

{A,C}7
{C} 91
end
{B,C}56
7

Example
A common example of heuristic search is hill climbing: keep adding features one at a
time, until no further improvement can be achieved.
Disadvantage: the subset space is not fully explored ! sub-optimal result

Forward selection: Start from empty set and keep adding one feature

{A,B} 98
{A} 89 {A,C} 77

{A,B}98
{B} 90 {A,B,C}98
start {B,C} 56

{A,C}7
{C} 91
end
{B,C}56
7

Example
A common example of heuristic search is hill climbing: keep adding features one at a time, until no
further improvement can be achieved.
Disadvantage: the subset space is not fully explored ! sub-optimal resul

Backward elimination: Start from full set and keep removing one feature

{C} 91
{B,C} 56 {B} 90

{A} 89 end
start {A,C} 77
{C} 91
{A} 89
{A,B} 98
{B} 90

Naïve Search
Train the ML model with each individual feature, taken alone. Measure
classification accuracy.

Sort the given n features in order of the classification accuracy obtained with
each of them.

Select the top d features from this sorted list.

Disadvantage
Correlation among features is not considered.
The best pair of features may not even contain the best individual feature.
Sequential forward selection (SFS) (heuristic search)
• First, the best single feature is selected (i.e., using classification
accuracy as criterion function).

• Then, pairs of features are formed using one of the remaining features
and this best feature, and the best pair is selected.
• Next, triplets of features are formed using one of the remaining
features and these two best features, and the best triplet is selected.
• This procedure continues until a predefined number of features are
selected, or even after ALL of them have been selected.

SFS performs best when

the optimal subset is small.
Example
features added at
each iteration

Results of sequential forward feature selection for classification of a satellite image using 28 features. x-
axis shows the classification accuracy (%) and y- axis shows the features added at each iteration (the
first iteration is at the bottom). The highest accuracy value is shown with
Bioinformatics – Overview of Machine Learning and Pattern Recognition a star.
Sequential backward selection (SBS) (heuristic search)
• First, the criterion function is computed for all n features.
• Then, each feature is deleted one at a time, the criterion function
is computed for all subsets with n-1 features, and the worst
feature is discarded.
• Next, each feature among the remaining n-1 is deleted one at a
time, and the worst feature is discarded to form a subset with
n-2 features.
• This procedure continues until a predefined number (or none) of
features are left.

SBS performs best when the

optimal subset is large.
Example
features removed
at each iteration

Results of sequential backward feature selection for classification of a satellite image using 28 features. x-
axis shows the classification accuracy (%) and y-axis shows the features removed at each iteration (the
first iteration is at the top). The highest accuracy value is shown with a star.
SFS vs SBS
• If there is no way to guess the approximate size of the
optimal feature set, the methods are equivalent
• As the search is heuristic, there is no guarantee that the
two approaches, even on the same dataset, will
converge to the same result!
Limitations of SFS and SBS

• The main limitation of SFS is that it is unable to remove features

that become non useful after the addition of other features.
• The main limitation of SBS is its inability to re- evaluate the
usefulness of a feature after it has been discarded.
• There are some generalizations of SFS and SBS:
– Bidirectional search (BDS
– Plus-L, minus-R selection (LRS
– …and many more

Bioinformatics – Overview of Machine Learning and Pattern Recognition

”
)

Bidirectional search (BDS)

• BDS applies SFS and SBS
simultaneously:
– SFS is performed from the
empty set
– SBS is performed from the full set

• To guarantee that SFS and SBS

converge to the same solution:
– Features already selected by SFS are
not removed by SBS.
– Features already removed by
SBS are not added by SFS.
.

“Plus-L, minus-R” selection (LRS)

• A generalization of SFS and SBS
– If L>R, LRS starts from the empty set and
• Repeatedly add L feature
• Repeatedly remove R feature
– If L<R, LRS starts from the full set and
• Repeatedly removes R feature
• Repeatedly add L features

Main limitation: how to

choose the optimal
values of L and R??
s

Genetic Algorithms (GAs)

Population of
encoded solutions
• A way to perform feature selection via a randomized search is to use GAs
• What are GAs
10010110
A global optimization technique for searching very large solution spaces. 01100011…

Inspired by the biological mechanisms of natural selection and 01100011

reproduction. 10100100…

• Main characteristics of GAs

– Searches probabilistically using a population of possible solutions.
– Each solution is properly encoded as a string of symbols.
– Uses an objective (or tness) function to. evaluate the “goodness” of each
solution.
…

fi
Genetic Algorithms - Encoding

• Each solution in the search space is represented as a finite

length string (chromosome) over some finite set of
symbols.

e.g., using binary encoding

(11,6,9) →(1011_0110_1001)→(101101101001)
Genetic Algorithms - Fitness evaluation

A fitness function is used to evaluate the goodness of each solution.

The fitness function is “problem specific”.

Fitness = f (decode(chromosome))

Bioinformatics – Overview of Machine Learning and Pattern Recognition

Genetic Algorithms - Searching
Population of Population of
encoded solutions encoded solutions

10010110 10010110
GA operators:
01100010 01100010
10100100.. Selection Crossover Muta on 10100100
10010010 01111001
01111101… 10011101…

Current Generation Next Generation

ti
…

Genetic Algorithms - Selection

Probabilistically filters out solutions that perform poorly,
choosing high performance solutions to exploit:
Rank chromosomes based on fitnes
Copy the “best” chromosome as it is to the nex
generation (elitism)
Set a random fitness threshol
Select chromosomes with fitness higher than the threshold as parents of the
next generation of chromosomes
Generate the new chromosomes by crossover and mutation
Replace the old generation with the new one

Bioinformatics – Overview of Machine Learning and Pattern Recognition

Genetic Algorithms - Crossover

Explore new solutions
Crossover: information exchange between strings.
Generate new chromosomes that, hopefully, will retain good features from
the previous generation
It is applied to randomly selected pairs of chromosomes with a probability
equal to a given crossover rate.

10011110 10010010
→
10110010 10111110
.

Genetic Algorithms - Mutation

Explore new solutions
Mutation: restore lost genetic material.
It protects GAs against irrecoverable loss of good solution features
It changes a character of some chromosome with a probability equal to a very
low given mutation rate.

10011110 → 10011010
↑
mutated bit

Genetic Algorithms - Steps

1. Initialize a population with randomly generated individual
2. Evaluate tness of each individua
3. Selection - Reproduce high tness chromosomes in the new
population, removing poor tness chromosome
4. Crossover – Construct new chromosome
5. Mutation – Recover lost feature
6. Repeat steps 2-6 until some stopping criterion is met (e.g. population
tness, number of iterations)
fi
fi
fi
fi
s

Feature Selection using GAs

GAs provide a simple, general, and powerful framework for feature selection
via a randomized search of the solutions.

Pre- Feature Feature

Extraction Classifier
Processing Subset

Feature
Selection
(GA)

Bioinformatics – Overview of Machine Learning and Pattern Recognition

Feature Selection using GAs
Binary encoding: 1 means “choose feature” and 0 means “do not choose”
feature.

• Fitness evaluation (to be maximized)

Fitness=w1 × accuracy + w2 × #zeros
Classi cation accuracy
Number of
using a validation set
discarded
features
w1>>w2
fi
Summary
• Different type of approache
• Feature selectio
• Filters, Wrapper
• Euristics, GA
• Dimensionality reductio
• PCA, LDA, etc
• Unclear how to tell in advance if reducing the number of features will wor
• Only known way is to check, but for very high dimensional data it helps
most of the tim
• How many features
• Just try! Perform cross-validation
• There is not a “best” method… just methods that work well for your data
s

Wrapper Method
No ratings yet
Wrapper Method
58 pages
Module5.2 Feature Selection Methods
No ratings yet
Module5.2 Feature Selection Methods
64 pages
کتاب پنجم بارگزاری شده
No ratings yet
کتاب پنجم بارگزاری شده
35 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
117 pages
Eel891 Selecao Atributos George Bebis
No ratings yet
Eel891 Selecao Atributos George Bebis
58 pages
Feature Selection Methods
No ratings yet
Feature Selection Methods
24 pages
Data Mining: Dimensionality Reduction
No ratings yet
Data Mining: Dimensionality Reduction
135 pages
ML Feature Selection Guide
No ratings yet
ML Feature Selection Guide
40 pages
Feature Selection for Data Scientists
No ratings yet
Feature Selection for Data Scientists
56 pages
Feature Selection Methods and Strategies
No ratings yet
Feature Selection Methods and Strategies
36 pages
Feature Selection and Extraction in ML
No ratings yet
Feature Selection and Extraction in ML
40 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
40 pages
Machine Learning Feature Selection Methods
No ratings yet
Machine Learning Feature Selection Methods
40 pages
DimensionalityReduction (Filter and Wrapper Methods)
No ratings yet
DimensionalityReduction (Filter and Wrapper Methods)
47 pages
Feature Extraction and Selection in ML
No ratings yet
Feature Extraction and Selection in ML
15 pages
ML Module VI
No ratings yet
ML Module VI
24 pages
Feature Generation & Selection Techniques
No ratings yet
Feature Generation & Selection Techniques
18 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
Essential Guide to Feature Selection
No ratings yet
Essential Guide to Feature Selection
29 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
PCA and Feature Selection Techniques
No ratings yet
PCA and Feature Selection Techniques
5 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
31 pages
Feature Selection Techniques for Classification
No ratings yet
Feature Selection Techniques for Classification
13 pages
Jdavis Advice
No ratings yet
Jdavis Advice
49 pages
Essential Guide to Feature Selection
No ratings yet
Essential Guide to Feature Selection
24 pages
Feature Selection & Extraction Techniques
No ratings yet
Feature Selection & Extraction Techniques
29 pages
Feature Selection 1692278667
No ratings yet
Feature Selection 1692278667
100 pages
Class Imbalance Handling Techniques
No ratings yet
Class Imbalance Handling Techniques
26 pages
Understanding Qualitative and Quantitative Data
No ratings yet
Understanding Qualitative and Quantitative Data
89 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
Effective Feature Selection Techniques
No ratings yet
Effective Feature Selection Techniques
62 pages
Machine Learning Concepts Overview 2017
No ratings yet
Machine Learning Concepts Overview 2017
41 pages
Correlation-Based Feature Selection in ML
No ratings yet
Correlation-Based Feature Selection in ML
4 pages
Pattern Classification Overview and Techniques
No ratings yet
Pattern Classification Overview and Techniques
99 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Feature Selection Techniques Explained
No ratings yet
Feature Selection Techniques Explained
35 pages
Overview of Pattern Recognition Techniques
0% (1)
Overview of Pattern Recognition Techniques
37 pages
MCA Class Note Feature
No ratings yet
MCA Class Note Feature
5 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Introduction To Data Mining 2005
60% (5)
Introduction To Data Mining 2005
400 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
30 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
57 pages
Pattern Recognition and Machine Learning Overview
No ratings yet
Pattern Recognition and Machine Learning Overview
28 pages
Feature Extraction for Disease Prediction
No ratings yet
Feature Extraction for Disease Prediction
8 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
16 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
30 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
6 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
6 pages
7 Selectia Trasaturilor
No ratings yet
7 Selectia Trasaturilor
54 pages
Introduction To Feature Selection Methods With An Example
No ratings yet
Introduction To Feature Selection Methods With An Example
10 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
Kernels, Model & Feature Selection
No ratings yet
Kernels, Model & Feature Selection
5 pages
Chapter3 Data Exploration
No ratings yet
Chapter3 Data Exploration
91 pages
Intro to Machine Learning Basics
100% (1)
Intro to Machine Learning Basics
52 pages
Feature Engineering & Selection Guide
No ratings yet
Feature Engineering & Selection Guide
32 pages
2NGS.02.Formats SeqBasicTools
No ratings yet
2NGS.02.Formats SeqBasicTools
49 pages
AML - Lecture - 10 - 15nov24
No ratings yet
AML - Lecture - 10 - 15nov24
169 pages
AML - Lecture - 09 - 08nov24
No ratings yet
AML - Lecture - 09 - 08nov24
126 pages
AML - Lecture - 08 - 05nov24
No ratings yet
AML - Lecture - 08 - 05nov24
106 pages
5network 02 BioNetwork
No ratings yet
5network 02 BioNetwork
125 pages
3ML 01 Introduction
No ratings yet
3ML 01 Introduction
34 pages
5network 01 Intro
No ratings yet
5network 01 Intro
202 pages
4data - Data and Statistics
No ratings yet
4data - Data and Statistics
55 pages
3ML.05.NeuralNetworks DeepLearning
No ratings yet
3ML.05.NeuralNetworks DeepLearning
67 pages
Socio-Economic Study of ASAUB Students
No ratings yet
Socio-Economic Study of ASAUB Students
17 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Understanding Adjusted R-Squared
100% (1)
Understanding Adjusted R-Squared
2 pages
OscarsDemo b9 U2 16 05 25
No ratings yet
OscarsDemo b9 U2 16 05 25
20 pages
Statistical Treatment of Data
No ratings yet
Statistical Treatment of Data
3 pages
EViews Pert 16 Cynthia, Ilham, Rama (04SMJK023)
No ratings yet
EViews Pert 16 Cynthia, Ilham, Rama (04SMJK023)
3 pages
Econometrics Exam Paper EC226 2012/13
No ratings yet
Econometrics Exam Paper EC226 2012/13
18 pages
Data Analytics: Classification Techniques
No ratings yet
Data Analytics: Classification Techniques
96 pages
Overview of Statistical Methods and Tests
No ratings yet
Overview of Statistical Methods and Tests
10 pages
Survival in STATA
No ratings yet
Survival in STATA
2 pages
CRD & RCBD With Sampling, Efficiency, Power Etc
No ratings yet
CRD & RCBD With Sampling, Efficiency, Power Etc
34 pages
Qualitative Response Regression Models
No ratings yet
Qualitative Response Regression Models
6 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
Daeng Nur Cahyo - Tugas Pertemuan 9 - Statistika
No ratings yet
Daeng Nur Cahyo - Tugas Pertemuan 9 - Statistika
4 pages
LectureNotes PatternRecognition
No ratings yet
LectureNotes PatternRecognition
203 pages
Regression Part 1 Introduction Recorded PPT Lecture Slides
No ratings yet
Regression Part 1 Introduction Recorded PPT Lecture Slides
8 pages
Time Series Analysis: MA Models
No ratings yet
Time Series Analysis: MA Models
17 pages
11.3 Lakeland College
No ratings yet
11.3 Lakeland College
2 pages
Topic 5 - Solutions
No ratings yet
Topic 5 - Solutions
5 pages
Pattern Recognition Handwritten Notes
No ratings yet
Pattern Recognition Handwritten Notes
64 pages
1-Way Independent ANOVA
No ratings yet
1-Way Independent ANOVA
2 pages
Simple Linear Regression - Lecture Notes
No ratings yet
Simple Linear Regression - Lecture Notes
19 pages
Ordinal Logistic Regression 2
No ratings yet
Ordinal Logistic Regression 2
27 pages
Mastering Multiple Regression for Theses
100% (2)
Mastering Multiple Regression for Theses
6 pages
Analyzing Gender Pay Disparities with Regression
100% (1)
Analyzing Gender Pay Disparities with Regression
7 pages
UNIT3
No ratings yet
UNIT3
32 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
36 pages
Imputing Missing Values and Encoding Data
No ratings yet
Imputing Missing Values and Encoding Data
162 pages
Erie Steel Case Presentation: Decision Making With Analytics
No ratings yet
Erie Steel Case Presentation: Decision Making With Analytics
4 pages
Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T
No ratings yet
Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T
11 pages

3ML.03.Feature Reduction

Uploaded by

3ML.03.Feature Reduction

Uploaded by

Overview of Machine Learning and Pattern Recognition

Bioinformatics – Overview of Machine Learning and Pattern Recognition

• Dimensionality reduction (DR) is based on mathematical

Principal Component Analysis (PCA)

• PCA Performs a linear

Choose directions that are orthogona

Choose m<n orthogonal directions which maximize total variance

variance of the data along the i-th

• PCA is unsupervised (does not take

• LDA: nd a low-dimensional space

Feature selection vs Dimensionality reduction

Feature Selection: steps

• Feature selection is an optimization problem

– Step 1: Search the space of possible feature

– Step 2: Pick the subset that is optimal or near-

Feature Selection: steps

Feature Selection: evaluation

– Evaluation uses criteria related to

– The objective function is a pattern

– Evaluation is independent of the classi cation

– The objective function evaluates feature subsets

• Execution me: filters • Tendency to select large

– Selecting the subset that performs the best according

• The number of subsets grows with n, making exhaustive

• In practice, heuristics are used to speed-up search but

FEATURE SET CLASSIFIER PERFORMANCE

Select the top d features from this sorted list.

SFS performs best when

SBS performs best when the

• The main limitation of SFS is that it is unable to remove features

Bioinformatics – Overview of Machine Learning and Pattern Recognition

Bidirectional search (BDS)

• To guarantee that SFS and SBS

“Plus-L, minus-R” selection (LRS)

Main limitation: how to

Genetic Algorithms (GAs)

Inspired by the biological mechanisms of natural selection and 01100011

• Main characteristics of GAs

• Each solution in the search space is represented as a finite

e.g., using binary encoding

A fitness function is used to evaluate the goodness of each solution.

Bioinformatics – Overview of Machine Learning and Pattern Recognition

Current Generation Next Generation

Genetic Algorithms - Selection

Bioinformatics – Overview of Machine Learning and Pattern Recognition

Genetic Algorithms - Crossover

Genetic Algorithms - Mutation

Genetic Algorithms - Steps

Feature Selection using GAs

Pre- Feature Feature

Bioinformatics – Overview of Machine Learning and Pattern Recognition

• Fitness evaluation (to be maximized)

You might also like