Professional Documents
Culture Documents
Module 4:
Feature Engineering
Faculty: Santosh Chapaneri
Features
• A feature is an attribute of a data set that is used in a machine
learning process.
• D-dimensional features
santosh.chapaneri@ieee.org
3
Feature Engineering
• Feature engineering refers to the process of translating a data set into
features such that these features can represent the data set more effectively
and result in a better learning performance.
• Feature selection - derive a subset of features from the full feature set
Feature Transformation
• Engineering a good feature space is a crucial prerequisite for the success of
any ML model. However, often it is not clear which feature is more important.
features will run in millions. To deal with this problem, feature transformation
comes into play.
santosh.chapaneri@ieee.org
5
Feature Construction
• Feature construction involves transforming a given set of input features to
• Exclude apartment length and apartment breadth features when building the
model.
santosh.chapaneri@ieee.org
santosh.chapaneri@ieee.org
7
santosh.chapaneri@ieee.org
• Bin the numerical data into multiple categories based on the data range
santosh.chapaneri@ieee.org
9
• BoW model
• converts raw text into words,
• counts frequency for words in text
• Example:
1. Jim and Pam travelled by bus.
2. The train was late.
3. The flight was full. Traveling by flight is expensive.
santosh.chapaneri@ieee.org
10
santosh.chapaneri@ieee.org
11
Feature Extraction
• New features are created from a combination of original features.
santosh.chapaneri@ieee.org
12
1) The new features are distinct, i.e. the covariance between the
new features is 0.
santosh.chapaneri@ieee.org
14
technique.
santosh.chapaneri@ieee.org
15
santosh.chapaneri@ieee.org
16
santosh.chapaneri@ieee.org
17
santosh.chapaneri@ieee.org
18
Fisher’s idea:
Maximize a function that gives the
largest separation between the
projected class means, but also
gives a small variance within each
class, minimizing class overlap.
santosh.chapaneri@ieee.org
19
santosh.chapaneri@ieee.org
20
• The subset of features is expected to give better results than the full
set.
santosh.chapaneri@ieee.org
21
22
that feature.
santosh.chapaneri@ieee.org
23
santosh.chapaneri@ieee.org
24
• E.g. Age and Weight – Age is redundant since with an increase in Age,
santosh.chapaneri@ieee.org
25
variables.
santosh.chapaneri@ieee.org
26
santosh.chapaneri@ieee.org
27
• A = {0,1,2,5,6}, B = {0,2,3,4,5,7,9}
santosh.chapaneri@ieee.org
28
santosh.chapaneri@ieee.org
29
• 2. subset evaluation
santosh.chapaneri@ieee.org
30
• Sequential forward (start with an empty set and keep adding features)
• Four approaches:
1) Filter approach
2) Wrapper approach
3) Hybrid approach
4) Embedded approach
santosh.chapaneri@ieee.org
31
feature selected.
santosh.chapaneri@ieee.org
32
box. The FS algorithm searches for a good feature subset using the induction
algorithm itself as a part of the evaluation function.
• Since for every candidate subset, the learning model is trained and the result is
santosh.chapaneri@ieee.org
33
approaches.
santosh.chapaneri@ieee.org
34
classification simultaneously.
santosh.chapaneri@ieee.org