Lec 07 Feature Enigneering

Lecture 7
Feature Engineering
Machine Learning
Ivan Smetannikov
15.06.2016
Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases
Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 2

Lecture plan
• Visualization
• Analysis
• Special cases

Feature engineering
• The most important step for raw data

• Synthetic feature engineering can be applied for
data: create features such as
� , � , ln � , , sin � , � � , …
�

Feature engineering
• Binarize categorical features

• Mine patterns from time and binarize them
• Try to binarize numeric features
• Mine text features
• Merge features that looks connected
• Create problem-specific features

Feature engineering
So what if feature engineering?
Feature engineering is the process of

transforming raw data into features that
better represent the underlying problem to
the predictive models, resulting
in improved model accuracy on unseen
data.

Feature engineering
Feature examples
• In computer vision – the line in the image
• In NLP – a phrase or a world
• In speech recognition – a single word or a
phoneme

Feature engineering
An estimate of the usefulness of a feature

• Features can be ranked by their scores
• The highest scores can be selected for inclusion
in the training dataset
• A feature may be important if it is highly
correlated with the target variable
• Feature importance scores can help you to
construct new features, similar but different to
those that have been estimated to be useful

Feature engineering
The process of feature engineering:

• Brainstorm features: Really get into the problem, look at
a lot of data, study feature engineering on other
problems and see what you can steal.
• Devise features: Depends on your problem, but you
may use automatic feature extraction, manual feature
construction and mixtures of the two.
• Select features: Use different feature importance
scorings and feature selection methods to prepare one or
more “views” for your models to operate upon.
• Evaluate models: Estimate model accuracy on unseen
data using the chosen features.
Feature engineering
Example:
• Something have a categorical attribute Item_Color, that
can be Red, Blue or Unknown
• You can create a new binary feature Has_Color, and
assign it a value of “1” when an item has a color and “0”
when the color is unknown
• You could create a binary feature for each value
that Item_Color has. This would be three binary
attributes: Is_Red, Is_Blue and Is_Unknown

Lecture plan
• Visualization
• Analysis
• Special cases

Missing values
• Forget about the objects with missing values

• Forget about missing values (some algorithms
can handle it)
• Try to fill them randomly
• Try to fill them in a clever way (MCMC
sampling)
• Use special algorithms processing this type of
uncertainty

Noise filtering
• Important
• Many approaches
• Can be done several time during analysis

Feature normalization and kernel selection
• Important for most of the algorithm

• Feature normalization: scale to [0;1]
• Feature regularization: mean is 0, SD is 1
• Kernel selection = distance selection.

• A good choice of kernel can solve the problem
• No universal heuristics
• Can be automated

Object clustering
• When many objects, it may be useful to merge

them into (small) clusters
• It also may be useful just add cluster as a feature

Object clustering
• When many objects, it may be useful to merge

them into (small) clusters
• It also may be useful just add cluster as a feature

Label synthesis
• Throw away rare classes

• Merge classes or build classes hierarchy
• Number → Ranking → Category
• Split complex labels to the simple ones
• 1-versus-other classification

Data Preprocessing (Python examples)
The sklearn.preprocessing package provides several

common utility functions and transformer classes to
change raw feature vectors into a representation that is
more suitable for the downstream estimators:
• Standardization (mean removing, variance scaling)
• Normalization
• Binarization
• Encoding categorical features
• Generating polynomial features
• Custom transformers

Data Preprocessing (Standartizadion)

Data Preprocessing (Binarization)

Data Preprocessing (Encoding)
Often features are not given as continuous values but categorical.

For example a person could have features
["male", "female"], ["from Europe", "from US", "from Asia"],["uses Fir
efox", "uses Chrome", "uses Safari", "uses Internet Explorer"]. Such
features can be efficiently coded as integers, for
instance ["male", "from US", "uses Internet Explorer"] could be
expressed as [0, 1, 3]
while ["female", "from Asia", "uses Chrome"] would be [1, 2, 1].

Data Preprocessing (Generating)
Often it’s useful to add complexity to the model by considering

nonlinear features of the input data. A simple and common method to
use is polynomial features, which can get features’ high-order and
interaction terms.
The features of X have been transformed from (X1, X2) to (X1, X2, X12, X1X2, X22)
Data Preprocessing (Custom Transform)
If you want to build a transformer that applies a log transformation in a

pipeline:

Data Preprocessing (Weka)
weka.filters.unsupervised.attribute:
MathExpression – to apply some function to features (for example (A-

MIN)/(MAX-MIN) would normalize your features to [0; 1], A*10 will multiply it
by 10)
Standardize - will normalize feature (exp = 0, stddev = 1)
InterquartileRange – Noise Filtering
ReplaceMissingValues – Missing values
NominalToBinary – same as OneHot for python (weka automaticaly converts
false = 0, true = 1)
NumericToBinary – same as Binarization

Lecture plan
• Visualization
• Analysis
• Special cases

Feature selection
• Try if you are feature engineer

• Try if a lot of features
• Try anyway
• Fast and not very fast feature selection
algorithms can be applied
• Some algorithms (RF) can perform feature
selection by themselves! But use if wisely.

Feature extraction
• Try for a small subset of features

• Try for visualization
• Try for signals
• Try for sparse data

How to obtain new features?
Replacing missing or invalid data with more

meaningful values (e.g., if you know that a missing
value for a product type variable actually means it
is a book, you can then replace all missing values
in the product type with the value for book).
A common strategy used to impute missing values
is to replace missing values with the mean or
median value. It is important to understand your
data before choosing a strategy for replacing
missing values.

Forming Cartesian products of one variable with another.

If you have two variables, such as population density
[urban, suburban, rural] and state [Washington, Oregon,
California], there might be useful information in the
features formed by a Cartesian product of these two
variables resulting in features [urban_Washington,
suburban_Washington, rural_Washington, urban_Oregon,
suburban_Oregon, rural_Oregon, urban_California,
suburban_California, rural_California]

Binning numeric features to categories.

In many cases, the relationship between a numeric feature
and the target is not linear (the feature value does not
increase or decrease monotonically with the target). In such
cases, it might be useful to bin the numeric feature into
categorical features representing different ranges of the
numeric feature. Each categorical feature (bin) can then be
modeled as having its own linear relationship with the
target.

• Domain-specific features:
If you have length, breadth, and height as separate
variables; you can create a new volume feature to be a
product of these three variables.
• Variable-specific features:
Some variable types such as text features, features that
capture the structure of a web page, or the structure of a
sentence have generic ways of processing that help extract
structure and context. For example, forming n-grams from
text “the fox jumped over the fence” can be represented
with unigrams: the, fox, jumped, over, fence or bigrams: the
fox, fox jumped, jumped over, over the, the fence.
Lecture plan
• Visualization
• Analysis
• Special cases

Vizualizing
• Important for data understanding

• Plot every step
• Plot result to analyze them
• Use dimensionality reduction for data
visualization
• Visualize data for a single feature

Lecture plan
• Visualization
• Analysis
• Special cases

Analysis
Do I have enough data?

How the samples are distributed?
What I know about the features?
What is performance measure?
What do my model requires?
Is the model overfitted?
Is the model robust?
Does the model make sense?

Lecture plan
• Visualization
• Analysis
• Special cases

Time series prediction
• May be understood as the multiply regression

problem
• Usually loss function decrease with the time
• Special methods to solve
• Deep learning networks
• It is essential to find proper period to scale (may

not exist).

Signal decoding
• Video, image, speech, sound

• Specific signal preprocessing an transformation
• Usually few interpretable features
• Many filters and approaches for feature
engineering
• Deep learning networks is state of the art
• If used as additional information, convert to text

Text processing
• Usually language-dependent
• Many supervised and unsupervised methods
• “Bag of words” paradigm, POS-taging and rule
mining
• Many problems from sentiment analysis to
language understanding
• Deep learning networks

Social media analysis
• Rich multistructural data

• Graphs, text, images, categories, actions, etc.
• Framework-dependent
• Extremely many approaches and problems
(including all the previous)
• Usually everything you can mine will improve
algorithm performance

Lec 07 Feature Enigneering

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 07 Feature Enigneering

Uploaded by

Copyright:

Available Formats

Lecture 7

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 2

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 3

• The most important step for raw data

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 4

• Binarize categorical features

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 5

So what if feature engineering?

Feature engineering is the process of

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 6

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 7

An estimate of the usefulness of a feature

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 8

The process of feature engineering:

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 10

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 11

• Forget about the objects with missing values

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 12

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 13

• Important for most of the algorithm

• Kernel selection = distance selection.

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 14

• When many objects, it may be useful to merge

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 15

• When many objects, it may be useful to merge

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 16

• Throw away rare classes

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 17

The sklearn.preprocessing package provides several

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 18

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 19

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 20

Often features are not given as continuous values but categorical.

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 21

Often it’s useful to add complexity to the model by considering

If you want to build a transformer that applies a log transformation in a

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 23

MathExpression – to apply some function to features (for example (A-

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 24

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 25

• Try if you are feature engineer

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 26

• Try for a small subset of features

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 27

Replacing missing or invalid data with more

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 28

Forming Cartesian products of one variable with another.

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 29

Binning numeric features to categories.

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 30

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 32

• Important for data understanding

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 33

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 34

Do I have enough data?

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 35

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 36

• May be understood as the multiply regression

• It is essential to find proper period to scale (may

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 37

• Video, image, speech, sound

• If used as additional information, convert to text

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 38

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 39