You are on page 1of 40

Lecture 7

Feature Engineering

Machine Learning
Ivan Smetannikov

15.06.2016
Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 2


Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 3


Feature engineering

• The most important step for raw data


• Synthetic feature engineering can be applied for
data: create features such as
� , � , ln � , , sin � , � � , …

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 4


Feature engineering

• Binarize categorical features


• Mine patterns from time and binarize them
• Try to binarize numeric features
• Mine text features
• Merge features that looks connected
• Create problem-specific features

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 5


Feature engineering

So what if feature engineering?

Feature engineering is the process of


transforming raw data into features that
better represent the underlying problem to
the predictive models, resulting
in improved model accuracy on unseen
data.

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 6


Feature engineering

Feature examples
• In computer vision – the line in the image
• In NLP – a phrase or a world
• In speech recognition – a single word or a
phoneme

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 7


Feature engineering

An estimate of the usefulness of a feature


• Features can be ranked by their scores
• The highest scores can be selected for inclusion
in the training dataset
• A feature may be important if it is highly
correlated with the target variable
• Feature importance scores can help you to
construct new features, similar but different to
those that have been estimated to be useful

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 8


Feature engineering

The process of feature engineering:


• Brainstorm features: Really get into the problem, look at
a lot of data, study feature engineering on other
problems and see what you can steal.
• Devise features: Depends on your problem, but you
may use automatic feature extraction, manual feature
construction and mixtures of the two.
• Select features: Use different feature importance
scorings and feature selection methods to prepare one or
more “views” for your models to operate upon.
• Evaluate models: Estimate model accuracy on unseen
data using the chosen features.
Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 9
Feature engineering

Example:
• Something have a categorical attribute Item_Color, that
can be Red, Blue or Unknown
• You can create a new binary feature Has_Color, and
assign it a value of “1” when an item has a color and “0”
when the color is unknown
• You could create a binary feature for each value
that Item_Color has. This would be three binary
attributes: Is_Red, Is_Blue and Is_Unknown

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 10


Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 11


Missing values

• Forget about the objects with missing values


• Forget about missing values (some algorithms
can handle it)
• Try to fill them randomly
• Try to fill them in a clever way (MCMC
sampling)
• Use special algorithms processing this type of
uncertainty

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 12


Noise filtering

• Important
• Many approaches
• Can be done several time during analysis

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 13


Feature normalization and kernel selection

• Important for most of the algorithm


• Feature normalization: scale to [0;1]
• Feature regularization: mean is 0, SD is 1

• Kernel selection = distance selection.


• A good choice of kernel can solve the problem
• No universal heuristics
• Can be automated

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 14


Object clustering

• When many objects, it may be useful to merge


them into (small) clusters
• It also may be useful just add cluster as a feature

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 15


Object clustering

• When many objects, it may be useful to merge


them into (small) clusters
• It also may be useful just add cluster as a feature

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 16


Label synthesis

• Throw away rare classes


• Merge classes or build classes hierarchy
• Number → Ranking → Category
• Split complex labels to the simple ones
• 1-versus-other classification

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 17


Data Preprocessing (Python examples)

The sklearn.preprocessing package provides several


common utility functions and transformer classes to
change raw feature vectors into a representation that is
more suitable for the downstream estimators:
• Standardization (mean removing, variance scaling)
• Normalization
• Binarization
• Encoding categorical features
• Generating polynomial features
• Custom transformers

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 18


Data Preprocessing (Standartizadion)

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 19


Data Preprocessing (Binarization)

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 20


Data Preprocessing (Encoding)

Often features are not given as continuous values but categorical.


For example a person could have features
["male", "female"], ["from Europe", "from US", "from Asia"],["uses Fir
efox", "uses Chrome", "uses Safari", "uses Internet Explorer"]. Such
features can be efficiently coded as integers, for
instance ["male", "from US", "uses Internet Explorer"] could be
expressed as [0, 1, 3]
while ["female", "from Asia", "uses Chrome"] would be [1, 2, 1].

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 21


Data Preprocessing (Generating)

Often it’s useful to add complexity to the model by considering


nonlinear features of the input data. A simple and common method to
use is polynomial features, which can get features’ high-order and
interaction terms.

The features of X have been transformed from (X1, X2) to (X1, X2, X12, X1X2, X22)
Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 22
Data Preprocessing (Custom Transform)

If you want to build a transformer that applies a log transformation in a


pipeline:

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 23


Data Preprocessing (Weka)

weka.filters.unsupervised.attribute:

MathExpression – to apply some function to features (for example (A-


MIN)/(MAX-MIN) would normalize your features to [0; 1], A*10 will multiply it
by 10)
Standardize - will normalize feature (exp = 0, stddev = 1)
InterquartileRange – Noise Filtering
ReplaceMissingValues – Missing values
NominalToBinary – same as OneHot for python (weka automaticaly converts
false = 0, true = 1)
NumericToBinary – same as Binarization

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 24


Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 25


Feature selection

• Try if you are feature engineer


• Try if a lot of features
• Try anyway
• Fast and not very fast feature selection
algorithms can be applied
• Some algorithms (RF) can perform feature
selection by themselves! But use if wisely.

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 26


Feature extraction

• Try for a small subset of features


• Try for visualization
• Try for signals
• Try for sparse data

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 27


How to obtain new features?

Replacing missing or invalid data with more


meaningful values (e.g., if you know that a missing
value for a product type variable actually means it
is a book, you can then replace all missing values
in the product type with the value for book).
A common strategy used to impute missing values
is to replace missing values with the mean or
median value. It is important to understand your
data before choosing a strategy for replacing
missing values.

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 28


How to obtain new features?

Forming Cartesian products of one variable with another.


If you have two variables, such as population density
[urban, suburban, rural] and state [Washington, Oregon,
California], there might be useful information in the
features formed by a Cartesian product of these two
variables resulting in features [urban_Washington,
suburban_Washington, rural_Washington, urban_Oregon,
suburban_Oregon, rural_Oregon, urban_California,
suburban_California, rural_California]

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 29


How to obtain new features?

Binning numeric features to categories.


In many cases, the relationship between a numeric feature
and the target is not linear (the feature value does not
increase or decrease monotonically with the target). In such
cases, it might be useful to bin the numeric feature into
categorical features representing different ranges of the
numeric feature. Each categorical feature (bin) can then be
modeled as having its own linear relationship with the
target.

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 30


How to obtain new features?

• Domain-specific features:
If you have length, breadth, and height as separate
variables; you can create a new volume feature to be a
product of these three variables.
• Variable-specific features:
Some variable types such as text features, features that
capture the structure of a web page, or the structure of a
sentence have generic ways of processing that help extract
structure and context. For example, forming n-grams from
text “the fox jumped over the fence” can be represented
with unigrams: the, fox, jumped, over, fence or bigrams: the
fox, fox jumped, jumped over, over the, the fence.
Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 31
Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 32


Vizualizing

• Important for data understanding


• Plot every step
• Plot result to analyze them
• Use dimensionality reduction for data
visualization
• Visualize data for a single feature

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 33


Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 34


Analysis

Do I have enough data?


How the samples are distributed?
What I know about the features?
What is performance measure?
What do my model requires?
Is the model overfitted?
Is the model robust?
Does the model make sense?

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 35


Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 36


Time series prediction

• May be understood as the multiply regression


problem
• Usually loss function decrease with the time
• Special methods to solve
• Deep learning networks

• It is essential to find proper period to scale (may


not exist).

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 37


Signal decoding

• Video, image, speech, sound


• Specific signal preprocessing an transformation
• Usually few interpretable features
• Many filters and approaches for feature
engineering
• Deep learning networks is state of the art

• If used as additional information, convert to text

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 38


Text processing

• Usually language-dependent
• Many supervised and unsupervised methods
• “Bag of words” paradigm, POS-taging and rule
mining
• Many problems from sentiment analysis to
language understanding
• Deep learning networks

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 39


Social media analysis

• Rich multistructural data


• Graphs, text, images, categories, actions, etc.
• Framework-dependent
• Extremely many approaches and problems
(including all the previous)
• Usually everything you can mine will improve
algorithm performance

Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 40

You might also like