Professional Documents
Culture Documents
Feature Engineering
Machine Learning
Ivan Smetannikov
15.06.2016
Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases
Feature examples
• In computer vision – the line in the image
• In NLP – a phrase or a world
• In speech recognition – a single word or a
phoneme
Example:
• Something have a categorical attribute Item_Color, that
can be Red, Blue or Unknown
• You can create a new binary feature Has_Color, and
assign it a value of “1” when an item has a color and “0”
when the color is unknown
• You could create a binary feature for each value
that Item_Color has. This would be three binary
attributes: Is_Red, Is_Blue and Is_Unknown
• Important
• Many approaches
• Can be done several time during analysis
The features of X have been transformed from (X1, X2) to (X1, X2, X12, X1X2, X22)
Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 22
Data Preprocessing (Custom Transform)
weka.filters.unsupervised.attribute:
• Domain-specific features:
If you have length, breadth, and height as separate
variables; you can create a new volume feature to be a
product of these three variables.
• Variable-specific features:
Some variable types such as text features, features that
capture the structure of a web page, or the structure of a
sentence have generic ways of processing that help extract
structure and context. For example, forming n-grams from
text “the fox jumped over the fence” can be represented
with unigrams: the, fox, jumped, over, fence or bigrams: the
fox, fox jumped, jumped over, over the, the fence.
Machine learning. Lecture 7. Feature Engineering. 15.06.2016. 31
Lecture plan
• Feature engineering
• Data preprocessing
• How to obtain new features
• Visualization
• Analysis
• Special cases
• Usually language-dependent
• Many supervised and unsupervised methods
• “Bag of words” paradigm, POS-taging and rule
mining
• Many problems from sentiment analysis to
language understanding
• Deep learning networks