Machine Learning: Aigerim Bogyrbayeva

Machine Learning
Aigerim Bogyrbayeva
Lecture 2
Agenda
● Feature engineering
○ Feature selection
○ Feature extraction
■ In Computer Vision
■ In Natural Language Processing
But, first, what we have covered in previous class?
1) What makes the supervised model different from unsupervised?

a) Differences are only on the name
b) Different libraries are used
c) In the depth of ML models
d) Existence of ground truth labels
2) Which data subset does not exists?

a) Training
b) Testing
c) Justification
d) Validation
3) What is the true positive?

a) Ground truth: True, Predicted: True
b) Ground truth: False, Predicted: False
c) Ground truth: True, Predicted: False
d) Ground truth: False, Predicted: True
4) How to calculate the accuracy?

a) (tp + tn) / (tp + tn + fp + fn)
b) tp / (tp + fp)
c) tp / (tp + fn)
d) (tp + tn) / (tp + tn + fp)
5) How to calculate the F1 score?

a) Weighted mean of precision and recall
b) Geometric mean of precision and recall
c) Arithmetic mean of precision and recall
d) Harmonic mean of precision and recall
What is Feature in Machine Learning?
● The quantity to be modelled is called, label, outcome, dependable

variable, response and denoted by y
● The variables used to model are called predictors, featues,
independent variables and denoted by X
Example: Predicting the Price of a House
● Outcome, Y: the price of the house

● Features, X: the location, the number of bedrooms, the square
meters and etc.
Feature Engineering
● There is no a single way of representing data

● Can represent the location of the house in terms of the latitude
and longitude, can specify the neighborhood name, use streets
● Feature Engineering - the process of creating representations of
data that increase the effectiveness of a model
Feature Engineering
● There is no a single way of representing data

● Can represent the location of the house in terms of the latitude
and longitude, can specify the neighborhood name, use streets
● Feature Engineering - the process of creating representations of
data that increase the effectiveness of a model
Why does Feature Engineering Matter?
● Example: a binary classification problem with 2 features A and B

Why does Feature Engineering Matter?
● Take Inverse of each Feature

How does Feature Engineering affect Models?
● Some models cannot tolerate predictors that measure the same

underlying quantity (i.e., multicollinearity or correlation between
predictors).
● Many models cannot use samples with any missing values.
● Some models are severely compromised when irrelevant
predictors are in the data.
● Most of the time, Feature Engineering will have more effect than
a model selection!
How we see the world?
How computers see the world?
Computer Vision
Computer vision tasks include methods for acquiring, processing,

analyzing and understanding digital images, and extraction of high-
dimensional data from the real world in order to produce numerical
or symbolic information, e.g., in the forms of decisions.
How we see the images?
How computer see the images?
3 4 2 23
B - blue
54 45 23 54
3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
3 4 76 2 13 23 61 72
54 45 45 23 54 54 13 117
R - red 76 13 61 72
45 54 13 117
3 4 2 23
B - blue
54 45 23 54
3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
3 4 76 2 13 23 61 72
54 45 45 23 54 54 13 117
R - red 76 13 61 72
45 54 13 117
3 4 2 23
Gray- 54 45 23 54
scale 76 13 61 72
45 54 13 117
3 4 2 23
B - blue
54 45 23 54
3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
1600x1200 = 1 920 000
3 4 76 2 13 23 61 72
54 45 45 23 54 54 13 117
R - red 76 13 61 72
45 54 13 117 1600x1200
3 4 2 23
B - blue
54 45 23 54
3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
1600x1200 = 1 920 000
3 4 76 2 13 23 61 72
But, what if we have 54 45 45 23 54 54 13 117
1 million images? R - red 76 13 61 72
45 54 13 117 1600x1200
3 4 2 23
B - blue
54 45 23 54
3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
1600x1200 = 1 920 000
3 4 76 2 13 23 61 72
But, what if we have 54 45 45 23 54 54 13 117

1 million images? R - red 76 13 61 72
192 000 000 000 45 54 13 117 1600x1200
What could be a
solution?
2-3 minutes to think
Get the important features
1000 px
192 000 000 000

vs
1000
1 5 6 7 ... 5 6 4 6
Feature vector = 1x1000

Image features
1) Hand-crafted features
2) Generic features
3) Deep learning features
Hand crafted features
1) Contrast
2) Edges
3) …
4) Saturation
Hand crafted features
1) Contrast
2) Edges ...
3) … Contrast Edges Saturation
4) Saturation
Feature vector
Image features
2) Generic features
Histogram of Oriented Gradients (HOG)
N.Dalal and B. Triggs,researchers in

French National Institute for Research in
Computer Science and Control,first
described their work in June 2005 in
CVPR paper
Image Gradients
Simply a measure of the change in pixel

values along the x-direction and the y-
direction around each pixel.
Image Gradients
We just take the right value minus the

left value and say that the rate of
change in the x direction is 38 (94 - 56 =
38).
Image Gradients
93 - 55 = 38 in the y-direction
Image Gradients
Image Gradients magnitud
e dy
direction
dx
Image Gradients magnitud
e dy
direction
dx
Image Gradients
Image Gradients
Image Gradients
Gradient
Gradient is measure of change

Suppose, we have an image 128x64.

The whole algorithm works as shown
below:
1. Calculate the gradient magnitude
and direction of image (the
derivatives with respect to x and y)
2. Divide image into 16x16 blocks
with 50% overlap (How many
blocks total we have?)
Suppose, we have an image 128x64.

The whole algorithm works as shown
below:
1. Calculate the gradient magnitude
and direction of image (the
derivatives with respect to x and y)
2. Divide image into 16x16 blocks
with 50% overlap ((128/8) - 1) *
((64/8) - 1) = 7 * 15 = 105 blocks
4. For each cell quantize the gradient

orientation into 9 bins (0 - 180) and
interpolate the votes by neighboring
bins
Histogram
5. Concatenate all histograms to one

which will be feature vector. In our case
the length of feature vector will be
equal to 7 * 15 * 9 bins * (2 * 2) blocks
size = 3780 Feature vector
HOG features
1 5 6 7 ... 5 6 4 6
1x3780
HOG features
HOG features
8. OpenCV will be required
HOG features
9. Read the image (using OpenCV, image
name is angelhack.jpg) and assign it to
image variable
HOG features
11-15. Parameters that are a bit

understandable
HOG features
19-20. Parameters that are not

understandable for me
HOG features
22. Create histogram object using

OpenCV library and give all previous
parameters
HOG features
27. Compute HOG features

HOG features
29-30. Show the feature vector length

and feature vector itself
HOG features (20 minutes to do it)
https://stackoverflow.com/questions/6090399/get-hog-image-features-from-
opencv-python
ImageNet (14 millions of images)
Low level visual features: High level visual features:
edges, contrast, … , style, object orientation
saturation
Low level visual features: High level visual features:
edges, contrast, … , style, object orientation
saturation
classifier
Image features
2) Generic features
Natural Language Processing
Word Embeddings
Word Embedding is a feature learning techniques in natural language

processing (NLP) where words or phrases from the vocabulary are mapped
to vectors of real numbers.
Word Embeddings
Hello
Bye
Friend
…
Say
Word Embeddings
Hello 1, 5, 3, 23, 5, 2
Bye 4, 7, 23, 354, 23, 2
Friend 4, 45, 13, 567, 24
… …
Say 3434, 546, 235, 7
One-Hot encoding
One-Hot encoding
When we search for phrases such as “hotels in New Jersey” in Google, we
want results pertaining to “motel”, “lodging”, “accommodation” in New
Jersey returned as well. And if we are using one-hot encoding, these words
have no natural notion of similarity.
Word embeddings
Word2Vec
Doc2Vec
Glove
Word embeddings
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est
laborum.
Word embeddings
lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua ut enim ad minim
veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat duis aute irure dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur excepteur sint occaecat cupidatat
non proident1) sunt in culpa
Delete qui officiaand
punctuations deserunt mollit
convert anim id est laborum
to lowercase
Word embeddings
tempor incididunt ut labore et dolore magna aliqua ut
1) Delete punctuations and convert to lowercase

2) Get first 20 words only (seems enough)
Word embeddings

3) Convert to vectors using word embeddings
Word embeddings

3) Convert to vectors using word embeddings
4) Concatenate
Practical exercise
1) Get the deep features from any kind of Convolutional Neural Network
2) Work with any kind of word embeddings
Next week we will have Quiz 1 (10%)
1. Lecture 1
2. Lecture 2

Machine Learning: Aigerim Bogyrbayeva

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning: Aigerim Bogyrbayeva

Uploaded by

Copyright:

Available Formats

Machine Learning

1) What makes the supervised model different from unsupervised?

2) Which data subset does not exists?

3) What is the true positive?

4) How to calculate the accuracy?

5) How to calculate the F1 score?

● The quantity to be modelled is called, label, outcome, dependable

● Outcome, Y: the price of the house

● There is no a single way of representing data

● There is no a single way of representing data

● Example: a binary classification problem with 2 features A and B

● Take Inverse of each Feature

● Some models cannot tolerate predictors that measure the same

Computer vision tasks include methods for acquiring, processing,

But, what if we have 54 45 45 23 54 54 13 117

192 000 000 000

Feature vector = 1x1000

N.Dalal and B. Triggs,researchers in

Simply a measure of the change in pixel

We just take the right value minus the

Gradient is measure of change

Suppose, we have an image 128x64.

Suppose, we have an image 128x64.

4. For each cell quantize the gradient

5. Concatenate all histograms to one

11-15. Parameters that are a bit

19-20. Parameters that are not

22. Create histogram object using

27. Compute HOG features

29-30. Show the feature vector length

Word Embedding is a feature learning techniques in natural language

1) Delete punctuations and convert to lowercase

1) Delete punctuations and convert to lowercase

1) Delete punctuations and convert to lowercase

You might also like