You are on page 1of 85

Machine Learning

Aigerim Bogyrbayeva
Lecture 2
Agenda

● Feature engineering
○ Feature selection
○ Feature extraction
■ In Computer Vision
■ In Natural Language Processing
But, first, what we have covered in previous class?

1) What makes the supervised model different from unsupervised?


a) Differences are only on the name
b) Different libraries are used
c) In the depth of ML models
d) Existence of ground truth labels
But, first, what we have covered in previous class?

2) Which data subset does not exists?


a) Training
b) Testing
c) Justification
d) Validation
But, first, what we have covered in previous class?

3) What is the true positive?


a) Ground truth: True, Predicted: True
b) Ground truth: False, Predicted: False
c) Ground truth: True, Predicted: False
d) Ground truth: False, Predicted: True
But, first, what we have covered in previous class?

4) How to calculate the accuracy?


a) (tp + tn) / (tp + tn + fp + fn)
b) tp / (tp + fp)
c) tp / (tp + fn)
d) (tp + tn) / (tp + tn + fp)
But, first, what we have covered in previous class?

5) How to calculate the F1 score?


a) Weighted mean of precision and recall
b) Geometric mean of precision and recall
c) Arithmetic mean of precision and recall
d) Harmonic mean of precision and recall
What is Feature in Machine Learning?

● The quantity to be modelled is called, label, outcome, dependable


variable, response and denoted by y
● The variables used to model are called predictors, featues,
independent variables and denoted by X
Example: Predicting the Price of a House

● Outcome, Y: the price of the house


● Features, X: the location, the number of bedrooms, the square
meters and etc.
Feature Engineering

● There is no a single way of representing data


● Can represent the location of the house in terms of the latitude
and longitude, can specify the neighborhood name, use streets
● Feature Engineering - the process of creating representations of
data that increase the effectiveness of a model
Feature Engineering

● There is no a single way of representing data


● Can represent the location of the house in terms of the latitude
and longitude, can specify the neighborhood name, use streets
● Feature Engineering - the process of creating representations of
data that increase the effectiveness of a model
Why does Feature Engineering Matter?

● Example: a binary classification problem with 2 features A and B


Why does Feature Engineering Matter?

● Take Inverse of each Feature


How does Feature Engineering affect Models?

● Some models cannot tolerate predictors that measure the same


underlying quantity (i.e., multicollinearity or correlation between
predictors).
● Many models cannot use samples with any missing values.
● Some models are severely compromised when irrelevant
predictors are in the data.

● Most of the time, Feature Engineering will have more effect than
a model selection!
How we see the world?
How computers see the world?
How computers see the world?
How computers see the world?
Computer Vision

Computer vision tasks include methods for acquiring, processing,


analyzing and understanding digital images, and extraction of high-
dimensional data from the real world in order to produce numerical
or symbolic information, e.g., in the forms of decisions.
How we see the images?
How computer see the images?
3 4 2 23
B - blue
54 45 23 54

3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
3 4 76 2 13 23 61 72
54 45 45 23 54 54 13 117
R - red 76 13 61 72

45 54 13 117
How computer see the images?
3 4 2 23
B - blue
54 45 23 54

3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
3 4 76 2 13 23 61 72
54 45 45 23 54 54 13 117
R - red 76 13 61 72

45 54 13 117
How computer see the images?

3 4 2 23

Gray- 54 45 23 54
scale 76 13 61 72

45 54 13 117
How computer see the images?
3 4 2 23
B - blue
54 45 23 54

3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
1600x1200 = 1 920 000
3 4 76 2 13 23 61 72
54 45 45 23 54 54 13 117
R - red 76 13 61 72

45 54 13 117 1600x1200
How computer see the images?
3 4 2 23
B - blue
54 45 23 54

3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
1600x1200 = 1 920 000
3 4 76 2 13 23 61 72
But, what if we have 54 45 45 23 54 54 13 117
1 million images? R - red 76 13 61 72

45 54 13 117 1600x1200
How computer see the images?
3 4 2 23
B - blue
54 45 23 54

3 4 76 2 13 23 61 72
G - green
54 45 45 23 54 54 13 117
1600x1200 = 1 920 000
3 4 76 2 13 23 61 72

But, what if we have 54 45 45 23 54 54 13 117


1 million images? R - red 76 13 61 72
192 000 000 000 45 54 13 117 1600x1200
What could be a
solution?
2-3 minutes to think
Get the important features

1000 px
Get the important features

192 000 000 000


vs
1000
Get the important features

1 5 6 7 ... 5 6 4 6

Feature vector = 1x1000


Image features
1) Hand-crafted features
2) Generic features
3) Deep learning features
Hand crafted features
1) Contrast
2) Edges
3) …
4) Saturation
Hand crafted features
1) Contrast
2) Edges ...
3) … Contrast Edges Saturation
4) Saturation
Feature vector
Image features
1) Hand-crafted features
2) Generic features
3) Deep learning features
Histogram of Oriented Gradients (HOG)

N.Dalal and B. Triggs,researchers in


French National Institute for Research in
Computer Science and Control,first
described their work in June 2005 in
CVPR paper
Image Gradients

Simply a measure of the change in pixel


values along the x-direction and the y-
direction around each pixel.
Image Gradients

We just take the right value minus the


left value and say that the rate of
change in the x direction is 38 (94 - 56 =
38).
Image Gradients

93 - 55 = 38 in the y-direction
Image Gradients
Image Gradients magnitud
e dy

direction
dx
Image Gradients magnitud
e dy

direction
dx
Image Gradients
Image Gradients
Image Gradients
Gradient

Gradient is measure of change


Histogram of Oriented Gradients (HOG)

Suppose, we have an image 128x64.


The whole algorithm works as shown
below:
1. Calculate the gradient magnitude
and direction of image (the
derivatives with respect to x and y)
2. Divide image into 16x16 blocks
with 50% overlap (How many
blocks total we have?)
Histogram of Oriented Gradients (HOG)

Suppose, we have an image 128x64.


The whole algorithm works as shown
below:
1. Calculate the gradient magnitude
and direction of image (the
derivatives with respect to x and y)
2. Divide image into 16x16 blocks
with 50% overlap ((128/8) - 1) *
((64/8) - 1) = 7 * 15 = 105 blocks
Histogram of Oriented Gradients (HOG)

4. For each cell quantize the gradient


orientation into 9 bins (0 - 180) and
interpolate the votes by neighboring
bins

Histogram
Histogram of Oriented Gradients (HOG)

5. Concatenate all histograms to one


which will be feature vector. In our case
the length of feature vector will be
equal to 7 * 15 * 9 bins * (2 * 2) blocks
size = 3780 Feature vector
HOG features

1 5 6 7 ... 5 6 4 6

1x3780
HOG features
HOG features
8. OpenCV will be required
HOG features
9. Read the image (using OpenCV, image
name is angelhack.jpg) and assign it to
image variable
HOG features

11-15. Parameters that are a bit


understandable
HOG features

19-20. Parameters that are not


understandable for me
HOG features

22. Create histogram object using


OpenCV library and give all previous
parameters
HOG features

27. Compute HOG features


HOG features

29-30. Show the feature vector length


and feature vector itself
HOG features (20 minutes to do it)

https://stackoverflow.com/questions/6090399/get-hog-image-features-from-
opencv-python
ImageNet (14 millions of images)
ImageNet (14 millions of images)
ImageNet (14 millions of images)
ImageNet (14 millions of images)
ImageNet (14 millions of images)
ImageNet (14 millions of images)
Low level visual features: High level visual features:
edges, contrast, … , style, object orientation
saturation
ImageNet (14 millions of images)
Low level visual features: High level visual features:
edges, contrast, … , style, object orientation
saturation

classifier
Image features
1) Hand-crafted features
2) Generic features
3) Deep learning features
Natural Language Processing
Word Embeddings

Word Embedding is a feature learning techniques in natural language


processing (NLP) where words or phrases from the vocabulary are mapped
to vectors of real numbers.
Word Embeddings
Hello
Bye
Friend

Say
Word Embeddings
Hello 1, 5, 3, 23, 5, 2
Bye 4, 7, 23, 354, 23, 2
Friend 4, 45, 13, 567, 24
… …
Say 3434, 546, 235, 7
One-Hot encoding
One-Hot encoding
When we search for phrases such as “hotels in New Jersey” in Google, we
want results pertaining to “motel”, “lodging”, “accommodation” in New
Jersey returned as well. And if we are using one-hot encoding, these words
have no natural notion of similarity.
Word embeddings
Word2Vec
Doc2Vec
Glove
Word embeddings
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est
laborum.
Word embeddings
lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua ut enim ad minim
veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat duis aute irure dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur excepteur sint occaecat cupidatat
non proident1) sunt in culpa
Delete qui officiaand
punctuations deserunt mollit
convert anim id est laborum
to lowercase
Word embeddings
lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua ut

1) Delete punctuations and convert to lowercase


2) Get first 20 words only (seems enough)
Word embeddings
lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua ut

1) Delete punctuations and convert to lowercase


2) Get first 20 words only (seems enough)
3) Convert to vectors using word embeddings
Word embeddings
lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua ut

1) Delete punctuations and convert to lowercase


2) Get first 20 words only (seems enough)
3) Convert to vectors using word embeddings
4) Concatenate
Practical exercise

1) Get the deep features from any kind of Convolutional Neural Network
2) Work with any kind of word embeddings
Next week we will have Quiz 1 (10%)

1. Lecture 1
2. Lecture 2

You might also like