Professional Documents
Culture Documents
Gilles Gasso
September 5, 2019
Data
Big Data : continuous increase in data generated
Twitter : 50M tweets /day (=7 terabytes)
Facebook : 10 terabytes /day
Youtube : 50h of uploaded videos /minute
2.9 millions of e-mails /second
Score(x) > 0
Score(x) < 0
Product
features
Matrix factorization
Purchase history of customers
User
features
https://katbailey.github.io/images/matrix_factorization.png
https://www.researchgate.net/profile/Y_Nikitin/publication/270163511/figure/download/fig5/AS:
295194831409153@1447391340221/MPCNN- architecture- using- alternating- convolutional- and- max- pooling- layers- 13.png
https://adriancolyer.files.wordpress.com/2016/04/imagenet- fig4l.png?w=656
Gilles Gasso Introduction to Machine Learning 7 / 32
Medical diagnosis
https://arxiv.org/pdf/1506.02640.pdf
https://miro.medium.com/max/2058/1*LF5T9fsr4w2EqyFJkb- gng.png
Data
Data Data
Real World Strength- Business
Gathering engineering
ening
Sensors
Industries Cleaning Exploration Interpretation
Transactions
Insurance Organizing Representation Pattern of
Web-clicks,logs
Internet Aggregation Display interests
Mobility
Social Networks Management Machine Learning Strategic decision
Open data
Medical of errors Data Mining Valuation
Docs
... ... ... ...
...
1
1
1 1
1 1
Attributes
An attribute is a property or characteristic of a phenomenon being
observed. Also called it feature or variable
Sample
It is an entity characterising an object; it is made up of attributes.
Synonyms : instance, point, vector (usually in Rd )
0.1
25
0.095
Variable 3 : Chlorides
Variable 4 : Sulfur
20
0.09
Points
0.085 Moyenne des points 15
0.08
10
0.075
0.1
2.8
0.09 2.6
0.07 Va
ria 0.08
2.4
bl 2.2
e 0.07 2 r
3 ga
:C 1.8 l Su
0.065 hl
or 0.06 1.6 esid
ua
1.5 2 2.5 3 id 2:R
es a ble
Variable 2 : Residual Sugar Vari
Examples
Image classification, object detection, stock price prediction . . .
Examples
Applications: Customers segmentation, image segmentation, data
visualization, categorization of similar documents . . .
80
60 7 7 77
7777
27 7 7777 7777 7 9949
7777777777 777 7 7 77
7 7777 77 7 7777 7 77 7 444444
78 99999 9 77777
7 9 444
399 9 9999 999999 99 99 9999 4 44
44 999999 999 9 777 4 444 44 4
40
7
4 99 1 9799 7 777 44 44 4444 44
4444
4 44 4
11
111111111 11 87
1 111111111
2
2 77779 9
9 999979 7 47 7 7 444 444444444 4
7 7 7
2 111 1 11 59 999 4 9 94 4
1
1 1
1 1 1 94 4 44 4 9 9
11111111 7 1 27 2 4 977 9 9 9 2 77 44
2 11111111 111 1 3 722 44 77 9 999 999 4
4
20 6 1 1 1 11 4 9 9
9 9 4 4 4 4
8 6 1111 111117 233 4 3 98 9 99 49 444
4 4
22 22 2 11111 111
1 7 22 2
9 49 4 44 44
775 4 599494444 9
2222 222 1 111 1111112 6
111 222 2 2222 5
2 22 8 81 11 666 2 77 9
604
5 115 26 2 88
0 8 8 8 8 12 55555
8 88 888888 888 8888 33 5555 55
5
5
88 8 88 8 8 93 13 55 555
88 8
888888888 88 8
8 8 8 8 5588 8 99 55
5 5 5555 666
6
366602
6 6666
6
88888888888888 88 8 88 8 883 8888 85 5 55555 6 4 66
8 2 8 8 8 55 3535 555 5 666666666 6
33
33 35333 85 3
5 8 566 6666 6666 666 222 2
-20 3 8 55
5 55 5 6666 496 6 666 6 66666
2 8 5 33
3338 3 85 8 5 55 5 55 666 666666 6 666 6666 666
2
22 2
3
3 3 3359 5 5 5 5 55 6 66666 22222222222
3333 53 333 55 55 5555 5 66 66 66
66 22
2222222 222 23
333333 3 3 3 33 5 3 55 5 55 0 2
5 2
3 333 3 3 3 3
3
3 2222 2222 2
-40 333333 33 55 000
333333333
3 55 0 0 00000000 0
3 3
3 3 3 0
6 0000 00
0 0 0 00 2
53 0000 00 0 000
600 9 0 000
000 00000 00000000
0 00 0 0 00000000
-60 000000000 000000000
0
-80
-60 -40 -20 0 20 40 60 80
Objective
Identify the prediction function which minimizes the true risk i.e.
L(Y , f (X )) = |Y − f (X )|
`1 loss (absolute deviation):
0 if y = f (x)
0 − 1 loss: L(y , f (x)) =
1 otherwise
Hinge loss: L(y , f (x)) = max(0, 1 − yf (x))
Regression 1
Support Vector Machine Regression
Y is a subset of Rd .
0
y
−0.5
−1.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
Minimizing the true risk is not duable (in allmost all practical applications)
Empirical risk
N
1 X
Remp (f ) = L(yi , f (xi ))
N
i=1
Ensemble de Test
Ensemble d’apprentissage
Faible Elevé
Complexité du modèle
Gilles Gasso Introduction to Machine Learning 26 / 32
Illustration of overfitting
0.3
Risque empirique
0.2
App
0.1 Val
Test
0
30 20 10 0
Valeurs decroissantes de k
https:
//www.cs.princeton.edu/courses/archive/spring16/
cos495/slides/ML_basics_lecture6_overfitting.pdf
Find in H the best function f that learned based on the training set
will well generalize (low true risk)
Example : We are looking for a polynomial function of degree α
minimizing the risk : Remp (fα ) = N fα (xi ))2 .
P
i=1 (yi −
Goal :
1 propose a model estimation method in order to choose (approximately)
the best model belonging to H .
2 once the model is selected, estimate its generalization error.
Note
Dtest is used once!