ML3 Linear Classifiers

Welcome to
the course!
LINEAR C L AS S I F I E R S I N P Y T
HON
Michael ( Mike ) Gelbart

Instructor, The University of
British Columbia
Assumed knowledge
In this course we'll assume y o u have some prior exposure to:
Python, at the level of Intermediate Python for Data Science
scikit-learn, at the level of Supervised Learning w ith

scikit - learn
supervised learning, at the level of Supervised Learning
w ith scikit -learn
LINEAR CLASSIFIERS IN PYTHON

Fitting and predicting
import sklearn.datasets
newsgroups = sklearn.datasets.fetch_20newsgroups_vectorized()
X, y = newsgroups.data, newsgroups.target
X.shape
(11314, 130107)
y.shape
(11314,)

Fitting and predicting (cont.)
from sklearn.neighbors import
KNeighborsClassifier
knn =
KNeighborsClassifier(n_neighbors=1)
knn.fit(X,y)
y_pred = knn.predict(X)

Model evaluation
knn.score(X,y)
0.99991
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
knn.fit(X_train, y_train)
knn.score(X_test, y_test)
0.66242

Let's practice !
LINEAR C L AS S I F I E R S I N P Y
THON
Appl y ing logistic
regression and
SVM
LINEAR C L AS S I F I E R S I N P Y T H O N

British Columbia
Using LogisticRegression
from sklearn.linear_model import
LogisticRegression
lr = LogisticRegression()
lr.fit(X_train, y_train)
lr.predict(X_test)
lr.score(X_test, y_test)

LogisticRegression example
wine = sklearn.datasets.load_wine()
from sklearn.linear_model import LogisticRegression
lr.fit(wine.data, wine.target)
lr.score(wine.data, wine.target)
0.972
lr.predict_proba(wine.data[:1])
array([[ 9.951e-01, 4.357e-03, 5.339e-04]])

Using LinearSVC
LinearSVC works the same
way:
from sklearn.svm import LinearSVC
svm = LinearSVC()
svm.fit(wine.data, wine.target)
svm.score(wine.data, wine.target)
0.893

Using
SVC
from sklearn.svm import SVC
svm = SVC() # default hyperparameters
svm.fit(wine.data, wine.target);
svm.score(wine.data, wine.target)
1.
Model complexity review:
Under fi t ing: model is too simple, low training
acc u rac y O v er fi t ing: model is too complex, low test

acc u rac y

Let's practice !
THON
Linear decision
boundaries
THON

British Columbia
Linear decision boundaries

Definitions
Vocabulary:
classi fi cation: learning to predict categories
decision boundar y: the surface separating dif erent predicted

classes
linear classifier: a classifier that learns linear

decision boundaries
e.g., logistic regression, linear SVM
linearl y separable: a data set can be perfectly explained b y

a linear classifier

Linearly separable data

Let's practice !
THON
Linear classifiers:
prediction
equations
L I N EA R C L A S S I F I ER S I NP Y T H O N
Michael (Mike) Gelbart

Instructor, The University of British
Col u mbia
Dot
Prod ucts
x = np.arange(3) np.sum(x*y)
x
14
array([0, 1, 2])
x@y
y = np.arange(3,6)
y
14
array([3, 4, 5]) x@y is called the dot

prod uct of x and y , and
x*y is
w r i t e n x ⋅ y.
array([0, 4, 10])
LINEAR CLASSIFIERS IN
Linear classifier
prediction
raw model output = coefficients ⋅ features + intercept
Linear classifier prediction : comp u te raw model outp ut , check
the sign
if positive, predict one class
if negati v e , predict the other class
This is the same for logistic regression and linear SVM

fit is di ff erent b u t predict is the same
How LogisticRegression makes
predictions
raw model output = coefficients ⋅ features + intercept
lr.fit(X,y)
lr.predict(X)[10]
lr.predict(X)[20]
How LogisticRegression makes predictions
cont.)@ X[10] + lr.intercept_ # raw model
(lr.coef_
output
array([-33.78572166])
lr.coef_ @ X[20] + lr.intercept_ # raw model

output
array([ 0.08050621])
The ra w model
output
The ra w model
output
The ra w model
output
Let's practice!
L I N EA R C L A S S I F I ER S I NP Y
THON
What is a
loss
function?
L I N EA R C L A S S I F I ER S I N
Michael Gelbart
PYTHON
Col u mbia
Least squares: the squared
loss
scikit - learn ' s LinearRegression minimi zes a loss:
n
∑(
tru
e
ith
targ
et
val
ue
−
pre
dict
ed
ith
targ LINEAR CLASSIFIERS IN
Classification errors: the 0-1
loss
Squared loss not appropriate for classi fi cation problems
( more on this later).
A nat u ral loss for classi fi cation problem is the n u mber of
errors.
This is the 0-1 loss: it's 0 for a correct prediction and 1 for
an incorrect prediction .
But this loss is hard to minimi ze !
Minimizing a
loss
from scipy.optimize import minimize
minimize(np.square, 0).x
array([0.])
minimize(np.square, 2).x
array([-1.88846401e-08])
Let's practice!
THON
Loss
function
diagrams
L I N EA R C L A S S I F I ER S I

N PYTHON
Col u mbia
The ra w model
output
0-1 loss
diagram
Linear regression loss
diagram
Logistic loss
diagram
Hinge loss
diagram
Hinge loss
diagram
Let's practice!
THON
Logistic
regression and
regularization

Col u mbia
Regularized logistic
regression
Regularized logistic
regression
How does regularization affect training
accuracy?
lr_weak_reg = LogisticRegression(C=100)
lr_strong_reg = LogisticRegression(C=0.01)
lr_weak_reg.fit(X_train, y_train)
lr_strong_reg.fit(X_train, y_train)
lr_weak_reg.score(X_train, y_train)
lr_strong_reg.score(X_train, y_train)
1.0
0.92
regularized loss = original loss + large coefficient penalty
more reg u lari z ation : lower training acc u rac y
How does regularization affect test
acc urac y?
lr_weak_reg.score(X_test, y_test)
0.86
lr_strong_reg.score(X_test, y_test)
0.88
regularized loss = original loss + large coefficient penalty
more reg u lari z ation : lower training acc u rac y
more reg u lari z ation : ( almost always) higher test acc u rac y
L1 vs. L2
reg ulari
Lasso zation
= linear regression wit h L1
reg ulari zat ion Ridge = linear regression w ith
L2 reg u lari z ation
For other models like logistic regression we

just say L1, L2, etc.
lr_L1 = LogisticRegression(penalty='l1')
lr_L2 = LogisticRegression() # penalty='l2'
by default
lr_L1.fit(X_train, y_train)
lr_L2.fit(X_train,
y_train)
plt.plot(lr_L1.coef_.flatten())
L2 vs. L1
regularization
Let's practice!
THON
Logistic
regression and
probabilities

Col u mbia
Logistic regression
probabilities
Witho u t reg u lari z ation model coe fficients :
( C = 108): [[1.55 1.57]]
model intercept : [-
0.64]
Logistic regression
probabilities
Witho u t reg u lari z ation
( C = 108):
model coe ffi cients:

[[1.55 1.57]]
model intercept : [-
0.64]
Logistic regression
probabilities
Witho u t reg u lari z ation With reg u lari z ation ( C =
( C = 108): 1):
model coe ffi cients : model coe fficients :

[[1.55 1.57]] [[0.45 0.64]]
model intercept : [- model intercept : [-

0.64] 0.26]
How are these probabilities
computed?
logistic regression predictions : sign of ra w model o u tp u t logistic
regression probabilities : "squashed" raw model o u tp u t
Let's practice!
THON
M ulti-class
logistic
regression

Col u mbia
Combining binary classifiers with one-vs-
rest
lr0.fit(X, y==0) lr1.decision_function(X)[0]
lr1.fit(X, y==1) -5.429
lr2.fit(X, y==2)
lr2.decision_function(X)[0]
# get raw model output

-7.532
lr0.decision_function(X)[0]
lr.fit(X, y)
6.124
lr.predict(X)[0]
One-vs- "Multinomial" or "softmax":
rest:
fi t a binar y classifier for fi t a single classifier for all
each class classes
predict w ith all, take largest prediction directl y o u tp u ts

o u tp u t best class
pro: simple, mod u lar con: more complicated , new
con : not directl y optimi z ing code

acc u rac y pro: tackle the problem
directl y
common for SVMs as well
possible for SVMs, b u t less
can prod u ce probabilities
common
can prod u ce probabilities
Model coefficients for multi-
class
# one-vs-rest by default lr_mn =
lr_ovr = LogisticRegression() LogisticRegression( multi
_class="multinomial",
lr_ovr.fit(X,y) solver="lbfgs")
lr_mn.fit(X,y)
lr_ovr.coef_.shape lr_mn.coef_.shape
(3,13) (3,13)
lr_ovr.intercept_.shape lr_mn.intercept_.shape
(3,) (3,)
Let's practice!
THON
Support Vectors
LINEAR C L AS S I F I E R S I N P Y T
HON

Col u mbia
What is an
Linear?classiifiers (so far )
SVM
Trained using the hinge loss and L2 reg u lari z ation

S upport v ector : a training e x ample not in the ft at part of the
loss diagram
S upport vector: an e x ample that is incorrectl y classiified or

close to the bo u ndar y
If an e x ample is not a s u pport vector, remo v ing it has no

e ff ect on the model
Ha v ing a small n u mber of s u pport vectors makes kernel SVMs

reall y fast

Ma x- margin viewpoint
The SVM maximizes the " margin " for linearl y separable
datasets
Margin : distance from the bo u ndar y to the closest

points

Ma x- margin viewpoint
The SVM maximizes the " margin " for linearl y separable
datasets
Margin : distance from the bo u ndar y to the closest

points

Let's practice !
THON
Kernel SVMs
THON

Col u mbia
Transforming your features


transformed feature =
(original feature)2

(original feature)2

(original feature)2

Kernel
SVMs
svm = SVC(gamma=1) # default is kernel="rbf"

Kernel
SVMs
svm = SVC(gamma=0.01) # default is

kernel="rbf"
smaller leads to smoother bo u ndaries

gamma

Kernel
SVMs
svm = SVC(gamma=2) # default is kernel="rbf"
larger leads to more comple x bo u ndaries

gamma

Let's practice !
THON
Comparing logistic
regression and
SVM

Col u mbia
Logistic regression: S u pport v ector machine
(SVM):
Is a linear classiifier
Is a linear classiifier
Can use w ith kernels, b u t
slow Can use w ith kernels, and
fast
O u tp u ts meaningf u l
probabilities Does not nat u rall y o u tp u t
probabilities
Can be e x tended to m u lti -
class Can be e x tended to m u lti -
All data points a ff ect ifit class

Onl y " s u pport v ectors "
L2 or L1
reg ulari zat ion a ff ect ifit
Con v entionall y just L2

reg u lari z ation

Use in scikit - learn
Logistic regression in sklearn:
linear_model.LogisticRegression
Key h y perparameters in sklearn:
C (inverse reg u lari z ation strength )

penalty ( t y pe of reg ulari z ation )
multi_class ( t y pe of multi-class )
SVM in sklearn:
svm.LinearSVC and
svm.SVC

Use in scikit - learn
(cont
Ke .)
y h y perparameters in sklearn:
C (inverse reg u lari z ation strength )

kernel ( t y pe of kernel)
gamma (inverse RBF smoothness)

SGDClassifier
SGDClassifier : scales well to large dataset s
from sklearn.linear_model import

SGDClassifier
logreg = SGDClassifier(loss='log')
linsvm = SGDClassifier(loss='hinge')
SGDClassifier h yperparameter is like 1/C

alpha

Let's practice !
THON
Conclusion
THON

Col u mbia
How does this course fit into data
science ?
Data science
→ Machine learning
→ → Supervised learning
→ → → Classiification
→ → → → Linear classiifiers (this course)

Congratulations &
thanks!

ML3 Linear Classifiers

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML3 Linear Classifiers

Uploaded by

Copyright:

Available Formats

Welcome to

Michael ( Mike ) Gelbart

Python, at the level of Intermediate Python for Data Science

scikit-learn, at the level of Supervised Learning w ith

LINEAR CLASSIFIERS IN PYTHON

LINEAR CLASSIFIERS IN PYTHON

LINEAR CLASSIFIERS IN PYTHON

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

LINEAR CLASSIFIERS IN PYTHON

Michael ( Mike ) Gelbart

LINEAR CLASSIFIERS IN PYTHON

array([[ 9.951e-01, 4.357e-03, 5.339e-04]])

LINEAR CLASSIFIERS IN PYTHON

LINEAR CLASSIFIERS IN PYTHON

Model complexity review:

Under fi t ing: model is too simple, low training

acc u rac y O v er fi t ing: model is too complex, low test

LINEAR CLASSIFIERS IN PYTHON

Michael ( Mike ) Gelbart

LINEAR CLASSIFIERS IN PYTHON

classi fi cation: learning to predict categories

decision boundar y: the surface separating dif erent predicted

linear classifier: a classifier that learns linear

linearl y separable: a data set can be perfectly explained b y

LINEAR CLASSIFIERS IN PYTHON

LINEAR CLASSIFIERS IN PYTHON

Michael (Mike) Gelbart

array([3, 4, 5]) x@y is called the dot

if negati v e , predict the other class

This is the same for logistic regression and linear SVM

lr.coef_ @ X[20] + lr.intercept_ # raw model

But this loss is hard to minimi ze !

Michael (Mike) Gelbart

Michael (Mike) Gelbart

regularized loss = original loss + large coefficient penalty

more reg u lari z ation : lower training acc u rac y

regularized loss = original loss + large coefficient penalty

more reg u lari z ation : lower training acc u rac y

reg ulari zat ion Ridge = linear regression w ith

L2 reg u lari z ation

For other models like logistic regression we

Michael (Mike) Gelbart

model coe ffi cients:

model coe ffi cients : model coe fficients :

model intercept : [- model intercept : [-

Michael (Mike) Gelbart

lr1.fit(X, y==1) -5.429

# get raw model output

predict w ith all, take largest prediction directl y o u tp u ts

pro: simple, mod u lar con: more complicated , new

con : not directl y optimi z ing code

can prod u ce probabilities

Michael ( Mike ) Gelbart

LINEAR CLASSIFIERS IN PYTHON

S upport vector: an e x ample that is incorrectl y classiified or

If an e x ample is not a s u pport vector, remo v ing it has no

Ha v ing a small n u mber of s u pport vectors makes kernel SVMs

LINEAR CLASSIFIERS IN PYTHON

Margin : distance from the bo u ndar y to the closest

LINEAR CLASSIFIERS IN PYTHON

Margin : distance from the bo u ndar y to the closest

LINEAR CLASSIFIERS IN PYTHON

Michael ( Mike ) Gelbart