You are on page 1of 83

Welcome to

the course!
LINEAR C L AS S I F I E R S I N P Y T

HON

Michael ( Mike ) Gelbart


Instructor, The University of
British Columbia
Assumed knowledge
In this course we'll assume y o u have some prior exposure to:

Python, at the level of Intermediate Python for Data Science

scikit-learn, at the level of Supervised Learning w ith


scikit - learn
supervised learning, at the level of Supervised Learning
w ith scikit -learn

LINEAR CLASSIFIERS IN PYTHON


Fitting and predicting
import sklearn.datasets

newsgroups = sklearn.datasets.fetch_20newsgroups_vectorized()

X, y = newsgroups.data, newsgroups.target

X.shape

(11314, 130107)

y.shape

(11314,)

LINEAR CLASSIFIERS IN PYTHON


Fitting and predicting (cont.)
from sklearn.neighbors import
KNeighborsClassifier

knn =

KNeighborsClassifier(n_neighbors=1)

knn.fit(X,y)

y_pred = knn.predict(X)

LINEAR CLASSIFIERS IN PYTHON


Model evaluation
knn.score(X,y)

0.99991

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

knn.fit(X_train, y_train)

knn.score(X_test, y_test)

0.66242

LINEAR CLASSIFIERS IN PYTHON


Let's practice !
LINEAR C L AS S I F I E R S I N P Y
THON
Appl y ing logistic
regression and
SVM
LINEAR C L AS S I F I E R S I N P Y T H O N

Michael ( Mike ) Gelbart


Instructor, The University of
British Columbia
Using LogisticRegression
from sklearn.linear_model import
LogisticRegression

lr = LogisticRegression()
lr.fit(X_train, y_train)
lr.predict(X_test)
lr.score(X_test, y_test)

LINEAR CLASSIFIERS IN PYTHON


LogisticRegression example
import sklearn.datasets
wine = sklearn.datasets.load_wine()
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(wine.data, wine.target)
lr.score(wine.data, wine.target)

0.972

lr.predict_proba(wine.data[:1])

array([[ 9.951e-01, 4.357e-03, 5.339e-04]])

LINEAR CLASSIFIERS IN PYTHON


Using LinearSVC
LinearSVC works the same
way:
import sklearn.datasets

wine = sklearn.datasets.load_wine()
from sklearn.svm import LinearSVC

svm = LinearSVC()

svm.fit(wine.data, wine.target)
svm.score(wine.data, wine.target)

0.893

LINEAR CLASSIFIERS IN PYTHON


Using
SVC
import sklearn.datasets
wine = sklearn.datasets.load_wine()
from sklearn.svm import SVC
svm = SVC() # default hyperparameters
svm.fit(wine.data, wine.target);
svm.score(wine.data, wine.target)

1.

Model complexity review:

Under fi t ing: model is too simple, low training

acc u rac y O v er fi t ing: model is too complex, low test


acc u rac y

LINEAR CLASSIFIERS IN PYTHON


Let's practice !
LINEAR C L AS S I F I E R S I N P Y
THON
Linear decision
boundaries
LINEAR C L AS S I F I E R S I N P Y

THON

Michael ( Mike ) Gelbart


Instructor, The University of
British Columbia
Linear decision boundaries

LINEAR CLASSIFIERS IN PYTHON


Definitions
Vocabulary:

classi fi cation: learning to predict categories

decision boundar y: the surface separating dif erent predicted


classes

linear classifier: a classifier that learns linear


decision boundaries
e.g., logistic regression, linear SVM

linearl y separable: a data set can be perfectly explained b y


a linear classifier

LINEAR CLASSIFIERS IN PYTHON


Linearly separable data

LINEAR CLASSIFIERS IN PYTHON


Let's practice !
LINEAR C L AS S I F I E R S I N P Y
THON
Linear classifiers:
prediction
equations
L I N EA R C L A S S I F I ER S I NP Y T H O N

Michael (Mike) Gelbart


Instructor, The University of British
Col u mbia
Dot
Prod ucts
x = np.arange(3) np.sum(x*y)
x

14
array([0, 1, 2])

x@y
y = np.arange(3,6)
y
14

array([3, 4, 5]) x@y is called the dot


prod uct of x and y , and
x*y is
w r i t e n x ⋅ y.
array([0, 4, 10])

LINEAR CLASSIFIERS IN
Linear classifier
prediction
raw model output = coefficients ⋅ features + intercept
Linear classifier prediction : comp u te raw model outp ut , check
the sign
if positive, predict one class

if negati v e , predict the other class

This is the same for logistic regression and linear SVM


fit is di ff erent b u t predict is the same

LINEAR CLASSIFIERS IN
How LogisticRegression makes
predictions
raw model output = coefficients ⋅ features + intercept
lr = LogisticRegression()

lr.fit(X,y)

lr.predict(X)[10]

lr.predict(X)[20]

LINEAR CLASSIFIERS IN
How LogisticRegression makes predictions
cont.)@ X[10] + lr.intercept_ # raw model
(lr.coef_
output

array([-33.78572166])

lr.coef_ @ X[20] + lr.intercept_ # raw model


output

array([ 0.08050621])

LINEAR CLASSIFIERS IN
The ra w model
output

LINEAR CLASSIFIERS IN
The ra w model
output

LINEAR CLASSIFIERS IN
The ra w model
output

LINEAR CLASSIFIERS IN
Let's practice!
L I N EA R C L A S S I F I ER S I NP Y
THON
What is a
loss
function?
L I N EA R C L A S S I F I ER S I N

Michael Gelbart
PYTHON
Instructor, The University of British
Col u mbia
Least squares: the squared
loss
scikit - learn ' s LinearRegression minimi zes a loss:
n
∑(
tru
e
ith
targ
et
val
ue

pre
dict
ed
ith
targ LINEAR CLASSIFIERS IN
Classification errors: the 0-1
loss
Squared loss not appropriate for classi fi cation problems
( more on this later).
A nat u ral loss for classi fi cation problem is the n u mber of
errors.

This is the 0-1 loss: it's 0 for a correct prediction and 1 for
an incorrect prediction .

But this loss is hard to minimi ze !

LINEAR CLASSIFIERS IN
Minimizing a
loss
from scipy.optimize import minimize

minimize(np.square, 0).x

array([0.])

minimize(np.square, 2).x

array([-1.88846401e-08])

LINEAR CLASSIFIERS IN
Let's practice!
L I N EA R C L A S S I F I ER S I NP Y
THON
Loss
function
diagrams
L I N EA R C L A S S I F I ER S I

Michael (Mike) Gelbart


N PYTHON
Instructor, The University of British
Col u mbia
The ra w model
output

LINEAR CLASSIFIERS IN
0-1 loss
diagram

LINEAR CLASSIFIERS IN
Linear regression loss
diagram

LINEAR CLASSIFIERS IN
Logistic loss
diagram

LINEAR CLASSIFIERS IN
Hinge loss
diagram

LINEAR CLASSIFIERS IN
Hinge loss
diagram

LINEAR CLASSIFIERS IN
Let's practice!
L I N EA R C L A S S I F I ER S I NP Y
THON
Logistic
regression and
regularization
L I N EA R C L A S S I F I ER S I NP Y T H O N

Michael (Mike) Gelbart


Instructor, The University of British
Col u mbia
Regularized logistic
regression

LINEAR CLASSIFIERS IN
Regularized logistic
regression

LINEAR CLASSIFIERS IN
How does regularization affect training
accuracy?
lr_weak_reg = LogisticRegression(C=100)
lr_strong_reg = LogisticRegression(C=0.01)

lr_weak_reg.fit(X_train, y_train)
lr_strong_reg.fit(X_train, y_train)

lr_weak_reg.score(X_train, y_train)
lr_strong_reg.score(X_train, y_train)

1.0
0.92

regularized loss = original loss + large coefficient penalty

more reg u lari z ation : lower training acc u rac y

LINEAR CLASSIFIERS IN
How does regularization affect test
acc urac y?
lr_weak_reg.score(X_test, y_test)

0.86

lr_strong_reg.score(X_test, y_test)

0.88

regularized loss = original loss + large coefficient penalty

more reg u lari z ation : lower training acc u rac y

more reg u lari z ation : ( almost always) higher test acc u rac y

LINEAR CLASSIFIERS IN
L1 vs. L2
reg ulari
Lasso zation
= linear regression wit h L1

reg ulari zat ion Ridge = linear regression w ith

L2 reg u lari z ation

For other models like logistic regression we


just say L1, L2, etc.

lr_L1 = LogisticRegression(penalty='l1')
lr_L2 = LogisticRegression() # penalty='l2'
by default

lr_L1.fit(X_train, y_train)
lr_L2.fit(X_train,
y_train)

plt.plot(lr_L1.coef_.flatten())
LINEAR CLASSIFIERS IN
L2 vs. L1
regularization

LINEAR CLASSIFIERS IN
Let's practice!
L I N EA R C L A S S I F I ER S I NP Y
THON
Logistic
regression and
probabilities
L I N EA R C L A S S I F I ER S I NP Y T H O N

Michael (Mike) Gelbart


Instructor, The University of British
Col u mbia
Logistic regression
probabilities
Witho u t reg u lari z ation model coe fficients :
( C = 108): [[1.55 1.57]]

model intercept : [-
0.64]

LINEAR CLASSIFIERS IN
Logistic regression
probabilities
Witho u t reg u lari z ation
( C = 108):

model coe ffi cients:


[[1.55 1.57]]

model intercept : [-
0.64]

LINEAR CLASSIFIERS IN
Logistic regression
probabilities
Witho u t reg u lari z ation With reg u lari z ation ( C =
( C = 108): 1):

model coe ffi cients : model coe fficients :


[[1.55 1.57]] [[0.45 0.64]]

model intercept : [- model intercept : [-


0.64] 0.26]

LINEAR CLASSIFIERS IN
How are these probabilities
computed?
logistic regression predictions : sign of ra w model o u tp u t logistic
regression probabilities : "squashed" raw model o u tp u t

LINEAR CLASSIFIERS IN
Let's practice!
L I N EA R C L A S S I F I ER S I NP Y
THON
M ulti-class
logistic
regression
L I N EA R C L A S S I F I ER S I NP Y T H O N

Michael (Mike) Gelbart


Instructor, The University of British
Col u mbia
Combining binary classifiers with one-vs-
rest
lr0.fit(X, y==0) lr1.decision_function(X)[0]

lr1.fit(X, y==1) -5.429

lr2.fit(X, y==2)
lr2.decision_function(X)[0]

# get raw model output


-7.532
lr0.decision_function(X)[0]

lr.fit(X, y)
6.124
lr.predict(X)[0]

LINEAR CLASSIFIERS IN
One-vs- "Multinomial" or "softmax":
rest:
fi t a binar y classifier for fi t a single classifier for all
each class classes

predict w ith all, take largest prediction directl y o u tp u ts


o u tp u t best class

pro: simple, mod u lar con: more complicated , new

con : not directl y optimi z ing code


acc u rac y pro: tackle the problem
directl y
common for SVMs as well
possible for SVMs, b u t less
can prod u ce probabilities
common

can prod u ce probabilities

LINEAR CLASSIFIERS IN
Model coefficients for multi-
class
# one-vs-rest by default lr_mn =
lr_ovr = LogisticRegression() LogisticRegression( multi
_class="multinomial",
lr_ovr.fit(X,y) solver="lbfgs")
lr_mn.fit(X,y)
lr_ovr.coef_.shape lr_mn.coef_.shape

(3,13) (3,13)

lr_ovr.intercept_.shape lr_mn.intercept_.shape

(3,) (3,)

LINEAR CLASSIFIERS IN
Let's practice!
L I N EA R C L A S S I F I ER S I NP Y
THON
Support Vectors
LINEAR C L AS S I F I E R S I N P Y T
HON

Michael ( Mike ) Gelbart


Instructor, The University of British
Col u mbia
What is an
Linear?classiifiers (so far )
SVM
Trained using the hinge loss and L2 reg u lari z ation

LINEAR CLASSIFIERS IN PYTHON


S upport v ector : a training e x ample not in the ft at part of the
loss diagram

S upport vector: an e x ample that is incorrectl y classiified or


close to the bo u ndar y

If an e x ample is not a s u pport vector, remo v ing it has no


e ff ect on the model

Ha v ing a small n u mber of s u pport vectors makes kernel SVMs


reall y fast

LINEAR CLASSIFIERS IN PYTHON


Ma x- margin viewpoint
The SVM maximizes the " margin " for linearl y separable
datasets

Margin : distance from the bo u ndar y to the closest


points

LINEAR CLASSIFIERS IN PYTHON


Ma x- margin viewpoint
The SVM maximizes the " margin " for linearl y separable
datasets

Margin : distance from the bo u ndar y to the closest


points

LINEAR CLASSIFIERS IN PYTHON


Let's practice !
LINEAR C L AS S I F I E R S I N P Y
THON
Kernel SVMs
LINEAR C L AS S I F I E R S I N P Y
THON

Michael ( Mike ) Gelbart


Instructor, The University of British
Col u mbia
Transforming your features

LINEAR CLASSIFIERS IN PYTHON


Transforming your features

LINEAR CLASSIFIERS IN PYTHON


Transforming your features

transformed feature =

(original feature)2

LINEAR CLASSIFIERS IN PYTHON


Transforming your features

transformed feature =

(original feature)2

LINEAR CLASSIFIERS IN PYTHON


Transforming your features

transformed feature =

(original feature)2

LINEAR CLASSIFIERS IN PYTHON


Kernel
SVMs
from sklearn.svm import SVC

svm = SVC(gamma=1) # default is kernel="rbf"

LINEAR CLASSIFIERS IN PYTHON


Kernel
SVMs
from sklearn.svm import SVC

svm = SVC(gamma=0.01) # default is


kernel="rbf"

smaller leads to smoother bo u ndaries


gamma

LINEAR CLASSIFIERS IN PYTHON


Kernel
SVMs
from sklearn.svm import SVC

svm = SVC(gamma=2) # default is kernel="rbf"

larger leads to more comple x bo u ndaries


gamma

LINEAR CLASSIFIERS IN PYTHON


Let's practice !
LINEAR C L AS S I F I E R S I N P Y
THON
Comparing logistic
regression and
SVM
LINEAR C L AS S I F I E R S I N P Y T H O N

Michael ( Mike ) Gelbart


Instructor, The University of British
Col u mbia
Logistic regression: S u pport v ector machine
(SVM):
Is a linear classiifier
Is a linear classiifier
Can use w ith kernels, b u t
slow Can use w ith kernels, and
fast
O u tp u ts meaningf u l
probabilities Does not nat u rall y o u tp u t
probabilities
Can be e x tended to m u lti -
class Can be e x tended to m u lti -

All data points a ff ect ifit class


Onl y " s u pport v ectors "
L2 or L1
reg ulari zat ion a ff ect ifit

Con v entionall y just L2


reg u lari z ation

LINEAR CLASSIFIERS IN PYTHON


Use in scikit - learn
Logistic regression in sklearn:

linear_model.LogisticRegression

Key h y perparameters in sklearn:

C (inverse reg u lari z ation strength )


penalty ( t y pe of reg ulari z ation )
multi_class ( t y pe of multi-class )
SVM in sklearn:

svm.LinearSVC and
svm.SVC

LINEAR CLASSIFIERS IN PYTHON


Use in scikit - learn
(cont
Ke .)
y h y perparameters in sklearn:

C (inverse reg u lari z ation strength )


kernel ( t y pe of kernel)
gamma (inverse RBF smoothness)

LINEAR CLASSIFIERS IN PYTHON


SGDClassifier
SGDClassifier : scales well to large dataset s

from sklearn.linear_model import


SGDClassifier

logreg = SGDClassifier(loss='log')

linsvm = SGDClassifier(loss='hinge')

SGDClassifier h yperparameter is like 1/C


alpha

LINEAR CLASSIFIERS IN PYTHON


Let's practice !
LINEAR C L AS S I F I E R S I N P Y
THON
Conclusion
LINEAR C L AS S I F I E R S I N P Y
THON

Michael ( Mike ) Gelbart


Instructor, The University of British
Col u mbia
How does this course fit into data
science ?
Data science

→ Machine learning
→ → Supervised learning
→ → → Classiification
→ → → → Linear classiifiers (this course)

LINEAR CLASSIFIERS IN PYTHON


Congratulations &
thanks!
LINEAR C L AS S I F I E R S I N P Y T H O N

You might also like