Machine Learning: Practice 2

Machine Learning
Practice 2
Quan Minh Phan & Ngoc Hoang Luong
University of Information Technology (UIT)
April 7, 2021
Quan Minh Phan & Ngoc Hoang Luong (University of Information

MachineTechnology
Learning (UIT)) April 7, 2021 1 / 74
Table of contents
1 Perceptron
2 Linear Regression
3 Adaptive Linear Neuron (Adaline)
4 Logistic Regression

MachineTechnology
Figure: The general concept of the perceptron

MachineTechnology
Recall
Problem
Using the perceptron model to classify the species of flowers (’setosa’ or
’versicolor’) based on the sepal and petal width.
2 features: the sepal width x1 ; the petal width x2

x = [x1 , x2 ]
2 labels: ’setosa’ ← -1; ’versicolor’ ← 1
w = [w1 , w2 ]
Net input function (z)
z = w1 ∗ x1 + w2 ∗ x2
Unit step function (φ(·))
(
1 if z ≥ θ,
φ(z) =
−1 otherwise.

MachineTechnology
Recall (next)
w = [w0 , w1 , w2 ]
Net input function (z)
z = w0 + w1 ∗ x1 + w2 ∗ x2 = w T x
Unit step function (φ(·))
(
1 if z ≥ 0,
φ(z) =
−1 otherwise.

MachineTechnology
Unit Step Function

MachineTechnology
Figure: The general concept of the perceptron

MachineTechnology
Training process
Algorithm 1 Pseudocode for the training process

1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: for x ∈ X do
4: Compute the output value, ŷ
5: Updates the weights
6: end for
7: end while

MachineTechnology
Updating the weights
w = w + ∆w
∆wi = η ∗ (y − ŷ ) ∗ xi
where:
I η: learning rate
I y : the true class label
I ŷ : the predicted class label
Examples
∆w0 = η ∗ (y − ŷ )
∆w1 = η ∗ (y − ŷ ) ∗ x1
∆w2 = η ∗ (y − ŷ ) ∗ x2

MachineTechnology
Components
Hyperparameters
eta → the learning rate
max iter → the maximum number of epochs
random state → to make the reproducible results
Parameters
w → the weights of model
errors → to store the error in each epoch
Methods
fit(X , y ) → to train the model
predict(X ) → to predict the output value
net input(X ) → to combine the features with the weights

MachineTechnology
Implement (code from scratch)
class Peceptron :
def init (self, eta = 0.01, max iter = 20, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.errors = [ ]
def net input(self, X):

return np.dot(X, self.w[1:]) + self.w[0]
def predict(self, X):

return np.where(self.net input(X) ≥ 0.0, 1, -1)

MachineTechnology
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.errors = [ ]
for n iter in range (self.max iter):
n wronglabels = 0
idx = rgen.permutation(len(y))
X, y = X[idx], y[idx]
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.w[1:] += self.eta * error * xi
self.w[0] += self. eta * error
n wronglabels += int(error != 0.0)
self.errors.append(n wronglabels)

MachineTechnology
Implement (library)
from sklearn.linear model import Perceptron
Hyperparameters
eta
max iter
random state
Parameters
coef
intercept
Methods
fit(X , y )
predict(X )
MachineTechnology
Practice
Using ’iris.csv’ dataset

How can we use the ’sepal length’ and ’sepal width’ to classify the
speices of flower?

MachineTechnology
Data visualization
>> import matplotlib.pyplot as plt
>> idx setosa = y train == -1

idx versicolor = y train != -1
>> plt.scatter(X train[idx setosa, 0], X train[idx setosa, 1], color=’red’,

marker=’s’, label=’setosa’)
plt.scatter(X train[idx versicolor, 0], X train[idx versicolor, 1],
color=’blue’, marker=’x’, label=’versicolor’)
plt.xlabel(’sepal length [cm]’)
plt.ylabel(’sepal width [cm]’)
plt.legend(loc=’upper left’)
plt.show()

MachineTechnology
Data visualization

MachineTechnology
Practice
>> ppn = Perceptron (eta = 0.001, max iter = 30, random state = 1)
ppn.fit(X train, y train)
>> from sklearn.linear model import Perceptron

>> ppn = Perceptron(eta = 0.001, max iter = 30, random state = 1)
ppn .fit(X train, y train)

MachineTechnology
Plotting the errors
>> plt.plot(range(1, len(ppn.errors) + 1), ppn.errors, marker=’o’)

plt.xlabel(’Epochs’)
plt.ylabel(’# Missifications’)
plt.show()

MachineTechnology
Plotting the errors

MachineTechnology
Practice
>> w ppn = ppn.w

w ppn
>> [-0.01575655 0.06508244 -0.11108172]
>> w ppn = np.append(ppn .intercept , ppn .coef )

w ppn
>> [-0.006 0.0196 -0.0325]

MachineTechnology
Visualization
>> plot decision regions(X train, y train, classifier=ppn)

plt.xlabel(’sepal length [cm]’)
plt.ylabel(’sepal width [cm]’)
plt.legend(loc=’upper left’)
plt.show()

MachineTechnology
Visualization

MachineTechnology
Practice (next)
Create a new model and train it on data after standardizing

MachineTechnology
Data visualization

MachineTechnology
Plotting the costs

MachineTechnology
Plotting the results

MachineTechnology
Table of contents
1 Perceptron
2 Linear Regression

MachineTechnology
Figure: The general concept of Linear Regression

MachineTechnology
Minimizing cost functions with gradient descent
Cost function:
1 X (i)
J(w ) = (y − φ(z (i) ))2
2
i
Update the weights:

w := w + ∆w
∆w = −η∇J(w )
∂J X (i)
=− (y (i) − φ(z (i) ))xj
∂wj
i
∂J X (i)
∆wj = −η =η (y (i) − φ(z (i) ))xj
∂wj
i

MachineTechnology
Minimizing cost functions with gradient descent
(
wj + η ∗ X T .dot((y − φ(z)) j ∈ [1, . . . , n]
wj =
wj + η ∗ sum(y − φ(z)) j =0

MachineTechnology
Pseudocode of Training process
Algorithm 2 Gradient Descent

2: while Stopping Criteria is not satisfied do
5: end while

MachineTechnology
Components
Hyperparameters Parameters Methods

eta w fit(X , y )
max iter costs predict(X )
random state net input(X )

MachineTechnology
class LinearRegression GD:

self.eta = eta
self.w = None
self.costs = [ ]
def net input(self, X):

return np.dot(X, self.w[1:]) + self.w[0]

return self.net input(X)

MachineTechnology
self.costs = [ ]
for n iters in range (self.max iter):
errors = y - self.predict(X)
self.w[1:] += self.eta * X.T.dot(error)
self.w[0] += self.eta * error.sum()
cost = (error**2).sum() / 2
self.costs.append(cost)

MachineTechnology
Implement (library)
Stochastic Gradient Descent

from sklearn.linear model import SGDRegressor
Hyperparameters Parameters Methods

eta0 intercept fit(X, y)
max iter coef predict(X)
random state

MachineTechnology
Implement (library)
Normal Equation
from sklearn.linear model import LinearRegression
Parameters Methods
intercept fit(X, y)
coef predict(X)

MachineTechnology
Differences
Gradient Descent
w := w + ∆w
∆w = η i (y (i) − φ(z (i) )x i
P

w := w + ∆w
∆w = η(y (i) − φ(z (i) )x i
Normal Equation
w = (X T X )−1 X T y

MachineTechnology
Practice
Using ’housing.csv’ dataset

How can we use the ’average number of rooms’ (RM) to estimate the
’price’ of houses (MEDV)?

MachineTechnology
Plotting data
>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)

plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.show()

MachineTechnology
Practice

MachineTechnology
Practice
Gradient Descent
>> reg GD = LinearRegression GD(eta=0.001, max iter=20,
random state=1)
reg GD.fit(X train, y train)

>> reg SGD = SGDRegressor(eta0=0.001, max iter=20,
random state=1, l1 ratio=0, tol=None, learning rate=’constant’)
reg SGD.fit(X train, y train)
Normal Equation
>> reg NE = LinearRegression()
reg NE.fit(X train, y train)

MachineTechnology
Plotting the cost
>> plt.plot(range(1, len(reg GD.costs) + 1), reg GD.costs)

plt.xlabel(’Epochs’)
plt.ylabel(’Cost’)
plt.title(’Gradient Descent’)
plt.show()

MachineTechnology
MachineTechnology
Practice
>> w GD = reg GD.w

w GD
>> [0.00767139 0.64623542]
>> w SGD = np.append(reg SGD.intercept , reg SGD.coef )

w SGD
>> [0.00783841 0.64551218]
>> w NE = np.append(reg NE.intercept , reg NE.coef )

w NE
>> [0.00773059 0.64638912]

MachineTechnology

plt.plot(X train, reg GD.predict(X train), color=’red’, lw=10,
label=’Gradient Descent’)
plt.plot(X train, reg SGD.predict(X train), color=’blue’, lw=6,
label=’Stochastic Gradient Descent’)
plt.plot(X train, reg NE.predict(X train), color=’black’, lw=2,
label=’Normal Equation’)
plt.legend()
plt.show()

MachineTechnology

MachineTechnology
Practice
>> y pred 1 = reg GD.predict(X test)
>> y pred 2 = reg SGD.predict(X test)
>> y pred 3 = reg NE.predict(X test)

MachineTechnology
Performance Evaluation
>> from sklearn.metrics import mean absolute error as MAE

from sklearn.metrics import mean squared error as MSE
from sklearn.metrics import r2 score as R2
Mean Absolute Error

>> print(’MAE of GD:’, round(MAE(y test, y pred 1), 6))
print(’MAE of SGD:’, round(MAE(y test, y pred 2), 6))
print(’MAE of NE:’, round(MAE(y test, y pred 3), 6))
Mean Squared Error

>> print(’MSE of GD:’, round(MSE(y test, y pred 1), 6))
print(’MSE of SGD:’, round(MSE(y test, y pred 2), 6))
print(’MSE of NE:’, round(MSE(y test, y pred 3), 6))

MachineTechnology
R 2 score
>> print(’R2 of GD:’, round(R2(y test, y pred 1), 6))
print(’R2 of SGD:’, round(R2(y test, y pred 2), 6))
print(’R2 of NE:’, round(R2(y test, y pred 3), 6))

MachineTechnology
Learning rate too large

MachineTechnology
Polynominal Regression
Example
X = [258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0, 480.0, 586.0]
y = [236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0, 391.2, 390.8]
>> X = np.array([258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0,

480.0, 586.0])[:, np.newaxis]
y = np.array([236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0,
391.2, 390.8])
>> plt.scatter(X, y, label=’Training points’)

plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()

MachineTechnology
Plotting data

MachineTechnology
>> from sklearn.linear model import LinearRegression

lr = LinearRegression()
lr.fit(X, y)

MachineTechnology
Syntax
from sklearn.preprocessing import PolynomialFeatures
>> from sklearn.preprocessing import PolynomialFeatures

pr = LinearRegression()
quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X)
pr.fit(X quad, y)
>> X fit = np.arange(250, 600, 10)[:, np.newaxis]
>> y fit linear = lr.predict(X fit)

y fit quad = pr.predict(quadratic.fit transform(X fit))

MachineTechnology
>> plt.scatter(X, y, label=’Training points’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.plot(X fit, y fit linear, label=’Linear fit’, linestyle=’–’)
plt.plot(X fit, y fit quad, label=’Quadratic fit’)
plt.legend()
plt.tight layout()
plt.show()

MachineTechnology
MachineTechnology
Practice
Linear regression
>> lr = LR()
lr.fit(X train, y train)
Polynominal regression (quadratic)

>> quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X train)
pr quad = LR() pr quad = pr quad.fit(X quad, y train)
Polynominal regression (cubic)

>> cubic = PolynomialFeatures(degree=3)
X cubic = cubic.fit transform(X train)
pr cubic = LR() pr cubic = pr cubic.fit(X cubic, y train)

MachineTechnology
>> X fit = np.arange(X train.min(), X train.max(), 0.1)[:, np.newaxis]
>> y linear fit = lr.predict(X fit)

y quad fit = pr quad.predict(quadratic.fit transform(X fit))
y cubic fit = pr cubic.predict(cubic.fit transform(X fit))

MachineTechnology
plt.plot(X fit, y lin fit, label=’Linear (d=1)’, color=’blue’, lw=2,
linestyle=’:’)
plt.plot(X fit, y quad fit, label=’Quadratic (d=2)’, color=’red’, lw=2,
linestyle=’-’)
plt.plot(X fit, y cubic fit, label=’Cubic (d=3)’, color=’green’,
lw=2,linestyle=’–’)
plt.legend()
plt.show()

MachineTechnology
MachineTechnology
Table of contents
1 Perceptron
2 Linear Regression

MachineTechnology
Figure: Differences between Perceptron and Adaline

MachineTechnology
Training process
Algorithm 3 Pseudocode for the training process

2: while stopping criteria is not satisfied do
3: for x ∈ X do
6: end for
7: end while

MachineTechnology
Updating the weights
w = w + ∆w
∆wi = η ∗ (y − ŷ ) ∗ xi
where:
I η: learning rate
I y : the true class label
I ŷ : the predicted class label
Examples
∆w0 = η ∗ (y − ŷ )
∆w1 = η ∗ (y − ŷ ) ∗ x1
∆w2 = η ∗ (y − ŷ ) ∗ x2

MachineTechnology
Components
Hyperparameters
eta
max iter
random state
Parameters
w
costs
Methods
fit(X , y )
predict(X )
net input(X )
activation(X )
MachineTechnology
class Adaline:
self.eta = eta
self.w = None
self.costs = [ ]
def activation(self, X):

return self.net input(X)

return np.where(self.activation(X) ≥ 0.0, 1, -1)

MachineTechnology
self.costs = [ ]
idx = rgen.permutation(len(y))
X, y = X[idx], y[idx]
cost = 0
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.w[1:] += self.eta * error * xi
self.w[0] += self.eta * error
cost += error**2
cost /= 2

MachineTechnology
Table of contents
1 Perceptron
2 Linear Regression

MachineTechnology
Figure: Differences between Adaline and Logistic regression

MachineTechnology
Components
Hyperparameters
eta
max iter
random state
Parameters
w
costs
Methods
fit(X , y )
predict(X )
net input(X )
activation(X )
MachineTechnology
class LogisticRegression:
self.eta = eta
self.w = None
self.costs = [ ]
def activation(self, X):

return 1. / (1. + np.exp(-np.clip(self.net input(X), -250, 250)))

return np.where(self.activation(X) ≥ 0.5, 1, 0)

MachineTechnology
self.costs = [ ]
output = self.activation(X)
errors = y - output
self.w[1:] += self.eta * X.T.dot(errors)
self.w[0] += self.eta * errors.sum()
cost = (-y.dot(np.log(output)) - ((1 - y).dot(np.log(1 - output))))

MachineTechnology
Practice
>> clf LR = LogisticRegression(eta=0.01, max iter=20,

random state=1)
clf LR.fit SGD(X train, y train)
>> y pred = clf LR.predict(X test)

MachineTechnology
Implement (library)
Syntax (import)
from sklearn.linear model import LogisticRegression
Examples
>> from sklearn.linear model import LogisticRegression as
LogisticRegression
>> clf LR lib = LogisticRegression (random state=1)
clf LR lib.fit(X train, y train)
>> y pred lib1 = clf LR lib.predict(X test)

MachineTechnology

Machine Learning: Practice 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning: Practice 2

Uploaded by

Copyright:

Available Formats

Machine Learning

Quan Minh Phan & Ngoc Hoang Luong

University of Information Technology (UIT)

Quan Minh Phan & Ngoc Hoang Luong (University of Information

3 Adaptive Linear Neuron (Adaline)

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

2 features: the sepal width x1 ; the petal width x2

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Algorithm 1 Pseudocode for the training process

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

def net input(self, X):

def predict(self, X):

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

from sklearn.linear model import Perceptron

Using ’iris.csv’ dataset

Quan Minh Phan & Ngoc Hoang Luong (University of Information

>> import matplotlib.pyplot as plt

>> idx setosa = y train == -1

>> plt.scatter(X train[idx setosa, 0], X train[idx setosa, 1], color=’red’,

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

>> from sklearn.linear model import Perceptron

Quan Minh Phan & Ngoc Hoang Luong (University of Information

>> plt.plot(range(1, len(ppn.errors) + 1), ppn.errors, marker=’o’)

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

>> w ppn = ppn.w

>> w ppn = np.append(ppn .intercept , ppn .coef )

Quan Minh Phan & Ngoc Hoang Luong (University of Information

>> plot decision regions(X train, y train, classifier=ppn)

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Create a new model and train it on data after standardizing

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

3 Adaptive Linear Neuron (Adaline)

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Update the weights:

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Algorithm 2 Gradient Descent

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Hyperparameters Parameters Methods

Quan Minh Phan & Ngoc Hoang Luong (University of Information

class LinearRegression GD:

def net input(self, X):

def predict(self, X):

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Quan Minh Phan & Ngoc Hoang Luong (University of Information

Stochastic Gradient Descent

Hyperparameters Parameters Methods

Quan Minh Phan & Ngoc Hoang Luong (University of Information