You are on page 1of 74

Machine Learning

Practice 2

Quan Minh Phan & Ngoc Hoang Luong

University of Information Technology (UIT)

April 7, 2021

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 1 / 74
Table of contents

1 Perceptron

2 Linear Regression

3 Adaptive Linear Neuron (Adaline)

4 Logistic Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 2 / 74
Figure: The general concept of the perceptron

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 3 / 74
Recall

Problem
Using the perceptron model to classify the species of flowers (’setosa’ or
’versicolor’) based on the sepal and petal width.

2 features: the sepal width x1 ; the petal width x2


x = [x1 , x2 ]
2 labels: ’setosa’ ← -1; ’versicolor’ ← 1
w = [w1 , w2 ]
Net input function (z)
z = w1 ∗ x1 + w2 ∗ x2
Unit step function (φ(·))
(
1 if z ≥ θ,
φ(z) =
−1 otherwise.

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 4 / 74
Recall (next)

w = [w0 , w1 , w2 ]
Net input function (z)
z = w0 + w1 ∗ x1 + w2 ∗ x2 = w T x
Unit step function (φ(·))
(
1 if z ≥ 0,
φ(z) =
−1 otherwise.

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 5 / 74
Unit Step Function

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 6 / 74
Figure: The general concept of the perceptron

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 7 / 74
Training process

Algorithm 1 Pseudocode for the training process


1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: for x ∈ X do
4: Compute the output value, ŷ
5: Updates the weights
6: end for
7: end while

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 8 / 74
Updating the weights

w = w + ∆w
∆wi = η ∗ (y − ŷ ) ∗ xi
where:
I η: learning rate
I y : the true class label
I ŷ : the predicted class label

Examples
∆w0 = η ∗ (y − ŷ )
∆w1 = η ∗ (y − ŷ ) ∗ x1
∆w2 = η ∗ (y − ŷ ) ∗ x2

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 9 / 74
Components

Hyperparameters
eta → the learning rate
max iter → the maximum number of epochs
random state → to make the reproducible results

Parameters
w → the weights of model
errors → to store the error in each epoch

Methods
fit(X , y ) → to train the model
predict(X ) → to predict the output value
net input(X ) → to combine the features with the weights

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 10 / 74
Implement (code from scratch)

class Peceptron :
def init (self, eta = 0.01, max iter = 20, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.errors = [ ]

def net input(self, X):


return np.dot(X, self.w[1:]) + self.w[0]

def predict(self, X):


return np.where(self.net input(X) ≥ 0.0, 1, -1)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 11 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.errors = [ ]
for n iter in range (self.max iter):
n wronglabels = 0
idx = rgen.permutation(len(y))
X, y = X[idx], y[idx]
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.w[1:] += self.eta * error * xi
self.w[0] += self. eta * error
n wronglabels += int(error != 0.0)
self.errors.append(n wronglabels)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 12 / 74
Implement (library)

from sklearn.linear model import Perceptron

Hyperparameters
eta
max iter
random state

Parameters
coef
intercept

Methods
fit(X , y )
predict(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 13 / 74
Practice

Using ’iris.csv’ dataset


How can we use the ’sepal length’ and ’sepal width’ to classify the
speices of flower?

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 14 / 74
Data visualization

>> import matplotlib.pyplot as plt

>> idx setosa = y train == -1


idx versicolor = y train != -1

>> plt.scatter(X train[idx setosa, 0], X train[idx setosa, 1], color=’red’,


marker=’s’, label=’setosa’)
plt.scatter(X train[idx versicolor, 0], X train[idx versicolor, 1],
color=’blue’, marker=’x’, label=’versicolor’)
plt.xlabel(’sepal length [cm]’)
plt.ylabel(’sepal width [cm]’)
plt.legend(loc=’upper left’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 15 / 74
Data visualization

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 16 / 74
Practice

>> ppn = Perceptron (eta = 0.001, max iter = 30, random state = 1)
ppn.fit(X train, y train)

>> from sklearn.linear model import Perceptron


>> ppn = Perceptron(eta = 0.001, max iter = 30, random state = 1)
ppn .fit(X train, y train)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 17 / 74
Plotting the errors

>> plt.plot(range(1, len(ppn.errors) + 1), ppn.errors, marker=’o’)


plt.xlabel(’Epochs’)
plt.ylabel(’# Missifications’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 18 / 74
Plotting the errors

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 19 / 74
Practice

>> w ppn = ppn.w


w ppn
>> [-0.01575655 0.06508244 -0.11108172]

>> w ppn = np.append(ppn .intercept , ppn .coef )


w ppn
>> [-0.006 0.0196 -0.0325]

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 20 / 74
Visualization

>> plot decision regions(X train, y train, classifier=ppn)


plt.xlabel(’sepal length [cm]’)
plt.ylabel(’sepal width [cm]’)
plt.legend(loc=’upper left’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 21 / 74
Visualization

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 22 / 74
Practice (next)

Create a new model and train it on data after standardizing

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 23 / 74
Data visualization

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 24 / 74
Plotting the costs

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 25 / 74
Plotting the results

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 26 / 74
Table of contents

1 Perceptron

2 Linear Regression

3 Adaptive Linear Neuron (Adaline)

4 Logistic Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 27 / 74
Figure: The general concept of Linear Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 28 / 74
Minimizing cost functions with gradient descent

Cost function:
1 X (i)
J(w ) = (y − φ(z (i) ))2
2
i

Update the weights:


w := w + ∆w
∆w = −η∇J(w )

∂J X (i)
=− (y (i) − φ(z (i) ))xj
∂wj
i

∂J X (i)
∆wj = −η =η (y (i) − φ(z (i) ))xj
∂wj
i

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 29 / 74
Minimizing cost functions with gradient descent

(
wj + η ∗ X T .dot((y − φ(z)) j ∈ [1, . . . , n]
wj =
wj + η ∗ sum(y − φ(z)) j =0

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 30 / 74
Pseudocode of Training process

Algorithm 2 Gradient Descent


1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: Compute the output value, ŷ
4: Updates the weights
5: end while

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 31 / 74
Components

Hyperparameters Parameters Methods


eta w fit(X , y )
max iter costs predict(X )
random state net input(X )

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 32 / 74
Implement (code from scratch)

class LinearRegression GD:


def init (self, eta = 0.001, max iter = 20, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]

def net input(self, X):


return np.dot(X, self.w[1:]) + self.w[0]

def predict(self, X):


return self.net input(X)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 33 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iters in range (self.max iter):
errors = y - self.predict(X)
self.w[1:] += self.eta * X.T.dot(error)
self.w[0] += self.eta * error.sum()
cost = (error**2).sum() / 2
self.costs.append(cost)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 34 / 74
Implement (library)

Stochastic Gradient Descent


from sklearn.linear model import SGDRegressor

Hyperparameters Parameters Methods


eta0 intercept fit(X, y)
max iter coef predict(X)
random state

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 35 / 74
Implement (library)

Normal Equation
from sklearn.linear model import LinearRegression

Parameters Methods
intercept fit(X, y)
coef predict(X)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 36 / 74
Differences

Gradient Descent
w := w + ∆w
∆w = η i (y (i) − φ(z (i) )x i
P

Stochastic Gradient Descent


w := w + ∆w
∆w = η(y (i) − φ(z (i) )x i

Normal Equation
w = (X T X )−1 X T y

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 37 / 74
Practice

Using ’housing.csv’ dataset


How can we use the ’average number of rooms’ (RM) to estimate the
’price’ of houses (MEDV)?

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 38 / 74
Plotting data

>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)


plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 39 / 74
Practice

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 40 / 74
Practice

Gradient Descent
>> reg GD = LinearRegression GD(eta=0.001, max iter=20,
random state=1)
reg GD.fit(X train, y train)

Stochastic Gradient Descent


>> reg SGD = SGDRegressor(eta0=0.001, max iter=20,
random state=1, l1 ratio=0, tol=None, learning rate=’constant’)
reg SGD.fit(X train, y train)

Normal Equation
>> reg NE = LinearRegression()
reg NE.fit(X train, y train)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 41 / 74
Plotting the cost

>> plt.plot(range(1, len(reg GD.costs) + 1), reg GD.costs)


plt.xlabel(’Epochs’)
plt.ylabel(’Cost’)
plt.title(’Gradient Descent’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 42 / 74
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 43 / 74
Practice

>> w GD = reg GD.w


w GD
>> [0.00767139 0.64623542]

>> w SGD = np.append(reg SGD.intercept , reg SGD.coef )


w SGD
>> [0.00783841 0.64551218]

>> w NE = np.append(reg NE.intercept , reg NE.coef )


w NE
>> [0.00773059 0.64638912]

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 44 / 74
Plotting the results

>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)


plt.plot(X train, reg GD.predict(X train), color=’red’, lw=10,
label=’Gradient Descent’)
plt.plot(X train, reg SGD.predict(X train), color=’blue’, lw=6,
label=’Stochastic Gradient Descent’)
plt.plot(X train, reg NE.predict(X train), color=’black’, lw=2,
label=’Normal Equation’)
plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.legend()
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 45 / 74
Plotting the results

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 46 / 74
Practice

>> y pred 1 = reg GD.predict(X test)

>> y pred 2 = reg SGD.predict(X test)

>> y pred 3 = reg NE.predict(X test)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 47 / 74
Performance Evaluation

>> from sklearn.metrics import mean absolute error as MAE


from sklearn.metrics import mean squared error as MSE
from sklearn.metrics import r2 score as R2

Mean Absolute Error


>> print(’MAE of GD:’, round(MAE(y test, y pred 1), 6))
print(’MAE of SGD:’, round(MAE(y test, y pred 2), 6))
print(’MAE of NE:’, round(MAE(y test, y pred 3), 6))

Mean Squared Error


>> print(’MSE of GD:’, round(MSE(y test, y pred 1), 6))
print(’MSE of SGD:’, round(MSE(y test, y pred 2), 6))
print(’MSE of NE:’, round(MSE(y test, y pred 3), 6))

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 48 / 74
R 2 score
>> print(’R2 of GD:’, round(R2(y test, y pred 1), 6))
print(’R2 of SGD:’, round(R2(y test, y pred 2), 6))
print(’R2 of NE:’, round(R2(y test, y pred 3), 6))

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 49 / 74
Learning rate too large

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 50 / 74
Polynominal Regression

Example
X = [258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0, 480.0, 586.0]
y = [236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0, 391.2, 390.8]

>> X = np.array([258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0,


480.0, 586.0])[:, np.newaxis]
y = np.array([236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0,
391.2, 390.8])

>> plt.scatter(X, y, label=’Training points’)


plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 51 / 74
Plotting data

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 52 / 74
Polynominal Regression

>> from sklearn.linear model import LinearRegression


lr = LinearRegression()
lr.fit(X, y)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 53 / 74
Polynominal Regression

Syntax
from sklearn.preprocessing import PolynomialFeatures

>> from sklearn.preprocessing import PolynomialFeatures


pr = LinearRegression()
quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X)
pr.fit(X quad, y)

>> X fit = np.arange(250, 600, 10)[:, np.newaxis]

>> y fit linear = lr.predict(X fit)


y fit quad = pr.predict(quadratic.fit transform(X fit))

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 54 / 74
>> plt.scatter(X, y, label=’Training points’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.plot(X fit, y fit linear, label=’Linear fit’, linestyle=’–’)
plt.plot(X fit, y fit quad, label=’Quadratic fit’)
plt.legend()
plt.tight layout()
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 55 / 74
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 56 / 74
Practice

Linear regression
>> lr = LR()
lr.fit(X train, y train)

Polynominal regression (quadratic)


>> quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X train)
pr quad = LR() pr quad = pr quad.fit(X quad, y train)

Polynominal regression (cubic)


>> cubic = PolynomialFeatures(degree=3)
X cubic = cubic.fit transform(X train)
pr cubic = LR() pr cubic = pr cubic.fit(X cubic, y train)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 57 / 74
>> X fit = np.arange(X train.min(), X train.max(), 0.1)[:, np.newaxis]

>> y linear fit = lr.predict(X fit)


y quad fit = pr quad.predict(quadratic.fit transform(X fit))
y cubic fit = pr cubic.predict(cubic.fit transform(X fit))

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 58 / 74
Plotting the results
>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)
plt.plot(X fit, y lin fit, label=’Linear (d=1)’, color=’blue’, lw=2,
linestyle=’:’)
plt.plot(X fit, y quad fit, label=’Quadratic (d=2)’, color=’red’, lw=2,
linestyle=’-’)
plt.plot(X fit, y cubic fit, label=’Cubic (d=3)’, color=’green’,
lw=2,linestyle=’–’)
plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.legend()
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 59 / 74
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 60 / 74
Table of contents

1 Perceptron

2 Linear Regression

3 Adaptive Linear Neuron (Adaline)

4 Logistic Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 61 / 74
Figure: Differences between Perceptron and Adaline

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 62 / 74
Training process

Algorithm 3 Pseudocode for the training process


1: Initialize the weights, w
2: while stopping criteria is not satisfied do
3: for x ∈ X do
4: Compute the output value, ŷ
5: Updates the weights
6: end for
7: end while

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 63 / 74
Updating the weights

w = w + ∆w
∆wi = η ∗ (y − ŷ ) ∗ xi
where:
I η: learning rate
I y : the true class label
I ŷ : the predicted class label

Examples
∆w0 = η ∗ (y − ŷ )
∆w1 = η ∗ (y − ŷ ) ∗ x1
∆w2 = η ∗ (y − ŷ ) ∗ x2

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 64 / 74
Components

Hyperparameters
eta
max iter
random state

Parameters
w
costs

Methods
fit(X , y )
predict(X )
net input(X )
activation(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 65 / 74
Implement (code from scratch)

class Adaline:
def init (self, eta = 0.01, max iter = 50, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]

def activation(self, X):


return self.net input(X)

def predict(self, X):


return np.where(self.activation(X) ≥ 0.0, 1, -1)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 66 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iter in range (self.max iter):
idx = rgen.permutation(len(y))
X, y = X[idx], y[idx]
cost = 0
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.w[1:] += self.eta * error * xi
self.w[0] += self.eta * error
cost += error**2
cost /= 2
self.costs.append(cost)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 67 / 74
Table of contents

1 Perceptron

2 Linear Regression

3 Adaptive Linear Neuron (Adaline)

4 Logistic Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 68 / 74
Figure: Differences between Adaline and Logistic regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 69 / 74
Components

Hyperparameters
eta
max iter
random state

Parameters
w
costs

Methods
fit(X , y )
predict(X )
net input(X )
activation(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 70 / 74
Implement (code from scratch)

class LogisticRegression:
def init (self, eta = 0.01, max iter = 50, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]

def activation(self, X):


return 1. / (1. + np.exp(-np.clip(self.net input(X), -250, 250)))

def predict(self, X):


return np.where(self.activation(X) ≥ 0.5, 1, 0)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 71 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iter in range (self.max iter):
output = self.activation(X)
errors = y - output
self.w[1:] += self.eta * X.T.dot(errors)
self.w[0] += self.eta * errors.sum()
cost = (-y.dot(np.log(output)) - ((1 - y).dot(np.log(1 - output))))
self.costs.append(cost)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 72 / 74
Practice

>> clf LR = LogisticRegression(eta=0.01, max iter=20,


random state=1)
clf LR.fit SGD(X train, y train)
>> y pred = clf LR.predict(X test)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 73 / 74
Implement (library)

Syntax (import)
from sklearn.linear model import LogisticRegression

Examples
>> from sklearn.linear model import LogisticRegression as
LogisticRegression
>> clf LR lib = LogisticRegression (random state=1)
clf LR lib.fit(X train, y train)
>> y pred lib1 = clf LR lib.predict(X test)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 74 / 74

You might also like