You are on page 1of 192

alaa.taima@qu.edu.

iq
symbolic reasoning

Data

Continuous Numeric
Nominal Categorical
Dataset

Classification

Regression

Clustering

Bias-Variance Trade-Off

Jupyter Notebook
Colab
PyTorch

TensorFlow

Keras

Artificial Neural Networks


perceptron

Sigmoid
tanh
ReLU
Softmax
Loss Function

Momentum
NAG
keras

Adagrad
keras Adagrad
AdaDelta
keras Adadelta
RMSprop
keras RMSprop
Adam
keras Adam
Loss Function

Mean Square Error


Mean Absolute Error

Cross Entropy
Binary Cross Entropy
Backpropagation

Early stopping
Dropout
Batch Normalization
keras
CNN

keras

keras

RNN
RNN
LSTM
LSTM
LSTM
LSTM

GAN

GAN
GAN MNIST
12
13

predictive analytics

GPUs
14
15
16

Siri Alexa

.
17

biases

Atari Google DeepMind

catastrophic forgetting)
18

Out-of-distribution

overconfident
19

if and else
20

4
21

symbolic reasoning

Transformer networks
attention

thought
vector
22

Google Assistant Alexa Siri

.
23

Robbie.AI
.

1
2
3
4
25

Data
dare

PDF ASCII
26

JSON HTML CSV

PDF PDF

PDF

Big data

Big data analytics

Data integrity

Metadata
Raw data
Structured data

Unstructured data
27

1
2
3

Continuous Numeric

Nominal Categorical
28

Dataset

(Clustering)
29

Classification
30

Single-Label Classification
Multi-Label Classification
Binary Classification
Multi-Class Classification
31

iPhone Apple

Regression

(Fitting Line)

GPS
32
33

Clustering
34

.
35

Narrow AI

representation learning
36

.
37

Hyper Parameters

Model Selection

model evaluation
38
39

overfitting

Bias-Variance Trade-Off
40

fit

(Overfitting)

.
41

overfitting
overfitting

K-
k k fold Validation

k
k-1

K-fold Validation
42

Confusion Matrix

TP

( )

Yes ( )

( )

(TP) (FN)

(FP) (TN)

accuracy
prediction error
43

. F1-Score recall) precision

F1
F1
44

Windows Python
Windows Python
www.python.org Python
download
Python
3.x

Windows Python
Windows

Add Python 3.9.6 to PATH


Windows Python
Add Python 3.9.6 to PATH
C C
Python.exe
Python

Python IDLE 1
IDLE Windows
read-evaluate-print-loop
45

Integrated Development Environment IDLE


Python Python
REPL IDLE

Python Windows Prompt 2


Win + R Windows Python
cmd
Python Python PATH Python 3.x
Python python Windows

Python Python
46

pip. pip pip

.
Windows pip
NumPy
cmd Win + R

> pip install numpy

Python
>>> import numpy

Traceback (most recent call last):


File "<stdin>", line 1, in <module>
ImportError: No module named numpy

Jupyter Notebook
Jupyter Notebook

Notebook
Notebook
47

Python Jupyter Notebook


Jupyter
Windows Jupyter Jupyter

> pip install jupyter

Jupyter
> jupyter notebook

Jupyter
Jupyter Notebook
Jupyter Notebook
Jupyter
Notebook

Notebook
Jupyter Jupyter
Notebook
Python Jupyter Jupyter
.
Notebook Notebook
Python 3
48

Untitled.ipynb Jupyter

print ('Hello World!')


Ctrl + Enter

Colab
Google Colab Colaboratory
Python
Google Colab

OpenCv Pytorch Keras


Jupyter Colab Tensorflow
GPU Google
TPU

Colab
Python Colab
OpenCV TensorFlow Keras PyTorch
Google Drive Colab
GitHub
Kaggle

Google Colab
Google Colab
49

Colab Google Docs

Colab

TPU GPU
: Jupyter Notebook
Google Colab 1

TPU GPU 2

Google Colab
: Colab

Google Colab 1
http://colab.research.google.com
Google Colaboratory
2

Gmail 3
Gmail
50

Google Colab 4
5
Colab

PyTorch
Torch PyTorch
PyTorch
Facebook
51

PyTorch

tensorboard

TensorFlow
TensorFlow

TensorFlow
Google Brain

GPU
Windows 64 macOS Linux TensorFlow TPU
iOS Android
TensorFlow
TensorFlow
TensorFlow
52

TensorFlow

API
TensorFlow

Keras
Keras

TensorFlow
API
TensorFlow Python
backend Keras
API Keras TensorFlow

Keras
Android
PyTorch TensorFlow Keras iOS
Keras
53

API

GPU

Keras

PyTorch
Keras Tensorflow
54

1
2
3
4
5

7
Overfitting 8
9
keras
57

Artificial Neural Networks


ANN

.
58

ANN

backpropagation

perceptron

Perceptron
Perceptron

decision boundary

Bias

Perceptron
weights
59

bias
Activation Function

Perceptron

Perceptron

c
60

Perceptron
61

perceptron

step function
perceptron
62

epoch
Perceptron

learning
rate
63

1
2

Python Perceptron
perceptron

# import the necessary packages


import numpy as np

class Perceptron:
def __init__(self, N, alpha=0.1):
# initialize the weight matrix and store the learning rate
self.W = np.random.randn(N + 1) / np.sqrt(N)
self.alpha = alpha

Perceptron

N
Perceptron alpha
64

N +1
W N

def step(self, x):


return 1 if x > 0 else 0

fit Perceptron
scikit-Learn
def fit(self, X, y, epochs=10):
X = np.c_[X, np.ones((X.shape[0]))]

X fit
y
Perceptron

# loop over the desired number of epochs


for epoch in np.arange(0, epochs):
# loop over each individual data point
for (x, target) in zip(X, y):
# take the dot product between the input features
# and the weight matrix, then pass this value
# through the step function to obtain the prediction
p = self.step(np.dot(x, self.W))
# only perform a weight update if our prediction
# does not match the target
if p != target:
# determine the error
error = p - target
# update the weight matrix
self.W += -self.alpha * error * x
65

Perceptron

predict

def predict(self, X, addBias=True):


X = np.atleast_2d(X)
if addBias:
X = np.c_[X, np.ones((X.shape[0]))]
return self.step(np.dot(X, self.W))

X predict

Perceptron
XOR OR AND

OR
# construct the OR dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [1]])
# define our perceptron and train it
print("[INFO] training perceptron...")
p = Perceptron(X.shape[1], alpha=0.1)
p.fit(X, y, epochs=20)

OR

Perceptron
OR
# now that our perceptron is trained we can evaluate it
print("[INFO] testing perceptron...")
# now that our network is trained, loop over the data points
for (x, target) in zip(X, y):
# make a prediction on the data point and display the result
pred = p.predict(x)
print("[INFO] data={}, ground-truth={}, pred={}".format(
x, target[0], pred))

import numpy as np
class Perceptron:
def __init__(self, N, alpha=0.1):
66

self.W = np.random.randn(N + 1) / np.sqrt(N)


self.alpha = alpha

def step(self, x):


return 1 if x > 0 else 0

def fit(self, X, y, epochs=10):


X = np.c_[X, np.ones((X.shape[0]))]
for epoch in np.arange(0, epochs):
for (x, target) in zip(X, y):
p = self.step(np.dot(x, self.W))
if p != target:
error = p - target
self.W += -self.alpha * error * x
def predict(self, X, addBias=True):
X = np.atleast_2d(X)
if addBias:
X = np.c_[X, np.ones((X.shape[0]))]
return self.step(np.dot(X, self.W))

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])


y = np.array([[0], [1], [1], [1]])
# define our perceptron and train it
print("[INFO] training perceptron...")
p = Perceptron(X.shape[1], alpha=0.1)
p.fit(X, y, epochs=20)

print("[INFO] testing perceptron...")

for (x, target) in zip(X, y):


pred = p.predict(x)
print("[INFO] data={}, ground-truth={}, pred={}".format(
x, target[0], pred))

[INFO] training perceptron...


[INFO] testing perceptron...
[INFO] data=[0 0], ground-truth=0, pred=0
[INFO] data=[0 1], ground-truth=1, pred=1
[INFO] data=[1 0], ground-truth=1, pred=1
[INFO] data=[1 1], ground-truth=1, pred=1

OR OR Perceptron

AND
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [0], [0], [1]])
# define our perceptron and train it
print("[INFO] training perceptron...")
p = Perceptron(X.shape[1], alpha=0.1)
p.fit(X, y, epochs=20)
# now that our perceptron is trained we can evaluate it
67

print("[INFO] testing perceptron...")


# now that our network is trained, loop over the data points
for (x, target) in zip(X, y):
# make a prediction on the data point and display the result
# to our console
pred = p.predict(x)
print("[INFO] data={}, ground-truth={}, pred={}".format(
x, target[0], pred))

OR AND

[INFO] training perceptron...


[INFO] testing perceptron...
[INFO] data=[0 0], ground-truth=0, pred=0
[INFO] data=[0 1], ground-truth=0, pred=0
[INFO] data=[1 0], ground-truth=0, pred=0
[INFO] data=[1 1], ground-truth=1, pred=1

AND Perceptron
AND
AND
perceptron XOR

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])


y = np.array([[0], [1], [1], [0]])
# define our perceptron and train it
print("[INFO] training perceptron...")
p = Perceptron(X.shape[1], alpha=0.1)
p.fit(X, y, epochs=20)
# now that our perceptron is trained we can evaluate it
print("[INFO] testing perceptron...")
# now that our network is trained, loop over the data points
for (x, target) in zip(X, y):
# make a prediction on the data point and display the result
pred = p.predict(x)
print("[INFO] data={}, ground-truth={}, pred={}".format(
x, target[0], pred))

[INFO] training perceptron...


[INFO] testing perceptron...
[INFO] data=[0 0], ground-truth=0, pred=1
[INFO] data=[0 1], ground-truth=1, pred=1
[INFO] data=[1 0], ground-truth=1, pred=0
[INFO] data=[1 1], ground-truth=0, pred=0
68

[INFO] training perceptron...


[INFO] testing perceptron...
[INFO] data=[0 0], ground-truth=0, pred=0
[INFO] data=[0 1], ground-truth=1, pred=0
[INFO] data=[1 0], ground-truth=1, pred=0
[INFO] data=[1 1], ground-truth=0, pred=1

. Perceptron XOR

Perceptron

MLP MultilLayer Perceptron


Feedforward Neural Network
69

MLP

Feed forward

Fully Connected
70

MLP
universal approximators MLPs

: Perceptron

.
.

(Overfitting)
71

Vanishing
gradient

propagating backwards

forward propagation
72

x
y=
y=

Nonlinear
Perceptron

Zero-Centered

Computational Expense
73

Differentiable

Continuous
Bounded

explode

Vanishing Gradient problem


dead neuron

(Vanishing Gradient)

Exploding Gradients
74

dead neuron

Sigmoid
Sigmoid

def sigmoid(x):
return 1/(1+np.exp(-x))

Sigmoid

Sigmoid Sigmoid

Sigmoid
def der_sigmoid(x):
return sigmoid(x) * (1- sigmoid(x))
75

Sigmoid
import numpy as np
import matplotlib.pyplot as plt

# Sigmoid Activation Function


def sigmoid(x):
return 1/(1+np.exp(-x))

# Derivative of Sigmoid
def der_sigmoid(x):
return sigmoid(x) * (1- sigmoid(x))

# Generating data to plot


x_data = np.linspace(-10,10,100)
y_data = sigmoid(x_data)
dy_data = der_sigmoid(x_data)

# Plotting
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('Sigmoid Activation Function & Derivative')
plt.legend(['sigmoid','der_sigmoid'])
plt.grid()
plt.show()

Sigmoid Sigmoid

Sigmoid

Sigmoid
76

Sigmoid 1

Sigmoid 2

Sigmoid

Sigmoid

Sigmoid

saturation problem

tanh
Sigmoid tanh

def htan(x):
return (np.exp(x) - np.exp(-x))/(np.exp(x) + np.exp(-x))
77

Sigmoid tanh
Sigmoid Activator

tanh
Sigmoid tanh
tanh

def der_htan(x):
return 1 - htan(x) * htan(x)

tanh
import numpy as np
import matplotlib.pyplot as plt

# Hyperbolic Tangent (htan) Activation Function


def htan(x):
return (np.exp(x) - np.exp(-x))/(np.exp(x) + np.exp(-x))

# htan derivative
def der_htan(x):
return 1 - htan(x) * htan(x)

# Generating data for Graph


x_data = np.linspace(-6,6,100)
y_data = htan(x_data)
dy_data = der_htan(x_data)

# Graph
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('htan Activation Function & Derivative')
plt.legend(['htan','der_htan'])
plt.grid()
78

Sigmoid

ReLU
tanh Sigmoid
ReLU

ReLU

def ReLU(x):
data = [max(0,value) for value in x]
return np.array(data, dtype=float)
79

ReLU

def der_ReLU(x):
data = [1 if value>0 else 0 for value in x]
return np.array(data, dtype=float)

ReLU
import numpy as np
import matplotlib.pyplot as plt

# Rectified Linear Unit (ReLU)


def ReLU(x):
data = [max(0,value) for value in x]
return np.array(data, dtype=float)

# Derivative for ReLU


def der_ReLU(x):
data = [1 if value>0 else 0 for value in x]
return np.array(data, dtype=float)

# Generating data for Graph


x_data = np.linspace(-10,10,100)
y_data = ReLU(x_data)
dy_data = der_ReLU(x_data)

# Graph
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('ReLU Activation Function & Derivative')
plt.legend(['ReLU','der_ReLU'])
plt.grid()
plt.show()
80

Sigmoid

ReLU

Softmax
sofmax
Softmax Sigmoid

Softmax

def softmax(x):
return np.exp(x) / np.sum(np.exp(x), axis=0)

Softmax
81

import numpy as np
import matplotlib.pyplot as plt

# Softmax Activation Function


def softmax(x):
return np.exp(x) / np.sum(np.exp(x), axis=0)

# Generating data to plot


x_data = np.linspace(-10,10,100)
y_data = softmax(x_data)

# Plotting
plt.plot(x_data, y_data)
plt.title('Softmax Activation Function')
plt.legend(['Softmax'])
plt.grid()
plt.show()

A, B, C, D
Softmax

Loss Function
82

maximizing
minimizing

Gradient Descent

J( )

1
Gradient Descent
83

(Batch
(Mini-batch gradient gradient descent)
descent)
(SGD) Stochastic gradient descent

mini-batch gradient descent

(Noisy data)

rate decay

.
84

min

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
from itertools import product

def func(x):
return x**2 - 2*x - 3

def fprime(x):
return 2*x - 2

def plotFunc(x0):
x = np.linspace(-5, 7, 100)
plt.plot(x, func(x))
plt.plot(x0, func(x0), 'ko')
plt.xlabel('$x$')
plt.ylabel('$f(x)$')
plt.title('Objective Function')

def plotPath(xs, ys, x0):


plotFunc(x0)
plt.plot(xs, ys, linestyle='--', marker='o', color='#F12F79')
plt.plot(xs[-1], ys[-1], 'ko')
85

x0 = -4
plotFunc(x0)

def GradientDescentSimple(func, fprime, x0, alpha, tol=1e-5, max_iter=1000):


# initialize x, f(x), and -f'(x)
xk = x0
fk = func(xk)
pk = -fprime(xk)
# initialize number of steps, save x and f(x)
num_iter = 0
curve_x = [xk]
curve_y = [fk]
# take steps
while abs(pk) > tol and num_iter < max_iter:
# calculate new x, f(x), and -f'(x)
xk = xk + alpha * pk
fk = func(xk)
pk = -fprime(xk)
# increase number of steps by 1, save new x and f(x)
num_iter += 1
curve_x.append(xk)
curve_y.append(fk)
# print results
if num_iter == max_iter:
print('Gradient descent does not converge.')
else:
print('Solution found:\n y = {:.4f}\n x = {:.4f}'.format(fk, xk))
86

return curve_x, curve_y

xs, ys = GradientDescentSimple(func, fprime, x0, alpha=0.1)


plotPath(xs, ys, x0)

Solution found:
y = -4.0000
x = 1.0000

xs, ys = GradientDescentSimple(func, fprime, x0, alpha=0.9)


plotPath(xs, ys, x0)

Solution found:
y = -4.0000
x = 1.0000
87

xs, ys = GradientDescentSimple(func, fprime, x0, alpha=1e-4)


plotPath(xs, ys, x0)

Gradient descent does not converge.

xs, ys = GradientDescentSimple(func, fprime, x0, alpha=1.01, max_iter=8)


plotPath(xs, ys, x0)

Gradient descent does not converge.


88

.
max_iter = 8
.
x=1

.
.

Momentum

Momentum
89

Griewank

NAG
90

NAG Nesterov Accelerated Gradient Descent

Sutskever

keras
Keras
SGD SGD
from tensorflow.keras.optimizers import SGD
...
sgd = SGD(learning_rate=0.0001, momentum=0.8, nesterov=True)
model.compile(optimizer=sgd, loss = ..., metrics= ...)

(learning_rate) SGD
decay nesterov (momentum)
91

Adagrad

Adagrad
Adagrad

AdaGrad

keras Adagrad
Keras Adagrad
from tensorflow.keras.optimizers import Adagrad

...
92

adagrad = Adagrad(learning_rate=0.0001, epsilon=1e-6)

model.compile(optimizer=adagrad, loss = ..., metrics= ...)

AdaDelta
Adagrad

AdaDelta

Adadelta
93

keras Adadelta
Keras Adadelta
from tensorflow.keras.optimizers import Adadelta
...
adadelta= Adadelta(learning_rate=0.0001, epsilon=1e-6)
model.compile(optimizer= adadelta, loss = ..., metrics= ...)

RMSprop
Adadelta
root mean- RMSprop
RMS squared

keras RMSprop
Keras RMSprop
from tensorflow.keras.optimizers import RMSprop
...
rms_prop = RMSprop(learning_rate=0.0001, epsilon=1e-6)
model.compile(optimizer= rms_prop, loss = ..., metrics= ...)

Adam

Adagrad
Adadelta
RMSprop

Adam adaptive moment estimation


94

keras Adam
Keras Adam
from tensorflow.keras.optimizers import Adam
...
adam= Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-6)
model.compile(optimizer= adam, loss = ..., metrics= ...)

Loss Function
score
95

Mean Square Error

Keras MSE
from tensorflow import keras
...
loss_fn = keras.losses.MeanSquaredError()
model.compile(loss=loss_fn, optimizer=..., metrics= ...)

Mean Absolute Error

Keras MAE
from tensorflow import keras
...
loss_fn = keras.losses.MeanAbsoluteError()
model.compile(loss=loss_fn, optimizer=..., metrics= ...)

Cross Entropy

Keras Cross Entropy


from tensorflow import keras

...
96

loss_fn = keras.losses.CategoricalCrossentropy()

model.compile(loss=loss_fn, optimizer=..., metrics= ...)

Binary Cross Entropy


Binary Cross Entropy

Binary Cross Entropy


Keras
from tensorflow import keras
...
loss_fn = keras.losses.binary_crossentropy
model.compile(loss=loss_fn, optimizer=..., metrics= ...)

Backpropagation

MLP

credit
. assignment
97

.
98

variance scaling
Fan-Out Fan-In

Keras variance scaling

from keras import initializers


initializer = initializers.VarianceScaling(scale=1.0, mode='fan_in',
distribution='normal')

Benjio Gloret
Xavier

Xavier

Keras Xavier
from keras import initializers
initializer = initializers.GlorotNormal()
99

Xavier's

hyper parameter

task
100

overfitting
underfitting

overfitting

underfitting overfitting
101

overfitting overfitting
overfitting regularization

Early stopping

overfitting

.
102

Dropout

( )

ensemble
103

Keras
from keras.layers import Dropout
from keras.models import Sequential

model = Sequential()

model.add(Dropout(0.5))

Batch Normalization

distribution
internal
covariate shift

Batch Normalization

Keras
104

from keras.layers import BatchNormalization


from keras.models import Sequential

model = Sequential()

model.add(BatchNormalization())

keras
Keras
.
Kaggle 1
Zillow
2
.

CSV

Alt notebook pandas


+ Enter
import pandas as pd

pandas
CSV pd
df = pd.read_csv('housepricedata.csv')

'housepricedata.csv'

notebook df df df
Alt + Enter
df

1
https://www.kaggle.com/c/zillow-prize-1/data
2
https://drive.google.com/file/d/1h6LPHNs4F_FnxwfdE_fCIsGeEh30tDBf/view?usp=sharing
105

dataset = df.values

df.values df
dataset dataset dataset
Alt + Enter notebook

dataset

array([[ 8450, 7, 5, ..., 0, 548, 1],


[ 9600, 6, 8, ..., 1, 460, 1],
[11250, 7, 5, ..., 1, 608, 1],
...,
[ 9042, 7, 9, ..., 2, 252, 1],
[ 9717, 5, 6, ..., 0, 240, 0],
[ 9937, 5, 6, ..., 0, 276, 0]])

Y X

Y X
X = dataset[:,0:10]

0:10 X
X
.
106

Y
Y = dataset[:,10]

X
Y

scikit-
Learn
from sklearn import preprocessing

sklearn
min-max

min_max_scaler = preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)

X_scale X_scale

X_scale

array([[0.0334198 , 0.66666667, 0.5 , ..., 0.5 , 0. ,


0.3864598 ],
[0.03879502, 0.55555556, 0.875 , ..., 0.33333333, 0.33333333,
0.32440056],
[0.04650728, 0.66666667, 0.5 , ..., 0.33333333, 0.33333333,
0.42877292],
...,
[0.03618687, 0.66666667, 1. , ..., 0.58333333, 0.66666667,
0.17771509],
[0.03934189, 0.44444444, 0.625 , ..., 0.25 , 0. ,
0.16925247],
[0.04037019, 0.44444444, 0.625 , ..., 0.33333333, 0. ,
0.19464034]])

scikit-Learn
train_test_split

from sklearn.model_selection import train_test_split


107

X_train, X_val_and_test, Y_train, Y_val_and_test =


train_test_split(X_scale, Y, test_size=0.3)

val_and_test scikit-Learn

val_and_test

X_val, X_test, Y_val, Y_test = train_test_split(X_val_and_test,


Y_val_and_test, test_size=0.5)

val_and_test

X_train
X_val
X_test
Y_train
Y_val
Y_test

print(X_train.shape, X_val.shape, X_test.shape, Y_train.shape,


Y_val.shape, Y_test.shape)
(1022, 10) (219, 10) (219, 10) (1022,) (219,) (219,)

X
Y
108

ReLU
ReLU
Sigmoid

Sequential

Keras

from keras.models import Sequential


from keras.layers import Dense

model = Sequential([
Dense(32, activation='relu', input_shape=(10,)),
Dense(32, activation='relu'),
Dense(1, activation='sigmoid'),
])

model = Sequential([ ])

Dense(32, activation='relu', input_shape=(10,))


109

ReLU
Dense

Dense(32, activation='relu'),
ReLU
Keras

Dense(1, activation='sigmoid'),

Sigmoid

model.compile

model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])

model.compile

optimizer='sgd',

sgd
loss='binary_crossentropy',

'binary_crossentropy'

metrics=['accuracy']

keras
110

hist = model.fit(X_train, Y_train,


batch_size=32, epochs=100,
validation_data=(X_val, Y_val))

fit
Y_train X_train
epochs batch_size

hist

Epoch 1/100
32/32 [==============================] - 0s 5ms/step - loss: 0.6990 - accuracy: 0.3542 -
val_loss: 0.6974 - val_accuracy: 0.3699
Epoch 2/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6955 - accuracy: 0.4022 -
val_loss: 0.6943 - val_accuracy: 0.4110
Epoch 3/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6926 - accuracy: 0.4706 -
val_loss: 0.6915 - val_accuracy: 0.4703
Epoch 4/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6899 - accuracy: 0.5499 -
val_loss: 0.6889 - val_accuracy: 0.5616
Epoch 5/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6874 - accuracy: 0.6468 -
val_loss: 0.6864 - val_accuracy: 0.6758
Epoch 6/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6849 - accuracy: 0.7133 -
val_loss: 0.6842 - val_accuracy: 0.7123
Epoch 7/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6828 - accuracy: 0.7524 -
val_loss: 0.6821 - val_accuracy: 0.7489
Epoch 8/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6807 - accuracy: 0.7564 -
val_loss: 0.6801 - val_accuracy: 0.7717
Epoch 9/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6787 - accuracy: 0.7779 -
val_loss: 0.6781 - val_accuracy: 0.8037
Epoch 10/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6767 - accuracy: 0.8072 -
val_loss: 0.6761 - val_accuracy: 0.8128
Epoch 11/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6746 - accuracy: 0.8317 -
val_loss: 0.6740 - val_accuracy: 0.8219
Epoch 12/100
32/32 [==============================] - 0s 2ms/step - loss: 0.6725 - accuracy: 0.8239 -
val_loss: 0.6717 - val_accuracy: 0.8265

Epoch 95/100
32/32 [==============================] - 0s 2ms/step - loss: 0.2931 - accuracy: 0.8865 -
val_loss: 0.3051 - val_accuracy: 0.9041
Epoch 96/100
32/32 [==============================] - 0s 2ms/step - loss: 0.2920 - accuracy: 0.8816 -
val_loss: 0.3043 - val_accuracy: 0.8995
Epoch 97/100
32/32 [==============================] - 0s 2ms/step - loss: 0.2911 - accuracy: 0.8855 -
val_loss: 0.3044 - val_accuracy: 0.9041
Epoch 98/100
32/32 [==============================] - 0s 2ms/step - loss: 0.2901 - accuracy: 0.8865 -
val_loss: 0.3030 - val_accuracy: 0.8995
Epoch 99/100
32/32 [==============================] - 0s 2ms/step - loss: 0.2896 - accuracy: 0.8806 -
val_loss: 0.3025 - val_accuracy: 0.8995
Epoch 100/100
32/32 [==============================] - 0s 2ms/step - loss: 0.2884 - accuracy: 0.8816 -
val_loss: 0.3017 - val_accuracy: 0.8995
111

model.evaluate(X_test, Y_test)[1]

notebook

7/7 [==============================] - 0s 2ms/step - loss: 0.3281 - accuracy: 0.8584


0.8584474921226501

overfitting
overfitting
matplotlib

import matplotlib.pyplot as plt


'val_loss' 'loss'

plt.plot(hist.history['loss'])
plt.plot(hist.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()

Model loss 'val_loss' 'loss'


x y

jupyter notebook
112

plt.plot(hist.history['accuracy'])
plt.plot(hist.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='lower right')
plt.show()

overfitting
113

model_2

model_2 = Sequential([
Dense(1000, activation='relu', input_shape=(10,)),
Dense(1000, activation='relu'),
Dense(1000, activation='relu'),
Dense(1000, activation='relu'),
Dense(1, activation='sigmoid'),
])

model_2.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

hist_2 = model_2.fit(X_train, Y_train,


batch_size=32, epochs=100,
validation_data=(X_val, Y_val))

Adam Adam

hist_2
hist hist_2
plt.plot(hist_2.history['loss'])
plt.plot(hist_2.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()
114

overfitting

plt.plot(hist_2.history['accuracy'])
plt.plot(hist_2.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='lower right')
plt.show()

'val_accuracy' 'accuracy'

overfitting
dropout overfitting

from keras.layers import Dropout

model_3 = Sequential([
Dense(1000, activation='relu', input_shape=(10,)),
Dropout(0.5),
Dense(1000, activation='relu'),
Dropout(0.5),
Dense(1000, activation='relu'),
Dropout(0.5),
Dense(1000, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid'),
])

Dropout
115

Dropout(0.5),
0.5

model_3.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

hist_3 = model_3.fit(X_train, Y_train,


batch_size=32, epochs=100,
validation_data=(X_val, Y_val))

plt.plot(hist_3.history['loss'])
plt.plot(hist_3.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()

plt.plot(hist_3.history['accuracy'])
plt.plot(hist_3.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='lower right')
plt.show()
116

Perceptron

Perceptron
117

10
11
12
13
optimizer 14

Iris

from sklearn.datasets import load_iris scaler = StandardScaler()


iris = load_iris() X_scaled = scaler.fit_transform(X)
X = iris['data']
y = iris['target']
119

Convolutional neural
CNN network

keras

CNN

(CNN)

.
120

CNN

CNN

Features Extraction
pooling convolution

CNN local receptive fields 1


121

Local parameters sharing 2


connectivity

pooling sub-sampling 3
CNN

overfitting
122
123

CNN
feature detectors kernel

D
D D

S (Stride)
124

(Padding)

P=1 S=3 K=3

Dilation

atrous convolutions
125

Sparse interactions

(Equivariance)
126

CNN

CNN

keras
Keras
from keras.layers import Conv2D

Conv2D (filters, kernel_size, strides, padding, activation='relu',


input_shape)

filters
kernel_size
strides
same valid padding
relu activation

input_shape

input_shape

(Pooling)

average pooling maximum pooling


127

keras

scrape
128

(Dataset)
: CIFAR-10

Keras
jupyter notebook

from keras.datasets import cifar10


(x_train, y_train), (x_test, y_test) = cifar10.load_data()
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 2s 0us/step
170508288/170498071 [==============================] - 2s 0us/step

(x_train, y_train
(x_test, y_test

print('x_train shape:', x_train.shape)


x_train shape: (50000, 32, 32, 3)

: x_train

print('y_train shape:', y_train.shape)


y_train shape: (50000, 1)

print(x_train[0])
[[[ 59 62 63]
[ 43 46 45]
[ 50 48 43]
...
129

[158 132 108]


[152 125 102]
[148 124 103]]
[[ 16 20 20]
[ 0 0 0]
[ 18 8 0]
...
[123 88 55]
[119 83 50]
[122 87 57]]
[[ 25 24 21]
[ 16 7 0]
[ 49 27 8]
...
[118 84 50]
[120 84 50]
[109 73 42]]
...
[[208 170 96]
[201 153 34]
[198 161 26]
...
[216 184 140]
[151 118 84]
[123 92 72]]]

matplotlib x_train[0]

import matplotlib.pyplot as plt


img = plt.imshow(x_train[0])

x_train[0] plt.imshow

print('The label is:', y_train[0])


The label is: [6]
130

img = plt.imshow(x_train[1])

print('The label is:', y_train[1])


The label is: [9]

.
131

Keras

(one-hot)

one-hot
[1000000000]
[0100000000]
[0010000000]
[0001000000]
[0000100000]
[0000010000]
[0000001000]
[0000000100]
[0000000010]
[0000000001]

Keras
from keras.utils import np_utils
y_train_one_hot = keras.utils.np_utils.to_categorical(y_train, 10)
y_test_one_hot = keras.utils.np_utils.to_categorical(y_test, 10)

y_train_one_hot = keras.utils.np_utils.to_categorical(y_train, 10)


y_train_one_hot y_train
one_hot
132

print('The one hot label is:', y_train_one_hot[1])


The one hot label is: [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]

x y

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train = x_train / 255
x_test = x_test / 255

float32

x_train[0]
array([[[0.23137255, 0.24313726, 0.24705882],
[0.16862746, 0.18039216, 0.1764706 ],
[0.19607843, 0.1882353 , 0.16862746],
...,
[0.61960787, 0.5176471 , 0.42352942],
[0.59607846, 0.49019608, 0.4 ],
[0.5803922 , 0.4862745 , 0.40392157]],

[[0.0627451 , 0.07843138, 0.07843138],


[0. , 0. , 0. ],
[0.07058824, 0.03137255, 0. ],
...,
[0.48235294, 0.34509805, 0.21568628],
[0.46666667, 0.3254902 , 0.19607843],
[0.47843137, 0.34117648, 0.22352941]],

[[0.09803922, 0.09411765, 0.08235294],


[0.0627451 , 0.02745098, 0. ],
[0.19215687, 0.10588235, 0.03137255],
...,
[0.4627451 , 0.32941177, 0.19607843],
[0.47058824, 0.32941177, 0.19607843],
[0.42745098, 0.28627452, 0.16470589]],

...,

[[0.8156863 , 0.6666667 , 0.3764706 ],


[0.7882353 , 0.6 , 0.13333334],
[0.7764706 , 0.6313726 , 0.10196079],
...,
[0.627451 , 0.52156866, 0.27450982],
[0.21960784, 0.12156863, 0.02745098],
[0.20784314, 0.13333334, 0.07843138]],

[[0.7058824 , 0.54509807, 0.3764706 ],


133

[0.6784314 , 0.48235294, 0.16470589],


[0.7294118 , 0.5647059 , 0.11764706],
...,
[0.72156864, 0.5803922 , 0.36862746],
[0.38039216, 0.24313726, 0.13333334],
[0.3254902 , 0.20784314, 0.13333334]],

[[0.69411767, 0.5647059 , 0.45490196],


[0.65882355, 0.5058824 , 0.36862746],
[0.7019608 , 0.5568628 , 0.34117648],
...,
[0.84705883, 0.72156864, 0.54901963],
[0.5921569 , 0.4627451 , 0.32941177],
[0.48235294, 0.36078432, 0.28235295]]], dtype=float32)

CNN

Conv Layer (32 Filter size 3 3)


Conv Layer (32 Filter size 3 3)
Max Pool Layer (Filter size 2 2)
Dropout Layer (Prob of dropout 0.25)
Conv Layer (64 Filter size 3 3)
Conv Layer (64 Filter size 3 3)
Max Pool Layer (Filter size 2 2)
Dropout Layer (Prob of dropout 0.25)
FC Layer (512 neurons)
Dropout Layer (Prob of dropout 0.5)
FC Layer, Softmax (10 neurons)
134

softmax

Keras

from keras.models import Sequential


from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D

model = Sequential()

CNN relu

model.add(Conv2D(32, (3, 3), activation='relu', padding='same',


input_shape=(32,32,3)))

model.add()
'same' 'relu
stride=1

model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))

model.add(MaxPooling2D(pool_size=(2, 2)))

overfitting
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))


model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
135

model.add(Dropout(0.25))

Flatten
model.add(Flatten())

relu
model.add(Dense(512, activation='relu'))

model.add(Dropout(0.5))

softmax
model.add(Dense(10, activation='softmax'))

model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 32, 32, 32) 896

conv2d_1 (Conv2D) (None, 32, 32, 32) 9248

max_pooling2d (MaxPooling2D) (None, 16, 16, 32) 0

max_pooling2d_1 (MaxPooling2D) (None, 8, 8, 32) 0

dropout (Dropout) (None, 8, 8, 32) 0

flatten (Flatten) (None, 2048) 0

dense (Dense) (None, 512) 1049088

dropout_1 (Dropout) (None, 512) 0

dense_1 (Dense) (None, 10) 5130

=================================================================
Total params: 1,064,362
Trainable params: 1,064,362
Non-trainable params: 0
_________________________________________________________________

model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
136

'categorical_crossentropy'

. adam

hist = model.fit(x_train, y_train_one_hot,


batch_size=32, epochs=20,
validation_split=0.2)

validation_data validation_split = 0.2

Epoch 1/20
1250/1250 [==============================] - 27s 14ms/step - loss: 1.4824 - accuracy: 0.4623 - val_loss: 1.1698 -
val_accuracy: 0.5885
Epoch 2/20
1250/1250 [==============================] - 17s 14ms/step - loss: 1.1323 - accuracy: 0.6000 - val_loss: 1.0689 -
val_accuracy: 0.6276
Epoch 3/20
1250/1250 [==============================] - 17s 13ms/step - loss: 0.9810 - accuracy: 0.6534 - val_loss: 0.9494 -
val_accuracy: 0.6725
Epoch 4/20
1250/1250 [==============================] - 17s 14ms/step - loss: 0.8859 - accuracy: 0.6900 - val_loss: 0.9151 -
val_accuracy: 0.6828
Epoch 5/20
1250/1250 [==============================] - 18s 15ms/step - loss: 0.8015 - accuracy: 0.7171 - val_loss: 0.9117 -
val_accuracy: 0.6802
Epoch 6/20
1250/1250 [==============================] - 18s 14ms/step - loss: 0.7203 - accuracy: 0.7469 - val_loss: 0.8959 -
val_accuracy: 0.6874
Epoch 7/20
1250/1250 [==============================] - 17s 14ms/step - loss: 0.6486 - accuracy: 0.7709 - val_loss: 0.8811 -
val_accuracy: 0.7066
Epoch 8/20
1250/1250 [==============================] - 17s 14ms/step - loss: 0.5960 - accuracy: 0.7902 - val_loss: 0.9053 -
val_accuracy: 0.7055
Epoch 9/20
1250/1250 [==============================] - 17s 13ms/step - loss: 0.5312 - accuracy: 0.8129 - val_loss: 0.9100 -
val_accuracy: 0.6982
Epoch 10/20
1250/1250 [==============================] - 18s 14ms/step - loss: 0.4887 - accuracy: 0.8262 - val_loss: 0.9183 -
val_accuracy: 0.7077
Epoch 11/20
1250/1250 [==============================] - 18s 14ms/step - loss: 0.4483 - accuracy: 0.8415 - val_loss: 0.9454 -
val_accuracy: 0.7083
Epoch 12/20
1250/1250 [==============================] - 17s 13ms/step - loss: 0.4195 - accuracy: 0.8514 - val_loss: 1.0072 -
val_accuracy: 0.7062
Epoch 13/20
1250/1250 [==============================] - 18s 14ms/step - loss: 0.3889 - accuracy: 0.8625 - val_loss: 0.9655 -
val_accuracy: 0.7089
Epoch 14/20
1250/1250 [==============================] - 18s 14ms/step - loss: 0.3591 - accuracy: 0.8764 - val_loss: 1.0041 -
val_accuracy: 0.7042
Epoch 15/20
1250/1250 [==============================] - 17s 13ms/step - loss: 0.3372 - accuracy: 0.8814 - val_loss: 1.0220 -
val_accuracy: 0.7133
Epoch 16/20
1250/1250 [==============================] - 18s 14ms/step - loss: 0.3242 - accuracy: 0.8868 - val_loss: 1.0927 -
val_accuracy: 0.7041
Epoch 17/20
1250/1250 [==============================] - 23s 18ms/step - loss: 0.3053 - accuracy: 0.8925 - val_loss: 1.0579 -
val_accuracy: 0.7046
Epoch 18/20
1250/1250 [==============================] - 18s 14ms/step - loss: 0.2917 - accuracy: 0.8990 - val_loss: 1.0833 -
val_accuracy: 0.7061
Epoch 19/20
1250/1250 [==============================] - 17s 14ms/step - loss: 0.2825 - accuracy: 0.9014 - val_loss: 1.0957 -
val_accuracy: 0.7165
Epoch 20/20
137

1250/1250 [==============================] - 17s 14ms/step - loss: 0.2700 - accuracy: 0.9064 - val_loss: 1.0855 -
val_accuracy: 0.7131

plt.plot(hist.history['loss'])
plt.plot(hist.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()

plt.plot(hist.history['loss'])
plt.plot(hist.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper right')
plt.show()
138

overfitting

val

model.evaluate(x_test, y_test_one_hot)[1]
313/313 [==============================] - 2s 5ms/step - loss: 1.1410 - accuracy: 0.6924
0.6923999786376953

HDF5
h5
model.save('my_cifar10_model.h5')

from keras.models import load_model


model = load_model('my_cifar10_model.h5')

CNN
Keras Sequential

notebook

JPEG cat.jpg
my_image = plt.imread("cat.jpg")
139

scikit-image
from skimage.transform import resize
my_image_resized = resize(my_image, (32,32,3))

img = plt.imshow(my_image_resized)

model.predict

import numpy as np
probabilities = model.predict(np.array( [my_image_resized,] ))

probabilities
array([[2.6720341e-02, 3.1344647e-05, 1.5095539e-01, 3.8518414e-01,
3.3354717e-03, 3.2324010e-01, 5.1648129e-02, 5.7933435e-02,
9.2716294e-04, 2.4454062e-05]], dtype=float32)
140

number_to_class = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog',


'horse', 'ship', 'truck']
index = np.argsort(probabilities[0,:])
print("Most likely class:", number_to_class[index[9]], "-- Probability:",
probabilities[0,index[9]])
print("Second most likely class:", number_to_class[index[8]], "-- Probability:",
probabilities[0,index[8]])
print("Third most likely class:", number_to_class[index[7]], "-- Probability:",
probabilities[0,index[7]])
print("Fourth most likely class:", number_to_class[index[6]], "-- Probability:",
probabilities[0,index[6]])
print("Fifth most likely class:", number_to_class[index[5]], "-- Probability:",
probabilities[0,index[5]])
Most likely class: cat -- Probability: 0.31140402
Second most likely class: horse -- Probability: 0.296455
Third most likely class: dog -- Probability: 0.1401798
Fourth most likely class: truck -- Probability: 0.12088975
Fifth most likely class: frog -- Probability: 0.078746535

CNN

1
2
3
4
5
LSTM
142

RNN RNN Recurrent Neural Networks

sequences
RNN
1
Turing-Complete RNNs

Google DeepMind

1
Alex Graves and Greg Wayne and Ivo Danihelka (2014). "Neural Turing Machines".
143

MLP
144

RNNs

RNN

RNN

RNNs

RNN

U
W
V

V W U
145

RNN
RNNs
RNNs

RNN

RNN
RNNs

RNN

RNN

RNN

(Sentiment analysis)
146

RNN

RNN
language models RNNs
natural language processing
RNN
RNN

www.gutenberg.org

warpeace_input.txt
https://cs.stanford.edu/people/karpathy/char-rnn/warpeace_input.txt

https://www.gutenberg.org/ebooks/2600

training_file = 'warpeace_input.txt'
raw_text = open(training_file, 'r').read()
raw_text = raw_text.lower()
raw_text[:100]
'ufeff"well, prince, so genoa and lucca are now just family estates
of thenbuonapartes. but i warn you, i'

n_chars = len(raw_text)
147

print('Total characters: {}'.format(n_chars))


Total characters: 3196213

chars = sorted(list(set(raw_text)))
n_vocab = len(chars)
print('Total vocabulary (unique characters): {}'.format(n_vocab))
print(chars)
Total vocabulary (unique characters): 57
['\n', ' ', '!', '"', "'", '(', ')', '*', ',', '-', '.', '/', '0',
'1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '=', '?',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
'à', 'ä', 'é', 'ê', '\ufeff']

RNN

learning
earni learn
148

deep learning architectures

deep_ eep_l
learn earni
ing_a ng_ar
rchit chite
ectur cture

one-hot

index_to_char = dict((i, c) for i, c in enumerate(chars))


char_to_index = dict((c, i) for i, c in enumerate(chars))
print(char_to_index)

import numpy as np
seq_length = 100
n_seq = int(n_chars / seq_length)

n_seq

X = np.zeros((n_seq, seq_length, n_vocab))


Y = np.zeros((n_seq, seq_length, n_vocab))

RNN Keras
n_seq
for i in range(n_seq):
x_sequence = raw_text[i * seq_length : (i + 1) * seq_length]
x_sequence_ohe = np.zeros((seq_length, n_vocab))
for j in range(seq_length):
char = x_sequence[j]
index = char_to_index[char]
x_sequence_ohe[j][index] = 1.
X[i] = x_sequence_ohe
149

y_sequence = raw_text[i * seq_length + 1 : (i + 1) * seq_length + 1]


y_sequence_ohe = np.zeros((seq_length, n_vocab))
for j in range(seq_length):
char = y_sequence[j]
index = char_to_index[char]
y_sequence_ohe[j][index] = 1.
Y[i] = y_sequence_ohe

X.shape
(31962, 100, 57)
Y.shape
(31962, 100, 57)

RNN

from keras.models import Sequential


from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import SimpleRNN
from keras.layers.wrappers import TimeDistributed
from keras import optimizers
from tensorflow import keras

batch_size = 100
n_layer = 2
hidden_units = 800
n_epoch = 300
dropout = 0.3

model = Sequential()
model.add(SimpleRNN(hidden_units, activation='relu', input_shape=(None,
n_vocab), return_sequences=True))
model.add(Dropout(dropout))
for i in range(n_layer - 1):
model.add(SimpleRNN(hidden_units, activation='relu',
return_sequences=True))
model.add(Dropout(dropout))
model.add(TimeDistributed(Dense(n_vocab)))
model.add(Activation('softmax'))

return_sequences=True
150

:TimeDistributed

TimeDistributed

RMSprop
optimizer = keras.optimizers.RMSprop(learning_rate=0.001, rho=0.9,
epsilon=1e-08, decay=0.0)

categorical_crossentropy

model.compile(loss= "categorical_crossentropy", optimizer=optimizer)

model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_4 (SimpleRNN) (None, None, 800) 686400
dropout_4 (Dropout) (None, None, 800) 0
simple_rnn_5 (SimpleRNN) (None, None, 800) 1280800
dropout_5 (Dropout) (None, None, 800) 0
time_distributed_2 (TimeDis (None, None, 57) 45657
tributed)
activation_2 (Activation) (None, None, 57) 0
=================================================================
Total params: 2,012,857
Trainable params: 2,012,857
Non-trainable params: 0

callback
: callback

checkpoint

.
.

from keras.callbacks import Callback, ModelCheckpoint, EarlyStopping

filepath="weights/weights_layer_%d_hidden_%d_epoch_{epoch:03d}_loss_{loss
:.4f}.hdf5"
151

checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1,


save_best_only=True, mode='min')

early_stop = EarlyStopping(monitor='loss', min_delta=0, patience=50,


verbose=1, mode='min')

callback
RNN
def generate_text(model, gen_length, n_vocab, index_to_char):
"""
Generating text using the RNN model
@param model: current RNN model
@param gen_length: number of characters we want to generate
@param n_vocab: number of unique characters
@param index_to_char: index to character mapping
@return:
"""
# Start with a randomly picked character
index = np.random.randint(n_vocab)
y_char = [index_to_char[index]]
X = np.zeros((1, gen_length, n_vocab))
for i in range(gen_length):
X[0, i, index] = 1.
indices = np.argmax(model.predict(X[:, max(0, i - 99):i + 1, :])[0], 1)
index = indices[-1]
y_char.append(index_to_char[index])
return ('').join(y_char)

gen_length-1
N callback
class ResultChecker(Callback):
def __init__(self, model, N, gen_length):
self.model = model
self.N = N
self.gen_length = gen_length

def on_epoch_end(self, epoch, logs={}):


if epoch % self.N == 0:
result = generate_text(self.model, self.gen_length, n_vocab,
index_to_char)
print('\nMy War and Peace:\n' + result)

model.fit(X, Y, batch_size=batch_size, verbose=1, epochs=n_epoch,


callbacks=[ResultChecker(model, 10, 200), checkpoint, early_stop])
152

Epoch 1:
Epoch 1/300
8000/31962 [======>.......................] - ETA: 51s - loss: 2.8891
31962/31962 [==============================] - 67s 2ms/step - loss: 2.1955
My War and Peace:
5 the count of the stord and the stord and the stord and the stord and the
stord and the stord and the stord and the stord and the stord and the stord
and the and the stord and the stord and the stord
Epoch 00001: loss improved from inf to 2.19552, saving model to
weights/weights_epoch_001_loss_2.19552.hdf5

Epoch 11:
Epoch 11/300
100/31962 [..............................] - ETA: 1:26 - loss: 1.2321
31962/31962 [==============================] - 66s 2ms/step - loss: 1.2493
My War and Peace:
?" said the countess was a strange the same time the countess was already
been and said that he was so strange to the countess was already been and
the same time the countess was already been and said
Epoch 00011: loss improved from 1.26144 to 1.24933, saving model to
weights/weights_epoch_011_loss_1.2493.hdf5

Epoch 51:
Epoch 51/300
31962/31962 [==============================] - 66s 2ms/step - loss: 1.1562
My War and Peace:
!!CDP!E.agrave!! to see him and the same thing is the same thing to him and
the same thing the same thing is the same thing to him and the sam thing
the same thing is the same thing to him and the same thing the sam
Epoch 00051: loss did not improve from 1.14279

Epoch 101:
Epoch 101/300
31962/31962 [==============================] - 67s 2ms/step - loss: 1.1736
My War and Peace:
= the same thing is to be a soldier in the same way to the soldiers and the
same thing is the same to me to see him and the same thing is the same to
me to see him and the same thing is the same to me
Epoch 00101: loss did not improve from 1.11891

Epoch 00203: loss did not improve from 1.10864


Epoch 00203: early stopping

which was a strange and serious expression of his face and shouting and said
that the countess was standing beside him. "what a battle is a strange and
serious and strange and so that the countess was
153

RNN
RNN
GRU LSTM

LSTM
RNNs

Long short term memory


LSTM LSTM

cell state LSTM


.

LSTM
154

LSTM
LSTM

LSTM

sigmoid

LSTM

tanh

LSTM
sigmoid

sigmoid tanh
155

LSTM

LSTM

LSTM
LSTM
SEQ_LENGTH Y X
= 160
seq_length = 160
n_seq = int(n_chars / seq_length)

X = np.zeros((n_seq, seq_length, n_vocab))


Y = np.zeros((n_seq, seq_length, n_vocab))

for i in range(n_seq):
x_sequence = raw_text[i * seq_length : (i + 1) * seq_length]
x_sequence_ohe = np.zeros((seq_length, n_vocab))
for j in range(seq_length):
char = x_sequence[j]
index = char_to_index[char]
x_sequence_ohe[j][index] = 1.
X[i] = x_sequence_ohe
y_sequence = raw_text[i * seq_length + 1 : (i + 1) * seq_length + 1]
y_sequence_ohe = np.zeros((seq_length, n_vocab))
for j in range(seq_length):
char = y_sequence[j]
index = char_to_index[char]
y_sequence_ohe[j][index] = 1.
156

Y[i] = y_sequence_ohe

RNN

from keras.layers.recurrent import LSTM


batch_size = 100
n_layer = 2
hidden_units = 800
n_epoch= 300
dropout = 0.4

model = Sequential()
model.add(LSTM(hidden_units, input_shape=(None, n_vocab),
return_sequences=True))
model.add(Dropout(dropout))
for i in range(n_layer - 1):
model.add(LSTM(hidden_units, return_sequences=True))
model.add(Dropout(dropout))
model.add(TimeDistributed(Dense(n_vocab)))
model.add(Activation('softmax'))

RMSprop

optimizer = keras.optimizers.RMSprop(learning_rate=0.001, rho=0.9,


epsilon=1e-08, decay=0.0)

model.compile(loss= "categorical_crossentropy", optimizer=optimizer)

LSTM
model.summary()
________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, None, 800) 2745600
_________________________________________________________________
dropout_1 (Dropout) (None, None, 800) 0
_________________________________________________________________
lstm_2 (LSTM) (None, None, 800) 5123200
_________________________________________________________________
dropout_2 (Dropout) (None, None, 800) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 57) 45657
_________________________________________________________________
activation_1 (Activation) (None, None, 57) 0
=================================================================
Total params: 7,914,457
Trainable params: 7,914,457
Non-trainable params: 0

RNN

model.fit(X, Y, batch_size=batch_size, verbose=1, epochs=n_epoch,


callbacks=[ResultChecker(model, 10, 200), checkpoint, early_stop])
157

Epoch 151:
Epoch 151/300
19976/19976 [==============================] - 250s 12ms/step - loss:
0.7300
My War and Peace:
ing to the countess. "i have nothing to do with him and i have nothing to
do with the general," said prince andrew.
"i am so sorry for the princess, i am so since he will not be able to say
anything. i saw him long ago. i am so sincerely that i am not to blame for
it. i am sure that something is so much talk about the emperor alexander's
personal attention."
"why do you say that?" and she recognized in his son's presence.
"well, and how is she?" asked pierre.
"the prince is very good to make

Epoch 00151: loss improved from 0.73175 to 0.73003, saving model to


weights/weights_epoch_151_loss_0.7300.hdf5

Epoch 201:
Epoch 201/300
19976/19976 [==============================] - 248s 12ms/step - loss:
0.6794
My War and Peace:
was all the same to him. he received a story proved that the count had not
yet seen the countess and the other and asked to be able to start a tender
man than the world. she was not a family affair and was at the same time as
in the same way. a few minutes later the count had been at home with his
smile and said:
"i am so glad! well, what does that mean? you will see that you are always
the same."
"you know i have not come to the conclusion that i should like to
send my private result. the prin

Epoch 00201: loss improved from 0.68000 to 0.67937, saving model to


weights/weights_epoch_151_loss_0.6793.hdf5

Epoch 251:
Epoch 251/300
19976/19976 [==============================] - 249s 12ms/step - loss:
0.6369
My War and Peace:
nd the countess was sitting in a single look on
her face.
"why should you be ashamed?"
"why do you say that?" said princess mary. "why didn't you say a word of
this?" said prince andrew with a smile.
"you would not like that for my sake, prince vasili's son, have you seen
the rest of the two?"
"well, i am suffering," replied the princess with a sigh. "well, what a
delightful norse?" he shouted.
the convoy and driving away the flames of the battalions of the first
day of the orthodox russian
158

Epoch 00251: loss improved from 0.63715 to 0.63689, saving model to


weights/weights_epoch_251_loss_0.6368.hdf5

LSTM
RNN LSTM
LaTex HTML

LSTM

toxic
identity_hate insult threat obscene severe_toxic

Kaggle1
train.csv

from numpy import array


from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers.core import Activation, Dropout, Dense
from keras.layers import Flatten, LSTM
from keras.layers import GlobalMaxPooling1D
from keras.models import Model
from keras.layers.embeddings import Embedding
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.layers import Input
from keras.layers.merge import Concatenate
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt

toxic_comments = pd.read_csv("train.csv")

print(toxic_comments.shape)

(159571, 8)

1
https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/overview
159

toxic_comments.head()
id comment_text toxic severe_toxic obscene threat insult identity_hate
Explanation\nWhy the
0 0000997932d777bf edits made under my 0 0 0 0 0 0
usern...
D'aww! He matches this
1 000103f0d9cfb60f background colour I'm 0 0 0 0 0 0
s...
Hey man, I'm really not
2 000113f07ec002fd trying to edit war. 0 0 0 0 0 0
It...
"\nMore\nI can't make
3 0001b41b1c6bb37e any real suggestions on 0 0 0 0 0 0
...
You, sir, are my hero.
4 0001d958c54c6e35 Any chance you 0 0 0 0 0 0
remember...

filter = toxic_comments["comment_text"] != ""


toxic_comments = toxic_comments[filter]
toxic_comments = toxic_comments.dropna()

comment_text
print(toxic_comments["comment_text"][168])

You should be fired, you're a moronic wimp who is too lazy to do research.
It makes me sick that people like you exist in this world.

print("Toxic:" + str(toxic_comments["toxic"][168]))
print("Severe_toxic:" + str(toxic_comments["severe_toxic"][168]))
print("Obscene:" + str(toxic_comments["obscene"][168]))
print("Threat:" + str(toxic_comments["threat"][168]))
print("Insult:" + str(toxic_comments["insult"][168]))
print("Identity_hate:" + str(toxic_comments["identity_hate"][168]))
Toxic:1
Severe_toxic:0
Obscene:0
Threat:0
Insult:1
Identity_hate:0

toxic_comments_labels = toxic_comments[["toxic", "severe_toxic", "obscene",


"threat", "insult", "identity_hate"]]
fig_size = plt.rcParams["figure.figsize"]
fig_size[0] = 10
fig_size[1] = 8
plt.rcParams["figure.figsize"] = fig_size
toxic_comments_labels.sum(axis=0).plot.bar()
160

toxic

sigmoid

sigmoid

sigmoid

def preprocess_text(sen):
# Remove punctuations and numbers
sentence = re.sub('[^a-zA-Z]', ' ', sen)
# Single character removal
sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)
# Removing multiple spaces
sentence = re.sub(r'\s+', ' ', sentence)
return sentence

X comment_text
toxic_comments_labels
y
X = []
161

sentences = list(toxic_comments["comment_text"])
for sen in sentences:
X.append(preprocess_text(sen))
y = toxic_comments_labels.values

one-hot
one-hot

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,


random_state=42)

word embedding

fasttext
GloVe

tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)
X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
vocab_size = len(tokenizer.word_index) + 1
maxlen = 200
X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)

GloVe

!wget http://nlp.stanford.edu/data/glove.6B.zip
!unzip glove*.zip

GloVe
162

from numpy import array


from numpy import asarray
from numpy import zeros

embeddings_dictionary = dict()

glove_file = open('glove.6B.100d.txt', encoding="utf8")

for line in glove_file:


records = line.split()
word = records[0]
vector_dimensions = asarray(records[1:], dtype='float32')
embeddings_dictionary[word] = vector_dimensions
glove_file.close()

embedding_matrix = zeros((vocab_size, 100))


for word, index in tokenizer.word_index.items():
embedding_vector = embeddings_dictionary.get(word)
if embedding_vector is not None:
embedding_matrix[index] = embedding_vector

LSTM

deep_inputs = Input(shape=(maxlen,))
embedding_layer = Embedding(vocab_size, 100, weights=[embedding_matrix],
trainable=False)(deep_inputs)
LSTM_Layer_1 = LSTM(128)(embedding_layer)
dense_layer_1 = Dense(6, activation='sigmoid')(LSTM_Layer_1)
model = Model(inputs=deep_inputs, outputs=dense_layer_1)

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])

print(model.summary())
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 200)] 0

embedding (Embedding) (None, 200, 100) 14824300

lstm (LSTM) (None, 128) 117248

dense (Dense) (None, 6) 774

=================================================================
Total params: 14,942,322
Trainable params: 118,022
Non-trainable params: 14,824,300

from keras.utils.vis_utils import plot_model


plot_model(model, to_file='model_plot4a.png', show_shapes=True,
show_layer_names=True)
163

history = model.fit(X_train, y_train, batch_size=128, epochs=5, verbose=1,


validation_split=0.2)
Epoch 1/5
798/798 [==============================] - 20s 17ms/step - loss: 0.1193 - acc: 0.9684 - val_loss:
0.0739 - val_acc: 0.9941
Epoch 2/5
798/798 [==============================] - 13s 17ms/step - loss: 0.0643 - acc: 0.9927 - val_loss:
0.0599 - val_acc: 0.9943
Epoch 3/5
798/798 [==============================] - 13s 17ms/step - loss: 0.0572 - acc: 0.9938 - val_loss:
0.0573 - val_acc: 0.9935
Epoch 4/5
798/798 [==============================] - 13s 17ms/step - loss: 0.0548 - acc: 0.9939 - val_loss:
0.0566 - val_acc: 0.9943
Epoch 5/5
798/798 [==============================] - 14s 17ms/step - loss: 0.0523 - acc: 0.9940 - val_loss:
0.0542 - val_acc: 0.9942

score = model.evaluate(X_test, y_test, verbose=1)


print("Test Score:", score[0])
print("Test Accuracy:", score[1])
998/998 [==============================] - 6s 6ms/step - loss: 0.0529 - acc: 0.9938
Test Score: 0.05285229906439781
Test Accuracy: 0.9937959909439087

overfitting
import matplotlib.pyplot as plt

plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])

plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train','test'], loc='upper left')
plt.show()
164

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train','test'], loc='upper left')
plt.show()

overfitting

LSTM
165

Sentiment analysis

IMDB LSTM
Keras

from keras.datasets import imdb


top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)

X_train
166

array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5,
25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385,
39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920,
4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626,
18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12,
16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407,
16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26,
400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194,
7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104,
4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),
list([1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118,
...,
list([1, 1446, 7079, 69, 72, 3305, 13, 610, 930, 8, 12, 582, 23, 5, 16, 484, 685, 54, 349,
11, 4120, 2959, 45, 58, 1466, 13, 197, 12, 16, 43, 23, 21469, 5, 62, 30, 145, 402, 11, 4131, 51,
575, 32, 61, 369, 71, 66, 770, 12, 1054, 75, 100, 2198, 8, 4, 105, 37, 69, 147, 712, 75, 3543, 44,
257, 390, 5, 69, 263, 514, 105, 50, 286, 1814, 23, 4, 123, 13, 161, 40, 5, 421, 4, 116, 16, 897,
13, 40691, 40, 319, 5872, 112, 6700, 11, 4803, 121, 25, 70, 3468, 4, 719, 3798, 13, 18, 31, 62,
40, 8, 7200, 4, 29455, 7, 14, 123, 5, 942, 25, 8, 721, 12, 145, 5, 202, 12, 160, 580, 202, 12, 6,
52, 58, 11418, 92, 401, 728, 12, 39, 14, 251, 8, 15, 251, 5, 21213, 12, 38, 84, 80, 124, 12, 9,
23]),
list([1, 17, 6, 194, 337, 7, 4, 204, 22, 45, 254, 8, 106, 14, 123, 4, 12815, 270, 14437, 5,
16923, 12255, 732, 2098, 101, 405, 39, 14, 1034, 4, 1310, 9, 115, 50, 305, 12, 47, 4, 168, 5, 235,
7, 38, 111, 699, 102, 7, 4, 4039, 9245, 9, 24, 6, 78, 1099, 17, 2345, 16553, 21, 27, 9685, 6139,
5, 29043, 1603, 92, 1183, 4, 1310, 7, 4, 204, 42, 97, 90, 35, 221, 109, 29, 127, 27, 118, 8, 97,
12, 157, 21, 6789, 85010, 9, 6, 66, 78, 1099, 4, 631, 1191, 5, 2642, 272, 191, 1070, 6, 7585, 8,
2197, 70907, 10755, 544, 5, 383, 1271, 848, 1468, 12183, 497, 16876, 8, 1597, 8778, 19280, 21, 60,
27, 239, 9, 43, 8368, 209, 405, 10, 10, 12, 764, 40, 4, 248, 20, 12, 16, 5, 174, 1791, 72, 7, 51,
6, 1739, 22, 4, 204, 131, 9])],
dtype=object)

word_index = imdb.get_word_index() # get {word : index}


index_word = {v : k for k,v in word_index.items()} # get {index : word}
index = 1
print(" ".join([index_word[idx] for idx in x_train[index]]))
print("positve" if y_train[index]==1 else "negetive")
the thought solid thought senator do making to is spot nomination assumed while he of
jack in where picked as getting on was did hands fact characters to always life
thrillers not as me can't in at are br of sure your way of little it strongly random
to view of love it so principles of guy it used producer of where it of here icon film
of outside to don't all unique some like of direction it if out her imagination below
keep of queen he diverse to makes this stretch stefan of solid it thought begins br
senator machinations budget worthwhile though ok brokedown awaiting for ever better
were lugia diverse for budget look kicked any to of making it out bosworth's follows
for effects show to show cast this family us scenes more it severe making senator to
levant's finds tv tend to of emerged these thing wants but fuher an beckinsale cult as
it is video do you david see scenery it in few those are of ship for with of wild to
one is very work dark they don't do dvd with those them
negetive
167

Keras

word_index = imdb.get_word_index() # get {word : index}


index_word = {v : k for k,v in word_index.items()} # get {index : word}
index = 1
print(" ".join([index_word[idx] for idx in x_train[index]]))
print("positve" if y_train[index]==1 else "negetive")

X_train[125]
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 11,
6, 58, 54, 4, 14537, 5, 6495, 4, 2351,
1630, 71, 13202, 23, 26, 20094, 40865, 34, 35,
9454, 1680, 8, 6681, 692, 39, 94, 205, 6177,
712, 121, 4, 18147, 7037, 406, 2657, 5, 2189,
61778, 26, 23906, 11420, 6, 708, 44, 4, 1110,
656, 4667, 206, 15, 230, 13781, 15, 7, 4,
4847, 36, 26, 54759, 238, 306, 2316, 190, 48,
25, 181, 8, 67, 6, 22, 44, 15, 353,
1659, 84675, 3048, 4, 9818, 305, 88, 11493, 9,
31, 7, 4, 91, 12789, 53410, 3106, 126, 93,
40, 670, 8759, 41931, 6, 6951, 4, 167, 47,
623, 23, 6, 875, 29, 186, 340, 4447, 7,
5, 44141, 27, 5485, 23, 220, 175, 2122, 10,
10, 27, 4847, 26, 6, 5261, 2631, 604, 7,
2118, 23310, 36011, 5350, 17, 48, 29, 71, 12129,
18, 1168, 38886, 33829, 1918, 31235, 3255, 9977, 31537,
9248, 40, 35, 1755, 362, 11, 4, 2370, 2222,
56, 7, 4, 23052, 2489, 39, 609, 82401, 48583,
6, 3440, 655, 707, 4198, 3801, 37, 4486, 33,
175, 2631, 114, 471, 17, 48, 36, 181, 8,
30, 1059, 4, 3408, 5963, 2396, 6, 117, 128,
21, 26, 131, 4218, 11, 20663, 3826, 14524, 10,
10, 12, 9, 614, 8, 97, 6, 1393, 22,
44, 995, 84, 21800, 5801, 21, 14, 9, 6,
4953, 22, 44, 995, 84, 93, 34, 84, 37,
104, 507, 11076, 37, 26, 662, 180, 8, 4,
5075, 11, 882, 71, 31, 8, 39022, 36011, 31537,
5, 48583, 19, 1240, 31800, 1806, 11521, 5, 7863,
28281, 4, 959, 62, 165, 30, 8, 2988, 4,
2772, 1500, 7, 4, 22, 24, 2358, 12, 10,
10, 22993, 238, 43, 28, 188, 245, 19, 27,
105, 5, 687, 48, 29, 562, 98, 615, 21,
27, 8500, 9, 38, 2797, 4, 548, 139, 62,
9343, 6, 14053, 707, 137, 4, 1205, 7, 4,
6556, 9, 53, 2797, 74, 4, 6556, 410, 5,
27, 5150, 8, 79, 27, 177, 8, 3126, 19,
33, 222, 49, 22895, 7, 14090, 406, 5424, 38,
4144, 15, 12, 9, 165, 2268, 8, 106, 318,
760, 215, 30, 93, 133, 7, 31537, 38, 55,
52, 11, 26149, 17310, 1080, 24192, 68007, 29, 9,
6248, 78, 133, 11, 6, 239, 15, 9, 38,
230, 120, 4, 350, 45, 145, 174, 10, 10,
22993, 47, 93, 49, 478, 108, 21, 25, 62,
115, 482, 12, 39, 14, 2342, 947, 6, 6950,
8, 27, 157, 62, 115, 181, 8, 67, 160,
7, 27, 108, 103, 14, 63, 62, 30, 6,
168

87, 902, 2152, 3572, 5, 6, 619, 437, 7,


6, 4616, 221, 819, 31, 323, 46, 7, 747,
5, 198, 112, 55, 3591], dtype=int32)

from keras.models import Sequential


from keras.layers import Embedding
from keras.layers import LSTM, Dense
embedding_vector_length = 32
model = Sequential()
model.add(Embedding(top_words, embedding_vector_length,
input_length=max_review_length))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',
metrics=['accuracy'])
print(model.summary())
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 500, 32) 160000

lstm (LSTM) (None, 100) 53200

dense (Dense) (None, 1) 101

=================================================================
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
None

Embedding embedding

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5,


batch_size=32)
Epoch 1/5
782/782 [==============================] - 82s 103ms/step - loss: 0.4681 - accuracy:
0.7815 - val_loss: 0.3702 - val_accuracy: 0.8430
Epoch 2/5
782/782 [==============================] - 79s 101ms/step - loss: 0.3340 - accuracy:
0.8618 - val_loss: 0.3984 - val_accuracy: 0.8299
Epoch 3/5
782/782 [==============================] - 80s 102ms/step - loss: 0.2669 - accuracy:
0.8954 - val_loss: 0.3272 - val_accuracy: 0.8657
Epoch 4/5
782/782 [==============================] - 80s 102ms/step - loss: 0.2259 - accuracy:
0.9117 - val_loss: 0.3122 - val_accuracy: 0.8713
Epoch 5/5
782/782 [==============================] - 79s 101ms/step - loss: 0.1912 - accuracy:
0.9270 - val_loss: 0.3399 - val_accuracy: 0.8604
169

scores = model.evaluate(X_test, y_test, verbose=0)


print("Accuracy: %.2f%%" % (scores[1]*100))
Accuracy: 86.04%

overfitting
overfitting

RNN
LSTM

LSTM
171

generative
discriminative model model

(Density estimation)
172

(Reinforcement learning)
173

GAN

(Generative Adversarial Networks)


GAN

GAN GAN
Da Vinci Art
adversarial
GAN
174

GAN

GAN
GAN
175

GAN
zero-sum
GAN

GAN

1
2
3

4
5

7
8
176

GAN

GAN
GAN

random latent vector


seed
random uniform random normal
177

GAN
(Zero-sum game)

Sequential 1

Zero-sum 2

minimax 3

minimax

Nash equilibrium

GAN

1
178

2
3

GAN

1
179

diminished gradient

GAN

GAN MNIST
Tensorflow Keras

from tensorflow.keras.datasets import mnist


from tensorflow.keras.layers import Input, Dense, Reshape, Flatten, Dropout
from tensorflow.keras.layers import BatchNormalization, Activation, ZeroPadding2D
from tensorflow.keras.layers import LeakyReLU
180

from tensorflow.keras.layers import UpSampling2D, Conv2D


from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import initializers
import matplotlib.pyplot as plt
import sys
import numpy as np
import tqdm

MNIST

(X_train, _), (_, _) = mnist.load_data()


X_train = (X_train.astype(np.float32) - 127.5)/127.5

X_train = X_train.reshape(60000, 784)

noisy data

randomDim
LeakyReLU Dense

randomDim = 10
generator = Sequential()
generator.add(Dense(256, input_dim=randomDim))
generator.add(LeakyReLU(0.2))
generator.add(Dense(512))
generator.add(LeakyReLU(0.2))
generator.add(Dense(1024))
generator.add(LeakyReLU(0.2))
generator.add(Dense(784, activation='tanh'))
181

discriminator = Sequential()
discriminator.add(Dense(1024, input_dim=784,
kernel_initializer=initializers.RandomNormal(stddev=0.02)))
discriminator.add(LeakyReLU(0.2))
discriminator.add(Dropout(0.3))
discriminator.add(Dense(512))
discriminator.add(LeakyReLU(0.2))
discriminator.add(Dropout(0.3))
discriminator.add(Dense(256))
discriminator.add(LeakyReLU(0.2))
discriminator.add(Dropout(0.3))
discriminator.add(Dense(1, activation='sigmoid'))

trainable GAN GAN


False

# Combined network
discriminator.trainable = False
ganInput = Input(shape=(randomDim,))
x = generator(ganInput)
ganOutput = discriminator(x)
gan = Model(inputs=ganInput, outputs=ganOutput)

GAN freeze

discriminator.compile(loss='binary_crossentropy', optimizer=adam)
gan.compile(loss='binary_crossentropy', optimizer=adam)

noisy data

dLosses = []
gLosses = []
def train(epochs=1, batchSize=128):
batchCount = int(X_train.shape[0] / batchSize)
print ('Epochs:', epochs)
print ('Batch size:', batchSize)
print ('Batches per epoch:', batchCount)

for e in range(1, epochs+1):


print ('-'*15, 'Epoch %d' % e, '-'*15)
for _ in range(batchCount):
# Get a random set of input noise and images
noise = np.random.normal(0, 1, size=[batchSize, randomDim])
imageBatch = X_train[np.random.randint(0, X_train.shape[0],
size=batchSize)]

# Generate fake MNIST images


182

generatedImages = generator.predict(noise)
# print np.shape(imageBatch), np.shape(generatedImages)
X = np.concatenate([imageBatch, generatedImages])

# Labels for generated and real data


yDis = np.zeros(2*batchSize)
# One-sided label smoothing
yDis[:batchSize] = 0.9

# Train discriminator
discriminator.trainable = True
dloss = discriminator.train_on_batch(X, yDis)

for

GAN

# Train generator
noise = np.random.normal(0, 1, size=[batchSize, randomDim])
yGen = np.ones(batchSize)
discriminator.trainable = False
gloss = gan.train_on_batch(noise, yGen)

# Store loss of most recent batch from this epoch


dLosses.append(dloss)
gLosses.append(gloss)

if e == 1 or e % 20 == 0:
saveGeneratedImages(e)

saveGeneratedImages plotLoss

# Plot the loss from each batch


def plotLoss(epoch):
plt.figure(figsize=(10, 8))
plt.plot(dLosses, label='Discriminitive loss')
plt.plot(gLosses, label='Generative loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.savefig('images/gan_loss_epoch_%d.png' % epoch)

# Create a wall of generated MNIST images


def saveGeneratedImages(epoch, examples=100, dim=(10, 10), figsize=(10,
10)):
noise = np.random.normal(0, 1, size=[examples, randomDim])
generatedImages = generator.predict(noise)
generatedImages = generatedImages.reshape(examples, 28, 28)

plt.figure(figsize=figsize)
for i in range(generatedImages.shape[0]):
183

plt.subplot(dim[0], dim[1], i+1)


plt.imshow(generatedImages[i], interpolation='nearest',
cmap='gray_r')
plt.axis('off')
plt.tight_layout()
plt.savefig('images/gan_generated_image_epoch_%d.png' % epoch)

from tensorflow.keras.datasets import mnist


from tensorflow.keras.layers import Input, Dense, Reshape, Flatten,
Dropout
from tensorflow.keras.layers import BatchNormalization, Activation,
ZeroPadding2D
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import UpSampling2D, Conv2D
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import initializers
import matplotlib.pyplot as plt
import sys
import numpy as np
import tqdm
randomDim = 10

# Load MNIST data


(X_train, _), (_, _) = mnist.load_data()
X_train = (X_train.astype(np.float32) - 127.5)/127.5
X_train = X_train.reshape(60000, 784)

generator = Sequential()
generator.add(Dense(256, input_dim=randomDim)) #,
kernel_initializer=initializers.RandomNormal(stddev=0.02)))
generator.add(LeakyReLU(0.2))
generator.add(Dense(512))
generator.add(LeakyReLU(0.2))
generator.add(Dense(1024))
generator.add(LeakyReLU(0.2))
generator.add(Dense(784, activation='tanh'))
adam = Adam(learning_rate=0.0002, beta_1=0.5)

discriminator = Sequential()
discriminator.add(Dense(1024, input_dim=784,
kernel_initializer=initializers.RandomNormal(stddev=0.02)))
discriminator.add(LeakyReLU(0.2))
discriminator.add(Dropout(0.3))
discriminator.add(Dense(512))
discriminator.add(LeakyReLU(0.2))
discriminator.add(Dropout(0.3))
discriminator.add(Dense(256))
discriminator.add(LeakyReLU(0.2))
discriminator.add(Dropout(0.3))
discriminator.add(Dense(1, activation='sigmoid'))
discriminator.compile(loss='binary_crossentropy', optimizer=adam)
# Combined network
discriminator.trainable = False
ganInput = Input(shape=(randomDim,))
x = generator(ganInput)
184

ganOutput = discriminator(x)
gan = Model(inputs=ganInput, outputs=ganOutput)
gan.compile(loss='binary_crossentropy', optimizer=adam)
dLosses = []
gLosses = []
# Plot the loss from each batch
def plotLoss(epoch):
plt.figure(figsize=(10, 8))
plt.plot(dLosses, label='Discriminitive loss')
plt.plot(gLosses, label='Generative loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.savefig('images/gan_loss_epoch_%d.png' % epoch)
# Create a wall of generated MNIST images
def saveGeneratedImages(epoch, examples=100, dim=(10, 10),
figsize=(10, 10)):
noise = np.random.normal(0, 1, size=[examples, randomDim])
generatedImages = generator.predict(noise)
generatedImages = generatedImages.reshape(examples, 28, 28)
plt.figure(figsize=figsize)
for i in range(generatedImages.shape[0]):
plt.subplot(dim[0], dim[1], i+1)
plt.imshow(generatedImages[i], interpolation='nearest',
cmap='gray_r')
plt.axis('off')
plt.tight_layout()
plt.savefig('images/gan_generated_image_epoch_%d.png' % epoch)
def train(epochs=1, batchSize=128):
batchCount = int(X_train.shape[0] / batchSize)
print ('Epochs:', epochs)
print ('Batch size:', batchSize)
print ('Batches per epoch:', batchCount)

for e in range(1, epochs+1):


print ('-'*15, 'Epoch %d' % e, '-'*15)
for _ in range(batchCount):
# Get a random set of input noise and images
noise = np.random.normal(0, 1, size=[batchSize,
randomDim])
imageBatch = X_train[np.random.randint(0,
X_train.shape[0], size=batchSize)]

# Generate fake MNIST images


generatedImages = generator.predict(noise)
# print np.shape(imageBatch), np.shape(generatedImages)
X = np.concatenate([imageBatch, generatedImages])

# Labels for generated and real data


yDis = np.zeros(2*batchSize)
# One-sided label smoothing
yDis[:batchSize] = 0.9

# Train discriminator
discriminator.trainable = True
dloss = discriminator.train_on_batch(X, yDis)
185

# Train generator
noise = np.random.normal(0, 1, size=[batchSize,
randomDim])
yGen = np.ones(batchSize)
discriminator.trainable = False
gloss = gan.train_on_batch(noise, yGen)

# Store loss of most recent batch from this epoch


dLosses.append(dloss)
gLosses.append(gloss)

if e == 1 or e % 20 == 0:
saveGeneratedImages(e)

# Plot losses from every epoch


plotLoss(e)

GAN

train(200, 128)
Epochs: 200
Batch size: 128
Batches per epoch: 468
--------------- Epoch 1 ---------------
--------------- Epoch 2 ---------------
--------------- Epoch 3 ---------------
--------------- Epoch 4 ---------------
--------------- Epoch 5 ---------------

--------------- Epoch 198 ---------------


--------------- Epoch 199 ---------------
--------------- Epoch 200 ---------------

GAN

GAN

Epoch 1: Epoch 20:


186

Epoch 100: Epoch 200:

GAN

GAN
187

mode collapse

GAN

1
2
3
4
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

erence on Neural
Information Processing. Springer
Sutskever, Ilya et al. (2013). On the importance of initialization and momentum
in deep learning. In: ICML (3) 28.1139-1147, p. 5.
Siegelmann, H.T. (1995). "Computation Beyond the Turing Limit". Science. 238
(28): 632 637.
Alex Graves and Greg Wayne and Ivo Danihelka (2014). "Neural Turing
Machines".
Valentino, Z., Gianmario, S., Daniel, S., & Peter, R. (2017). Python Deep
Learning: Next Generation Techniques to Revolutionize Computer Vision,
AI, Speech and Data Analysis. Birmingham, UK: Packt Publishing Ltd.
Aljundi, R. (2019). Continual learning in neural networks. arXiv preprint
arXiv:1910.02718.
Gupta, P., & Sehgal, N. K. (2021). Introduction to Machine Learning in the Cloud
with Python: Concepts and Practices. Springer Nature.
Alpaydin, E. (2004). Introduction To Machine Learning. S.L.: Mit Press.
Bishop, Christopher M (2006). Pattern recognition and machine learning.
springer.
Brudfors, M. (2020). Generative Models for Preprocessing of Hospital Brain
Scans (Doctoral dissertation, UCL (University College London)).
189

Charte, F., Rivera, A. J., & Del Jesus, M. J. (2016). Multilabel classification:
problem analysis, metrics and techniques. Springer International Publishing.
Chen, Z., & Liu, B. (2018). Lifelong machine learning. Synthesis Lectures on
Artificial Intelligence and Machine Learning, 12(3), 1-207.
Dulhare, U. N., Ahmad, K., & Ahmad, K. A. B. (Eds.). (2020). Machine Learning
and Big Data: Concepts, Algorithms, Tools and Applications. John Wiley &
Sons.
Friedemann Zenke, Ben Poole, and Surya Ganguli. (2017). Continual learning
through synaptic intelligence. In International Conference on Machine
Learning (ICML).
Gan, Z. (2018). Deep Generative Models for Vision and Language Intelligence
(Doctoral dissertation, Duke University).
Jebara, T. (2012). Machine learning: discriminative and generative (Vol. 755).
Springer Science & Business Media.
Lesort, T. (2020). Continual Learning: Tackling Catastrophic Forgetting in Deep
Neural Networks with Replay Processes. arXiv preprint arXiv:2007.00487
Marsland, S. (2015). Machine Learning : an algorithmic perspective. Boca Raton,
Fl: Crc Press.
Mitchell, T.M. (1997). Machine learning. New York: Mcgraw Hill.
Rhys, H. (2020). Machine Learning with R, the tidyverse, and mlr. Simon and
Schuster.
Zhou Z. (2021). Machine Learning. Springer Nature Singapore Pte Ltd.
Weidman, S. (2019). Deep Learning from Scratch: Building with Python from
First Principles. O'Reilly Media.
Gulli, A., Kapoor, A., & Pal, S. (2019). Deep learning with TensorFlow 2 and
Keras: regression, ConvNets, GANs, RNNs, NLP, and more with
TensorFlow 2 and the Keras API. Packt Publishing Ltd.
Vasudevan, S. K., Pulari, S. R., & Vasudevan, S. (2022). Deep Learning: A
Comprehensive Guide. Chapman and Hall/CRC.
Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2021). Dive into deep learning.
arXiv preprint arXiv:2106.11342.
Ekman, M. (2021). Learning Deep Learning: Theory and Practice of Neural
Networks, Computer Vision, NLP, and Transformers Using TensorFlow.
Addison-Wesley Professional.
Huang, S. C., & Le, T. H. (2021). Principles and labs for deep learning. Elsevier.
Wilmott, P. (2019). Machine learning: an applied mathematics introduction.
Panda Ohana Publishing.
Kelleher, J. D., Mac Namee, B., & D'arcy, A. (2020). Fundamentals of machine
learning for predictive data analytics: algorithms, worked examples, and case
studies. MIT press.
190

Du, K. L., & Swamy, M. N. (2013). Neural networks and statistical learning.
Springer Science & Business Media.
Ye, J. C. (2022). Geometry of Deep Learning: A Signal Processing Perspective
(Vol. 37). Springer Nature.
Howard, J., & Gugger, S. (2020). Deep Learning for Coders with fastai and
PyTorch. O'Reilly Media.
Stevens, E., Antiga, L., & Viehmann, T. (2020). Deep learning with PyTorch.
Manning Publications.
Saitoh, K. (2021). Deep Learning from the Basics: Python and Deep Learning:
Theory and Implementation. Packt Publishing Ltd.
Ghatak, A. (2019). Deep learning with R (Vol. 245). Singapore: Springer.
Calin, O. (2020). Deep learning architectures. Springer International Publishing.
Sewak, M., Karim, M. R., & Pujari, P. (2018). Practical convolutional neural
networks: implement advanced deep learning models using Python. Packt
Publishing Ltd.
Liu, Y. H., & Mehta, S. (2019). Hands-On Deep Learning Architectures with
Python: Create deep neural networks to solve computational problems using
TensorFlow and Keras. Packt Publishing Ltd.
Ma, Y., & Tang, J. (2021). Deep learning on graphs. Cambridge University Press.
Buduma, N., & Locascio, N. (2017). Fundamentals of deep learning: Designing
next-generation machine intelligence algorithms. " O'Reilly Media, Inc.".
Ramsundar, B., & Zadeh, R. B. (2018). TensorFlow for deep learning: from linear
regression to reinforcement learning. " O'Reilly Media, Inc.".
Hope, T., Resheff, Y. S., & Lieder, I. (2017). Learning tensorflow: A guide to
building deep learning systems. " O'Reilly Media, Inc.".
Osinga, D. (2018). Deep learning cookbook: practical recipes to get started
quickly. " O'Reilly Media, Inc.".
Trask, A. W. (2019). Grokking deep learning. Simon and Schuster.
Wani, M. A., Bhat, F. A., Afzal, S., & Khan, A. I. (2020). Advances in deep
learning. Springer.
Bhardwaj, A., Di, W., & Wei, J. (2018). Deep Learning Essentials: Your hands-
on guide to the fundamentals of deep learning and neural network modeling.
Packt Publishing Ltd.
Gulli, A., & Pal, S. (2017). Deep learning with Keras. Packt Publishing Ltd.
Aggarwal, C. C. (2018). Neural networks and deep learning. Springer, 10, 978-3.
Chollet, F. (2021). Deep learning with Python. Simon and Schuster.
Foster, D. (2019). Generative deep learning: teaching machines to paint, write,
compose, and play. O'Reilly Media.
191

Patterson, J., & Gibson, A. (2017). Deep learning: A practitioner's approach. "
O'Reilly Media, Inc.".
Glassner, A. (2018). Deep learning: From basics to practice. Seattle, WA: The
Imaginary Institute.
Pointer, I. (2019). Programming PyTorch for Deep Learning: Creating and
Deploying Deep Learning Applications. O'Reilly Media.
192

You might also like