You are on page 1of 36

An Introduction to Keras

CS 831
Dr. Sanjay Chatterji
1. Introduction of Keras
• Keras is a deep learning framework based on theano/tensorflow written in
python.

• Keras is a high level neural network API that supports fast experimentation
and can quickly translate your ideas into results.
• Powerful
• Easy to use
• Free
• Open source
Other Deep Learning Tools
TensorFlow is not the only game in town. These are some of the best supported alternatives. Most of these are
written in C++.
• TensorFlow Google's deep learning API.
• MXNet Apache foundation's deep learning API. Can be used through Keras.
• Theano - Python, from the academics that created deep learning.
• Keras - Also by Google, higher level framework that allows the use of TensorFlow, MXNet and Theano
interchangeably.
• Torch - LUA based. It has been used for some of the most advanced deep learning projects in the world.
• PaddlePaddle - Baidu's deep learning API.
• Deeplearning4J - Java based. GPU support in Java!
• Computational Network Toolkit (CNTK) - Microsoft. Support for Windows/Linux, command line only.
GPU support.
• H2O - Java based. Supports all major platforms. Limited support for computer vision. No GPU support.
2. Keras design principles

( 1 ) user-friendly

Keras provides a consistent and concise API that greatly reduces the
user workload in general applications, while providing clear and practical
bug feedback.

(2) Collaboration with Python

Keras does not have a separate model profile type, and the model is
described by python code, which makes it more compact and
debuggable, and provides the convenience of extensions.
(3) Expansibility

It is easy to add new modules by simply writing new classes or functions that mimic
existing modules.

(4) modularity

A model can be understood as a sequence of a layer or a graph of data, and fully


configurable modules can be freely combined with minimal cost.

Specifically, the network layer, loss functions, optimizers, initialization policies,


activation functions, and regularization methods are all independent modules that you can
use to build your own model.
4. Keras module structure Sequential model
model
Theano Functional model
TensorFlow end
Commonly used layers
Convolution layer
Loss function Recurrent layer
The optimizer Pooling layer
The activation function Keras Local connection layer
Performance evaluation Circulation layer
network network
Initialization method configuration Embedded layer
layer
The regularization The fusion layer
Constraint item Higher activation layer
The callback function Specification layer
The noise level
The wrapper
Sequential
Custom layer
preprocessing data
Text preprocessing preprocessing
output
Image preprocessing
Use Keras to build neural network
Sequential model
Step1 : Select Model
Functional model

Input layer
Step2 : Build network Hidden layer
layer Output layer Commonly used layer
Convolution layer
Program
network Pooling layer
Optimization function layer Local connection layer
Step3 : compile Loss function Circulation layer
Performance evaluation
……
Step4 : train callback function
Sequential
data preprocessing
preprocessing Text preprocessing
Step5 : prediction Image preprocessing
Hyperparameters: Activation
• Activation functions (for neurons) are applied on a per-layer basis.

• Available options in Keras:


o ‘softmax’
o ‘elu’ – The exponential linear activation: x if x > 0 and alpha * (exp(x)-1) if x < 0.
o ‘selu’ -- The scaled exponential unit activation: scale * elu(x, alpha).
o ‘softplus’ -- The softplus activation: log(exp(x) + 1).
o ‘softsign’ -- The softplus activation: x / (abs(x) + 1).
o ‘relu’ -- The (leaky) rectified linear unit activation: x if x > 0, alpha * x if x < 0. If max_value is
defined, the result is truncated to this value.
o ‘tanh’ -- Hyperbolic tangent activation function.
o ‘sigmoid’ – Sigmoid activation function.
o ‘hardsigmoid’
o ‘linear’
Hyperparameters: (2) Loss function
• An optimizer is one of the two arguments required for compiling a
Keras model:

• Available options for cost/loss functions in Keras:


o mean_squared_error o logcosh
o mean_absolute_error o categorical_crossentropy
o mean_absolute_percentate_error o sparse_categorical_crossentropy
o mean_squared_logarithmic_error o binary_crossentropy
o squared_hinge o kullback_leibler_divergence
o hinge o poisson
o categorical_hinge
o cosine_proximity
7 Hyperparameters: (3) Optimizer
• An optimizer is one of the two arguments required for compiling a Keras model:

• Several optimizers are available, including SGD and adam (default).


• See the documentation for the various option parameters of each function.
7 Hyperparameters: (4) Regularizer
• Regularizers allow to apply penalties on layer parameters or layer activity during
optimization.
• The penalties are applied on a per-layer basis.
• There are 3 types of regularizers in Keras:
• kernel_regularizer: applied to the kernel weights matrix.
• bias_regularizer: applied to the bias vector.
• activity_regularizer: applied to the output of the layer (its "activation").
7 Hyperparameters: (5) Early Stopping
Example of early stopping. There are some parameters:
• monitor – quantity to be monitored
• min_delta -- minimum change in the monitored quantity to qualify as an improvement
• patience -- number of epochs with no improvement after which training will be stopped.
5 TensorFlow for Classification: MNIST

Input 2D image is
flattened to 1D
vector.
Dropout (with the
rate 0.2) is applied
to the first hidden
layer
6. Keras main concept
(1) Symbolic computing
The underlying libraries of Keras use Theano or TensorFlow, which are also known
as the back ends of Keras. Both Theano and TensorFlow are "symbolic" libraries.
(2) Tensor
Tensors, you can think of them as natural extensions of vectors, matrices, to represent
a wide range of data types. The order of a tensor is also called a dimension.
(3) model

Keras comes in two types: Sequential models, which are more widely used, and
functional models.

• Sequential model: only single-input, single-output, a path to the bottom, layer-


to-layer adjacencies, no cross-layer connections.

• Functional model: multiple input and multiple output, arbitrary connection


between layers.
Load Data.

Define Model.

7. Coding Steps Compile Model.

Fit Model.

Evaluate Model.
Load data
• Peptides in amino acid sequence
• Encode AA using BLOSUM : independend variable (X)
• Binding affinity : dependend variable (y)
• Mnist dataset
Define model
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
# fix random seed for reproducibility
np.random.seed(7)

# create model
model = Sequential()
model.add(Dense(12, input_dim=INPUT_DIMENSIONS, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
Compile model
# Compile model
model.compile(loss='binary_crossentropy’,
optimizer='adam', metrics=['accuracy'])
Fit Model
# Fit the model
model.fit(X, Y,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
validation_data=(X_val, y_val))
Evaluate model
# evaluate the model
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1],
scores[1]*100))
Avoid overfitting
• Regularization
from keras.regularizers import l2
model.add(Dense(number_of_neurons, activation = 'relu’,
kernel_regularizer=l2(0.001)))

• Dropout
from keras.layers import Dropout
model.add(Dropout(0.2))

• Batch normalization
from keras.layers.normalization import BatchNormalization
model.add(BatchNormalization())
8. Keras coding example--
(regression model)
import numpy as np
np.random.seed(1337)
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
Create data set
# Create data set
X = np.linspace(-1, 1, 200)
np.random.shuffle(X) # Randomize the data set
Y = 0.5 * X + 2 + np.random.normal(0, 0.05, (200, )) # Suppose the real model is : Y=0.5X+2
plt.show() # Draw data set:plt.scatter(X, Y)

X_train, Y_train = X[:160], Y[:160] # Put the first 160 pieces of data into the
training set
X_test, Y_test = X[160:], Y[160:] # Put the last 40 points in the test set
Define a model
# Define a model
'''
Keras has two types of models, Sequential (Sequential) and functional,
and the more common is Sequential,
which is single-input-single-output
'''
model = Sequential ()
'''
The model is added layer by layer through the add() method.
Dense is a full connection layer.
The first layer needs to define the input,
while the second layer does not need to specify the input
'''
model.add(Dense(output_dim=1, input_dim=1))

'''
1. Training is required after defining the model,
but we need to specify some training parameters before training
2. Select the loss function and optimizer through the compile() method
3. Here, mean square error is used as the loss function,
and stochastic gradient descent is used as the optimization method
'''
model.compile(loss='mse', optimizer='sgd')
Train the model
# Train the model
print('Training -----------')
for step in range(301):
cost = model.train_on_batch(X_train, Y_train) # Keras has a number
of functions to start training, here train_on_batch ()
if step % 100 == 0:
print('train cost: ', cost)
Test the model
# Test the model
print('\nTesting ------------')
cost = model.evaluate(X_test, Y_test, batch_size=40)
print('test cost:', cost)
'''
Check the trained network parameters.
Since our network has only one layer,
and there is only one input and one output for each training,
the model Y=WX+B is trained on the first layer, where W and B are the training
'''
W, b = model.layers[0].get_weights()
print('Weights=', W, '\nbiases=', b)
Output forecast
# plotting the prediction
Y_pred = model.predict(X_test)
plt.scatter(X_test, Y_test)
plt.plot(X_test, Y_pred)
plt.show()
Full code
import numpy as np
# Train the model
np.random.seed(1337)
print('Training -----------')
from keras.models import Sequential
for step in range(301):
from keras.layers import Dense
cost = model.train_on_batch(X_train, Y_train) #
import matplotlib.pyplot as plt
Keras has a number of functions to start training, here
train_on_batch ()
# Create data set
if step % 100 == 0:
X = np.linspace(-1, 1, 200)
print('train cost: ', cost)
np.random.shuffle(X) # Randomize the data set
Y = 0.5 * X + 2 + np.random.normal(0, 0.05, (200, )) # Suppose the
# Test the model
real model is : Y=0.5X+2
print('\nTesting ------------')
plt.show() # Draw data set:plt.scatter(X, Y)
cost = model.evaluate(X_test, Y_test, batch_size=40)
print('test cost:', cost)
X_train, Y_train = X[:160], Y[:160] # Put the first 160 pieces
W, b = model.layers[0].get_weights()
of data into the training set
print('Weights=', W, '\nbiases=', b)
X_test, Y_test = X[160:], Y[160:] # Put the last 40 points
in the test set
# plotting the prediction
Y_pred = model.predict(X_test)
# Define a model
plt.scatter(X_test, Y_test)
model = Sequential ()
plt.plot(X_test, Y_pred)
model.add(Dense(output_dim=1, input_dim=1))
plt.show()
model.compile(loss='mse', optimizer='sgd')
Regression model training results
Embeddings in Word Prediction
Code Example (1): The Embedding Layer in Keras
Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] ->
[[0.25, 0.1], [0.6, -0.2]]  So, this is used to learn embedding from scratch.

model = Sequential() model.add(Embedding(1000, 64, input_length=10))


# the model will take as input an integer matrix of size (batch, input_length).
# the largest integer (i.e. word index) in the input should be
# no larger than 999 (vocabulary size).
# now model.output_shape == (None, 10, 64), where None is the batch dimension.

input_array = np.random.randint(1000, size=(32, 10))

model.compile('rmsprop', 'mse’)
output_array = model.predict(input_array)
assert output_array.shape == (32, 10, 64)

Arguments
•input_dim: int > 0. Size of the vocabulary, i.e. maximum integer index + 1.
•output_dim: int >= 0. Dimension of the dense embedding.
Code Example (2): Embedding in a FeedForward
Network for Text Classification
model = keras.Sequential([
keras.layers.Embedding(encoder.vocab_size, 16),
keras.layers.GlobalAveragePooling1D(),
keras.layers.Dense(1, activation='sigmoid')])

1.The first layer is an Embedding layer. This layer takes the integer-encoded
vocabulary and looks up the embedding vector for each word-index. These vectors
are learned as the model trains. The vectors add a dimension to the output array.
The resulting dimensions are: (batch, sequence, embedding).

2.Next, a GlobalAveragePooling1D layer returns a fixed-length output vector for each


example by averaging over the sequence dimension. This allows the model to
handle input of variable length, in the simplest way possible.

3.This fixed-length output vector is piped through a fully-connected (Dense) layer with
16 hidden units.

4.The last layer is densely connected with a single output node. Using
the sigmoid activation function, this value is a float between 0 and 1, representing a
probability, or confidence level.
Code Example (3): Embedding in a RNN Network
for Text Classification
• With one Bidirectional layer
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])

• With stacked Bidirectional layers


model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Thank You

You might also like