Deep Learning

Deep Learning on Google Cloud
Colaboratory : Tesla GPU based Free Cloud
https://colab.research.google.com
Cloud Configurations
# GPU count and name (SMI: System Management Interface)

!nvidia-smi -L
!nvidia-smi
!lscpu |grep 'Model name'
# no.of sockets i.e available slots for physical processors

!lscpu | grep 'Socket(s):'
# no.of cores each processor is having

!lscpu | grep 'Core(s) per socket:'
# No. of threads each core is having
!lscpu | grep 'Thread(s) per core'
!lscpu | grep "L3 cache"
# Processor
!lscpu | grep "MHz"
# Usable Memory
!cat /proc/meminfo | grep 'MemAvailable'
# Usable Hard Disk

!df -hT /
Configuration: GPU Based Remote System
GPU: 1 x Tesla K80 , compute 3.7, having 2496 CUDA cores , 12GB
GDDR5 VRAM
CPU: 1 x single core hyper threaded i.e(1 core, 2 threads) Xeon

Processors @2.3Ghz (No Turbo Boost) , 45MB Cache
RAM: ~12.6 GB Available
Disk: ~320 GB Available (OverlayFS: Similar to Live CD), Each

programmer can work on their 320 GB space with security
Disk Space is in Sync with Google Drive . We can upload the dataset
on Google Drive. Google Colab will automatically link Google Drive to
perform implementations. All Source Codes automatically saved in
Google Drive. Full Security without any privacy issues
Idle Time: 90 minutes

https://colab.research.google.com/drive/151805XTDg--dgHb3-AXJCpnWaqRhop_2#scrollTo=vEWe-FHNDY3E
Fetching System Details
from psutil import *

cpu_count()
cpu_stats()
!cat /proc/cpuinfo
!df -h
virtual_memory()
Extraction of Data Files
!apt-get install p7zip-full
!p7zip -d file_name.tar.7z
!tar -xvf file_name.tar
from google.colab import files
Sync Google Drive
Deep Learning and Transfer Functions in Keras
1. Activation Function or Transfer Function is used to
determine the output of node
2. Determine the output of neural network like Yes or No
3. It maps the resulting values in between 0 to 1 or -1 to

1 etc. (depending upon the function).
Categories of Activation / Transfer Functions
• Linear Activation Function
• Non-linear Activation Functions
If Activation function not applied, then the output signal will be a simple linear function
as a polynomial of one degree. Deep Networks are complex
Without Activation Function, the output shall be same in every iteration

Using Activation Function, output is optimized in every next iteration
Equation : f(x) = x
Range : (-infinity to infinity)
Not fit for complexity or various parameters of usual data (Real Time)
that is fed to the neural networks.
Images have encoding in Spatial Domain rather than

Frequency Domain
• Most used Activation
Functions
• Makes easy to adapt /
generalize variety of data
and differentiate between
output
Key Terminologies to understand for nonlinear functions

• Derivative or Differential: Change in y-axis w.r.t.
change in x-axis (Slope)
• Monotonic function: A function which is either
entirely non-increasing or non-decreasing.
The Nonlinear Activation Functions are mainly divided on

the basis of their Range or Curves
Activation Function: Sigmoid / Logistic
Finding Probabilities (0-1)
x, y, z
0.4, 0.7887, 0.3423
Alongwith Sigmoid, ReLU is used

Vanishing Gradient: After so many iterations, the error do
not reduce to huge level
100 iterations: 0.000047453

1000 iterations: 0.00004745199999
10000 iterations: 0.0000474503822=> Relu=> 0
-0.0232232332 RELU => -1

Not able to optimize to huge level…. Only very lesser
value is reduced
Why ReLU??
ReLU gives big jump from vanishing gradient
Activation Function: Tanh - Hyperbolic Tangent
• Mathematical formula is f(x) = 1 - exp(-2x) / 1 + exp(-2x).
• It’s output is zero centered because its range in between -1 to 1

i.e -1 < output < 1 .
• Optimization is easier in this method hence in practice it is

always preferred over Sigmoid function
The sigmoid and hyperbolic

tangent activation functions
cannot be used in networks
with many layers due to the
Vanishing Gradient Problem
Need of Rectified Linear Unit (ReLU)
• Overcomes vanishing gradient problem, allow models learn faster and perform
better (Suppose we struck at 0.000000004223333, RELU will give big jump to 0 by
rounding off, Can customize to how much value rounding off to do)
• (-0.2374274272474 RELU=> -1)
• ReLU is default activation when developing MLP and CNN. The model takes less
time to train or run. At every iteration, it rounds off the values
• As ReLU is 0 for all negative inputs for any given unit not to activate at all (For
Missing Data or Data Sparsity)
• The downside for being zero for all negative values is a problem called dying
ReLU, Neurons die for all inputs and remain inactive no matter what input is
supplied, here no gradient flows
• The leak helps to increase the range of the ReLU function. Usually, the value of a
is 0.01 or so. When a is not 0.01 then it is called Randomized ReLU. Therefore the
range of the Leaky ReLU is (-infinity to infinity)
Avoidance of Vanishing Gradient in ReLU
In Back-propagation, while calculating gradients of loss

(Error) with respect to the weights, the gradients tends to
get smaller and smaller as we keep on moving backward in
the Network. This means that the neurons in
the Earlier layers learn very slowly as compared to the
neurons in the later layers in the Hierarchy. The Earlier
layers in the network are slowest to train.
Using TensorFlow APIs in Keras
# Uploading Dynamic Files Dense implements the operation:
from google.colab import files • A dense layer is just a regular layer of

uploaded = files.upload() neurons in a neural network.
• Each neuron receives input from all
the neurons in the previous layer,
thus densely connected.
# Create MLP in Keras
from keras.models import Sequential output = activation(dot(input, kernel) +
bias)
from keras.layers import Dense
import numpy Activation: Element-wise activation
function passed as the activation
argument
# fix random seed for reproducibility Kernel: Weights Matrix created by layer
numpy.random.seed(9)
bias: Bias vector created by the layer
# avoid overfitting (only applicable if use_bias is True).
# load dataset of scores
dataset = numpy.loadtxt("scores.csv", delimiter=",")
# split into input (X) and output (Y) variables

X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation=‘sigmoid'))
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation=‘sigmoid'))
model.add(Dense(2, activation='relu'))
model.add(Dense(1, activation=‘relu'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
https://keras.io/activations/
# Fit the model
model.fit(X, Y, epochs=100, batch_size=10)
# Evaluate the model

scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
# Test Data
testdata = files.upload()
testdataset = numpy.loadtxt("testdata.csv", delimiter=",")
X2 = testdataset[:,0:8]
predictions = model.predict(X2)
# Round predictions
rounded = [round(x[0]) for x in predictions]
print(rounded)
Loss Functions
A loss function (or objective function, or optimization
score function) is one of the two parameters required
to compile a model:
from keras import losses

model.compile(loss='mean_squared_error',
optimizer='sgd')
• for binary_crossentropy: sigmoid activation, scalar
target
• for categorical_crossentropy: softmax activation, one-

hot encoded target
• If it is a multiclass problem, use categorical_crossentropy

Interpretation of Output
• Loss: A scalar value that we attempt to minimize during our training of

the model. The lower the loss, the closer our predictions are to the true
labels.
• Both loss and val_loss should be decreasing and Accuracy (acc and
val_acc) should be increasing.
• acc is the accuracy of training set. val_acc is the measure of how good
the predictions of your model are.
• Training loss is the average of the losses over each batch of training data
A function that transforms the values or
states the conditions for the decision of
the output neuron is known as
an activation function
• Sigmoid (In-Between Solution like
MAY-BE, Intermediate Prediction)
• Tanh
• Softmax
• and many others
Classification and Regression in Prediction
Classification -> Task of predicting a discrete class label

(If forecasting Target Class)
Regression -> Task of predicting a continuous quantity

(If forecasting a Value)
Metrics
A metric is a function that is used to judge the performance of your
model. Metric functions are to be supplied in the metrics parameter
when a model is compiled.
optimizer='sgd',
metrics=['mae', 'acc'])
from keras import metrics
optimizer='sgd',
metrics=[metrics.mae, metrics.categorical_accuracy])
A metric function is similar to a loss function, except that the results

from evaluating a metric are not used when training the model.
Convolution Neural Networks vs Fully Connected
Neural Networks
• In a fully connected layer each neuron is connected to every neuron in the previous
layer, and each connection has it's own weight. This is a totally general purpose
connection pattern and makes no assumptions about the features in the data. It's also
very expensive in terms of memory (weights) and computation (connections).
• In contrast, in a convolutional layer each neuron is only connected to a few nearby (aka
local) neurons in the previous layer, and the same set of weights (and local connection
layout) is used for every neuron. This connection pattern only makes sense for cases
where the data can be interpreted as spatial with the features to be extracted being
spatially local (hence local connections only OK) and equally likely to occur at any input
position (hence same weights at all positions OK). The typical use case for convolutional
layers is for image data where, as required, the features are local (e.g. a "nose" consists
of a set of nearby pixels, not spread all across the image), and equally likely to occur
anywhere (in general case, that nose might be anywhere in the image).
• The fewer number of connections and weights make convolutional layers relatively
cheap (vs full connect) in terms of memory and compute power needed.
• The name "convolutional" layer/network comes from the fact that the local connection
pattern and shared weight scheme can be interpreted as a filter (or set of filters) being
"convolved" with the input/image... But in plain English it's just a "locally connected
shared weight layer".
• To classifying images — lets say with size 64x64x3 — fully
connected layers need 12288 weights in the first hidden layer
• The number of weights will be even bigger for images with size
225x225x3 = 151875.
• Networks having large number of parameter face several

problems, for e.g. slower training time, chances of overfitting
• The main functional difference of convolution neural network is

that, the main image matrix is reduced to a matrix of lower
dimension in the first layer itself through an operation called
Convolution.
• For e.g. an image of 64x64x3 can be reduced to 1x1x10.

Following which subsequent operations are performed.
Sequential Model
The Sequential model is a linear stack of layers. You can create a Sequential model by
passing a list of layer instances to the constructor:
from keras.models import Sequential

from keras.layers import Dense, Activation
model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
])
You can also simply add layers via the .add() method:
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))
https://keras.io/getting-started/sequential-model-guide/
Pre-Trained Models in Keras
• … classical models available in Keras as Applications.
• … trained on ImageNet (1.2 Million Images) having 1000 classes.
• … available in Two Parts => Model Architecture and Model Weights
• Model architectures are downloaded during Keras installation
• Model weights (As Large Files), can be downloaded on instantiating a model.
Xception InceptionResNetV2
VGG16 MobileNet
VGG19 MobileNetV2
ResNet, ResNetV2, ResNeXt DenseNet
InceptionV3 NASNet
Depth
Parameters
Network (Weight Size Image Input Size
(Millions)
Layers)
alexnet 8 227 MB 61.0 227-by-227
vgg16 16 515 MB 138 224-by-224
vgg19 19 535 MB 144 224-by-224
squeezenet 18 4.6 MB 1.24 227-by-227
googlenet 22 27 MB 7.0 224-by-224
inceptionv3 48 89 MB 23.9 299-by-299
densenet201 201 77 MB 20.0 224-by-224
mobilenetv2 53 13 MB 3.5 224-by-224
resnet18 18 44 MB 11.7 224-by-224
resnet50 50 96 MB 25.6 224-by-224
resnet101 101 167 MB 44.6 224-by-224
xception 71 85 MB 22.9 299-by-299
inceptionresnetv2 164 209 MB 55.9 299-by-299
shufflenet 50 6.3 MB 1.4 224-by-224

nasnetmobile * 20 MB 5.3 224-by-224
nasnetlarge * 360 MB 88.9 331-by-331

Deep Learning

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning

Uploaded by

Copyright:

Available Formats

Deep Learning on Google Cloud

Colaboratory : Tesla GPU based Free Cloud

# GPU count and name (SMI: System Management Interface)

!lscpu |grep 'Model name'

# no.of sockets i.e available slots for physical processors

# no.of cores each processor is having

!lscpu | grep "L3 cache"

# Usable Hard Disk

CPU: 1 x single core hyper threaded i.e(1 core, 2 threads) Xeon

RAM: ~12.6 GB Available

Disk: ~320 GB Available (OverlayFS: Similar to Live CD), Each

Idle Time: 90 minutes

from psutil import *

2. Determine the output of neural network like Yes or No

3. It maps the resulting values in between 0 to 1 or -1 to

Without Activation Function, the output shall be same in every iteration

Images have encoding in Spatial Domain rather than

Key Terminologies to understand for nonlinear functions

The Nonlinear Activation Functions are mainly divided on

Finding Probabilities (0-1)

Alongwith Sigmoid, ReLU is used

100 iterations: 0.000047453

-0.0232232332 RELU => -1

• It’s output is zero centered because its range in between -1 to 1

• Optimization is easier in this method hence in practice it is

The sigmoid and hyperbolic

In Back-propagation, while calculating gradients of loss

from google.colab import files • A dense layer is just a regular layer of

# split into input (X) and output (Y) variables

# Evaluate the model

from keras import losses

• for categorical_crossentropy: softmax activation, one-

• If it is a multiclass problem, use categorical_crossentropy

• Loss: A scalar value that we attempt to minimize during our training of

Classification -> Task of predicting a discrete class label

Regression -> Task of predicting a continuous quantity

from keras import metrics

A metric function is similar to a loss function, except that the results

• Networks having large number of parameter face several

• The main functional difference of convolution neural network is

• For e.g. an image of 64x64x3 can be reduced to 1x1x10.

from keras.models import Sequential

• … trained on ImageNet (1.2 Million Images) having 1000 classes.

• … available in Two Parts => Model Architecture and Model Weights

• Model architectures are downloaded during Keras installation

• Model weights (As Large Files), can be downloaded on instantiating a model.

shufflenet 50 6.3 MB 1.4 224-by-224

You might also like