You are on page 1of 40

Deep Learning on Google Cloud

Colaboratory : Tesla GPU based Free Cloud

https://colab.research.google.com
Cloud Configurations

# GPU count and name (SMI: System Management Interface)


!nvidia-smi -L

!nvidia-smi

!lscpu |grep 'Model name'

# no.of sockets i.e available slots for physical processors


!lscpu | grep 'Socket(s):'

# no.of cores each processor is having


!lscpu | grep 'Core(s) per socket:'
# No. of threads each core is having
!lscpu | grep 'Thread(s) per core'

!lscpu | grep "L3 cache"

# Processor
!lscpu | grep "MHz"

# Usable Memory
!cat /proc/meminfo | grep 'MemAvailable'

# Usable Hard Disk


!df -hT /
Configuration: GPU Based Remote System
GPU: 1 x Tesla K80 , compute 3.7, having 2496 CUDA cores , 12GB
GDDR5 VRAM

CPU: 1 x single core hyper threaded i.e(1 core, 2 threads) Xeon


Processors @2.3Ghz (No Turbo Boost) , 45MB Cache

RAM: ~12.6 GB Available

Disk: ~320 GB Available (OverlayFS: Similar to Live CD), Each


programmer can work on their 320 GB space with security

Disk Space is in Sync with Google Drive . We can upload the dataset
on Google Drive. Google Colab will automatically link Google Drive to
perform implementations. All Source Codes automatically saved in
Google Drive. Full Security without any privacy issues

Idle Time: 90 minutes


https://colab.research.google.com/drive/151805XTDg--dgHb3-AXJCpnWaqRhop_2#scrollTo=vEWe-FHNDY3E
Fetching System Details

from psutil import *


cpu_count()

cpu_stats()
!cat /proc/cpuinfo

!df -h

virtual_memory()
Extraction of Data Files
!apt-get install p7zip-full
!p7zip -d file_name.tar.7z
!tar -xvf file_name.tar
from google.colab import files
Sync Google Drive
Deep Learning and Transfer Functions in Keras
1. Activation Function or Transfer Function is used to
determine the output of node

2. Determine the output of neural network like Yes or No

3. It maps the resulting values in between 0 to 1 or -1 to


1 etc. (depending upon the function).
Categories of Activation / Transfer Functions
• Linear Activation Function
• Non-linear Activation Functions

If Activation function not applied, then the output signal will be a simple linear function
as a polynomial of one degree. Deep Networks are complex

Without Activation Function, the output shall be same in every iteration


Using Activation Function, output is optimized in every next iteration
Equation : f(x) = x
Range : (-infinity to infinity)

Not fit for complexity or various parameters of usual data (Real Time)
that is fed to the neural networks.

Images have encoding in Spatial Domain rather than


Frequency Domain
• Most used Activation
Functions
• Makes easy to adapt /
generalize variety of data
and differentiate between
output

Key Terminologies to understand for nonlinear functions


• Derivative or Differential: Change in y-axis w.r.t.
change in x-axis (Slope)
• Monotonic function: A function which is either
entirely non-increasing or non-decreasing.

The Nonlinear Activation Functions are mainly divided on


the basis of their Range or Curves
Activation Function: Sigmoid / Logistic

Finding Probabilities (0-1)

x, y, z
0.4, 0.7887, 0.3423

Alongwith Sigmoid, ReLU is used


Vanishing Gradient: After so many iterations, the error do
not reduce to huge level

100 iterations: 0.000047453


1000 iterations: 0.00004745199999
10000 iterations: 0.0000474503822=> Relu=> 0

-0.0232232332 RELU => -1


Not able to optimize to huge level…. Only very lesser
value is reduced

Why ReLU??
ReLU gives big jump from vanishing gradient
Activation Function: Tanh - Hyperbolic Tangent
• Mathematical formula is f(x) = 1 - exp(-2x) / 1 + exp(-2x).

• It’s output is zero centered because its range in between -1 to 1


i.e -1 < output < 1 .

• Optimization is easier in this method hence in practice it is


always preferred over Sigmoid function

The sigmoid and hyperbolic


tangent activation functions
cannot be used in networks
with many layers due to the
Vanishing Gradient Problem
Need of Rectified Linear Unit (ReLU)
• Overcomes vanishing gradient problem, allow models learn faster and perform
better (Suppose we struck at 0.000000004223333, RELU will give big jump to 0 by
rounding off, Can customize to how much value rounding off to do)
• (-0.2374274272474 RELU=> -1)

• ReLU is default activation when developing MLP and CNN. The model takes less
time to train or run. At every iteration, it rounds off the values

• As ReLU is 0 for all negative inputs for any given unit not to activate at all (For
Missing Data or Data Sparsity)

• The downside for being zero for all negative values is a problem called dying
ReLU, Neurons die for all inputs and remain inactive no matter what input is
supplied, here no gradient flows

• The leak helps to increase the range of the ReLU function. Usually, the value of a
is 0.01 or so. When a is not 0.01 then it is called Randomized ReLU. Therefore the
range of the Leaky ReLU is (-infinity to infinity)
Avoidance of Vanishing Gradient in ReLU

In Back-propagation, while calculating gradients of loss


(Error) with respect to the weights, the gradients tends to
get smaller and smaller as we keep on moving backward in
the Network. This means that the neurons in
the Earlier layers learn very slowly as compared to the
neurons in the later layers in the Hierarchy. The Earlier
layers in the network are slowest to train.
Using TensorFlow APIs in Keras
# Uploading Dynamic Files Dense implements the operation:

from google.colab import files • A dense layer is just a regular layer of


uploaded = files.upload() neurons in a neural network.
• Each neuron receives input from all
the neurons in the previous layer,
thus densely connected.
# Create MLP in Keras
from keras.models import Sequential output = activation(dot(input, kernel) +
bias)
from keras.layers import Dense
import numpy Activation: Element-wise activation
function passed as the activation
argument
# fix random seed for reproducibility Kernel: Weights Matrix created by layer
numpy.random.seed(9)
bias: Bias vector created by the layer
# avoid overfitting (only applicable if use_bias is True).
# load dataset of scores
dataset = numpy.loadtxt("scores.csv", delimiter=",")

# split into input (X) and output (Y) variables


X = dataset[:,0:8]
Y = dataset[:,8]

# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation=‘sigmoid'))
model.add(Dense(6, activation='relu'))
model.add(Dense(4, activation=‘sigmoid'))
model.add(Dense(2, activation='relu'))
model.add(Dense(1, activation=‘relu'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
https://keras.io/activations/
# Fit the model
model.fit(X, Y, epochs=100, batch_size=10)

# Evaluate the model


scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

# Test Data
testdata = files.upload()
testdataset = numpy.loadtxt("testdata.csv", delimiter=",")
X2 = testdataset[:,0:8]
predictions = model.predict(X2)

# Round predictions
rounded = [round(x[0]) for x in predictions]
print(rounded)
Loss Functions
A loss function (or objective function, or optimization
score function) is one of the two parameters required
to compile a model:

from keras import losses


model.compile(loss='mean_squared_error',
optimizer='sgd')
• for binary_crossentropy: sigmoid activation, scalar
target

• for categorical_crossentropy: softmax activation, one-


hot encoded target

• If it is a multiclass problem, use categorical_crossentropy


Interpretation of Output

• Loss: A scalar value that we attempt to minimize during our training of


the model. The lower the loss, the closer our predictions are to the true
labels.
• Both loss and val_loss should be decreasing and Accuracy (acc and
val_acc) should be increasing.
• acc is the accuracy of training set. val_acc is the measure of how good
the predictions of your model are.
• Training loss is the average of the losses over each batch of training data
A function that transforms the values or
states the conditions for the decision of
the output neuron is known as
an activation function
• Sigmoid (In-Between Solution like
MAY-BE, Intermediate Prediction)
• Tanh
• Softmax
• and many others
Classification and Regression in Prediction

Classification -> Task of predicting a discrete class label


(If forecasting Target Class)

Regression -> Task of predicting a continuous quantity


(If forecasting a Value)
Metrics
A metric is a function that is used to judge the performance of your
model. Metric functions are to be supplied in the metrics parameter
when a model is compiled.

model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['mae', 'acc'])

from keras import metrics

model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=[metrics.mae, metrics.categorical_accuracy])

A metric function is similar to a loss function, except that the results


from evaluating a metric are not used when training the model.
Convolution Neural Networks vs Fully Connected
Neural Networks
• In a fully connected layer each neuron is connected to every neuron in the previous
layer, and each connection has it's own weight. This is a totally general purpose
connection pattern and makes no assumptions about the features in the data. It's also
very expensive in terms of memory (weights) and computation (connections).
• In contrast, in a convolutional layer each neuron is only connected to a few nearby (aka
local) neurons in the previous layer, and the same set of weights (and local connection
layout) is used for every neuron. This connection pattern only makes sense for cases
where the data can be interpreted as spatial with the features to be extracted being
spatially local (hence local connections only OK) and equally likely to occur at any input
position (hence same weights at all positions OK). The typical use case for convolutional
layers is for image data where, as required, the features are local (e.g. a "nose" consists
of a set of nearby pixels, not spread all across the image), and equally likely to occur
anywhere (in general case, that nose might be anywhere in the image).
• The fewer number of connections and weights make convolutional layers relatively
cheap (vs full connect) in terms of memory and compute power needed.
• The name "convolutional" layer/network comes from the fact that the local connection
pattern and shared weight scheme can be interpreted as a filter (or set of filters) being
"convolved" with the input/image... But in plain English it's just a "locally connected
shared weight layer".
• To classifying images — lets say with size 64x64x3 — fully
connected layers need 12288 weights in the first hidden layer

• The number of weights will be even bigger for images with size
225x225x3 = 151875.

• Networks having large number of parameter face several


problems, for e.g. slower training time, chances of overfitting

• The main functional difference of convolution neural network is


that, the main image matrix is reduced to a matrix of lower
dimension in the first layer itself through an operation called
Convolution.

• For e.g. an image of 64x64x3 can be reduced to 1x1x10.


Following which subsequent operations are performed.
Sequential Model
The Sequential model is a linear stack of layers. You can create a Sequential model by
passing a list of layer instances to the constructor:

from keras.models import Sequential


from keras.layers import Dense, Activation

model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
])

You can also simply add layers via the .add() method:
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))

https://keras.io/getting-started/sequential-model-guide/
Pre-Trained Models in Keras
• … classical models available in Keras as Applications.

• … trained on ImageNet (1.2 Million Images) having 1000 classes.

• … available in Two Parts => Model Architecture and Model Weights

• Model architectures are downloaded during Keras installation

• Model weights (As Large Files), can be downloaded on instantiating a model.

Xception InceptionResNetV2
VGG16 MobileNet
VGG19 MobileNetV2
ResNet, ResNetV2, ResNeXt DenseNet
InceptionV3 NASNet
Depth
Parameters
Network (Weight Size Image Input Size
(Millions)
Layers)
alexnet 8 227 MB 61.0 227-by-227
vgg16 16 515 MB 138 224-by-224
vgg19 19 535 MB 144 224-by-224
squeezenet 18 4.6 MB 1.24 227-by-227
googlenet 22 27 MB 7.0 224-by-224
inceptionv3 48 89 MB 23.9 299-by-299
densenet201 201 77 MB 20.0 224-by-224
mobilenetv2 53 13 MB 3.5 224-by-224
resnet18 18 44 MB 11.7 224-by-224
resnet50 50 96 MB 25.6 224-by-224
resnet101 101 167 MB 44.6 224-by-224
xception 71 85 MB 22.9 299-by-299
inceptionresnetv2 164 209 MB 55.9 299-by-299

shufflenet 50 6.3 MB 1.4 224-by-224


nasnetmobile * 20 MB 5.3 224-by-224
nasnetlarge * 360 MB 88.9 331-by-331

You might also like