Distributed Deep Learning Techniques

Apache Singa: A General Distributed Deep
Learning Platform
Md Johirul Islam
Department of Computer Science
Iowa State University
mislam@iastate.edu
March 3, 2016
Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary
Singa is a general distributed deep learning platform for

training big deep learning models over large datasets
It is designed with an intuitive programming model using

layer abstraction
SINGA is intergrated with Mesos, so that distributed

training can be started as a Mesos framework
SINGA can run on top of distributed storage system to

achieve scalability. The current version of SINGA supports
HDFS
Md Johirul Islam
2/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Two Goals:
I
Md Johirul Islam
Scalability: Reduces total training time to achieve a certain

accuracy with more computing resources
Easy to use programming model: The users can
implement their deep learning model/algorithms without
much awareness of the underlying distributed system
3/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Work flow
Training Goal is to find optimal parameters involved in the

transformation functions that genrate good features for
specific tasks.
SGD algorithm is used to randomly initialize the

parameters and then randomly update through iterations
Md Johirul Islam
4/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Work Flow
Md Johirul Islam
5/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Work Flow
I
The training workload is distributed over the workers and

servers
In each iteration every worker calls the TrainOneBatch

function to compute parameter gradients
TrainOneBatch takes a NeuralNet object representing a

neural network and visits all the layers in a certain order
The resultant gradients are aggregated by the local stub.
The stub forwards them to the corresponding servers for

updating
Md Johirul Islam
6/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Work Flow
Md Johirul Islam
7/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Logical Architecture
Parallelism
Communication
8/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Worker Group
I
I
I
I
I
Parallelism
Communication
Made up of one or more workers.

Each worker group trains a complete model replica for
particular dataset
They compute the parameter gradients
A worker group communicates with only one server group.
All worker groups communicate with the server group
asynchronously.
workers inside a worker group communicates
synchronously.
Server Group
I
I
I
Md Johirul Islam
Made up of a number of servers.Each Server manages a

partition of the model parameters.
The handle get/update requests.
The neighboring server groups synchronize time to time.
9/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
Model Parallelism: Each worker computes subset of

parameters against all data partitioned to the group. In
layer/NeuralNet setting the value of partition_dim to 0.
Data parallelism: Each worker computes all parameters

against a subset of data.In layer/NeuralNet setting the
value of partition_dim to 1.
Hybrid parallelism: Combination of both
Md Johirul Islam
10/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
Figure: Hybrid Parallelism.
Md Johirul Islam
11/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
In Singa Workers and Server run in separate threads.
Several workers and servers again reside in a process.
There is a main thread in a process that works as stub.
The communication between then occurs through

messages occur through messages.
The stub aggregates all the local messages and forwards

them to different threads
Md Johirul Islam
12/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
Singa Communication library consists of two components:

I
I
Md Johirul Islam
Message
Socket
13/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
Message header contains the Sender and Receiver IDs.
The sender and receiver id comprises of the group id and

worker/server id.
The stub forwards messages seeing these id in the

address table.
Md Johirul Islam
14/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
15/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
16/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
17/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
18/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
19/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
20/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
Creating Address
Md Johirul Islam
21/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
Create Address
Md Johirul Islam
22/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
Sockets
There are two types of sockets: Dealer Socket and Router

Socket
The communication between dealers and router are

asynchronous.
The Basic functions of Sockets are to send and receive

messages.
Md Johirul Islam
23/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
24/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
Poller
Md Johirul Islam
25/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
A poller class provides the asynchronous communication

between the dealers and the Routers
One can register a set of Socket Interface Objects with a

poller instance via calling add method and then calling wait
method of this poll object to wait for the registered
SocketInterface to be ready for sending and receiving
messages
Md Johirul Islam
26/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
27/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Parallelism
Communication
In Singa the Dealer Socket can connect to only one Router

Socket.
The connection is set up by connecting the dealer socket

to the end point of the router socket.
A router Socket can connect to one or more Dealer socket.

Upon receiving a message the router forwards it to the
appropriate dealer according to the Reciever ID of the
message.
Md Johirul Islam
28/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
29/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Parallelism
Communication
30/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Types of Topology
Singa Cluster topology support different distributed training

frameworks.
The Cluster topology of Singa is configured in the cluster

field of JobProto
Md Johirul Islam
31/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Overview
Types of Topology
32/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Overview
Types of Topology
33/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Types of Topology
SandBlaster
This is a synchronous framework used by Google Brain.
A single server group is launched to handle all requests

from workers. A worker computes on its partition of the
model, and only communicates with servers handling
related parameters.
Md Johirul Islam
34/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Types of Topology
Figure: SandBlaster topology
Md Johirul Islam
35/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Types of Topology
AllReduce
This is a synchronous framework used by Baidu

DeepImage
We bind each worker with a server on the same node, so

that each node is responsible for maintaining a partition of
parameters and collecting updates from all other nodes
Md Johirul Islam
36/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Types of Topology
Figure: AllReduce topology
Md Johirul Islam
37/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Types of Topology
Downpour
This is a asynchronous framework used by Google Brain.
Figure: Downpour topology
Md Johirul Islam
38/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Types of Topology
Distributed Hogwild
This is a asynchronous framework used by Caffe(Deep

learning framework by the BVLC)
Each node contains a complete server group and a

complete worker group. Parameter updates are done
locally, so that communication cost during each training
step is minimized. However, the server group must
periodically synchronize with neighboring groups to
improve the training convergence.
Md Johirul Islam
39/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Types of Topology
Figure: Distributed Hogwild
Md Johirul Islam
40/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Overview
Types of Neural Network
Layer
Param
41/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
NeuralNet represents a user neural network model
We have to convert neural net into configuration NeuralNet
Users configure NeuralNet by listing all layers of the neural

net and specifying each layer source layers names
Md Johirul Islam
42/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Feed Forward
They do not have any cycles
Example: MLP,CNN
Md Johirul Islam
43/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Figure: A Simple MLP
Md Johirul Islam
44/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Energy Models
I
I
In energy models the connections are undirected

To convert these models into NeuralNet we have to replace
each undirected connection with two directed connections
Md Johirul Islam
45/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
RNN Models
I
For recurrent neural networks first step would be to unroll

the recurrent layer
Md Johirul Islam
46/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Layer is core abstraction in SINGA
It performs a variety of feature transformation to obtain

high level features.
Md Johirul Islam
47/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Overview
Layer
Param
48/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Built in Layers
I
Input Layers: for loading data from HDFS, DISK or

Network into memory
Neuron Layers: For feature transformation e.g

convolution, pooling, dropout
Loss Layers: for measuring training objective loss, e.g.

Cross Entropy loss, Euclidean Loss
Output Layers: For putting the output of prediction into

DISK, HDS etc.
Connection Layers: For connecting partitions when

NeuralNet is partitioned.
Md Johirul Islam
49/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Input Layers
I
I
A base layer for for loading data from data store

It has different subclasses SingleLabelRecordLayer,
RecordInputLayer, CSVInputLayer, ImagePreprocessLayer
and many others.
Md Johirul Islam
50/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Output Layers
I
This layer gets data from its source layer and converts it
into records of type RecordProto. Records are written as
(key,value) tuples into Store.
Md Johirul Islam
51/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Neuron Layer
I
I
They manipulate feature transformation

ConvolutionLayer: conducts convolution transformation.
Md Johirul Islam
52/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Loss layers
I
Loss layers measures the objective training loss
Md Johirul Islam
53/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
ConnectionLayer
ConcateLayer: connects more than one source layers to

concatenate their feature blob along given dimension
SliceLayer: connects to more than one destination layers

to slice its feature blob along given dimension
SplitLayer: connects to more than one destination layers

to replicate its feature blob
Md Johirul Islam
54/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Md Johirul Islam
Overview
Layer
Param
55/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Base Layer Class

I
Fields:
Methods:
Md Johirul Islam
56/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Creating Custom Layer
Md Johirul Islam
57/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Md Johirul Islam
58/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Md Johirul Islam
59/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Md Johirul Islam
60/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
A Param object in SINGA represents a set of parameters

e.g weight matrix or a bias vector configured inside a layer
configuration
Md Johirul Islam
61/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Different Parameter Types
Md Johirul Islam
62/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Creating Custom Parameter Type
Md Johirul Islam
63/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
Overview
Layer
Param
Creating Custom Parameter Type
Md Johirul Islam
64/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
For each SGD iteration every worker calls the

TraineOneBatch function to compute gradients of
parameters associated with local layers.
SINGA implemented two algorithms for the TrainOneBatch
I
Md Johirul Islam
BP or BackPropagation: Used By Feed forward and RNN

models
CD or Contrastive Divergence: used by energy models
65/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
Implementing new Algorithms
To implement a new algorithm for TrainOneBatch we have

to create a subclass of Worker
Md Johirul Islam
66/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
Implementing new Algorithms
Md Johirul Islam
67/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
Implementing new Algorithm
Md Johirul Islam
68/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
Every Server in SINGA has an updater instance
There are many updaters all of which are subclasses of

Updater class
The base Updater implements the Vanilla SGD Algorithm
Md Johirul Islam
69/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
Learning Rate
I
There are different change methods like kFixed, kLinear,

kExponential, kInverseT, kStep, kFixedStep
Md Johirul Islam
70/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
For different change methods different configuration would

be used.
Md Johirul Islam
71/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
Implement Custom Updater
Figure: Base Updater Class
Md Johirul Islam
72/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
Md Johirul Islam
73/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
TrainOneBatch
Updater
Md Johirul Islam
74/75
Apache Singa
Overview
System Architecture
NeuralNet
Training
Summary
We can use SINGA without a much programming

experience.
To get our custom layers,parameters, algorithms we need

to change the code.
Apache SINGA still in development phase. A lot of features

are being added very soon.
Currently it has Python Binding following Keras.
It currently supports training on GPU
Md Johirul Islam
75/75
Apache Singa

Distributed Deep Learning Techniques

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Deep Learning Techniques

Uploaded by

Copyright:

Available Formats

Apache Singa: A General Distributed Deep

Singa is a general distributed deep learning platform for

It is designed with an intuitive programming model using

SINGA is intergrated with Mesos, so that distributed

SINGA can run on top of distributed storage system to

Scalability: Reduces total training time to achieve a certain

Training Goal is to find optimal parameters involved in the

SGD algorithm is used to randomly initialize the

The training workload is distributed over the workers and

In each iteration every worker calls the TrainOneBatch

TrainOneBatch takes a NeuralNet object representing a

The resultant gradients are aggregated by the local stub.

The stub forwards them to the corresponding servers for

Made up of one or more workers.

Made up of a number of servers.Each Server manages a

Model Parallelism: Each worker computes subset of

Data parallelism: Each worker computes all parameters

Hybrid parallelism: Combination of both

Figure: Hybrid Parallelism.

In Singa Workers and Server run in separate threads.

Several workers and servers again reside in a process.

There is a main thread in a process that works as stub.

The communication between then occurs through

The stub aggregates all the local messages and forwards

Singa Communication library consists of two components:

Message header contains the Sender and Receiver IDs.

The sender and receiver id comprises of the group id and

The stub forwards messages seeing these id in the

There are two types of sockets: Dealer Socket and Router

The communication between dealers and router are

The Basic functions of Sockets are to send and receive

A poller class provides the asynchronous communication

One can register a set of Socket Interface Objects with a

In Singa the Dealer Socket can connect to only one Router

The connection is set up by connecting the dealer socket

A router Socket can connect to one or more Dealer socket.

Singa Cluster topology support different distributed training

The Cluster topology of Singa is configured in the cluster

This is a synchronous framework used by Google Brain.

A single server group is launched to handle all requests

Figure: SandBlaster topology

This is a synchronous framework used by Baidu

We bind each worker with a server on the same node, so

Figure: AllReduce topology

This is a asynchronous framework used by Google Brain.

Figure: Downpour topology

This is a asynchronous framework used by Caffe(Deep

Each node contains a complete server group and a

Figure: Distributed Hogwild

NeuralNet represents a user neural network model

We have to convert neural net into configuration NeuralNet

Users configure NeuralNet by listing all layers of the neural

They do not have any cycles

Figure: A Simple MLP

In energy models the connections are undirected

For recurrent neural networks first step would be to unroll

Layer is core abstraction in SINGA

It performs a variety of feature transformation to obtain

Input Layers: for loading data from HDFS, DISK or

Neuron Layers: For feature transformation e.g

Loss Layers: for measuring training objective loss, e.g.

Output Layers: For putting the output of prediction into

Connection Layers: For connecting partitions when

A base layer for for loading data from data store