You are on page 1of 75

Apache Singa: A General Distributed Deep

Learning Platform
Md Johirul Islam
Department of Computer Science
Iowa State University
mislam@iastate.edu

March 3, 2016

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Singa is a general distributed deep learning platform for


training big deep learning models over large datasets

It is designed with an intuitive programming model using


layer abstraction

SINGA is intergrated with Mesos, so that distributed


training can be started as a Mesos framework

SINGA can run on top of distributed storage system to


achieve scalability. The current version of SINGA supports
HDFS

Md Johirul Islam

2/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Two Goals:
I

Md Johirul Islam

Scalability: Reduces total training time to achieve a certain


accuracy with more computing resources
Easy to use programming model: The users can
implement their deep learning model/algorithms without
much awareness of the underlying distributed system

3/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Work flow

Training Goal is to find optimal parameters involved in the


transformation functions that genrate good features for
specific tasks.

SGD algorithm is used to randomly initialize the


parameters and then randomly update through iterations

Md Johirul Islam

4/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Work Flow

Md Johirul Islam

5/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Work Flow
I

The training workload is distributed over the workers and


servers

In each iteration every worker calls the TrainOneBatch


function to compute parameter gradients

TrainOneBatch takes a NeuralNet object representing a


neural network and visits all the layers in a certain order

The resultant gradients are aggregated by the local stub.

The stub forwards them to the corresponding servers for


updating

Md Johirul Islam

6/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Work Flow

Md Johirul Islam

7/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

8/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Worker Group
I
I

I
I
I

Logical Architecture
Parallelism
Communication

Made up of one or more workers.


Each worker group trains a complete model replica for
particular dataset
They compute the parameter gradients
A worker group communicates with only one server group.
All worker groups communicate with the server group
asynchronously.
workers inside a worker group communicates
synchronously.

Server Group
I

I
I

Md Johirul Islam

Made up of a number of servers.Each Server manages a


partition of the model parameters.
The handle get/update requests.
The neighboring server groups synchronize time to time.
9/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

Model Parallelism: Each worker computes subset of


parameters against all data partitioned to the group. In
layer/NeuralNet setting the value of partition_dim to 0.

Data parallelism: Each worker computes all parameters


against a subset of data.In layer/NeuralNet setting the
value of partition_dim to 1.

Hybrid parallelism: Combination of both

Md Johirul Islam

10/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

Figure: Hybrid Parallelism.

Md Johirul Islam

11/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

In Singa Workers and Server run in separate threads.

Several workers and servers again reside in a process.

There is a main thread in a process that works as stub.

The communication between then occurs through


messages occur through messages.

The stub aggregates all the local messages and forwards


them to different threads

Md Johirul Islam

12/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

Singa Communication library consists of two components:


I
I

Md Johirul Islam

Message
Socket

13/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

Message header contains the Sender and Receiver IDs.

The sender and receiver id comprises of the group id and


worker/server id.

The stub forwards messages seeing these id in the


address table.

Md Johirul Islam

14/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

15/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

16/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

17/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

18/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

19/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

20/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

Creating Address

Md Johirul Islam

21/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

Create Address

Md Johirul Islam

22/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

Sockets

There are two types of sockets: Dealer Socket and Router


Socket

The communication between dealers and router are


asynchronous.

The Basic functions of Sockets are to send and receive


messages.

Md Johirul Islam

23/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

24/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

Poller

Md Johirul Islam

25/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

A poller class provides the asynchronous communication


between the dealers and the Routers

One can register a set of Socket Interface Objects with a


poller instance via calling add method and then calling wait
method of this poll object to wait for the registered
SocketInterface to be ready for sending and receiving
messages

Md Johirul Islam

26/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

27/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Logical Architecture
Parallelism
Communication

In Singa the Dealer Socket can connect to only one Router


Socket.

The connection is set up by connecting the dealer socket


to the end point of the router socket.

A router Socket can connect to one or more Dealer socket.


Upon receiving a message the router forwards it to the
appropriate dealer according to the Reciever ID of the
message.

Md Johirul Islam

28/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

29/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Logical Architecture
Parallelism
Communication

30/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Topology

Singa Cluster topology support different distributed training


frameworks.

The Cluster topology of Singa is configured in the cluster


field of JobProto

Md Johirul Islam

31/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Overview
Types of Topology

32/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Overview
Types of Topology

33/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Topology

SandBlaster

This is a synchronous framework used by Google Brain.

A single server group is launched to handle all requests


from workers. A worker computes on its partition of the
model, and only communicates with servers handling
related parameters.

Md Johirul Islam

34/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Topology

Figure: SandBlaster topology

Md Johirul Islam

35/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Topology

AllReduce

This is a synchronous framework used by Baidu


DeepImage

We bind each worker with a server on the same node, so


that each node is responsible for maintaining a partition of
parameters and collecting updates from all other nodes

Md Johirul Islam

36/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Topology

Figure: AllReduce topology

Md Johirul Islam

37/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Topology

Downpour

This is a asynchronous framework used by Google Brain.

Figure: Downpour topology

Md Johirul Islam

38/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Topology

Distributed Hogwild

This is a asynchronous framework used by Caffe(Deep


learning framework by the BVLC)

Each node contains a complete server group and a


complete worker group. Parameter updates are done
locally, so that communication cost during each training
step is minimized. However, the server group must
periodically synchronize with neighboring groups to
improve the training convergence.

Md Johirul Islam

39/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Topology

Figure: Distributed Hogwild

Md Johirul Islam

40/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Overview
Types of Neural Network
Layer
Param

41/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

NeuralNet represents a user neural network model

We have to convert neural net into configuration NeuralNet

Users configure NeuralNet by listing all layers of the neural


net and specifying each layer source layers names

Md Johirul Islam

42/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Feed Forward

They do not have any cycles

Example: MLP,CNN

Md Johirul Islam

43/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Figure: A Simple MLP

Md Johirul Islam

44/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Energy Models
I
I

In energy models the connections are undirected


To convert these models into NeuralNet we have to replace
each undirected connection with two directed connections

Md Johirul Islam

45/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

RNN Models
I

For recurrent neural networks first step would be to unroll


the recurrent layer

Md Johirul Islam

46/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Layer is core abstraction in SINGA

It performs a variety of feature transformation to obtain


high level features.

Md Johirul Islam

47/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Overview
Types of Neural Network
Layer
Param

48/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Built in Layers
I

Input Layers: for loading data from HDFS, DISK or


Network into memory

Neuron Layers: For feature transformation e.g


convolution, pooling, dropout

Loss Layers: for measuring training objective loss, e.g.


Cross Entropy loss, Euclidean Loss

Output Layers: For putting the output of prediction into


DISK, HDS etc.

Connection Layers: For connecting partitions when


NeuralNet is partitioned.

Md Johirul Islam

49/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Input Layers
I
I

A base layer for for loading data from data store


It has different subclasses SingleLabelRecordLayer,
RecordInputLayer, CSVInputLayer, ImagePreprocessLayer
and many others.

Md Johirul Islam

50/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Output Layers
I

This layer gets data from its source layer and converts it
into records of type RecordProto. Records are written as
(key,value) tuples into Store.

Md Johirul Islam

51/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Neuron Layer
I
I

They manipulate feature transformation


ConvolutionLayer: conducts convolution transformation.

Md Johirul Islam

52/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Loss layers
I

Loss layers measures the objective training loss

Md Johirul Islam

53/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

ConnectionLayer

ConcateLayer: connects more than one source layers to


concatenate their feature blob along given dimension

SliceLayer: connects to more than one destination layers


to slice its feature blob along given dimension

SplitLayer: connects to more than one destination layers


to replicate its feature blob

Md Johirul Islam

54/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Md Johirul Islam

Overview
Types of Neural Network
Layer
Param

55/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Base Layer Class


I

Fields:

Methods:

Md Johirul Islam

56/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Creating Custom Layer

Md Johirul Islam

57/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Creating Custom Layer

Md Johirul Islam

58/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Creating Custom Layer

Md Johirul Islam

59/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Creating Custom Layer

Md Johirul Islam

60/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

A Param object in SINGA represents a set of parameters


e.g weight matrix or a bias vector configured inside a layer
configuration

Md Johirul Islam

61/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Different Parameter Types

Md Johirul Islam

62/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Creating Custom Parameter Type

Md Johirul Islam

63/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

Overview
Types of Neural Network
Layer
Param

Creating Custom Parameter Type

Md Johirul Islam

64/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

For each SGD iteration every worker calls the


TraineOneBatch function to compute gradients of
parameters associated with local layers.
SINGA implemented two algorithms for the TrainOneBatch
I

Md Johirul Islam

BP or BackPropagation: Used By Feed forward and RNN


models
CD or Contrastive Divergence: used by energy models

65/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

Implementing new Algorithms

To implement a new algorithm for TrainOneBatch we have


to create a subclass of Worker

Md Johirul Islam

66/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

Implementing new Algorithms

Md Johirul Islam

67/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

Implementing new Algorithm

Md Johirul Islam

68/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

Every Server in SINGA has an updater instance

There are many updaters all of which are subclasses of


Updater class

The base Updater implements the Vanilla SGD Algorithm

Md Johirul Islam

69/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

Learning Rate
I

There are different change methods like kFixed, kLinear,


kExponential, kInverseT, kStep, kFixedStep

Md Johirul Islam

70/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

For different change methods different configuration would


be used.

Md Johirul Islam

71/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

Implement Custom Updater

Figure: Base Updater Class

Md Johirul Islam

72/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

Implement Custom Updater

Md Johirul Islam

73/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

TrainOneBatch
Updater

Implement Custom Updater

Md Johirul Islam

74/75

Apache Singa

Overview
System Architecture
Distributed Training FrameWork
NeuralNet
Training
Summary

We can use SINGA without a much programming


experience.

To get our custom layers,parameters, algorithms we need


to change the code.

Apache SINGA still in development phase. A lot of features


are being added very soon.

Currently it has Python Binding following Keras.

It currently supports training on GPU

Md Johirul Islam

75/75

Apache Singa

You might also like