You are on page 1of 40

Implementation of Infrastructure for Fog

Computing
A Report Submitted
in Partial Fulfillment of the Requirements
for the Degree of
Bachelor of Technology
in
Information Technology

by
Vaibhav Gaiha(20158083)
Naguboyina Sravya(20158088)
Subham Kumar(20158082)
Rajat Agrawal(20158037)
Akshita Singh(20158002)
Sonam Kumari(20128067)

to the
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY
ALLAHABAD,PRAYAGRAJ
April, 2019
UNDERTAKING
We declare that the work presented in this report titled “Im-
plementation of Infrastructure for Fog Computing“, submitted
to the Computer Science and Engineering Department, Moti-
lal Nehru National Institute of Technology, Allahabad, for
the award of the Bachelor of Technology degree in
Information Technology , is my original work. I have not
plagiarized or submitted the same work for the award of any
other degree. In case this undertaking is found incorrect, I ac-
cept that my degree may be unconditionally withdrawn.

April, 2019
Allahabad
Vaibhav Gaiha (20158083)

Naguboyina Sravya
(20158088)

Subham Kumar (20158082)

Rajat Agrawal (20158037)

Akshita Singh (20158002)

Sonam Kumari(20128067)

ii
CERTIFICATE

Certified that the work contained in the report titled “Imple-


mentation of Infrastructure for Fog Computing“,by
Vaibhav Gaiha(20158083)
Naguboyina Sravya(20158088)
Subham Kumar(20158082)
Rajat Agrawal(20158037)
Akshita Singh(20158002)
Sonam Kumari(20128067), has been carried out under my su-
pervision and that this work has not been submitted elsewhere
for a degree.

(Dr. Shashwati Banerjea)


Computer Science and Engineering Dept.
M.N.N.I.T, Allahabad,Prayagraj

April, 2019

iii
Preface

The usage of internet is increasing day by day in today’s world. Earlier the devices
which use internet were few but with IoT growing leaps and bounds, the network is
becoming more and more congested. Sensors collect the data and the data is sent
over the network to cloud for processing. Cloud contains processing engines which
receive the data and do some processing and send back the results. As the amount
of data received is increasing, it makes difficult to transport the data to cloud and to
maintain the cloud processing engine it requires huge cost. To overcome the draw-
backs obtained in cloud computing, Cisco introduced fog computing. Fog computing
uses edge devices for carrying computation and storage. The computational power
of edge nodes is increased to a huge extent. This allows the sensors to perform the
complex computation on edge devices. This created a huge motivation for shifting
towards fog computing paradigm. These drawbacks are the reason because of which
we are shifting from cloud computing to fog computing paradigm.

iv
Acknowledgements

We feel to acknowledge deep sense of gratitude to our guide Dr. Shashwati Banerjea,
whose valuable guidance and kind supervision given to us throughout the project
shaped the present work as its show. Her advise and critics are source of innovative
ideas, inspiration and causes behind the success of this dissertation. The confidence
shown on us by her was the biggest source of inspiration. We would also like to
thank Mr. Shabir Ali for encouraging us. It was because of his help and support
this project has been duly completed.
We perceive this opportunity as a big milestone in our career development. We
will strive to use the gained skills and knowledge in the best way possible and we
will continue to work on their improvement. We hope to continue cooperation with
all of you in the future.

v
Contents

Preface iv

Acknowledgements v

1 Introduction 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Problems with cloud data processing . . . . . . . . . . . . . . 3
1.1.2 Increased computational power of edge nodes . . . . . . . . . 4
1.1.3 Good data privacy . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.4 Lower cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Edge computing and Fog computing . . . . . . . . . . . . . . 5
1.4.2 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Software Defined Network (SDN) . . . . . . . . . . . . . . . . . . . . 8
1.6 FloodLight Controller . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Related Work 9
2.1 ParaDrop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 EdgeML by Microsoft . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 FogFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Fogernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

vi
3 Proposed Work 11

4 Implementation Details 12
4.1 Ideating the Publisher, Subscriber and Broker model . . . . . . . . . 12
4.1.1 Publisher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 Subscriber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.3 Broker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.4 Working of the model . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Improving the setup speed of docker containers . . . . . . . . . . . . 13
4.3 Local Docker Repository . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Collecting nodes performance data . . . . . . . . . . . . . . . . . . . 14
4.5 Profiling of containers and Nodes . . . . . . . . . . . . . . . . . . . . 16
4.6 Algorithm for profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.7 Peer-to-Peer module . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.8 SDN module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.8.1 Collection of network statistics . . . . . . . . . . . . . . . . . 17
4.8.2 To perform packet redirection . . . . . . . . . . . . . . . . . . 18
4.8.3 To get list of all devices currently present in the network . . . 18
4.9 Selection of a nodes and tasks . . . . . . . . . . . . . . . . . . . . . . 18
4.9.1 For tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.9.2 Fog nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.10 Creation of Environment . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.11 Invoking the containers . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.11.1 RPC Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.11.2 RPC Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.12 Web User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.13 Kademlia DHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.13.1 Why Kademlia? . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Experimental Setup and Results Analysis 22


5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1.1 Distributed network approach . . . . . . . . . . . . . . . . . . 23

vii
5.1.2 SDN approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2.1 Execution in the overlay network . . . . . . . . . . . . . . . . 25
5.2.2 Execution in the centralized system . . . . . . . . . . . . . . . 26

6 Challenges faced and proposed solutions 27


6.1 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Kademlia DHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7 Conclusion and Future Work 29


7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7.2.1 Live container migration . . . . . . . . . . . . . . . . . . . . . 30
7.2.2 Docker image download in a better way . . . . . . . . . . . . . 30
7.2.3 Using Docker Lite . . . . . . . . . . . . . . . . . . . . . . . . . 30
7.2.4 Using docker swarm . . . . . . . . . . . . . . . . . . . . . . . 30
7.2.5 Using a better algorithm for node selection . . . . . . . . . . . 31
7.2.6 Creating a centralized version of the above architecture . . . . 31

References 32

viii
Abbrevations
P2P peer-to-peer network
IoT Internet of Things
I/O Input/Output
POSIX Portable Operating System
DHT Distributed Hash Table
SDN Software Defined Networking
ASIC Application-Specific Integrated Circuit

1
Chapter 1

Introduction

Sensors surround us. We find them in almost all the devices such as phones, wear-
ables, robots, fridges, televisions etc. Sensors are what make these devices smart
but most of these devices have small processing capability which allows them to do
small work like data collecting and networking related I/O. The collected data is
further processed via cloud computing or fog computing.
As the cloud continues to expand, it has taken a lot of forms. In cloud computing
sensors collect data and it is sent over the network to cloud which contains processing
engines.The cloud engine receives the data and they do some processing and they
send the data back for analysis.This architecture requires us to maintain a cloud
processing engine which incurs huge cost.
Due to increase in data obtained for processing and the need to reduce cost
and to reduce the amount of data transported to cloud for processing, storage and
analysis, fog computing came into existence. Fog computing is basically an extension
of cloud computing. Compared with cloud computing, the architecture used for
fog computing is more distributed and closer to network edges.This architecture
uses edge devices for carrying computation,storage locally and is routed over the
internet.In a fog environment, the processing takes place in a data hub on a smart
device, or in a smart router or gateway, thus reducing the amount of data sent to
the cloud.

2
1.1 Motivation
The present world is completely dependent on the internet for performing the daily
chores. We can easily see the applications having internet as their backbone which
change daily lives of the people.
Earlier, the networking devices were few and internet was used for important
applications only. But today, with the IoT growing leaps and bounds, the network is
becoming more and more congested as the data in the network is growing drastically.
IoT mainly uses cloud networking for transferring data from sensors to the central
server. But with increasing amount of load which is hard to imagine, it is tough
to keep the network from congestion. A better solution is not to send the data to
the central server but to compute it on nearby nodes by utilizing their computation
power and send the results back. This would greatly lessen the data sent over the
network and it is the better way. Hence, our aim is to shift from the cloud networking
paradigm to the fog computing paradigm.

1.1.1 Problems with cloud data processing


In the present scenario, there are sensors which collect data, which is then sent
over the network to cloud server which contains the processing engines.The cloud
engines receives data from the sensors which then do some processing and then send
the result back for analysis.
The current scenario requires us to maintain a cloud computing engine which
incurs huge cost for the current architecture. However as the computing power of
devices has increased considerably, we have enough processing power in the edge
nodes i.e. routers,PC’s,etc.Now, data processing can happen over edge nodes itself
instead of sending the data to cloud, which in turn leads to good increase in the
performance as we no longer need to send data to cloud and this also decreases the
congestion in the network and improves the response time.

3
1.1.2 Increased computational power of edge nodes
The current computational power of edge nodes has increased to a huge extent. This
increase allows the sensors to perform complex computation on edge devices which
was not possible ealier. This has created a huge motivation for moving towards the
fog computing paradigm.

1.1.3 Good data privacy


In the cloud network, the data was transferred from the sensors move all the way
to the cloud server where computations were performed.Thus, data flowing from
edge to the central network could be prone to attack. In the newer approach, data
remains confined locally so risk of exposure or misuse of data reduces.

1.1.4 Lower cost


There is huge reduction in the cost of implementing this architecture as we no longer
need to perform data processing on cloud or to have costly data links to data centers.

1.2 Problem Statement


Our aim is to bring the computations towards the edge of the network rather than
at cloud(which is current architecture). This will surely decrease the pressure on the
network and hence will handle the increasing network congestion.Also, the response
time will improve due to multiple and nearer points of computation.

• We aim to implement the infrastructure for fog network using Kademlia which
is a peer-to-peer network module.

• We also try to form a central node which makes the decisions for the selection
of tasks and nodes for performing the tasks. This decision is based on the
current computing capabilities of the nodes whose data is collected.

• We also aim to improve the performance of docker by implementing a local


docker repository.

4
• Later, we intend to bring up this fog network infrastructure over SDN for
controlling the data packet flows in the network in an optimized way.

Hence, we aim at completing these modules and then creating docker images for
them so that they can be executed irrespective of the environment of the node.

1.3 Use Cases


We all are well versed with the fact that today, the computing power of the mobile
phones is even better than some of the computers.Also, most of the computing power
of the edge systems go unused. Hence, the current way of handling the network data
is not in accordance with the resources that we have now. The best use of this change
in working of network will be seen in the IoT network.
Moreover, the routers are pretty powerful and have unused computation power
and so we can make use of them for computations and hence they support edge
network for the IoT applications.

1.4 Overview

1.4.1 Edge computing and Fog computing


Edge computing is a networking paradigm which is a great alternative for the cloud
computing paradigm considering the present time applications of IoT. In edge com-
puting, the computing power of the end nodes is exploited i.e. they are used to
perform the computations that they are capable of executing as opposed to what is
done in cloud computing where all data is sent to the central server. This reduces
the overhead on the network and also decreases the response time for a task.
Fog computing is similar to edge computing as it also tries to bring the intelli-
gence or computing nearer to the data generation points but the difference between
the two is that in fog computing, it is done on the level of the local area network.
Fog computing uses edge devices and gateways with the LAN that provide the pro-
cessing capability whereas in edge computing we focus on the end nodes of the edge

5
network.

1.4.2 Docker
Docker is an open source tool designed to create, deploy, build and run applications
by using Linux containers. Docker allows applications to use the same OS kernel
based on the system that they are running on and only requires applications be
shipped with things not already running on the host computer. This gives a better
performance and also reduces the size of the application. Docker fulfills the require-
ments of developers by separating application dependencies from infrastructure.

Images

Figure 1: Docker Image Layers

6
A docker image is a file which is used to run code in a docker container. A
docker engine runs an image to form a container which perform a task. Docker
images comprise of multiple layers which increases re-usability. This reuse saves
time as the user does not have to build everything in an image.

Containers
Containers allow developers to package an application with all the parts it needs,
such as libraries and dependencies. It makes applications into small, light weight
execution environments. Container is basically an instance of an image of the ap-
plication which are run for further implementation. They are standard and are
portable anywhere.

Containers vs Virtual Machines


Virtual machines provide isolation by packing an entire operating system instance to
each application that needs to be divided into discrete sections or categories. This
approach provides almost total isolation, but at the cost of significant overhead.
Each guest operating instance eats up memory and processing power that could be
better utilized to various applications. Containers take a different approach from
virtual machines. Each application and its dependencies use a partitioned segment
of the operating systems resources. The container runtime sets up by drawing on
the low-level container services provided by the host operating system. Containers
work on the OS level whereas VM’s work on the hardware level.

Docker and Security


Applications are much safer in containers and Docker brings security to applica-
tions by running in a shared environment, but containers by themselves are not an
alternative to taking proper security measures.

7
1.5 Software Defined Network (SDN)
Routing consists of three planes:

• Control plane : Control plane makes decisions about the path data takes. It
handles the signaling traffic.

• Data plane : Data plane looks at the routing information provided by the
control plane and looks for the port on which the data has to be forwarded.

• Management plane : Management plane handles the administrative traffic.

SDN is a approach that deals with the decoupling of the data and control plane.
In order to do so, the SDN enabled switches and SDN controller are created. SDN
enables programmable network, centralized intelligence and control. As the control
plane is in the form of the SDN controller, we can just change this software imple-
mentation of the SDN controller to affect the complete network’s routing algorithm.
We have used the floodlight controller as our SDN controller.

1.6 FloodLight Controller


Floodlight controller is a SDN controller based on the openflow protocol. It has
been developed in JAVA and allows us to change the software and manage the flows
of the network as per our requirements.

8
Chapter 2

Related Work

2.1 ParaDrop
In our project, we aim to bring the computation in case of IoT towards the edge of the
network rather that doing it in the centralized fashion. An idea similar to ours was
pitched in the paper[1]. This research paper has discussed about ParaDrop which
is an edge computing platform that provides the computing and storage resources
to third parties at the extreme edge of the network, making it easier for them to
run various services. Since, the WiFi AP has unique contextual knowledge of its
end-devices (e.g., proximity, channel characteristics), ParaDrop has a clear focus on
them. It uses the Linux containers instead of VM’s, hence able to provide better
services with similar hardware. It has following components:

• Virtualization Substrates in WiFi AP’s used as hosts for performing compu-


tations of third parties in isolated containers.

• Cloud Backend : Through this component, the third party containers are
installed and started.

• Developer API : Using this, the developers can manage and view the status of
AP’s in the network.

9
2.2 EdgeML by Microsoft
In this project, Microsoft has aimed at allowing the tiny, resource-constrained IoT
devices to run the machine learning algorithms without being connected to the cloud
network. Hence, this development is also carrying the similar idea i.e. shifting from
cloud computing to edge computing for IoT and other applications.

2.3 FogFlow
FogFlow is a data processing framework for service providers to enable easily pro-
gramming and managing IoT services over cloud and edges of the network. In the
cases where the number of devices is large, the edge computing provides a much bet-
ter solution in terms of response time. It also provides optimized task deployment
and dynamic service provision.

2.4 Fogernetes
Internet has become very vast. We can easily see the heterogeneity in the nodes
that are present in the network i.e. from smart TV’s to PC’s and servers. In the
paper[2], the authors present Fogernetes which is a fog computing platform that
enable the deployment of fog applications having a certain set of requirements in
the various nodes having different capabilities.

10
Chapter 3

Proposed Work

As already stated, presently the major of the implementations of the IoT has been
done in the cloud architecture where the central server is solely responsible for
computations. This has certain flaws attached to it. Firstly, the fact that the
number of devices are increasing by leaps and bounds makes it clear that the network
is flooded with data. This causes a lot of overhead as the cloud centre is far from
these end devices. This is not required as we have a lot of nodes which are capable of
performing the computations and are also idle. We intend to change this approach
towards the edge computing approach where the computations are performed at the
edge devices only rather than sending it to the cloud.
We have developed a fog network where data collected by sensors is processed by
utilization of the edge devices’ computational power instead of sending the data to
cloud server. A group of nodes works collectively in a peer-to-peer network for data
processing in an efficient manner. This will reduce the overhead on the network and
also decreases the response time. Our model has been inspired by the publisher,
broker and subscriber model which has been explained later. We have currently
implemented this using distributed networking by making use of the distributed
hash tables. Also, we have used Software Defined Network (SDN using openflow
controller) which will give us a better control over data flows in the network.

11
Chapter 4

Implementation Details

4.1 Ideating the Publisher, Subscriber and Bro-


ker model

Figure 2: Relation between publisher and subscriber

We are trying to create the publisher, subscriber and broker model as explained
in website[3]

4.1.1 Publisher
These are the nodes that generate the information(a good example is sensors).

12
4.1.2 Subscriber
These are the nodes that will receive the data produced by them depending on their
capabilities.

4.1.3 Broker
This will comprise of the central server that will run an algorithm that will select
the appropriate node to perform a required task.Hence, it will act as the moderator
between the publisher and subscriber nodes.

4.1.4 Working of the model


This is a communication model wherein the data generators (publishers) have no
need to know who uses the information provided by it. Similarly, the data users
(Subscribers) don’t have to worry about the source of the data that they are getting.
The central party is the broker which handles the interaction between the publishers
and the subscribers.

4.2 Improving the setup speed of docker contain-


ers
The default configuration of the docker involves the parallel downloading of the
layers of the image. The authors in this paper[4] have discussed about this problem
and have given a solution for that. As discussed in the paper, implementation
of sequential downloading has be done by setting the max-concurrent-downloads
parameter to 1 in the ”/etc/docker/daemon.json” configuration file.

4.3 Local Docker Repository


Currently, when we are working with docker, whenever we need an image which is
not present locally, the image is pulled from a docker hub cloud repository. The idea

13
behind creating a local central repository is that the same image may be required by
other nodes as well for similar type of tasks that may be provided to them. We all
know that working in a local network is faster than doing it in the external network.
The working of this model is as follows:

• A central node will run a repository.sh shell script which will allow it to work
as a node to store docker images locally.

• Any node can fetch the images from this local repository by calling the shell
script localpuller.sh if the image is present in the local repository.

• In case a image does not exist in the local repository, the script i.e. lo-
calpuller.sh will first pull the image from the docker hub cloud repository
and then push this image copy in our local repository node.

• This will result in quick pulling of that image in the future whenever it is
required by any other node.

• This will considerably increase the speed of execution of our code.

4.4 Collecting nodes performance data


We know that all the nodes in the network are different in terms of their processing
power. Moreover, when the tasks are assigned to these nodes, their processing
powers change with time as the task is under process. Hence, in order to keep the
node data updated we make use of the netdata API which helps us get the basic
details of the node. This information in sent at some intervals by every node to the
central server which keeps itself updated. This data is used for profiling of the nodes
so that the tasks can be assigned in the future accordingly.

• The first requirement is to install netdata, which is a monitoring agent required


to collect the performance attributes of the nodes.

• In order to measure the performance of the node the following attributes have
been kept in mind:

14
Figure 3: Screenshot from Netdata Utility

– Average free memory (RAM)


– Average free CPU percentage
– Average CPU frequency
– Average Disk Input
– Average Disk Output
– Average number of packets sent
– Average number of packets received

• These attributes are collected by a python script client.py and comma sepa-
rated values are stored in the node data.txt file. This file is maintained by
each node which stores its attribute values for the previous instance.

• Then, this data is sent to a server which keeps record of the attribute values
and stores them in a all data.txt file.

• The clustering algorithm then works on the data in all data.txt file and break
them into classes.

15
4.5 Profiling of containers and Nodes
All the nodes in a network are not same i.e. they differ largely in their capabilities.
In a network we have small capacity devices i.e. from routers to large ones such as
the servers. Also, the state of a node changes with time i.e. as the tasks are being
run on the node, their state continuously changes. In order to balance the load on
the nodes and also to perform the tasks in the most optimized way, we create certain
profiles for the nodes as well as the containers.

• Nodes are divided into profiles based on their computing capabilities like RAM,
CPU-usage, processor power and battery power remaining.

• The node with best computing capability has been ranked first.

• Similarly, the containers are also divided into profiles as per their image sizes.
Here, we have assumed that the other requirements of the container must also
be in accordance with its size i.e. the disk requirement.

4.6 Algorithm for profiling


In order to perform the profiling operation for the nodes, we have used the k-means
clustering algorithm. The properties of the nodes that we are keeping into account
are RAM usage, CPU usage, network usage, network errors, etc which are collected
by the server at regular intervals. Data in attributes are normalized to the same
scale and initial default weight of 0.1 is assigned to each attribute, the weights to
these attributes can be changed according to the priority of profiling. This algorithm
will run at constant intervals so that the profiles of these nodes are updated with
the tasks assigned to them. Based on the current profile of a node, the future tasks
are assigned to them.

4.7 Peer-to-Peer module


This module in our project includes the idea of creating a peer-to-peer network for
implementing the fog computing architecture using distributed network.

16
This includes the following parts:

• Bootstrap node: The first node in this peer to peer network will become the
bootstrap node for the Distributed hash table.It will run a docker container
pulled from the local repository containing Bootstrap.py code which will act
then store the following:

– CPU information for each node with key as IP address and value as Node
CPU information obtained from NodeInfo class.
– Profile information with profile priority as key and IP address as value.
– List of IP address currently occupied in the P2P network.

• Peer nodes: Any node running the peers container will connect to the boot-
strap node to become a part of existing peer-to-peer network instance and thus
will create a virtual storage ring. Any node among them can update their pro-
cessor information or access the data stored in the DHT. Each member of the
DHT keep updating their CPU usage statistics in the distributed hash table.

• Using this data stored in the hash table the most idle nearest node is selected
for further processing.

• Currently, we are using Kademlia DHT for implementing this module.

4.8 SDN module


We have implemented the SDN architecture using the floodlight controller which
works on the openflow protocol. We have used this controller to perform the follow-
ing tasks:

4.8.1 Collection of network statistics


We are collecting network related information like bandwidth of a path in the net-
work , bandwidth of a node, transmission rate of a node,etc which are then being
used to select the optimal node for sending the task to an optimal node.

17
4.8.2 To perform packet redirection
Once the optimal node is selected, the packets are then redirected to the desired
node as per the recommendation of the central system. This has been done by
addition, deletion and modification of the flows in the switches.

4.8.3 To get list of all devices currently present in the net-


work
With the use of floodlight rest API we can get the list of all the nodes, switches and
mesh routers currently present in the mesh network. This allows us to overcome the
problem of ghost nodes i.e. we can now check if a node is present in the network or
not which can save us processing time for the task.

Figure 4: Screenshot of active Openflow Controller

4.9 Selection of a nodes and tasks


We are currently working on this portion. In order to schedule tasks and nodes for
that task, we can use multiple algorithms.

18
4.9.1 For tasks
For the selection of the next task, we can use First Come First Serve or the Shortest
Task First algorithm. The only limitation in this selection is that the algorithm
should be non-preemptive otherwise the container execution may lead to overhead.

4.9.2 Fog nodes


We have already divided the nodes and containers in various profiles. Based on these
inputs, we can either select the best possible option or the node that just fits in the
requirement of the task. Thus, with the above possibilities we have to implement
and test the results and then decide the best option for this.

4.10 Creation of Environment


We have to create the environment first so that the container deployment can be
made possible.

• Packages like docker and ssh are installed on the given IP remotely by the
central server.

• Also, netdata is installed so as to make its performance attributes available.

4.11 Invoking the containers


This module deals with the invoking the containers on the target nodes according to
the task to be accomplished. Once the target node is decided, this module is used
to invoke the container fulfilled with all the requirements to complete the task.
This module comprises of the following parts:

4.11.1 RPC Servers


This includes the nodes on which the computations will take place.These nodes will
provide the functionality of pulling images and running the containers with certain

19
requirements which can be provided by the nodes invoking it.

4.11.2 RPC Clients


These will include the nodes that will invoke the containers on the RPC servers.
For the implementation, we have used the github library msgpackrpc. Using this
we have pre-defined certain functions that accept the parameters from the function-
calling nodes. Thus, using these functions the containers are created and invoked
hence performing the task.

4.12 Web User Interface


We have created a basic web user interface which is of great use for network ad-
ministrators. The administrator can see the status of the network and the nodes
comprising their CPU utilization, RAM and disk utilization, hence helping them to
analyze the load balancing in the network.

4.13 Kademlia DHT


Kademlia is a 3rd generation P2P distributed hash table which stores resource ta-
ble throughout the network. A node is a participating computer on the Kademlia
DHT network. Kademlia nodes communicate with each other through UDP. A vir-
tual overlay network is formed by the nodes participating in DHT. Each node is
identified by a number or node ID. The node ID server as identification as well as
used to locate values stored in DHT. The node ID provides a direct map to file
hashes and that node stores information on where to obtain the file or resource.
When searching for some value, the algorithm needs to know the associated key and
explores the network in several steps. Each step will find nodes that are closer
to the key until the contacted node returns the value or no more closer nodes
are found. This is very efficient: like many other DHTs, Kademlia contacts only
O(log (n))nodesduringthesearchoutof atotalof nnodesinthesystem.

20
Figure 5: Distributed Hash Table

4.13.1 Why Kademlia?


Kademila is preferred than other DHTs is due to the below mentioned reasons:

• Kademlia is a decentralized distributed hash table unlike a centralized hash


table there is no single point of failure.

• Kademlia minimizes the number of inter-node introduction messages.

• Configuration information such as nodes on the network and neighboring nodes


spread automatically as a side effect of key lookups.

• Nodes in Kademlia are knowledgeable of other nodes. This allows routing


queries through low latency paths.

• Kademlia is resistant to some DDOS attacks.

21
Chapter 5

Experimental Setup and Results


Analysis

The project has following modules

• Kademlia DHT module

• Containers (for processing data on a node)

• Algorithms (for profiling of containers and choosing the best available node)

• RPC module

22
5.1 Setup

5.1.1 Distributed network approach


• A node which will be the central node will become a local docker repository
which will act as a cache to store docker images. Any other node in the
network will pull docker images from this local repository. This node will also
be a bootstrap node for the Distributed Hash Table

• Each node in the network will first install a Docker image available from Linux
software center .

• The nodes will then pull the docker image ”PEERS” and upon running this
image in a Docker container the node will become a part of the Kademlia DHT
network instance

• The ”PEERS” docker image will keep updating its node information in the
DHT like CPU related information, category of the profile under which the
current node is, etc.

• There is also a GUI scenario of the current network which will be running
in a central node which will show all the information of the whole scenario
graphically. It will provide the information like number of nodes currently in
network. The CPU information of these nodes. Also users will be able to
interact with these node and execute RPC commands on them using the GUI.

23
5.1.2 SDN approach
• A node which will be the central node will become a local docker repository
which will act as a cache to store docker images. Any other node in the
network will pull docker images from this local repository. This node will also
be a bootstrap node for the Distributed Hash Table

• The environment for working in the network is then created on all the partic-
ipating nodes.

• All the network statistics can be seen on the SDN controller.

• We create a centralized server which is the same system on which we have run
the SDN controller.

• Every node which is part of the network sends its statistics like CPU usage,
RAM usage, etc to the central node at particular intervals.

• The central server node collects the performance attributes of the other nodes
and then runs the profiling algorithm i.e. the k-means clustering algorithm.

• When a task is registered, the appropriate node is selected by the central server
and then the data is redirected towards the desired node by changing the flows
in the network accordingly.

24
5.2 Execution

5.2.1 Execution in the overlay network


The system will work in the following manner:

• Whenever a sensor, say node A has some data for processing it will first choose
the right docker image with appropriate specification and modules required for
that data processing. This docker image will be classified as per our classifier
standard.

• Then it will look up in the HASH table for all the nodes having the profile
capable of running this Docker image.

• Out of all the nodes found the nearest node ,say node B available will be
selected for data processing.

• Data from node A will be passed to the selected node B for processing via
ftp/sftp .

• A RPC call will be made to the processing node B to pull the required docker
image from the local central repository of docker images. If an image is not
present in the local repository then first an image is pulled from the live
docker-hub registry and then pushed to the local repository.

• After the image is available in the node B, data processing begins at node B .

• While processing is happening on node B its profile may be downgraded from


its current profile due to any changes in the network or CPU usage.

• In the case of distributed approach, the kademlia Distributed Hash Table data
has maximum TTL of 10 secs after that the data is removed and new data
has to be added . So the peers container which will be running in all these
nodes will keep updating its CPU statistics and node profile information in
the Distributed Hash Table.

25
• On the other hand, in SDN based approach the nodes continuously send the
data at particular intervals which used by central server for clustering, hence
the data is updated in central server only.

• When the task is registered, the optimal node is selected and task is performed
in it.

5.2.2 Execution in the centralized system


Execution in the centralize system will occur in the following way.

• All the nodes will first of all register with the central server running the SDN
controller, the profiling algorithm for nodes and a local docker repository.Upon
registration the required software like docker, python, RPC etc will be installed
in the node.

• The nodes will send their statistics like CPU utilization, memory utilization
etc to the central system averaged over a period of time interval currently set
to 10 seconds.

• The central node on receiving the data will group these nodes into clusters us-
ing the kmeans clustering algorithm . The clustering will be done by providing
appropriate weights to each attribute sent by the nodes which will depend on
the type of task being run.

• After that all the nodes belonging to the required profile will be selected and
the one having the best bandwidth and the one which is nearest will be picked
up for the execution of the task.The bandwidth routing data are collected
using the SDN controller floodlight

• Appropriate environment will be created on the selected node i.e. the required
docker images will be pulled from local docker repository and installed and the
data required for execution of the task will be sent to that node using FTP
protocol.

• After the above process execution of the task is done on the selected node.

26
Chapter 6

Challenges faced and proposed


solutions

6.1 Docker
The current docker version in its default form worked in the following manner:

• All the layers of an image are fetched in a parallel way from the docker repos-
itory in which were compressed in a tar file.

• After the layers are downloaded they are extracted using tar. However due to
its underlying architecture, tar works serially and hence not much simultaneous
action is achieved here.

The above parallel download process of the image layers resulted in heavy network
usage first meanwhile the CPU remains idle and then heavy CPU usage later on due
to the extraction process.

Proposed Solution
Instead of allowing the docker to pull all the layers of an image simultaneously. We
propose to pull only 1 layer at a time and after the layer is pulled the extraction can
begin for that layer and also a new layer can be downloaded simultaneously. This
will result in utilization of both networking and CPU in an adequate manner.

27
6.2 Kademlia DHT
Kademlia DHT currently doesn’t support deletion of data from the distributed hash
table due to its underlying architecture. This data stored in the hash table may be
out of date and needs to be updated. The current distributed hash table requires
the key to be unique so two values with the same key cant be stored.

Proposed Solution
• To reduce the time to live of the stored key value pair i.e. after a certain
interval the data is removed from all the nodes and need to be re updated.
This solution solves our problem as the profile and CPU usage statistics are
highly dynamic and each node will keep updating its data in frequent intervals.

• To store multiple key linear probing and quadratic probing can be used to
store multiple keys with the same value.

28
Chapter 7

Conclusion and Future Work

7.1 Conclusion
When we see the present network, we find a large number of distinct nodes with
different sets of computing powers. With the advancement in processor technology
processing power has increased significantly now we have operating system based on
Linux like android which runs on devices like refrigerators,television,phones,cars,watches
and many more. These devices generates huge amount of data daily and all is sent
over the network to some central server and either stored processed or discarded
based upon the usage of data. The arrival of the concept of IoT has changed ev-
erything in the networking area and has led to the paradigm shift from cloud to
edge and fog. Our work has focused towards edge networking. We are currently
using distributed networking but we intend to develop it using SDN and improve
it further. Hence, edge networking is the future as it surely removes a lot of load
off the core network making the congestion problem less. On combining it with the
power of SDN, the routers will also become more idle, hence making them more
suitable to support fog computing. The shift from distributed computing to SDN
will also remove load from end nodes making the computation faster.

29
7.2 Future Work

7.2.1 Live container migration


Currently if there is a better node available for the execution of current docker
image then its not possible to stop the execution at the current node and migrate
the container to the better node. We can implement live migration in docker where
it can be migrated to better node just like a virtual machine.

7.2.2 Docker image download in a better way


Docker images contains layers stacked together which can be similar for different
images. Currently most of the networking and processing usage is wasted in down-
loading docker images which contains multiple layers. By using the layers from a
previously downloaded container and by downloading only required layers the load
on networking will be reduced significantly.

7.2.3 Using Docker Lite


Docker Lite is a light weight version of docker which provide the basic features of
docker. It is written in POSIX shell script and uses BTRFS file system. Though it
is still under development and highly unstable. A light weight version of docker will
highly increase the performance of our system.

7.2.4 Using docker swarm


Currently, we are treating the nodes as single devices i.e. not using their combined
computing power. A task is given to a single system and never distributed. If we
have a task which requires higher requirements than any node in the network, we
cannot perform it. A solution to this problem is the docker swarm which will help
us to combine the computation powers of multiple nodes, hence solving the above
stated problem. We can divide the container into multiples and break the services
provided. Then the swarm schedules the replicas of containers on different nodes
making the computing strategy better.

30
7.2.5 Using a better algorithm for node selection
The current algorithm selects the nodes for execution in a greedy manner. The best
available node is selected for processing. An advance machine learning algorithm can
help optimize the node selection process and will result in a system that can handle
increased number of simultaneous execution and better utilization of the processing
power.

7.2.6 Creating a centralized version of the above architec-


ture
Using SDN we can bring centralization in our network, the process of selecting the
nearest node and routing data to that node can be centralized. By using a central
controller a distributed hash table will no longer be required and the nodes for data
processing can be selected by the controller itself.

31
References

[1] Peng Liu, Dale willis, Suman Banerjee, Paradrop: Enabling Lightweight
Multi-tenancy at the Networks Extreme Edge.
[2] Cecil Wbker, Andreas Seitz, Harald Mueller, Bernd Bruegge, Fogernetes:
Deployment and management of fog computing applications .
[3] ”https://www.ibm.com/support/knowledgecenter/en/SSMKHH 10.0.0/com.ibm.etools.mft.doc/
[4] IEEE EDGE 2018 - IEEE International Conference on Edge Computing, Jul
2018, San Francisco, CA, United States. IEEE, pp.1-8, 2018. ¡hal-01775105¿ .
[5] Petar Maymounkov and David Mazires. 2002. Kademlia: A Peer-to-Peer
Information System Based on the XOR Metric. In Revised Papers from the First
International Workshop on Peer-to-Peer Systems (IPTPS ’01), Peter Druschel, M.
Frans Kaashoek, and Antony I. T. Rowstron (Eds.). Springer-Verlag, London, UK,
UK, 53-65.
[6]”https://www.ibm.com/support/knowledgecenter/en/SSFKSJ 9.0.0/com.ibm.mq.pro.doc/q0051
[7]”https://en.wikipedia.org/wiki/Distributed hash table”

32

You might also like