Professional Documents
Culture Documents
BTech Latex Report
BTech Latex Report
Computing
A Report Submitted
in Partial Fulfillment of the Requirements
for the Degree of
Bachelor of Technology
in
Information Technology
by
Vaibhav Gaiha(20158083)
Naguboyina Sravya(20158088)
Subham Kumar(20158082)
Rajat Agrawal(20158037)
Akshita Singh(20158002)
Sonam Kumari(20128067)
to the
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY
ALLAHABAD,PRAYAGRAJ
April, 2019
UNDERTAKING
We declare that the work presented in this report titled “Im-
plementation of Infrastructure for Fog Computing“, submitted
to the Computer Science and Engineering Department, Moti-
lal Nehru National Institute of Technology, Allahabad, for
the award of the Bachelor of Technology degree in
Information Technology , is my original work. I have not
plagiarized or submitted the same work for the award of any
other degree. In case this undertaking is found incorrect, I ac-
cept that my degree may be unconditionally withdrawn.
April, 2019
Allahabad
Vaibhav Gaiha (20158083)
Naguboyina Sravya
(20158088)
Sonam Kumari(20128067)
ii
CERTIFICATE
April, 2019
iii
Preface
The usage of internet is increasing day by day in today’s world. Earlier the devices
which use internet were few but with IoT growing leaps and bounds, the network is
becoming more and more congested. Sensors collect the data and the data is sent
over the network to cloud for processing. Cloud contains processing engines which
receive the data and do some processing and send back the results. As the amount
of data received is increasing, it makes difficult to transport the data to cloud and to
maintain the cloud processing engine it requires huge cost. To overcome the draw-
backs obtained in cloud computing, Cisco introduced fog computing. Fog computing
uses edge devices for carrying computation and storage. The computational power
of edge nodes is increased to a huge extent. This allows the sensors to perform the
complex computation on edge devices. This created a huge motivation for shifting
towards fog computing paradigm. These drawbacks are the reason because of which
we are shifting from cloud computing to fog computing paradigm.
iv
Acknowledgements
We feel to acknowledge deep sense of gratitude to our guide Dr. Shashwati Banerjea,
whose valuable guidance and kind supervision given to us throughout the project
shaped the present work as its show. Her advise and critics are source of innovative
ideas, inspiration and causes behind the success of this dissertation. The confidence
shown on us by her was the biggest source of inspiration. We would also like to
thank Mr. Shabir Ali for encouraging us. It was because of his help and support
this project has been duly completed.
We perceive this opportunity as a big milestone in our career development. We
will strive to use the gained skills and knowledge in the best way possible and we
will continue to work on their improvement. We hope to continue cooperation with
all of you in the future.
v
Contents
Preface iv
Acknowledgements v
1 Introduction 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Problems with cloud data processing . . . . . . . . . . . . . . 3
1.1.2 Increased computational power of edge nodes . . . . . . . . . 4
1.1.3 Good data privacy . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.4 Lower cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Edge computing and Fog computing . . . . . . . . . . . . . . 5
1.4.2 Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Software Defined Network (SDN) . . . . . . . . . . . . . . . . . . . . 8
1.6 FloodLight Controller . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Related Work 9
2.1 ParaDrop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 EdgeML by Microsoft . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 FogFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Fogernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
vi
3 Proposed Work 11
4 Implementation Details 12
4.1 Ideating the Publisher, Subscriber and Broker model . . . . . . . . . 12
4.1.1 Publisher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 Subscriber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.3 Broker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.4 Working of the model . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Improving the setup speed of docker containers . . . . . . . . . . . . 13
4.3 Local Docker Repository . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Collecting nodes performance data . . . . . . . . . . . . . . . . . . . 14
4.5 Profiling of containers and Nodes . . . . . . . . . . . . . . . . . . . . 16
4.6 Algorithm for profiling . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.7 Peer-to-Peer module . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.8 SDN module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.8.1 Collection of network statistics . . . . . . . . . . . . . . . . . 17
4.8.2 To perform packet redirection . . . . . . . . . . . . . . . . . . 18
4.8.3 To get list of all devices currently present in the network . . . 18
4.9 Selection of a nodes and tasks . . . . . . . . . . . . . . . . . . . . . . 18
4.9.1 For tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.9.2 Fog nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.10 Creation of Environment . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.11 Invoking the containers . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.11.1 RPC Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.11.2 RPC Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.12 Web User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.13 Kademlia DHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.13.1 Why Kademlia? . . . . . . . . . . . . . . . . . . . . . . . . . . 21
vii
5.1.2 SDN approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2.1 Execution in the overlay network . . . . . . . . . . . . . . . . 25
5.2.2 Execution in the centralized system . . . . . . . . . . . . . . . 26
References 32
viii
Abbrevations
P2P peer-to-peer network
IoT Internet of Things
I/O Input/Output
POSIX Portable Operating System
DHT Distributed Hash Table
SDN Software Defined Networking
ASIC Application-Specific Integrated Circuit
1
Chapter 1
Introduction
Sensors surround us. We find them in almost all the devices such as phones, wear-
ables, robots, fridges, televisions etc. Sensors are what make these devices smart
but most of these devices have small processing capability which allows them to do
small work like data collecting and networking related I/O. The collected data is
further processed via cloud computing or fog computing.
As the cloud continues to expand, it has taken a lot of forms. In cloud computing
sensors collect data and it is sent over the network to cloud which contains processing
engines.The cloud engine receives the data and they do some processing and they
send the data back for analysis.This architecture requires us to maintain a cloud
processing engine which incurs huge cost.
Due to increase in data obtained for processing and the need to reduce cost
and to reduce the amount of data transported to cloud for processing, storage and
analysis, fog computing came into existence. Fog computing is basically an extension
of cloud computing. Compared with cloud computing, the architecture used for
fog computing is more distributed and closer to network edges.This architecture
uses edge devices for carrying computation,storage locally and is routed over the
internet.In a fog environment, the processing takes place in a data hub on a smart
device, or in a smart router or gateway, thus reducing the amount of data sent to
the cloud.
2
1.1 Motivation
The present world is completely dependent on the internet for performing the daily
chores. We can easily see the applications having internet as their backbone which
change daily lives of the people.
Earlier, the networking devices were few and internet was used for important
applications only. But today, with the IoT growing leaps and bounds, the network is
becoming more and more congested as the data in the network is growing drastically.
IoT mainly uses cloud networking for transferring data from sensors to the central
server. But with increasing amount of load which is hard to imagine, it is tough
to keep the network from congestion. A better solution is not to send the data to
the central server but to compute it on nearby nodes by utilizing their computation
power and send the results back. This would greatly lessen the data sent over the
network and it is the better way. Hence, our aim is to shift from the cloud networking
paradigm to the fog computing paradigm.
3
1.1.2 Increased computational power of edge nodes
The current computational power of edge nodes has increased to a huge extent. This
increase allows the sensors to perform complex computation on edge devices which
was not possible ealier. This has created a huge motivation for moving towards the
fog computing paradigm.
• We aim to implement the infrastructure for fog network using Kademlia which
is a peer-to-peer network module.
• We also try to form a central node which makes the decisions for the selection
of tasks and nodes for performing the tasks. This decision is based on the
current computing capabilities of the nodes whose data is collected.
4
• Later, we intend to bring up this fog network infrastructure over SDN for
controlling the data packet flows in the network in an optimized way.
Hence, we aim at completing these modules and then creating docker images for
them so that they can be executed irrespective of the environment of the node.
1.4 Overview
5
network.
1.4.2 Docker
Docker is an open source tool designed to create, deploy, build and run applications
by using Linux containers. Docker allows applications to use the same OS kernel
based on the system that they are running on and only requires applications be
shipped with things not already running on the host computer. This gives a better
performance and also reduces the size of the application. Docker fulfills the require-
ments of developers by separating application dependencies from infrastructure.
Images
6
A docker image is a file which is used to run code in a docker container. A
docker engine runs an image to form a container which perform a task. Docker
images comprise of multiple layers which increases re-usability. This reuse saves
time as the user does not have to build everything in an image.
Containers
Containers allow developers to package an application with all the parts it needs,
such as libraries and dependencies. It makes applications into small, light weight
execution environments. Container is basically an instance of an image of the ap-
plication which are run for further implementation. They are standard and are
portable anywhere.
7
1.5 Software Defined Network (SDN)
Routing consists of three planes:
• Control plane : Control plane makes decisions about the path data takes. It
handles the signaling traffic.
• Data plane : Data plane looks at the routing information provided by the
control plane and looks for the port on which the data has to be forwarded.
SDN is a approach that deals with the decoupling of the data and control plane.
In order to do so, the SDN enabled switches and SDN controller are created. SDN
enables programmable network, centralized intelligence and control. As the control
plane is in the form of the SDN controller, we can just change this software imple-
mentation of the SDN controller to affect the complete network’s routing algorithm.
We have used the floodlight controller as our SDN controller.
8
Chapter 2
Related Work
2.1 ParaDrop
In our project, we aim to bring the computation in case of IoT towards the edge of the
network rather that doing it in the centralized fashion. An idea similar to ours was
pitched in the paper[1]. This research paper has discussed about ParaDrop which
is an edge computing platform that provides the computing and storage resources
to third parties at the extreme edge of the network, making it easier for them to
run various services. Since, the WiFi AP has unique contextual knowledge of its
end-devices (e.g., proximity, channel characteristics), ParaDrop has a clear focus on
them. It uses the Linux containers instead of VM’s, hence able to provide better
services with similar hardware. It has following components:
• Cloud Backend : Through this component, the third party containers are
installed and started.
• Developer API : Using this, the developers can manage and view the status of
AP’s in the network.
9
2.2 EdgeML by Microsoft
In this project, Microsoft has aimed at allowing the tiny, resource-constrained IoT
devices to run the machine learning algorithms without being connected to the cloud
network. Hence, this development is also carrying the similar idea i.e. shifting from
cloud computing to edge computing for IoT and other applications.
2.3 FogFlow
FogFlow is a data processing framework for service providers to enable easily pro-
gramming and managing IoT services over cloud and edges of the network. In the
cases where the number of devices is large, the edge computing provides a much bet-
ter solution in terms of response time. It also provides optimized task deployment
and dynamic service provision.
2.4 Fogernetes
Internet has become very vast. We can easily see the heterogeneity in the nodes
that are present in the network i.e. from smart TV’s to PC’s and servers. In the
paper[2], the authors present Fogernetes which is a fog computing platform that
enable the deployment of fog applications having a certain set of requirements in
the various nodes having different capabilities.
10
Chapter 3
Proposed Work
As already stated, presently the major of the implementations of the IoT has been
done in the cloud architecture where the central server is solely responsible for
computations. This has certain flaws attached to it. Firstly, the fact that the
number of devices are increasing by leaps and bounds makes it clear that the network
is flooded with data. This causes a lot of overhead as the cloud centre is far from
these end devices. This is not required as we have a lot of nodes which are capable of
performing the computations and are also idle. We intend to change this approach
towards the edge computing approach where the computations are performed at the
edge devices only rather than sending it to the cloud.
We have developed a fog network where data collected by sensors is processed by
utilization of the edge devices’ computational power instead of sending the data to
cloud server. A group of nodes works collectively in a peer-to-peer network for data
processing in an efficient manner. This will reduce the overhead on the network and
also decreases the response time. Our model has been inspired by the publisher,
broker and subscriber model which has been explained later. We have currently
implemented this using distributed networking by making use of the distributed
hash tables. Also, we have used Software Defined Network (SDN using openflow
controller) which will give us a better control over data flows in the network.
11
Chapter 4
Implementation Details
We are trying to create the publisher, subscriber and broker model as explained
in website[3]
4.1.1 Publisher
These are the nodes that generate the information(a good example is sensors).
12
4.1.2 Subscriber
These are the nodes that will receive the data produced by them depending on their
capabilities.
4.1.3 Broker
This will comprise of the central server that will run an algorithm that will select
the appropriate node to perform a required task.Hence, it will act as the moderator
between the publisher and subscriber nodes.
13
behind creating a local central repository is that the same image may be required by
other nodes as well for similar type of tasks that may be provided to them. We all
know that working in a local network is faster than doing it in the external network.
The working of this model is as follows:
• A central node will run a repository.sh shell script which will allow it to work
as a node to store docker images locally.
• Any node can fetch the images from this local repository by calling the shell
script localpuller.sh if the image is present in the local repository.
• In case a image does not exist in the local repository, the script i.e. lo-
calpuller.sh will first pull the image from the docker hub cloud repository
and then push this image copy in our local repository node.
• This will result in quick pulling of that image in the future whenever it is
required by any other node.
• In order to measure the performance of the node the following attributes have
been kept in mind:
14
Figure 3: Screenshot from Netdata Utility
• These attributes are collected by a python script client.py and comma sepa-
rated values are stored in the node data.txt file. This file is maintained by
each node which stores its attribute values for the previous instance.
• Then, this data is sent to a server which keeps record of the attribute values
and stores them in a all data.txt file.
• The clustering algorithm then works on the data in all data.txt file and break
them into classes.
15
4.5 Profiling of containers and Nodes
All the nodes in a network are not same i.e. they differ largely in their capabilities.
In a network we have small capacity devices i.e. from routers to large ones such as
the servers. Also, the state of a node changes with time i.e. as the tasks are being
run on the node, their state continuously changes. In order to balance the load on
the nodes and also to perform the tasks in the most optimized way, we create certain
profiles for the nodes as well as the containers.
• Nodes are divided into profiles based on their computing capabilities like RAM,
CPU-usage, processor power and battery power remaining.
• The node with best computing capability has been ranked first.
• Similarly, the containers are also divided into profiles as per their image sizes.
Here, we have assumed that the other requirements of the container must also
be in accordance with its size i.e. the disk requirement.
16
This includes the following parts:
• Bootstrap node: The first node in this peer to peer network will become the
bootstrap node for the Distributed hash table.It will run a docker container
pulled from the local repository containing Bootstrap.py code which will act
then store the following:
– CPU information for each node with key as IP address and value as Node
CPU information obtained from NodeInfo class.
– Profile information with profile priority as key and IP address as value.
– List of IP address currently occupied in the P2P network.
• Peer nodes: Any node running the peers container will connect to the boot-
strap node to become a part of existing peer-to-peer network instance and thus
will create a virtual storage ring. Any node among them can update their pro-
cessor information or access the data stored in the DHT. Each member of the
DHT keep updating their CPU usage statistics in the distributed hash table.
• Using this data stored in the hash table the most idle nearest node is selected
for further processing.
17
4.8.2 To perform packet redirection
Once the optimal node is selected, the packets are then redirected to the desired
node as per the recommendation of the central system. This has been done by
addition, deletion and modification of the flows in the switches.
18
4.9.1 For tasks
For the selection of the next task, we can use First Come First Serve or the Shortest
Task First algorithm. The only limitation in this selection is that the algorithm
should be non-preemptive otherwise the container execution may lead to overhead.
• Packages like docker and ssh are installed on the given IP remotely by the
central server.
19
requirements which can be provided by the nodes invoking it.
20
Figure 5: Distributed Hash Table
21
Chapter 5
• Algorithms (for profiling of containers and choosing the best available node)
• RPC module
22
5.1 Setup
• Each node in the network will first install a Docker image available from Linux
software center .
• The nodes will then pull the docker image ”PEERS” and upon running this
image in a Docker container the node will become a part of the Kademlia DHT
network instance
• The ”PEERS” docker image will keep updating its node information in the
DHT like CPU related information, category of the profile under which the
current node is, etc.
• There is also a GUI scenario of the current network which will be running
in a central node which will show all the information of the whole scenario
graphically. It will provide the information like number of nodes currently in
network. The CPU information of these nodes. Also users will be able to
interact with these node and execute RPC commands on them using the GUI.
23
5.1.2 SDN approach
• A node which will be the central node will become a local docker repository
which will act as a cache to store docker images. Any other node in the
network will pull docker images from this local repository. This node will also
be a bootstrap node for the Distributed Hash Table
• The environment for working in the network is then created on all the partic-
ipating nodes.
• We create a centralized server which is the same system on which we have run
the SDN controller.
• Every node which is part of the network sends its statistics like CPU usage,
RAM usage, etc to the central node at particular intervals.
• The central server node collects the performance attributes of the other nodes
and then runs the profiling algorithm i.e. the k-means clustering algorithm.
• When a task is registered, the appropriate node is selected by the central server
and then the data is redirected towards the desired node by changing the flows
in the network accordingly.
24
5.2 Execution
• Whenever a sensor, say node A has some data for processing it will first choose
the right docker image with appropriate specification and modules required for
that data processing. This docker image will be classified as per our classifier
standard.
• Then it will look up in the HASH table for all the nodes having the profile
capable of running this Docker image.
• Out of all the nodes found the nearest node ,say node B available will be
selected for data processing.
• Data from node A will be passed to the selected node B for processing via
ftp/sftp .
• A RPC call will be made to the processing node B to pull the required docker
image from the local central repository of docker images. If an image is not
present in the local repository then first an image is pulled from the live
docker-hub registry and then pushed to the local repository.
• After the image is available in the node B, data processing begins at node B .
• In the case of distributed approach, the kademlia Distributed Hash Table data
has maximum TTL of 10 secs after that the data is removed and new data
has to be added . So the peers container which will be running in all these
nodes will keep updating its CPU statistics and node profile information in
the Distributed Hash Table.
25
• On the other hand, in SDN based approach the nodes continuously send the
data at particular intervals which used by central server for clustering, hence
the data is updated in central server only.
• When the task is registered, the optimal node is selected and task is performed
in it.
• All the nodes will first of all register with the central server running the SDN
controller, the profiling algorithm for nodes and a local docker repository.Upon
registration the required software like docker, python, RPC etc will be installed
in the node.
• The nodes will send their statistics like CPU utilization, memory utilization
etc to the central system averaged over a period of time interval currently set
to 10 seconds.
• The central node on receiving the data will group these nodes into clusters us-
ing the kmeans clustering algorithm . The clustering will be done by providing
appropriate weights to each attribute sent by the nodes which will depend on
the type of task being run.
• After that all the nodes belonging to the required profile will be selected and
the one having the best bandwidth and the one which is nearest will be picked
up for the execution of the task.The bandwidth routing data are collected
using the SDN controller floodlight
• Appropriate environment will be created on the selected node i.e. the required
docker images will be pulled from local docker repository and installed and the
data required for execution of the task will be sent to that node using FTP
protocol.
• After the above process execution of the task is done on the selected node.
26
Chapter 6
6.1 Docker
The current docker version in its default form worked in the following manner:
• All the layers of an image are fetched in a parallel way from the docker repos-
itory in which were compressed in a tar file.
• After the layers are downloaded they are extracted using tar. However due to
its underlying architecture, tar works serially and hence not much simultaneous
action is achieved here.
The above parallel download process of the image layers resulted in heavy network
usage first meanwhile the CPU remains idle and then heavy CPU usage later on due
to the extraction process.
Proposed Solution
Instead of allowing the docker to pull all the layers of an image simultaneously. We
propose to pull only 1 layer at a time and after the layer is pulled the extraction can
begin for that layer and also a new layer can be downloaded simultaneously. This
will result in utilization of both networking and CPU in an adequate manner.
27
6.2 Kademlia DHT
Kademlia DHT currently doesn’t support deletion of data from the distributed hash
table due to its underlying architecture. This data stored in the hash table may be
out of date and needs to be updated. The current distributed hash table requires
the key to be unique so two values with the same key cant be stored.
Proposed Solution
• To reduce the time to live of the stored key value pair i.e. after a certain
interval the data is removed from all the nodes and need to be re updated.
This solution solves our problem as the profile and CPU usage statistics are
highly dynamic and each node will keep updating its data in frequent intervals.
• To store multiple key linear probing and quadratic probing can be used to
store multiple keys with the same value.
28
Chapter 7
7.1 Conclusion
When we see the present network, we find a large number of distinct nodes with
different sets of computing powers. With the advancement in processor technology
processing power has increased significantly now we have operating system based on
Linux like android which runs on devices like refrigerators,television,phones,cars,watches
and many more. These devices generates huge amount of data daily and all is sent
over the network to some central server and either stored processed or discarded
based upon the usage of data. The arrival of the concept of IoT has changed ev-
erything in the networking area and has led to the paradigm shift from cloud to
edge and fog. Our work has focused towards edge networking. We are currently
using distributed networking but we intend to develop it using SDN and improve
it further. Hence, edge networking is the future as it surely removes a lot of load
off the core network making the congestion problem less. On combining it with the
power of SDN, the routers will also become more idle, hence making them more
suitable to support fog computing. The shift from distributed computing to SDN
will also remove load from end nodes making the computation faster.
29
7.2 Future Work
30
7.2.5 Using a better algorithm for node selection
The current algorithm selects the nodes for execution in a greedy manner. The best
available node is selected for processing. An advance machine learning algorithm can
help optimize the node selection process and will result in a system that can handle
increased number of simultaneous execution and better utilization of the processing
power.
31
References
[1] Peng Liu, Dale willis, Suman Banerjee, Paradrop: Enabling Lightweight
Multi-tenancy at the Networks Extreme Edge.
[2] Cecil Wbker, Andreas Seitz, Harald Mueller, Bernd Bruegge, Fogernetes:
Deployment and management of fog computing applications .
[3] ”https://www.ibm.com/support/knowledgecenter/en/SSMKHH 10.0.0/com.ibm.etools.mft.doc/
[4] IEEE EDGE 2018 - IEEE International Conference on Edge Computing, Jul
2018, San Francisco, CA, United States. IEEE, pp.1-8, 2018. ¡hal-01775105¿ .
[5] Petar Maymounkov and David Mazires. 2002. Kademlia: A Peer-to-Peer
Information System Based on the XOR Metric. In Revised Papers from the First
International Workshop on Peer-to-Peer Systems (IPTPS ’01), Peter Druschel, M.
Frans Kaashoek, and Antony I. T. Rowstron (Eds.). Springer-Verlag, London, UK,
UK, 53-65.
[6]”https://www.ibm.com/support/knowledgecenter/en/SSFKSJ 9.0.0/com.ibm.mq.pro.doc/q0051
[7]”https://en.wikipedia.org/wiki/Distributed hash table”
32