You are on page 1of 21

SCHOOL OF COMPUTER SCIENCE AND

ENGINEERING

Parallel and Distributed Computing

A Task Distribution Framework for Parallel Computing over TCP/IP


Master-Slave Sockets

Members:

Darsh kumar (20BCE0301)


Mannan goyal (20BCI0231)
Mukund Yadav (20BCE2538)
Deepanshu Sharma (20BCE0724)

Slot: G1&G2
Faculty: Prof. Mohana Sunadari L

1
DECLARATION
We hereby declare that the thesis entitled “A Task Distribution Framework for Parallel
Computing over TCP/IP Master-Slave Sockets” submitted by us, for the award of the degree
of Bachelor of Technology in Computer Science and Engineering to VIT is a record of
bonafide work carried out by us under the supervision of Dr. L. Mohana Sundari.

We further declare that the work reported in this thesis has not been submitted and will not be
submitted, either in part or in full, for the award of any other degree or diploma in this
institute or any other institute or university.

Place: Vellore
Date: 11/04/2023

2
Table of Contents
1. ABSTRACT 4
2. INTRODUCTION 5
3. LITERATURE REVIEW 7
Problem Statement 7
Problem Statement 8
Problem Statement 9
Proposed Methodology 9
Limitations 10
Problem Statement 10
Proposed Methodology 11
Limitations 11
4. PROBLEM FORMULATION 13
4.1 Objectives 13
5. METHODS/ALGORITHMS 14
5.1 Design and Implementation 14
5.2 Implementation 14
6. RESULTS AND DISCUSSIONS 15
7. CONCLUSION AND FUTURE WORK 20
7.1 Conclusion 20
7.2 Future Work 20
8. REFERENCES 21

3
1. ABSTRACT
Parallelization and distributed computing are the two major techniques to optimize high
CPU-intensive programming tasks. Unfortunately, most solutions built presently are focused
on using these optimizations in a single machine, by utilizing all the threads/CPU/GPU cores
available on a single machine. Our goal is to build a library on top of TCP sockets that
provide an easy-to-use javascript API to distribute tasks to multiple slave machines. We plan
to also implement an algorithm to easily serialize/unserialize javascript functions, which will
let us easily transport them over TCP to other machines.

We also plan to explore multiple strategies/algorithms to allocate tasks to multiple machines


and provide APIs to implement message passing between processes.

4
2. INTRODUCTION

Parallel computing is a critical aspect of modern computing applications, enabling users to


execute multiple tasks simultaneously across multiple computing nodes, thereby improving
overall system performance. However, designing and developing parallel applications that are
scalable and efficient can be a challenging task.

One of the most significant challenges in developing parallel applications is effective task
distribution among multiple computing nodes. In this context, a task distribution framework
that uses TCP/IP master-slave sockets can be an effective solution. Such a framework can
significantly simplify the development of parallel applications while optimising the
distribution of tasks among computing nodes. This article will provide an in-depth overview
of a task distribution framework for parallel computing over TCP/IP master-slave sockets,
discussing its architecture, benefits, and potential use cases. We will also delve into the
challenges associated with designing and implementing a task distribution framework and
explore how this framework can be used to enhance the performance of various computing
applications, such as scientific simulations, big data processing, and machine learning.
Additionally, we will examine how this framework can be integrated with various
programming languages and tools to simplify the development of parallel applications.

Overall, this article aims to provide a comprehensive understanding of task distribution


frameworks and how they can be utilised to enhance the performance of parallel applications.

5
3. LITERATURE REVIEW
3.1 Barak, A., Ben-Nun, T., Levy, E., & Shiloh, A. (2010). A package for
OpenCL-based heterogeneous computing on clusters with many GPU devices.
Problem Statement
Heterogeneous systems provide new opportunities to increase the performance 3 of parallel
applications on clusters with CPU and GPU architectures. Currently, applications that utilize
GPU devices run their device-executable code on local devices in their respective hosting
nodes. This paper presents a package for running OpenMP, C++, and unmodified OpenCL
applications on clusters with many GPU devices.

Their Proposed Methodology

This Many GPUs Package (MGP) includes an implementation of the OpenCL specifications
and extensions of the OpenMP API that allow applications on one hosting node to
transparently utilize cluster-wide devices (CPUs and/or GPUs). MGP provides means for
reducing the complexity of programming and running parallel applications on clusters,
including scheduling based on task dependencies and buffer management.
Limitations:
This solution is not general and is limited to OpenCL only and cannot be used for other
frameworks. One cannot use frameworks like CUDA even though it operates on a unified
architecture. Our implementation is platform-independent and provides a general solution to
distributed computing across different clients in a language-independent way. This includes
languages like Python or Javascript that do not require OpenCL bindings or runtimes.

3.2 Gabriel, E., Resch, M., Beisel, T., & Keller, R. (1998). Distributed
computing in a heterogeneous computing environment.

Problem Statement

Distributed computing is a means to overcome the limitations of single computing systems.


In this paper, we describe how clusters of heterogeneous supercomputers can be used to run a

6
single application or a set of applications. We concentrate on the communication problem in
such a configuration and present a software library called PACX-MPI that was developed to
allow a single system image from the point of view of an MPI programmer.

Proposed System

The authors introduce the concept of ‘PACX-MPI’, which is a library that enables the
clustering of two or more MPPs into one single resource. This allows us to use a meta
computer just like an ordinary MPP.

Limitations

The system does not support asynchronous message passing and thus, all of the calls are
blocking. This introduces a time penalty when the caller is waiting for the function to return.

3.3 Duda, J., & Dłubacz, W. (2013).Distributed Evolutionary Computing


System Based on Web Browsers with JavaScript

Problem Statement

The paper presents a distributed computing system that is based on evolutionary algorithms
and utilizes a web browser on a client’s side. This approach is particularly useful in a
multicore or a multiprocessor architecture, where communication time for large problem
instances is almost negligible.

Proposed System

An evolutionary algorithm is coded in JavaScript language embedded in a web page sent to


the client. The code is optimized with regard to memory usage and communication efficiency
between the server and the clients. The server side is also based on JavaScript language, as
the node.js server was applied. The proposed system has been tested on the basis of the
permutation flow shop scheduling problem, one of the most popular optimization benchmarks

7
for heuristics studied in the literature. The results have shown that the system scales quite
smoothly, taking additional advantage of local search algorithms executed by some clients.
Javascript provides us with a dynamic interface to

Limitations

The proposed system must run in a javascript engine with multiple abstraction layers of
abstraction. While the Javascript runtime engine (such as Chrome’s V8 or webkit) may
optimize it in terms of memory, it will always live in the HTML DOM with no direct access
to the hardware. It must talk to the hardware through the browser’s interface, and this
introduces lag which hinders performance.

3.4 Pandey, S., & Tokekar, V. (2014). The prominence of MapReduce in


Big Data Processing. 2014 Fourth International Conference on
Communication Systems and Network Technologies.

Problem Statement

Large Volumes of Data are growing because organizations are continuously capturing the
collective amount of data for better decision making. The volume of data increases by online
contents like blogs, posts, social networking site interactions, photos created by the users, and
servers continuously record the messages about what the online users are doing. Today’s
Business is unexpectedly affected by this growth of Data. Every day 2.5 quintillion bytes of
data are created according to the estimation done by IBM and it is a very large amount so
90% of data in the world has been created in the last 2 years. It is a mind-boggling figure but
the sad part is instead of having more information, people feel less conversant.

Proposed Methodology

MapReduce has come up as a highly effective and efficient tool for Big Data analysis;
Reasons behind the popularity of MapReduce is its unique features which includes simplicity
and communicative manners of its programming model as MapReduce has mainly two
functions map( ) and reduce( ) even though a large number of data analysis tasks can be

8
expressed as a set of MapReduce jobs, high degree of elastic scalability and fault tolerance
but the performance of MapReduce is still far off from ideal in the database context. A recent
search shows that Hadoop, the open-source implementation of MapReduce, is slower than
two parallel database systems by a factor of 3.1 to 6.5. The organization required not only
elastically scalable but also efficient data processing systems. MapReduce can also become
more efficient by making proper tuning of various performance-effective parameters. This
paper has focused on those parameters and their proper tuning.

Limitations

Different business models have been suggested by the providers of databases but basically,
those are application-specific for eg. Google seems to be more interested in small
applications. Big Data Storage is another big issue in Big Data management as available
computer algorithms are sufficient to store homogeneous data but not able to smartly store
data that comes in real-time because of its heterogeneous behavior. So how to rearrange Data
is another big problem in the context of Big Data Management. Virtual server technology can
sharpen the problem because it raises the issue of overcommitted resources especially when
there is a lack of communication between the application server and storage administrator.
Also need to solve the problem of concurrent I/O and a single node master /slave architecture.

3.5 Thurgood, B., & Lennon, R. G. (2019). Cloud Computing With


Kubernetes Cluster Elastic Scaling. Proceedings of the 3rd International
Conference on Future Networks and Distributed Systems

Problem Statement

Cloud computing and artificial intelligence (AI) technologies are becoming increasingly
prevalent in the industry, necessitating the requirement for advanced platforms to support
their workloads through parallel and distributed architectures. Kubernetes provides an ideal
platform for hosting various workloads, including dynamic workloads based on AI
applications that support ubiquitous computing devices leveraging parallel and distributed
architectures. The rationale is that Kubernetes can be used to support backend services
running on parallel and distributed architectures, hosting ubiquitous cloud computing
workloads. These applications support smart homes and concerts, providing an environment

9
that automatically scales based on demand. While Kubernetes does offer support for
auto-scaling of Pods to support these workloads, automated scaling of the cluster itself is not
currently offered. In this paper, we introduce a Free and Open Source Software (FOSS)
solution for autoscaling Kubernetes (K8s) worker nodes within a cluster to support dynamic
workloads. We go on to discuss scalability issues and security concerns both on the platform
and within the hosted AI applications

Proposed Methodology

Cloud and ubiquitous computing in the context of this paper may take the form of a smart
home with interconnected devices throughout, consisting of the user wearing a smartwatch
that interacts with distributed sensors, all communicating via Wi-Fi with the containerized AI
application hosted on Kubernetes. The sensor network could not only turn lights on when a
room is entered but a variety of other functions could also be performed based on the
constantly uploaded sensor data to the Kubernetes cloud platform which processes the data
and can provide constant feedback to the sensor network. The AI applications could trigger
actions such as the setting of ambient lighting or relaxing music based on mood, posture, or a
variety of other factors. Having the ubiquitous computing sensor network respond to both
physical motions, a number of inhabitants and various other inputs would allow for a fully
interactive experience in an unobtrusive manner. As the number of users interacting with the
platform is likely to fluctuate, as family and guests come and go, or new sensor networks are
onboarded, the platform is able to automatically scale both in the form of containers spinning
up as required within the cluster, and the cluster itself scaling new worker nodes as the
number of containers consume the capacity of the cluster.

Limitations

While this research was based on a proprietary IaaS solution, additional research could
produce an entirely FOSS solution. Foreman has built-in support for oVirt and libvirt which
can be leveraged; however, the alarming solution will also need to be adapted as it is
currently based on vCenter performance alarms. The use of Prometheus and Alertmanager
would likely yield positive results in triggering VM builds through the Foreman API. This
solution provides a dynamically scaling support infrastructure for ubiquitous computing
which can be used in a variety of different use cases. Running AI, although not a
requirement, is likely to yield advances in the field. Prometheus is a free software application

10
used for event monitoring and alerting. It records real-time metrics in a time series database
built using a HTTP pull model, with flexible queries and real-time alerting.

11
4. PROBLEM FORMULATION
To build a javascript library that provides an async API and uses TCP sockets to distribute
tasks to multiple machines in a cluster, and then aggregate the results.

4.1 Objectives

1. To explore multiple task scheduling algorithms (Map Reduce, round-robin, etc.) to


distribute and aggregate tasks in a cluster of machines
2. To explore algorithms to serialize and deserialize javascript functions
3. To build a master-slave architecture over TCP sockets and build a cluster of multiple
machines
4. To test the performance and efficiency of the system using cloud computing machines
hosted on AWS

12
5. METHODS/ALGORITHMS

5.1 Design and Implementation

This project follows a Master-Slave approach. Here, the master is responsible for assigning, i.e the function
with the arguments, work to a set of slaves. These slaves respond to the master on completing their tasks. The
algorithm our projects follow for this assignment is Round-Robin.
Round Robin is a CPU scheduling algorithm where each process is assigned a fixed time slot in a cyclic way. It
is preemptive as processes are assigned CPU only for a fixed slice of time at most. This abstract process
synchronization ensures that no two slaves are working on the same job

5.2 Implementation

https://github.com/Mannan-Goyal/dicer

13
6. RESULTS AND DISCUSSIONS
When testing the library with one master and two slaves, we see the following results -

Each slave connects to the master with its own unique ID. The master also handles slave disconnects.

Our testing scenario consists of three functions that are decorated as follows:

class Job {
@lib.exec
static square(x: number): any {
return x ** 2;
}
@lib.exec
static cube(x: number): any {
return x ** 3;
}
@lib.exec
static sqrt(x:number):any{
return Math.sqrt(x);
}
}
describe('Simple Math Test', () => {
it('Square of 2', async () => {
const res: SlaveResponse = await Job.square(2)

14
assert.strictEqual(res.result, 4);
});
it('Cube of 3', async () => {
const res: SlaveResponse = await Job.cube(3)
assert.strictEqual(res.result, 27);
});
it('Sqrt of 9', async () => {
const res: SlaveResponse = await Job.sqrt(9)
assert.strictEqual(res.result, 3);
});
});

The results of testing are as follows:

The library has taken the input as the user’s function and arguments which are serialized and passed it to the
master, which then passes it to the slave that it selects using Round Robin and then passes the function and
arguments to it.

The slave then de-serializes the function and the arguments and executes it and returns the result and the time
taken to execute to the master, that then returns it to the client.

15
As seen above, the master selects the nodes via Round Robin and distributes the workload among them.
Slave-30595971 gets a single function, i.e., cube function, and Slave-27368516 gets two functions, i.e. Square
and Cube to execute.
This ensures that work is evenly distributed among all the nodes.

This is another example showcasing matrix multiplication:

16
As we can see, for sequential and parallel execution the time differs. On graphing the values, we get the following bar graph.

17
As we can see from the diagram above, the time taken for parallel execution of the tests is
almost half of the sequential time in every case. This happens due to the distribution of tasks
to different slaves.

For the case of Array Multiplication, we multiplied a one dimensional matrix of size 300 with
a number. The sequential execution time was 32ms and the parallel execution time came out
to be 12ms.

Moreover, we tested our framework using a two dimensional 480x480 matrix multiplication
problem. The sequential execution time was 53ms and the parallel execution time came out to
be 26ms.

In each case, the parallel execution time was better.

18
7. CONCLUSION AND FUTURE WORK
7.1 Conclusion
With the advancement of technology, a common motive is always to decrease time and
increase efficiency. Keeping this goal in our mind, our project successfully demonstrates a
faster method of execution using parallel computing. Having chosen a fitting algorithm here,
that is, Round Robin we realise how it increases efficiency by not letting any two slaves work
on the same job.

7.2 Future Work

1. As of now, the project only supports the Round Robin algorithm for distribution of
tasks to different slaves. Although, in the future, we plan to implement to more
scheduling algorithms like:
a. First Come First Serve
b. Shortest-Job-Next (SJN) Scheduling
c. Priority Scheduling
d. Shortest Remaining Time
e. Multiple-Level Queues Scheduling
2. The master server developed is stateful, which means it cannot be horizontally
scalable. This means that there is an upper bound on how many tasks the server can
take in, and how many workers can connect to the master node. In the future, we plan
to make the master server stateless, which will allow us to run multiple master nodes,
and load balance the requests across each of these nodes.
3. The library currently transfers all function arguments to all workers. For example, if
we were doing matrix multiplication, the entire matrix is distributed to all workers,
instead of just transporting the row/column of each worker's needs. This is inefficient,
as it increases memory usage on each worker, and also a higher amount of network
bandwidth is required to transfer data to each node.

19
8. REFERENCES
1. Barak, A., Ben-Nun, T., Levy, E., & Shiloh, A. (2010). A package for OpenCL-based
heterogeneous computing on clusters with many GPU devices. 2010 IEEE
International Conference On Cluster Computing Workshops and Posters (CLUSTER
WORKSHOPS). doi:10.1109/clusterwksp.2010.5613086

2. Baldo, Lucas & Brenner, Leonardo & Fernandes, Luiz & Fernandes, Paulo & Sales,
Afonso. (2005). Performance Models For Master/Slave Parallel Programs. Electronic
Notes in Theoretical Computer Science. 128. 101-121. 10.1016/j.entcs.2005.01.015.

3. Duda, J., & Dłubacz, W. (2013). Distributed Evolutionary Computing System Based
on Web Browsers with JavaScript. Lecture Notes in Computer Science, 183–191.
doi:10.1007/978-3-642-36803-5_13

4. Gabriel, E., Resch, M., Beisel, T., & Keller, R. (1998). Distributed computing in a
heterogeneous computing environment. Lecture Notes in Computer Science, 180–187.
doi:10.1007/bfb0056574 Karolj Skala, Davor Davidovic, Enis Afgan, Ivan Sovic,
Zorislav Sojat

5. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In


OSDI, pages 137-150{ 2004}. [5] A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D.
J.DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale
data

6. Jeffery Dean, Sanjay Ghemawat.An Article on MapReduce:A Flexible Data


Processing Tool.In SigMOD pages3(1):72-75{ACM 2010 }.

7. Oudjida, Abdelkrim & Ouchabane, Abdelghani & Liem, Marco. (2005). Master-Slave
Wrapper Communication Protocol: A Case Study.

8. Wajiansyah, Agusma & Purwadi, Hari & Astagani, Asrina. (2018). Implementation of
master-slave method on the multiprocessor-based embedded system: A case study on
a mobile robot. International Journal of Engineering & Technology. 7. 53.

20
10.14419/ijet.v7i2.2.12732.

9. [Online] https://medium.com/csivit/oversimplified-code-executor-ea6b26def4d9

10. [Online] https://core.ac.uk/download/pdf/82095915.pdf

21

You might also like