AJMSPaper HPC

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/331824476
Design and Implementation of High Performance Computing (HPC) Cluster
Article · January 2018
CITATIONS READS
0 953
3 authors:
Dileep Kumar Liaquat Ali

Quaid-e-Awam University of Engineering, Science and Technology Mehran University of Engineering and Technology
3 PUBLICATIONS 1 CITATION 9 PUBLICATIONS 7 CITATIONS
SEE PROFILE SEE PROFILE
Sheeraz Memon
Mehran University of Engineering and Technology
23 PUBLICATIONS 203 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Plant Diseases Detection using Content Based Image Retrieval View project
Adaptive Queue Management View project
All content following this page was uploaded by Dileep Kumar on 17 March 2019.
The user has requested enhancement of the downloaded file.

Academic Journal of Management Sciences ISSN 2305- 2864 Vol. 6, No.1, Jan. 2018
Design and Implementation of High Performance Computing

(HPC) Cluster
a, a a
Dileep Kumar *; Liaquat Ali Thebo and Sheeraz Memon
a
Mehran University of Engineering & Technology, Jamshoro, Pakistan.
Abstract
High Performance Computing evolved by reason of increase in demand of computational power.
This is due to increase in the need for fast computations. High performance computing increasingly
is used in all kinds of fields. Nowadays in engineering & science, for solving scientific problems it
requires huge execution time so, viable hardware based supercomputers on the other hand are
very costly to build and supervise. A new technique is emerged to create parallel systems that
provides the functionality of supercomputing through the use of inexpensive old PCs, fast local area
networking and free and open source Linux operating system and softwares. These parallel
systems called as High Performance Computing Clusters or Beowulf Clusters.The multiple PC’s are
interconnected to combine computational power of multiple machines or nodes to reduce the
execution time. The management of such clusters is a difficult task. A variety of tools are available
but proper selection & integration is always challenging.This paper presents High Performance
Computing clusters approach on Linux platform such that Ubuntu that describes the steps
necessary to create a cluster and providing an implementation of MPI (Message Passing Interface)
based HPC cluster. Finally, the performance of cluster environment is tested by showing the
comparisons of execution times and speedup on cluster with different numbers of processes.
Keywords: High Performance Computing; Cluster Computing; Beowulf Clusters; Parallel

Computing; MPI
I. Introduction
High Performance Computing is field of computer science in which supercomputers are used to
solve challenging scientific problems.These scientific problems are highly intensive in nature initially
recognized and then modeled in mathematical expressions. Mathematical expressions are normally
the differential and integral equations. These expressions cannot be directly executed on computer
machines, first to be converted into parallel programs then executed on large number of High
Performance Computers for the solution to reduce the computational time [A]. The solution can be
huge amounts of numerical data, or the representation of numerical data in the form of an image or
any animation using visualization techniques, used to realize these type of scientific problems.
Figure 1: A scientific problem
3
Academic Journal of Management Sciences 6(1) 3 – 12
Applications of High Performance Computing are progressively being used in educational

institutes and laboratories scientifically for research and in industries for analysing business problems
and their solutions [B]. With the evolution of the low-cost powerful desktop computers a new concept
emerged, that of the “computer cluster”, is a set of connected computers, functioning together closely
like single computer [C]. A distributed computing system called “Beowulf cluster” is a set of normal
commodity computers machines, networked together and controlled by open source softwares [D].
The High Performance Computing (HPC) components are very expensive but with the use of
commodity-off-the-shelf hardware components, the costs can be lowered down [E]. According to
knowledge and experience, High performance Computing with Linux based clusters are more
preferred to build parallel systems that will act as core for next generation supercomputers [F]. In
parallel computing,the problem is broken down into number of parts, each part is again broken into
number of series of instructions which can be solved simultaneously on different processors called
CPUs. Parallel computing has been evaluated as "the high level of computing", and used to solve very
complex problems in many fields of engineering and science. Parallel computers can be built from the
low-cost, commodity hardware. Parallel computing is to increase performance by executing the task
on multiple processors [G].
II. Problem Statement
Problem is that a great amount of money and effort can be spent to build very fast, specialized
High Performance supercomputers. These supercomputers are often optimized for specific problems
in a particular field, are expensive, have unique operating systems and application software that
require trained peoples.
Figure 2 : Highly expensive supercomputer
III. Proposed Solution
A solution to the problem of getting faster computers is to create a low cost High Performance
Computing (HPC) clusters using old commodity hardware combined with open source, free operating
systems & softwares that provides the functionality like a commercial High Performance
supercomputer and can be applied to several fields to solve the advanced computational problems.
Open source and free operating systems & softwares like Linux & MPI have drop down the marginal
cost near to zero. HPC clusters provide an inexpensive parallel computing system to aggregate
computing power of multiple computers and dramatically reduce the time needed to find the solutions
of problems that requires the analysis of larger amounts of data.
4
Figure 3 : A low cost supercomputer
IV. Objectives
According to what was expected from this research, objectives were:
 To become familiar with the parallel computing.

 To obtain a meaning of clustering environment.
 To realize that Linux Operating System offers its implementation over old commodity
hardware & viability of clustering on it.
 To test the implementation by combining computational power of old computers in order to
reduce the execution time which can be feasible for solving larger problems.
V. Related Work
In our review we found that under the term High-performance Computing and Cluster
Computing, there are several projects that are related to our project.
In [H], used the commodity hardware and implemented the Beowulf low cost cluster that can be
utilized for learning purpose at undergraduate and postgraduate levels.
In [I], showed the method to install and configure HPC cluster on the Linux distribution and
describes the structure of the cluster.
In [J], they showed how to assemble a disk-less cluster but do not take in any experiment done
with parallel programming.
In the [K], they proposed a HPC approach on Ubuntu Linux by using Parallel Programming
environment with the number of multiple nodes for bigger computations. They described method for
installing cluster environment using the PXE (Preboot Execution Environment) which requires the
DHCP & TFTP protocols installation and configuration.
In the [L], implemented the clustering environment to solve large mathematical operation
quicker as matrix multiplication and PI calculation. The users can right to access any node of the
cluster and use it separately as a local personal computer.
5
VI. HPC Cluster Design and Implementation
A. Proposed Network Topology
Designing the Linux HPC cluster we need a set of cluster nodes networked together for sharing
the computing resources. In the proposed HPC cluster star network topology deployed consist of:
 A master node
 Two slave nodes
 Fast Ethernet 100 Mbps switch
 PTCL DSL modem as a Gateway Device
 Cables, and other networking hardware
Figure 4 : Proposed network topology
B. Cluster Working
The user actually interacts with the master node & submits the task (job) to it. Master node is
the only controlling node, which trace the number of nodes within the cluster including itself. After the
task submission by the user the master node divides the task among the nodes including it (master
node) means that the master node also takes a part in the computation. After the computations of the
task from each node, the results will be the unified & return back.
Figure 5 : Cluster working
6
C. Proposed Cluster Message Passing Model
Message Passing Interface Chameleon (MPICH) is a software implementation of the Message

Passing Interface (MPI) standard for high performance used to pass processes from the master
computers to slaves. MPICH2 is a feasible implementation of Message Passing Interface, a standard
for message passing used for parallel computing applications [M]. MPI is most popular communication
protocol used in cluster computing environments. Many scientific and other commercial applications
running on cluster computers were developed with the help of MPI protocol [N]. The standard defines
the syntax and semantics of library routines useful to the programmer and developers in writing
parellel programs in many programming languages such as Fortran, C, C++ and Java. The goals of
MPI are high performance, portability, and scalability. MPI remains the leading model used in High
Performance Computing.
Figure 6 : Proposed cluster message passing model
The assigned task will be divided into the number of the processes and each process
associated with the unique Process ID inside the communicator. A set of communicating processes is
called as Communicator. The MPI_COMM_WORLD is default communicator. The number of
processes called as “Size” of the communicator and the Process ID called as “Rank”. Processes run
on nodes and communicate by exchanging messages. In the proposed HPC cluster we used 3 nodes
each of Core 2 Duo processor architecture with 2 CPUs. A single process runs on "single processor”,
it means that 2 processes will be assigned to each node and a cluster executes 6 processes at a time.
If the processes are more than 6 remaining processes will be assigned to those CPUs who, completed
the execution of previous tasks first according to First Come First Serve (FCFS).
D. Cluster Implementation
The main steps of implementing the HPC cluster are mentioned in the following “Figure 7”.
7
Figure 7: Cluster implementation steps
E. Hardware Configuration
The nodes used of same architecture & having smaller different specifications for CPU &
memory.
Table 1: Hardware configuration
Configuration Master Node Slave1 & Slave2 Nodes

Processor Type Intel (R) Core (TM) 2 CPU Intel (R) Core (TM) 2 CPU
Processor Speed 2.4 GHz 2.33 GHz
RAM Memory 4 GB 2 GB
L2 Cache 4 MB 4 MB
System Model Dell OptiPlex 745 Dell OptiPlex 755
Operating System Ubuntu 14.04 Desktop i386 Ubuntu 14.04 Desktop i386
Host Name master slave1 & slave2
IP Address 192.168.1.100 192.168.1.101-192.168.1.102
Subnet Mask 255.255.255.0 255.255.255.0
Default gateway 192.168.1.1 192.168.1.1
DNS server 192.168.1.1 192.168.1.1
Software & Packages GST, NFS-server, OpenSSH- GST, NFS-client, OpenSSH-
server, NTP, GCC Compiler, server, NTP
MPICH2
VII. Results & Comparison
We evaluated the performance of the cluster by implementing a parallel program for calculating
the value of “pi”. The performance can be measured in terms of execution time and speed up.
8
The execution time of a code is the time spent by the system executing that code. Speedup of a
parallelized code defined as:
𝑠𝑒𝑟𝑖𝑎𝑙 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 (𝑇𝑠)
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 (𝑇𝑝)
A. Pi Calculation
The problem is to calculate the value of “pi”, for solving this problem a mathematical model is
needed and then it will be converted in the computer code. A mathematical constant “pi”, is the ratio of
a circle's circumference to its diameter, whose approximate value is equal to 3.14159. Its value can be
calculated by:
1
4
𝑑𝑥 = 𝜋
0 1 + 𝑥2
We compare the value of calculated “pi” with the original value and find out the accuracy of the
output and the time taken by program will also displayed.
Table 2 : Pi calculation results
NO.OF EXECUTION SPEED UP

PROCESSES TIME TS / TP
1 3.331381 1
2 1.710955 1.95
3 1.117531 2.98
4 0.858116 3.88
5 0.690760 4.82
6 0.557304 5.98
7 0.764321 4.36
8 0.646207 5.16
9 0.598842 5.56
10 0.678636 4.91
Figure 8: Results comparison
9
VIII. Conclusion
According to the results it is clear that the HPC cluster is functioning as expected and we are
able to combine the computational power of the different nodes. From the results we observed that,
when the task is executed serially by creating only one process it takes much time as compared to the
other results. But when we divide the task into a total of 6 processes as shown in “Figure 8" and
“Table 2”, then the cluster shows the maximum performance in terms of execution time and speedup.
It is because of we have a total of 3 nodes and each node is of Core Duo architecture, have 2 cores or
CPUs so totally we have 6 number of cores within the cluster. The proposed cluster assigns 6
processes at a time among the 6 cores for execution and in case we create 10 processes, the
remaining 4 will be assigned to the cores next time after the execution of previous tasks or processes
on First Come First Serve (FCFS) basis. We observed variations in the results when the processes
greater than the 6 are generated. For getting the better results the task should be divided into the
number of processes equals to the number of cores or CPUs available within the cluster.
IX. Future Research
Sometimes we observed when any machine not working for any reason then the whole cluster
becomes failure and not performing any job that is submitted to it. Although administering a large HPC
cluster is a challenging job, because of any reason if anyone or some of machines may become
failure, it should not affect the whole cluster to become failure by doing their job. We will take part to
resolve this issue so that cluster can continue its job even if a machines not working within the cluster.
This will significantly improve the cluster performance. We have used the Fast Ethernet (100 Mbps) for
backend communication, it can be upgraded to a Gigabit Ethernet.
10
X. References
[A] Vecchiola, C., Pandey, S. and Buyya, R. (2009), December, “High-Performance Cloud Computing:
A View of Scientific Applications”, ISPAN '09 Proceedings of the 2009 10th International Symposium
on Pervasive Systems, Algorithms and Networks held on December 14, 2009, pp.4-16.
[B] Gupta, A. and Milojicic, D. (2011), October, “Evaluation of HPC Applications on Cloud”, TitleOCS
'11 Proceedings of the 2011 Sixth Open Cirrus Summit held on October 12, 2011 , pp. 22-26.
[C] Kaur E.R. (2015), “A Review of Computing Technologies: Distributed, Utility, Cluster, Grid and
Cloud Computing”, International Journal of Advanced Research in Computer Science and Software
Engineering (IJARCSSE) , ISSN: 2277 128X, Vol. 5 , Issue 2, pp. 144-148.
[D] Kahanwal, B. and Singh, T.P. (2012), “The Distributed Computing Paradigms: P2P, Grid, Cluster,
Cloud, and Jungle”, International Journal of Latest Research in Science and Technology, ISSN: 2278-
5299, Vol. 1, Issue 2 ,pp. 183-187.
[E] Ngxande, M. and Moorosi, N. (2014), “Development of Beowulf Cluster to Perform Large Datasets
Simulations in Educational Institutions”, International Journal of Computer Applications, Vol. 99 (15),
pp. 29-35.
[F] Adams, J. and Vos, D. (2002), “Small College Supercomputing: Building a Beowulf Cluster at a
Comprehensive College”, 33rd SIGCSE Technical Symposium on Computer Science Education,
Covington, KY, held on February 27 - March 03, 2002, pp. 411-415.
[G] Rajput, V. and Katiyar, A. (2013), “Proactive Bottleneck Performance Analysis in Parallel
Computing Using OpenMP”, International Journal of advanced studies in Computer Science and
Engineering (IJASCSE) , Vol. 2, Issue 5, pp. 46-53.
[H] Datti, A.A., Umar, H.A. and Galadanci, J. (2015), “A Beowulf Cluster for Teaching and Learning”,
th
4 International Conference on Eco-friendly Computing and Communication Systems, Vol. 70, pp. 62
– 68.
[I] HaiTao, W. and ChunQin, C. (2009) “A High Performance Computing Method of Linux Cluster’s”,
Proceedings of the 2009 International Symposium on Information Processing (ISIP’09), Huangshan,
P. R. China, held on August 21-23, 2009, pp. 083-086.
[J] Brightwell, R., Riesen, R. and Underwood, K. (2003), “A Performance Comparison of Linux and a
Light weight Kernel”, Proceedings of the IEEE International Conference on Cluster Computing, Hong
Kong, China, held on December 1-4, 2003, pp. 251-258.
11
[K] Chowdhury, S.S., Jannat, M.-E. and Shoeb, A.A.Md. (2012), “Performance Analysis of MPI
(mpi4py) on Diskless Cluster Environment in Ubuntu”, International Journal of Computer Applications
(IJCA), 0975 – 8887, Vol. 60(14), pp. 40 - 46.
[L] Al-Khazraji, S.H.A.A., Al-Sa'ati, M.A.Y. and Abdullah, N.M. (2014), “Building High Performance
Computing Using Beowulf Linux Cluster”, International Journal of Computer Science and Information
Security (IJCSIS), ISSN: 1947-5500, Vol. 12(4), pp. 1 - 7.
[M] Rahman, A. (2015), “High Performance Computing Clusters Design and Analysis Using Red Hat
Enterprise Linux”, TELKOMNIKA Indonesian Journal of Electrical Engineering, ISSN: 2302-4046, Vol.
14(3), pp. 534-542.
[N] Ruan, X., Yang, Q., Alghamdi, M.I., Yin, S., Ding, Z., Xie, J., Lewis, J. and Qin, X. (2010), “ES-
MPICH2: A Message Passing Interface with Enhanced Security”, 2010 IEEE 29th International
Conference on Performance Computing and Communications, Vol. 9(3), pp. 161-168.
12
View publication stats

AJMSPaper HPC

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AJMSPaper HPC

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Design and Implementation of High Performance Computing (HPC) Cluster

Article · January 2018

Dileep Kumar Liaquat Ali

SEE PROFILE SEE PROFILE

Adaptive Queue Management View project

The user has requested enhancement of the downloaded file.

Design and Implementation of High Performance Computing

Keywords: High Performance Computing; Cluster Computing; Beowulf Clusters; Parallel

Figure 1: A scientific problem

Applications of High Performance Computing are progressively being used in educational

II. Problem Statement

Figure 2 : Highly expensive supercomputer

III. Proposed Solution

Figure 3 : A low cost supercomputer

According to what was expected from this research, objectives were:

 To become familiar with the parallel computing.

VI. HPC Cluster Design and Implementation

A. Proposed Network Topology

Figure 4 : Proposed network topology

Figure 5 : Cluster working

C. Proposed Cluster Message Passing Model

Message Passing Interface Chameleon (MPICH) is a software implementation of the Message

Figure 6 : Proposed cluster message passing model

Figure 7: Cluster implementation steps

Configuration Master Node Slave1 & Slave2 Nodes

VII. Results & Comparison

Table 2 : Pi calculation results

NO.OF EXECUTION SPEED UP

Figure 8: Results comparison

IX. Future Research

View publication stats

You might also like