Tutorial about Cluster

By: Showayb A A Zahda
This tutorial will teach you the basics of clusters, how to build it, and how to monitor your jobs using the Grid Sun Engine. What is a cluster? Cluster can be defined as a group of computers running as one computer. There are several types of clusters. However, in this tutorial we will be talking about High Performance Computing (HPC) Cluster. Cluster is a replacement of supercomputers which have very high computing performance as well as very high cost. To make a cluster you need normal Personal Computers (PCs) and network cards on the machines and switch to connect the machines. Definitely, this reduces the cost of having high performance supercomputer. The notion of a cluster is having more that two machines connected together by the network. A master machine or front-end machine is basically like the God of the cluster. The master node is controlling everything in the cluster. It has at least two network cards one of them is connected to the outside world and the other is connected to the network of the cluster. The other nodes which are called compute nodes are connected to the network as well. Compute nodes receive commands to be executed from the master node. After the nodes finish executing the tasks they return the output to the master node. The master node is responsible of distributing the tasks to the nodes evenly. The interesting point in running a cluster is that the cluster runs best under Linux Operating System and Linux tools. So it is free. How to build a cluster? In this section, we will build a cluster from scratch. So, before we build it we need to know our architecture of the cluster. We will build the structure using 3 computers 64-bit.


And we need a switch and network cables. What else? Of course we need operating system to run the cluster i.e. based Linux. The following figure depicts the architecture of our cluster

Master node Outside world


Compute node 1

Compute node 2

The master node needs keyboard, mouse and monitor. And it is strongly recommended to have CD-ROM. However, compute nodes do not have mice, keyboards and monitors. Because we can control them from the master node. (We will talk about that soon). The Rocks operating system is made especially for clusters. Rocks is a RedHat based operating system. Rocks has a lot of interesting features which make your life easy. It is very easy to be installed. To install Rocks visit the official website of Rocks Cluster http://www.rocksclusters.org/wordpress/ where you can download Rocks and have the handout of installation.


Some handy information about Rocks Rocks configures everything for you. From the master node (as a root user) you can control everything on the cluster. The followings are some important and handy commands. Note: Assume that all the machines in your cluster are one machine i.e. the master node. This assumption makes it easy for you to handle the cluster. Because the cluster is aimed to aid people to perform some tasks quickly you need to create those people so as to have good security. In order to add a user to the system use useradd name this command will create a user that has some privileges on the system (to know more read about controlling user in Linux). And because we assumed that all the machines are one we need to notify the rest of the machine about the changes in the system. In this case, adding new user, to do so it is very easy just type on the shell rocks-user-sync this command will synchronize all the setting of all the machines in the cluster in order to make them perform as one machine. Assume you want to execute one command on all the machines simultaneously. It is ridiculous to go to each machine and execute the command. Moreover, do not forget that in some compute nodes there are no mice, keyboards and monitors. So, how is that possible? That is possible in two ways: 1- To execute one command on all the machines at the same time. For instance, reboot all the machines. Use the command cluster-fork command. This command will send signals to the entire cluster except the master node asking them to execute the command. The output will be displayed on your shell or can be redirected to a file. (if you do not know what is output redirection google it). #cluster-fork reboot : this command will reboot all the machines.

#cluster-fork hostname : this command will ask all the machines about their


hostname. You can check whether all the machines are on or not using this command. 2- If you want to do some tasks on one particular machine only, you can use ssh command in this format: ssh machine-name command # ssh c0-0 reboot # ssh c0-1 mkdir showayb # ssh c0-0 ls –l What are ssh, c0-0, c0-1, reboot, etc.? ssh: is the secure shell which allows you to connect to other machines remotely. Read about ssh if you want to know more. c0-0 or c0-1: are the names of the compute nodes. These names are defined by rocks by default. So if you have 10 computers. Computer number 7 is c0-7 The command is any Linux command like reboot, ls, mkdir, kill, etc. So, ssh c0-0 is like your shell. And the command is Linux command. So, you do not need to go to the machine and execute the commands there. Do it from the master node. What comes after Rocks? In fact, Rocks comes with a lot of software. One of them is the Sun Grid Engine (SGE) (read more http://gridengine.sunsource.net) which is one of the main components of the cluster. SGE does a lot of things like.
• • • • • • • • •

Policy based allocation of distributed resources (CPU time, software licenses, etc.) Batch queuing & scheduling Support diverse server hardware, OS and architectures Load balancing & remote job execution Detailed job accounting statistics Fine-grained user specifiable resources Suspend/resume/migrate jobs Tools for reporting Job/Host/Cluster status Job Arrays


Integration & control of parallel jobs

Source: http://wiki.gridengine.info/wiki/index.php/Main_Page Okay, now it is time to run jobs on our cluster. Make sure that have an account created for you so as to have access to the Operating system i.e. Rocks. Login to your account and from there you can execute the following commands based on your needs. Before you submit a job! Do you know what is job? The job basically is a shell script (google shell script to learn how to write it). So, a shell script is a file that executes Linux commands. The shell script can range from a very basic script to a very sophisticated one which needs a lot of care and efforts to write it. So, let’s see an example of a shell script that can be executed on our cluster. The script will only execute the command date and output it to us. #!/bin/bas #capture the date date #sleep for 20 seconds sleep 20 #capture the date again date Save the script in a file called date.sh The extension of the shell script is sh. Basically you can run this script on your machine using the command sh date.sh or ./date.sh . In the second case make sure that you have permission to execute the file use ls –l and chmod. This way has just run the script on the master machine. But if the script is of 1,000 lines which runs some programmes and does a lot of computation, it will definitely take a lot of time. So, we need to send the job to the other machines (compute nodes) on our cluster so as to get the output quickly. How to do that?


Sun Grid Engine is the solution. You need first to login in to the system either remotely or locally. And only then you can submit a job (do not forget shell script). To submit the file which has the shell scripts do the following. #qsub filename.sh qsub stands for queue submit which means submit the job to the queue. filename.sh is your shell script file. After the submission of the job a message will appear to indicate that your job was submitted successfully. The message looks like “Your job 66 (filename.sh) has been submitted” the number 66 is the id of your job and the filename.sh is the name of the script you submitted. After the completion of the job two files will be created usually in your home directory. The files are filename.sh.eX and filename.sh.o.X The name looks weird but in fact it is easy once you know what it means. filename.sh.eX : filename.sh is the name of the shell script you submitted, e stands for error, X is the jobid. Example: date.sh.e66 filename.sh.o.X: filename.sh is the name of the shell script you submitted, o stands for output, X is the jobid. Example: date.sh.o66

After you execute the job you need to see what happens to your job. #qstat –f qstat stands for queue status -f is an argument which means specify a full format display of information The output of this command looks like a table. It has the id of the job, the name of the job, the user who submitted the job, the state of the job, the queue and some more.


The states of the jobs are represented by letters by their initials. • • • • r: running qw: queued waiting s: suspended E: Error

To delete a job from the queue #qdel jobid qdel: queue delete and the jobid is numeric. Make sure you are the owner of that job otherwise you will not b able to delete it simply because it is not yours. Only the root user can delete other people’s job and do whatsoever he/she wants to do.