You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/255739031

High-Performance Computing

Book · January 1999


DOI: 10.1007/978-1-4615-4873-7

CITATIONS READS

51 998

5 authors, including:

Martyn F. Guest Alan Simpson


Cardiff University The University of Edinburgh
358 PUBLICATIONS   8,460 CITATIONS    34 PUBLICATIONS   850 CITATIONS   

SEE PROFILE SEE PROFILE

David Henty
The University of Edinburgh
85 PUBLICATIONS   1,538 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Development of ChemShell QM/MM Software View project

NWChem View project

All content following this page was uploaded by Martyn F. Guest on 05 January 2015.

The user has requested enhancement of the downloaded file.


High Performance Computing - Seminar Plan

• The Message Passing Interface (MPI) is a large and complex system.

• Core (and most useful functions) are pretty easy to grasp with a bit of
experience.

• MPI will be covered in main module lectures in greater depth.

• Purpose of this seminar is to show you how to compile MPI programs


and how to submit/manage jobs running on the HPSG cluster.

High Performance Computing


MPI and C-Language Seminars 2009
Photo Credit: NOAA (IBM Hardware)

1 3

High Performance Computing - Seminar Plan

• Seminar Plan for Weeks 1-5

• Week 1 - Introduction, Data Types, Control Flow, Pointers


• Week 2 - Arrays, Structures, Enumerations, I/O, File I/O
• Week 3 - Dynamic Memory, Preprocessing, Compile Time Options

• Week 4 - MPI in C and Using the Cluster Compiling MPI Applications


• Week 5 - “How to Build a Performance Model”

• Week 6-9 - Coursework Troubleshooting (a number of seminar tutors


will be available during these sessions - go to each tutor’s office)

2 4
Compiling MPI Applications Compiling MPI Applications written in C

• MPI Applications have their own set of compilers - this handles the • Applications compiled with the MPI compilers cannot be run from the
include locations and libraries without any additional user interaction. command line because OpenMPI is not available.

• This will also compile your code for running on Myrinet and under • You might get errors or the code may partially run.
OpenPBS/Torque.
• Either way - best to run your codes using the OpenPBS/Torque system
• Compilers for C, C++ and Fortran are installed (only C is needed for the in the next section.
coursework unless you choose to completely rewrite the application!).

5 7

Compiling MPI Applications written in C OpenMPI Installation Information

• To compile an application written in C replace gcc with mpicc • If you want to know any OpenMPI installation information for your
coursework then use the following command:
• Example:
ompi_info
mpicc -o hw-mpi -O2 helloworld-mpi.c

• The compiled in modules and installation information will be shown.

• All of the include paths and libraries get compiled in by default.

6 8
Logging In

• Access to the server is by SSH only.

• Your account is of the form: hpcXXXXXXX where XXXXXXX is your


student University number (e.g. hpc0234567).

• You can collect your password by coming to the HPSG lab (CS2.04) in
How to use the HPSG Cluster week 5. We will make an announcement via the Teaching
Announcements and in lectures.

• The server is located at: deepthought.dcs.warwick.ac.uk

• Use of the cluster is subject to University, departmental and ITS usage


policies. Be careful what you execute, if you have questions ask. You
WILL loose your ITS and DCS accounts if you break the rules.

9 11

High Performance Systems Group Cluster Checking the Current Runtime Queue

• During the coursework you will have access to the HPSG IBM Cluster, • To check the current execution queue:
this is not a high performance system but it will be good for
performance models. qstat

• 42 x dual processor (Pentium III, 1.4Ghz), 2GB system RAM per node, • The current queue will be shown.
Myrinet fibre-optics interconnect.
Job id Name User Time Use S Queue
• Various head nodes and system management machines. -------------------------
5277.frankie
----------------
BOINC
---------------
sdh
--------
612:08:2
-
R
-----
boinc
5278.frankie BOINC sdh 600:48:5 R boinc


5279.frankie BOINC sdh 553:19:1 R boinc
The system uses an OpenPBS/Torque queue to manage and batch 5280.frankie BOINC sdh 496:34:2 R boinc
5281.frankie BOINC sdh 584:17:1 R boinc
jobs. 5282.frankie BOINC sdh 505:19:2 R boinc
5283.frankie BOINC sdh 501:20:0 R boinc
5284.frankie BOINC sdh 523:02:3 R boinc
• Note: Best performance is achieved when you densely pack your jobs 5285.frankie
5286.frankie
BOINC
BOINC
sdh
sdh
507:42:1
489:33:5
R
R
boinc
boinc
(i.e. use both processors on the same node).

10 12
Checking the Current Runtime Queue Checking the Current Runtime Queue

• To check the current execution queue: • To check the current execution queue:

qstat qstat

• The current queue will be shown. • The current queue will be shown.

Job id Name User Time Use S Queue Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - ----- ------------------------- ---------------- --------------- -------- - -----
5277.frankie BOINC sdh 612:08:2 R boinc 5277.frankie BOINC sdh 612:08:2 R boinc
5278.frankie BOINC sdh 600:48:5 R boinc 5278.frankie BOINC sdh 600:48:5 R boinc
5279.frankie BOINC sdh 553:19:1 R boinc 5279.frankie BOINC sdh 553:19:1 R boinc
5280.frankie BOINC sdh 496:34:2 R boinc 5280.frankie BOINC sdh 496:34:2 R boinc
5281.frankie BOINC sdh 584:17:1 R boinc 5281.frankie BOINC sdh 584:17:1 R boinc
5282.frankie BOINC sdh 505:19:2 R boinc 5282.frankie BOINC sdh 505:19:2 R boinc
5283.frankie BOINC sdh 501:20:0 R boinc 5283.frankie BOINC sdh 501:20:0 R boinc
5284.frankie BOINC sdh 523:02:3 R boinc 5284.frankie BOINC sdh 523:02:3 R boinc
5285.frankie BOINC sdh 507:42:1 R boinc 5285.frankie BOINC sdh 507:42:1 R boinc
5286.frankie BOINC sdh 489:33:5 R boinc 5286.frankie BOINC sdh 489:33:5 R boinc

13 15

Checking the Current Runtime Queue Checking the Current Runtime Queue

• To check the current execution queue: • To check the current execution queue:

qstat qstat

• The current queue will be shown. • The current queue will be shown.

Job id Name User Time Use S Queue Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - ----- ------------------------- ---------------- --------------- -------- - -----
5277.frankie BOINC sdh 612:08:2 R boinc 5277.frankie BOINC sdh 612:08:2 R boinc
5278.frankie BOINC sdh 600:48:5 R boinc 5278.frankie BOINC sdh 600:48:5 R boinc
5279.frankie BOINC sdh 553:19:1 R boinc 5279.frankie BOINC sdh 553:19:1 R boinc
5280.frankie BOINC sdh 496:34:2 R boinc 5280.frankie BOINC sdh 496:34:2 R boinc
5281.frankie BOINC sdh 584:17:1 R boinc 5281.frankie BOINC sdh 584:17:1 R boinc
5282.frankie BOINC sdh 505:19:2 R boinc 5282.frankie BOINC sdh 505:19:2 R boinc
5283.frankie BOINC sdh 501:20:0 R boinc 5283.frankie BOINC sdh 501:20:0 R boinc
5284.frankie BOINC sdh 523:02:3 R boinc 5284.frankie BOINC sdh 523:02:3 R boinc
5285.frankie BOINC sdh 507:42:1 R boinc 5285.frankie BOINC sdh 507:42:1 R boinc
5286.frankie BOINC sdh 489:33:5 R boinc 5286.frankie BOINC sdh 489:33:5 R boinc

14 16
Job Status Information Getting Something Run on the Cluster

• Jobs can be in one of many states: • The cluster queuing system batches up jobs and runs them so that the
resource use is shared fairly between users.
• C - Job completed after having run (in tidy up and completed state)
• You can run jobs on the cluster in two ways:
• E - Job is exiting after having run (“ending”)
• H - Job is held by user or system • Via a submit script (a ‘batch’ job)
• Interactively
• Q - Job is being held in the queue for resources
• R - Job is running
• Do not run jobs outside of the queue - this will give you incorrect
• T - Job is being moved to a new queue/server results and cause unfair resource use. We will monitor this and disable
accounts for users who do not use the queue correctly.
• W - Job is waiting for its execution to be reached
• S - Job is suspended (not supported on our cluster) • Jobs are submitted using the qsub command.

17 19

Checking the Current Machine Status Interactive Jobs

• To check the current execution queue: • Interactive jobs allocate one (or more) nodes for you to use interactively
- i.e. you can run commands on the node in a similar way to using a
pbsnodes -a shell.

• The current queue will be shown. • Useful if you need to see the application executing (for debug etc).

• Can be used for X-windows jobs (if you really need to).
vogon41.deepthought.hpsg.dcs.warwick.ac.uk
state = free
np = 2
• Your job submission will block until a node becomes free for you to
ntype = cluster
status = opsys=linux,uname=Linux vogon41 2.6.24-vogon-stripped0 #5 SMP Thu Sep 18
use.
10:19:48 BST 2008 i686,sessions=881
1061,nsessions=2,nusers=2,idletime=928290,totmem=4174136kb,availmem=4096888kb,physmem=207
6496kb,ncpus=2,loadave=0.00,netload=696181053,state=free,jobs=,varattr=,rectime=123287857 • To request an interactive session on one node:
0

qsub -V -I

18 20
Batch Jobs Sample OpenPBS Submit File:

• Batch jobs allow you to submit an application to be executed into the #!/bin/bash
#PBS -V
queue and then leave it to be run (i.e. you don’t need to sit there typing
commands in and watching the output). cd $PBS_O_WORKDIR

mpirun ./IMB-MPI1 -msglen imblengths


• Very efficient if you want to submit lots of jobs.

• Scheduler will run jobs so that good resource usage is achieved.

• Requires you to write a submit script (to say what you want to be
executed).

• Example submit command (single processor job):

qsub -V submit.pbs

21 23

Sample OpenPBS Submit File: Sample OpenPBS Submit File:

#!/bin/bash #!/bin/bash
#PBS -V #PBS -V

cd $PBS_O_WORKDIR cd $PBS_O_WORKDIR

mpirun ./IMB-MPI1 -msglen imblengths mpirun ./IMB-MPI1 -msglen imblengths

22 24
Sample OpenPBS Submit File: Some more options...

#!/bin/bash
#PBS -V
• -N <name> (sets the job name)

cd $PBS_O_WORKDIR • -X (forwards X11 information for interactive jobs)


mpirun ./IMB-MPI1 -msglen imblengths
• -a <date> (Mark job as available to be run after <date>)

• -h (Submit job into a hold state)

25 27

Output and Error Streams Submitting Parallel Jobs

• The output and error streams of your job will be written to a file after • Write a script as before.
your job has completed.
• Need additional parameters to tell the scheduler how many processors
• This may take a minute or two after completion - be patient! to allocate to the job.

• Default file names are <jobname>.o<jobid> and <jobname>.e<jobid> • MPI runtime environment automatically knows how many processors
are allocated by looking this up in the PBS shell variables (and TM
interface).

• You cannot request more than 64 processors in the system (the


queues will reject your jobs).

26 28
Submitting Parallel Jobs Removing Jobs

qsub -l nodes=X:ppn=Y -V submit.pbs • To remove a job from the current execution queue:

• -l nodes=X:ppn=Y is the “resource request string”. qdel <job number>

• nodes=X requests a number of machines • Get the job number from the queue (qstat)

• ppn=Y requests a number of processors per node (either 1 or 2)


Job id Name User Time Use S Queue
• You can add the resource string into the submit file at the PBS -------------------------
5277.frankie
----------------
BOINC
---------------
sdh
--------
616:39:0
-
R
-----
boinc
commands. 5278.frankie
5279.frankie
BOINC
BOINC
sdh
sdh
601:53:2
553:51:2
R
R
boinc
boinc
5280.frankie BOINC sdh 497:06:4 R boinc
5281.frankie BOINC sdh 585:21:4 R boinc
• You can add the string for interactive jobs (the commands only run on 5282.frankie
5283.frankie
BOINC
BOINC
sdh
sdh
506:23:5
501:52:1
R
R
boinc
boinc
one node unless you run MPI or PBSDSH). 5284.frankie BOINC sdh 523:34:4 R boinc
5285.frankie BOINC sdh 507:42:1 R boinc
5286.frankie BOINC sdh 489:33:5 R boinc

29 31

Resource Constraints Getting Extended Job Information

• You can limit the resource requests for your job on time, memory, • To get extended information on jobs in the system:
processors and user specified attributes.
qstat -a
• The queues running on the cluster automatically apply defaults to your
jobs during submission.

Job ID Username Queue Jobname SessID NDS TSK Memory Time S


-l mem=512Mb Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- -
-l walltime=10:00:00 5277.frankie sdh boinc BOINC 12776 1 -- -- 720:0 R
352:3
5278.frankie sdh boinc BOINC 16561 1 -- -- 720:0 R
352:4
5279.frankie sdh boinc BOINC 15150 1 -- -- 720:0 R
352:4
5280.frankie sdh boinc BOINC 13542 1 -- -- 720:0 R
352:3
5281.frankie sdh boinc BOINC 10436 1 -- -- 720:0 R
352:4
5282.frankie sdh boinc BOINC 12805 1 -- -- 720:0 R
352:4

30 32
Resources

• Condor/HPSG Resources:

http://www2.warwick.ac.uk/fac/sci/dcs/people/research/csrcbc/
hpcsystems/dthought/
End of Seminar
• High Performance Computing coursework + MPI resources (FAQs,
Common Errors etc)
Thanks for coming, next week -
http://www2.warwick.ac.uk/fac/sci/dcs/people/research/csrcbc/
teaching/hpcseminars/
How to Build a Performance Model

33 35

Final Notes...

• This is the first year students are running jobs on a cluster using
OpenPBS - there probably will be some bugs and faults in our queue
definitions.

• Email us as soon as its goes wrong, we will try to fix it (crosses fingers)!

• Do not try to run jobs directly, if we find them we will lock the account -
always use the queue, this ensures fair execution and more reliable
results (which are crucial to a good performance model).

• The job queue will get busy but your jobs will get run (in the end).
Please be responsible, only submit what you need and delete jobs
which you know will go wrong.

34

View publication stats

You might also like