You are on page 1of 47

High Performance Computing Cluster

Quick Reference User Guide
Base Operating System:
Redhat(TM) / Scientific Linux 5.5 with
!ce" HPC Software Stac#
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e ( of +,
TABLE O CO!TE!TS
'!(! L-C./S-/G -/0OR1)2-O/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3
"# $!TRO%UCT$O!####################################################################################################################################&
(!(! S4S2.1 -/0OR1)2-O/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +
(!! PRO2.C2-/G 4OUR 5)2) )/5 -/0OR1)2-O/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!+
(!3! US-/G 2H-S GU-5.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +
'# CLUSTER ARC($TECTURE#####################################################################################################################)
!(! L-/U6 OP.R)2-/G S4S2.1!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 5
!! 7.O8UL0 CLUS2.R )RCH-2.C2UR.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 5
!3! CLUS2.R S.R9.RS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! :
!+! SH)R.5 5)2) S2OR)G. CO/0-GUR)2-O/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!,
*# ACCESS$!G T(E CO+,UTE CLUSTER###############################################################################################-
3!(! LOGG-/G -/ 2O 2H. CLUS2.R 9-) 2H. CO11)/5 L-/.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!;
3!! GR)PH-C)L LOG-/ ./9-RO/1./2!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <
a" US-/G R.1O2. 5.S=2OP CO//.C2-O/S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <
>" /6 CL-./2 CO//.C2-O/S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <
3!3! CH)/G-/G 4OUR P)SS8OR5!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (
a" CH)/G-/G P)SS8OR5S 8-2H /-S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (
>" CH)/G-/G P)SS8OR5S 8-2H L5)P!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (
&# CO!$GUR$!G .OUR USER E!/$RO!+E!T##################################################################################"*
+!(! L-/U6 SH.LL CUS2O1-S)2-O/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (3
+!! 1O5UL.S ./9-RO/1./2 S8-2CH.R!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (3
a" 54/)1-C 1O5UL. LO)5-/G!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (3
>" S.22-/G UP 4OUR 5.0)UL2 ./9-RO/1./2!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (5
c" LO)5-/G 1O5UL.S 0RO1 SCH.5UL.R ?O7SCR-P2S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (5
d" 1)/U)LL4 CO/0-GUR-/G 4OUR US.R ./9-RO/1./2 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (:
)# US$!G CO+,$LERS A!% L$BRAR$ES#################################################################################################"0
5!(! )9)-L)7L. CO1P-L.RS 0OR S.R-)L@7)2CH ?O7S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(,
5!! )9)-L)7L. CO1P-L.RS 0OR P)R)LL.L ?O7S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(,
5!3! )CC.L.R)2.5 HPC L-7R)R-.S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (;
5!+! CO1P-L.R ./9-RO/1./2 9)R-)7L.S 0OR L-7R)R-.S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(<
1# +ESSAGE ,ASS$!G E!/$RO!+E!TS##############################################################################################'2
:!(! 1.SS)G. P)SS-/G -/2RO5UC2-O/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! '
:!! 1P- -/2.RCO//.C2S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! '
a" .2H.R/.2 /.28OR=S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! '
>" -/0-/-7)/5 0)7R-CS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (
:!3! S.L.C2-/G )/ 1P- -1PL.1./2)2-O/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (
:!+! .6.CU2-/G ) P)R)LL.L )PPL-C)2-O/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
0# CLUSTER SC(E%ULERS######################################################################################################################'*
,!(! CLUS2.R SCH.5UL.R -/2RO5UC2-O/!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3
,!! 24P.S O0 ?O7!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3
-# GR$%3E!G$!E CLUSTER SC(E%ULER###############################################################################################')
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e of +,
;!(! US-/G GR-5A./G-/. : CLUS2.R SCH.5UL.R!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!5
;!! ?O7 SCR-P2S )/5 QSU7 5-R.C2-9.S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 5
;!3! GR-5A./G-/. PS.U5O ./9-RO/1./2 9)R-)7L.S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!;
;!+! ?O7 OU2PU2 0-L.S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ;
;!5! SU71-22-/G /O/A-/2.R)C2-9. ?O7S 9-) QSU7!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!;
;!:! 9-.8-/G 2H. S2)2US O0 ) SU71-22.5 ?O7!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<
;!,! SU71-22-/G ) P)R)LL.L ?O7!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3'
;!;! SU71-22-/G -/2.R)C2-9. ?O7SB!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3(
;!<! SU71-22-/G )/ )RR)4 O0 ?O7SB!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3
;!('! ?O7 5.P./5./C-.SB!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 33
;!((! 5.L.2-/G ) SU71-22.5 ?O7 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 33
4# TORQUE 5 O,E!,BS CLUSTER SC(E%ULER###################################################################################*&
<!(! O9.R9-.8 O0 2ORQU. @ OP./P7S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3+
<!! ?O7 SCR-P2S )/5 2ORQU. 5-R.C2-9.S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3+
<!3! 2ORQU. PS.U5O ./9-RO/1./2 9)R-)7L.S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!3:
<!+! ?O7 OU2PU2 0-L.S!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3,
<!5! SU71-22-/G /O/A-/2.R)C2-9. ?O7S 9-) QSU7!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!3,
<!:! 9-.8-/G 2H. S2)2US O0 ) SU71-22.5 ?O7!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!3;
<!,! SU71-22-/G ) P)R)LL.L ?O7!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3;
<!;! .6.CU2-/G )/ -/2.R)C2-9. ?O7 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 3<
<!<! SU71-22-/G )/ )RR)4 O0 ?O7SB!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +'
<!('! R.QU.S2-/G .6CLUS-9. US. O0 ) CO1PU2. /O5.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!+(
<!((! 5.L.2-/G ) SU71-22.5 ?O7 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +(
"2# LUSTRE ,ARALLEL $LES.STE+######################################################################################################&'
('!(! LUS2R. 7)C=GROU/5!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +
('!! QU.R4-/G 0-L.S4S2.1 SP)C.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +
A,,E!%$6 A: E6A+,LE ,ARALLEL 7OB E6ECUT$O!###########################################################################&&
A,,E!%$6 B: E6A+,LE O,E!+, 7OB CO+,$LAT$O!########################################################################&1
$.%. L&C'(S&() &(*+RMT&+(
2his *ork is & Co#$ri%ht '',A'(( )lces Soft*are Ltd, )ll Ri%hts Reserved!
UnauthoriCed reAdistri>ution is #rohi>ited!
2he )lces Soft*are HPC Cluster 2oolkit is free soft*areB $ou can redistri>ute it and@or Dodif$ it under the
terDs of the G/U )ffero General Pu>lic License as #u>lished >$ the 0ree Soft*are 0oundation, either
version 3 of the License, or (at $our o#tion" an$ later version! 2hese #acka%es are distri>uted in the ho#e
that the$ *ill >e useful, >ut 8-2HOU2 )/4 8)RR)/24E *ithout even the iD#lied *arrant$ of
1.RCH)/2)7-L-24 or 0-2/.SS 0OR ) P)R2-CUL)R PURPOS.! See the G/U )ffero General Pu>lic
License for Dore details (htt#B@@***!%nu!or%@licenses@"!
) co#$ of the G/U )ffero General Pu>lic License is distri>uted alon% *ith this #roduct!
0or Dore inforDation on )lces Soft*are, #lease visitB htt#B@@***!alcesAsoft*are!coD@!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3 of +,
"# $!TRO%UCT$O!
2his Fuick reference %uide is intended to #rovide a >ase reference #oint for users of an )lces Soft*are
confi%ured RedHat, Centos or Scientific Linux 5 cluster installation! -t sho*s >asic exaD#les of coD#ilin%
and runnin% coD#ute Go>s across the cluster and #rovides inforDation a>out *here $ou can %o for Dore
detail!
2he #riDar$ audience for this %uide is an HPC cluster user! )ll soft*are is confi%ured to >e as close to the
reFuireDents of the ori%inal coDDunit$ su##orted distri>ution as is #ossi>leE Dore detailed docuDentation
on s#ecific #acka%es are siD#le to find >$ follo*in% the links #rovided in this docuDent!
%.%. S,ST'M &(*+RMT&+(
-n order to access the cluster, $our s$steD adDinistrator *ill #rovide $ou *ith the follo*in%B
• 2he -P address and net*ork inforDation for the cluster
• ) usernaDe and #ass*ord to allo* $ou to lo% in to the cluster
2he soft*are installed on $our server nodes is desi%ned to acce#t and run Go>s froD #articular s$steDs H
these s$steDs are called the !ogin node"! -n a cluster confi%uration, this Da$ >e the headnode s$steD(s", or
dedicated lo%in nodes Da$ >e #rovided! -t is not norDall$ necessar$ to lo%Ain directl$ to coD#ute or
*orker nodes and $our s$steD adDinistrator Da$ #rohi>it direct access to nonAinteractive s$steDs as #art
of $our site inforDation securit$ #olic$!
%.-. PR+T'CT&() ,+.R /T (/ &(*+RMT&+(
2he soft*are installed on $our HPC cluster is desi%ned to assist Dulti#le users to efficienc$ share a lar%e
#ool of coD#ute servers, ensurin% that resources are fairl$ availa>le to ever$one *ithin certain confi%ured
#araDeters! Securit$ >et*een users and %rou#s is strictl$ Daintained, #rovidin% $ou *ith DechanisDs to
control *hich data is shared *ith colla>orators or ke#t #rivate!
8hen usin% the HPC cluster, $ou have a res#onsi>ilit$ to adhere to the inforDation securit$ #olic$ for $our
site *hich outlines acce#ta>le >ehaviour and #rovides %uidelines desi%ned to #erDit DaxiDuD flexi>ilit$ for
users *hile Daintainin% a %ood level of service for all! 4our s$steD adDinistrator can advise $ou of an$
s#ecific rules coverin% the HPC s$steD, >ut users are encoura%ed toB
• Chan%e $our s$steD #ass*ord as re%ularl$ as #ossi>le
• Onl$ access Dachines that $ou have a valid reFuireDent to use
• )dhere to $our site inforDation securit$ #olic$
• 1ake >acku# co#ies of iD#ortant or valua>le data
• ReDove teD#orar$ data after use
%.0. .S&() TH&S ).&/'
2his FuickAstart user Danual is intended to #rovide users *ith >asic inforDation a>out their HPC cluster!
4our local cluster Da$ >e confi%ured differentl$ >$ $our s$steD adDinistrator to suit the availa>le hard*are H
if in dou>t, contact $our s$steD adDinistrator for further assistance!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e + of +,
'# CLUSTER ARC($TECTURE
-.%. L&(.1 +P'RT&() S,ST'M
4our HPC cluster is installed *ith a :+A>it Linux o#eratin% s$steD! CentOS and Scientific Linux (SL" are
freel$ availa>le Linux distri>utions >ased on RedHat .nter#rise Linux! 7oth #roGects atteD#t to Daintain
(''I coD#ati>ilit$ *ith the corres#ondin% .nter#rise distri>ution >$ recoD#ilin% #acka%es and u#dates
froD source as the$ are released >$ RedHat! 2hese o#enAsource distri>utions are su##orted >$ a thrivin%
coDDunit$ of users and develo#ers *ith Dan$ su##ort sites, *ikis and foruDs to assist ado#ters *ith its
confi%uration!
Redhat distri>ution version nuD>ers have t*o #arts A a DaGor nuD>er corres#ondin% to the .nter#rise Linux
version and Dinor nuD>er *hich corres#onds to the .nter#rise u#date set! Redhat +!: therefore
corres#onds to RedHat .nter#rise Linux + u#date :! Since :+A>it extensions *ere first inte%rated into 6;:
coD#ati>le #rocessors in ''3, :+A>it ca#a>le o#eratin% s$steDs have allo*ed users to take advanta%e of
>oth 3A>it and :+A>it coD#utin%! .ach s$steD in $our Linux cluster is installed *ith either Redhat, CentOS
or Scientific Linux 5 :+A>it edition!
-.-. 2'+3.L* CL.ST'R RCH&T'CT.R'
2he #ur#ose of a Beowulf or loosely-coupled Linux cluster is to #rovide a lar%e #ool of coD#ute resource
*hich can >e easil$ Dana%ed froD a sin%le adDinistration #oint! 2$#icall$, *orker nodes are identicall$
confi%ured *ith other s#ecialA#ur#ose servers to #rovide cluster services includin% stora%e, Dana%eDent
and reDote user access! .ach node in a Linux cluster is installed *ith an o#eratin% s$steD H often,
personality-less s$steD confi%urations are ado#ted on coD#ute nodes to hel# reduce adDinistration
overhead! 2$#icall$, coD#ute nodes have a DiniDal o#eratin% s$steD installation to DiniDiCe un*anted
services and hel# reduce Gitter across the cluster! 2he confi%uration of the node o#eratin% s$steD is
controlled >$ the cluster #rovisionin% soft*are!
2he o#eratin% s$steDs on s#ecial #ur#ose nodes are confi%ured s#ecificall$ for their #ur#ose in the cluster!
1achines are cate%oriCed >$ the roles the$ #erforD, for exaD#leB
– Head node systemE #rovides cluster Dana%eDent and scheduler services
– Interactive systemE #rovides soft*are develo#Dent environDent
– Compute node systemE used solel$ for executin% coD#ute Go>s
– Storage systemE #rovides data services to other nodes in the cluster
– Gateway systemE #rovide a secure #ath*a$ into the cluster
5e#endin% on $our cluster siCe and confi%uration, soDe #h$sical servers Da$ #erforD several different roles
siDultaneousl$E for exaD#le, the headnode of a sDall cluster Da$ #rovide cluster Dana%eDent, scheduler
servers, an interactive develo#Dent environDent and stora%e services to the coD#ute nodes in $our
cluster! )ll s$steD adDinistration is usuall$ #erforDed froD the cluster headnode or adDinistration DachineE
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 5 of +,
the %oal of the installed cluster Dana%eDent utilities is to reduce the reFuireDent to lo% in to each
coD#ute node in turn *hen confi%urin% and Dana%in% the s$steD!
/onAadDinistrative cluster users t$#icall$ access the cluster >$ su>Dittin% Go>s to the scheduler s$steDE this
Da$ >e #erforDed reDotel$ (>$ usin% a scheduler su>Dission client installed on their local *orkstation" or
>$ su>Dittin% Go>s froD the cluster lo%in or interactive nodes! -t is not norDall$ necessar$ for users to lo% in
directl$ to the coD#ute nodes H Dan$ cluster adDinistrators disallo* this to encoura%e the fair sharin% of
resources via the cluster scheduler s$steD!
4our cluster Da$ >e confi%ured *ith Dulti#le different net*orks dedicated to different #ur#oses! 2hese Da$
includeB
• -nteractive user data and Go> scheduler inforDation
• O#eratin% s$steD #rovisionin%
• Li%htsAoutADana%eDent, -P1- or S/1P traffic
• Stora%e net*orkin%
• 1essa%eA#assin% interconnect traffic
• Gra#hics renderin% @ visualiCation interconnect
5e#endin% on the siCe of the cluster, soDe of these functions Da$ >e shared on a sin%le net*ork, or
Dulti#le net*orks Da$ >e #rovided for the saDe function! 8hen usin% $our cluster for the first tiDe, $ou
should consider *hich net*ork to use for Dessa%eA#assin% functions, as these o#erations are sensitive to
Dessa%e latenc$ and net*ork >and*idth!
-.0. CL.ST'R S'R4'RS
4our cluster is confi%ured *ith a nuD>er of different server s$steDs that #erforD different functions in the
cluster! 2$#ical s$steDs Da$ includeB
• Cluster lo%in nodesE e!%!
◦ login1.university.ac.uk, login2.university.ac.uk
• CoD#ute node serversE e!%!
◦ comp00-comp23 H standard coD#ute nodes
◦ bigmem00-bigmem03 H hi%h DeDor$ coD#ute nodes
• Cluster service nodesE e!%!
◦ headnode1 H #riDar$ headnode s$steD
◦ headnode2 H secondar$ headnode s$steD
◦ mds1 H Lustre files$steD #riDar$ Detadata server
◦ nfs1 H /et*ork file s$steD #riDar$ server
◦ oss1, oss2 H Lustre o>Gect stora%e servers
)s a cluster user, $ou *ill Dostl$ >e runnin% Go>s on compute node servers via the cluster 89: sc;eduler
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e : of +,
(often also called c!u"ter !oad56a!ancing "oftware or the 6atch "7"tem"! .ach coD#ute node is eFui##ed
*ith hi%h #erforDance DultiAcore #rocessors, tuned DeDor$ s$steDs and often have a sDall aDount of
local teD#orar$ scratch s#ace assi%ned on their hard disks! -n contrast, cluster service nodes often feature
s#ecialised hard*are and slo*er, lo*A#o*er #rocessors to ena>le theD to #erforD their intended #ur#ose
efficientl$! 4our coD#ute Go>s *ill execute fastest on coD#ute nodes and should not >e started on cluster
service nodes unless directed >$ $our local adDinistrator!
.ach coD#ute node is installed *ith a DiniDal o#eratin% s$steD that allo*s it to >oot and function as a
Linux host to run coD#ute Go>s! )ll a##lication soft*are, coD#ilers, li>raries and user data is held centrall$
on $our cluster service nodes *ith ver$ fe* soft*are #acka%es installed on individual coD#ute nodes! 2his
hel#s to ensure that all nodes are identicall$ confi%ured, allo*in% Go>s su>Ditted to the cluster scheduler to
run eFuall$ *ell *hichever node the$ are executed on! 2he o#eratin% s$steD and s$steD soft*are installed
on coD#ute nodes is controlled >$ $our s$steD adDinistrator froD a central locationE individual nodes Da$
>e autoDaticall$ reAinstalled >et*een successive Go> executions! Users should onl$ store teD#orar$ data on
coD#ute node hard disks as scratch areas Da$ >e cleaned >et*een Go> executions!
-.8. SHR'/ /T ST+R)' C+(*&).RT&+(
4our cluster s$steD Da$ >e confi%ured *ith a nuD>er of different file stora%e areas Dounted on different
servers for adDinistrative and data sharin% #ur#oses! 2hese areas are desi%ned to #rovide user data,
s$steD li>raries and confi%uration inforDation to the nodes froD a sin%le, central location! 7$ usin% shared
stora%e, the Dana%eDent overhead of a lar%e cluster can >e si%nificantl$ reduced as there is no need to
co#$ data to ever$ node in the cluster for runnin% Go>s! 7$ default, the follo*in% shared stora%e areas are
t$#icall$ confi%ured on the clusterB
Storage mount point on node" *i!e "er9er Purpo"e
/users
headnode or
stora%e servers
Shared stora%e area for users
/opt/gridware
headnode or
stora%e servers
O#tional soft*are includin%
cluster scheduler
/opt/alces headnode or confi%uration server Cluster confi%uration inforDation
/scratch or /tmp
Local coD#ute node disk or
stora%e servers
Hi%hAs#eed transient data stora%e
8here data stora%e areas are shared via /0S froD the central fileserver, coD#ute nodes are confi%ured
*ith >oth a uniFue scratch stora%e area (Dounted under 5tmp or 5scratc;" and a shared stora%e area
(Dounted under 5users"! )n$ a##lications or services installed on coD#ute nodes that use the 5tmp area to
store teD#orar$ data can Dake use of individual coD#ute node disks *ithout affectin% the #erforDance or
ca#acit$ of the centrall$ stored data! Users should >e a*are that data stored on node hard disks Da$ >e
reDoved after their Go> has finished executin% H al*a$s co#$ data to >e retained to $our hoDeAdirector$ as
#art of $our Go>scri#t!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e , of +,
*# ACCESS$!G T(E CO+,UTE CLUSTER
Cluster users t$#icall$ *ork froD the login nodes of an HPC cluster, usin% theD to su>Dit Go>s, coD#ile
code and Donitor the status of runnin% Go>s! Lar%er clusters Da$ have one or Dore dedicated lo%in nodes,
*hereas sDaller Dachines Da$ have Gust one lo%in node hosted >$ the headnode s$steD! 4our s$steD
adDinistrator *ill #rovide $ou *ith the host address of the cluster lo%in node(s" as *ell as $our user naDe
and #ass*ord to access the cluster!
0.%. L+))&() &( T+ TH' CL.ST'R 4& TH' C+MM(/ L&('
Users can lo%Ain to a cluster lo%in node usin% secureAshell (SSH" froD $our deskto# s$steD or another
Dachine on the net*ork! 2o lo% in froD a U/-6, Linux or 1acOS 6 s$steD, use the ssh coDDandB
[username@workstation ~]$ ssh username@login1.mycluster.local
Password: ********
[username@login1 ~$]
2o lo% in froD a 1icrosoft 8indo*s *orkstation, users Dust do*nload and run an SSH client a##licationE
the o#enAsource PuTTY #acka%e is a #o#ular choice, as sho*n >elo*B
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e ; of +,
0.-. )RPH&CL L+)&( '(4&R+(M'(T
-f $our cluster lo%in node(s" have >een confi%ured *ith %ra#hical lo%in service, authoriCed users can o>tain
an interactive deskto# session froD a net*ork connected *orkstation or la#to#! 4our s$steD adDinistrator
*ill let $ou kno* *hich services are confi%ured and if $our user account is ena>led for %ra#hical access!
a< US$!G RE+OTE %ES=TO, CO!!ECT$O!S
8hen confi%ured, the !"P service #rovides a Fuick and eas$ Dethod to access $our cluster s$steD and
run %ra#hical a##lications usin% a !emote "es#top client! 4ou can connect to the cluster headnode froD
an$ Linux, Solaris, 1ac OS6 or 1icrosoft 8indo*s client *ith a suita>le client a##lication installed! 7$
default, the !"P a##lication runs on #ort 33;< H the default for 8indo*s 2erDinal Services or ReDote
5eskto# clients!
0or Linux or 1acOS 6 clients, use the rdes#top client to connect to the headnode DachineB
[username@workstation ~]# rdesktop headnode:!"
0or 1icrosoft 8indo*s clients, start the !emote "es#top Connection a##lication, enter the hostnaDe or -P
address of $our cluster lo%in node s$steD and #ress the connect >uttonB
:< !6 CL$E!T CO!!ECT$O!S
2he /6 series of dis#la$ li>raries #rovides a convenient, hi%hA#erforDance Dethod of runnin% a %ra#hical
deskto# froD the cluster Daster or lo%in node on $our local client s$steD! O#enAsource client >inaries are
#rovided >$ the $o%achine$ #roGect for the DaGorit$ of client deskto#s includin% 1icrosoft 8indo*s,
1ac OS6, Solaris and Linux! 9isit the follo*in% URL to do*nload a client #acka%e suita>le for $our client
s$steDB
http:##www.nomachine.com#download.php
)fter installation, start the (1 Connection 3i:ard to confi%ure the settin%s for $our HPC clusterE enter a
reco%nisa>le naDe to descri>e the connection and the hostnaDe or -P address details of $our cluster lo%in
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e < of +,
or Daster node Dachine! Set the slider to Datch the net*ork connection s#eed >et*een $our client
Dachine and the HPC clusterB
Click the (ext >utton and select a JU/-6 G/O1.K deskto# H the standard for Linux RedHat, CentOS and
Scientific Linux s$steDs! Confi%ure the siCe of $our deskto# >$ selectin% one of the #reAdefined o#tions, or
select cu"tom and enter the desired *idth and hei%ht in #ixelsB
Click the (ext >utton and click on the JSho* )dvanced Settin%sK checkA>ox H click (ext a%ain!
2he advanced settin%s dialo%ue >ox allo*s users to enter the connection ke$ used to connect to the /6
server runnin% on $our HPC cluster! 2his ke$ is s#ecific to $our cluster and ensures that onl$ clients that
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e (' of +,
identif$ theDselves *ith the correct ke$ are allo*ed to connect! 4our s$steD adDinistrator *ill #rovide $ou
*ith a co#$ of the ke$file, stored on $our HPC cluster lo%in or Daster node(s"!
Click on the ;e7 >utton on the confi%uration dialo%ue and enter the ke$ #rovided >$ $our s$steD
adDinistrator! Press the Sa9e >utton on the ke$ entr$ >ox, then #ress Sa9e a%ain on the Dain confi%uration
>ox to save $our chan%es!
.nter $our cluster usernaDe and #ass*ord in the dialo%ue >oxes #rovided and #ress the Login >utton!
Connection Dessa%es *ill >e dis#la$ed *hile $our client ne%otiates *ith the /6 server >efore $our reDote
deskto# session is dis#la$ed!
Contact $our s$steD adDinistrator for assistance if $our reDote deskto# session is not started as ex#ectedE
the$ Da$ ask for the dia%nostic Dessa%es dis#la$ed durin% ne%otiation to assist in trou>leshootin% the
connection #ro>leD!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e (( of +,
0.0. CH()&() ,+.R PSS3+R/
4our s$steD adDinistrator *ill have #rovided $ou *ith a lo%in #ass*ord or SSHAke$ access to the coD#ute
cluster, allo*in% $ou to lo%in and use the attached resources! 8e recoDDend that $ou chan%e $our
s$steD #ass*ord as often as #ossi>le and adhere to the recoDDendations ex#lained in $our site
inforDation securit$ #olic$!
a) C(A!G$!G ,ASS>OR%S >$T( !$S
-f $our cluster uses /et*ork -nforDation Services (/-S", $ou can chan%e $our user #ass*ord usin% the
yppasswd coDDand on the coDDand line *hen lo%%ed in to the cluster!
[user@login$1 ~]$ yppasswd
%hanging &'( account in)ormation )or user on headnode1.cluster.local.
Please enter old password: ********
%hanging &'( password )or user on headnode1.cluster.local.
Please enter new password: ********
Please retype new password: ********
*he &'( password has +een changed on headnode1.cluster.local.
4our user #ass*ord can >e reset >$ $our s$steD adDinistrator if $ou have for%otten *hat it is H a teD#orar$
#ass*ord Da$ >e issued to $ou to allo* $ou to lo% inE use the yppasswd coDDand to chan%e the
teD#orar$ #ass*ord to one that $ou can reDeD>er!
>" C(A!G$!G ,ASS>OR%S >$T( L%A,
0or clusters that are confi%ured to use siteA*ide L5)P services, access to lo%in Dachines *ill >e
authoriCed usin% $our standard L5)P #ass*ord used on other site Dachines! Contact $our s$steD
adDinistrator for assistance chan%in% $our L5)P #ass*ord for $our site!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e ( of +,
&# CO!$GUR$!G .OUR USER E!/$RO!+E!T
8.%. L&(.1 SH'LL C.ST+M&ST&+(
)s a cluster user, $ou Da$ need access to s#ecific a##lications, coD#ilers, li>raries and utilities in order to
run $our coD#ute Go>s on the Dachine! 8hen lo%%in% in, the default coDDandAline access Dethod is via
the >ash shell inter#reter! 7ash can >e confi%ured to #rovide convenient access to the different soft*are
$ou use need to hel# siD#lif$ usa%e of the cluster!
-n $our hoDe director$, a file called .bashrc contains a list of custoDisations *hich are executed >$ the
>ash shell ever$ tiDe $ou lo% in! )s $our hoDe director$ is shared across all Dachines in the coD#ute
cluster, the .bashrc file is also executed autoDaticall$ *hen $our coD#ute Go>s are executed >$ cluster
coD#ute nodes!
2he .bashrc file Da$ >e custoDised >$ users >$ siD#l$ editin% it in a text editor (e!%! nano, emacs or
vim"! )fter chan%in% the file, users Dust lo%out and lo%in a%ain, causin% the >ash shell to reAread the file
and a##l$ an$ chan%es Dade!
8hen runnin% a##lications or develo#in% and coD#ilin% code to run on an HPC cluster, there are a
nuD>er of different coD#iler and li>rar$ o#tions *hich users Da$ need to >uild or link a%ainst! 1an$
different #acka%es have siDilar naDes for coDDands and it can Fuickl$ >ecoDe difficult to instruct the
s$steD exactl$ *hich tools and utilities should >e used! Linux shell inter#reters (e!%! >ash, csh, tsh" use a
series of en?ir9nment ?aria:les to #rovide convenient shortAcuts for runnin% >inaries and locatin% s$steD
li>raries! CoDDon varia>les includeB
En?ir9nment ?aria:le Usage
!"#$ Stores the search #ath for executa>le tools and utilities
%&'%()*"*+'!"#$
Stores the search #ath for li>rar$ files that Da$ >e used >$ executed
a##lications
,"-!"#$
Stores the search #ath for user Danual #a%es descri>in% different
utilities and tools
./0* Stores $our current user naDe
!1& Stores $our current *orkin% director$
8.-. M+/.L'S '(4&R+(M'(T S3&TCH'R
2he module environDent s*itcher #rovides a convenient Dethod of settin% $our environDent varia>les for
the different li>raries installed on $our cluster! 8hen invoked, the utilit$ Dodifies the runnin% session, settin%
the !"#$, %&'%()*"*+'!"#$ and ,"-'!"#$ varia>les as reFuested!
a" %.!A+$C +O%ULE LOA%$!G
1odules are installed in a shared area (e!%! /opt/gridware/modules" to Dake theD availa>le across the
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e (3 of +,
cluster on all lo%in and coD#ute nodes! )ll users lo%%ed into an$ cluster node can use Dodule coDDands
*ith identical s$ntax!
2o vie* the different Dodule environDents availa>le on $our cluster, use the Jmodule avail@ coDDandB
[user@login$1 #]# module a,ail
---------------------#usr#share#.odules#module)iles-----------------------
dot mpi#openmpi-1./.01gcc null
module-c,s mpi#openmpi-1./.01intel switcher#1.$.12de)ault3
module-in)o mpi#openmpi-1./.01pgi
modules mpi#openmpi-1../1gcc
------------------------- #opt#gridware#modules---------------------------
apps#gcc#+east apps#gcc#em+oss apps#gcc#p)tdr
apps#gcc#+last apps#gcc#hmmer apps#gcc#spider
apps#gcc#+so)t apps#gcc#imod apps#gcc#,el,et
apps#gcc#+so)t/ apps#gcc#ma))t li+s#perl-cpan
apps#gcc#eman apps#gcc#mr+ayes mpi#gcc#openmpi#1.4.1
apps#gcc#eman/ apps#gcc#paml44
[user@login$1 #]#

• Use the Jmodule load@ coDDand to ena>le a ne* environDent froD the list dis#la$ed! 2he load
coDDand Da$ >e used Dulti#le tiDes to include settin%s froD Dulti#le different #rofiles!
• Use the Jmodule list@ coDDand to list currentl$ loaded Dodules!
• Use the Jmodule unload@ coDDand to unload an alread$ loaded Dodule!
• Use the Jmodule purge@ coDDand to clear all loaded Dodules!
2he 2module display3 and 2module help3 coDDands can hel# users to deterDine *hich
environDent varia>les are Dana%ed >$ different DodulesB
[user@login$1 #]# module help li+s#gcc#gsl#1.14
----------- .odule (peci)ic 5elp )or 6li+s#gcc#gsl#1.146 ----------
7dds 8gsl-1.146 to your en,ironment ,aria+les
##############
9&: a)ter load
##############
;(<='> -? @ase path o) li+rary
;(<<'@ -? li+ path
;(<@'& -? +in path
;(<'&%<A=9 -? include path
7dds ;(<<'@ to <=1<'@7>B1P7*5
7dds ;(<@'& to P7*5
7dds gsl .anPages to .7&P7*5
7dds ;(<'&%<A=9 to %1'&%<A=91P7*5 and %P<A(1'&%<A=91P7*5
7dds necessary )lags to +uild#link against the li+rary to %C<7;( and
<=C<7;(
[user@login$1 #]#

)fter loadin% an a##lication Dodule, the >ase location of the a##lication is set in an environDent varia>le
to #rovide Fuick access to an$ a##lication hel# or exaD#le files >undled *ith the a##lication! 0or exaD#leB
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e (+ of +,
[user@login$1 #]# module load apps#mr+ayes
[user@login$1 #]# echo $mr+ayes
#opt#gridware#apps#mr+ayes#1.."#
[user@login$1 #]# cd $mr+ayes
[user@login$1 1.."]# ls
7A*5D>( %DPB'&; doc eEamples .7&A7< scripts
*A*D>'7< +in li+ li+04 :9>('D&
[user@login$1 1.."]#
:< SETT$!G U, .OUR %EAULT E!/$RO!+E!T
Users can reFuest that certain Dodules are loaded ever$ tiDe the$ lo% into the cluster >$ usin% the module
initadd and module initrm coDDands! 2hese coDDands Dani#ulate a file held in the usersL hoDe
director$ called L.modulesL, causin% the s$steD to #reAload the desired Dodules as the user lo%s in! 2his is
#articularl$ useful for users *ho run a fixed set of a##lications and reFuire the saDe environDent to >e
confi%ured each tiDe the$ lo% into the cluster!
• Use the Jmodule initadd@ coDDand to cause a Dodule to >e loaded on lo%in!
• Use the Jmodule initprepend@ coDDand to cause a Dodule to >e loaded on lo%in, >efore an$
other Dodules that are alread$ >ein% loaded on lo%in!
• Use the Jmodule initlist@ coDDand to list the Dodules that *ill >e loaded on lo%in!
• Use the Jmodule initrm@ coDDand to reDove a Dodule listed to load on lo%in!
• Use the Jmodule initclear@ coDDand to clear all Dodules froD >ein% loaded on lo%in!
4our s$steD adDinistrator can also confi%ure the s$steD to autoDaticall$ load one or Dore default Dodules
for users lo%%in% in to the cluster! Use the module list coDDand to deterDine *hich Dodules are
loaded *hen $ou lo% in!
c< LOA%$!G +O%ULES RO+ SC(E%ULER 7OBSCR$,TS
8hen su>Dittin% Go>s throu%h the scheduler s$steD, users Da$ *ish to use module coDDands to
confi%ure their environDent varia>le settin%s >efore runnin% Go>s! 2hree Dethods can >e used to use
environDent varia>les in Go> su>Dission scri#tsB
• PerDanentl$ add the reFuired Dodules to >e loaded for $our lo%in environDent
• Load Dodules >efore su>Dittin% Go>s and use the A9 scheduler directive to ex#ort these varia>les
to the runnin% Go>
• Use a Amodule load@ coDDand in $our Go>scri#t to load additional Dodules
Users incor#oratin% a Amodule load@ coDDand in their Go> scri#t should reDeD>er to source the
Dodules #rofile environDent (e!%! /etc/profile.d/modules.sh" in their Go>scri#ts >efore loadin%
Dodules to allo* the environDent to >e #ro#erl$ confi%uredE e!%!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e (5 of +,
#F#+in#+ash
# *his is an eEample Go+ script
# (cheduler directi,es are shown +elow
#$ -G y -o ~#myoutput)ile
# %on)igure modules
. #etc#pro)ile.d#modules.sh
# <oad modules
module load mpi#gcc#openmpi#1.H./
echo IJo+ starting at 8date8K
~#test#myapplication -i ~#data#intput)ile4
d< +A!UALL. CO!$GUR$!G .OUR USER E!/$RO!+E!T
2he modules environDent s*itcher is installed for the convenience of users to assist in Dana%in% Dulti#le
different coD#ilers, li>raries and soft*are develo#Dent environDents! -f $ou #refer, $our user environDent
can >e setAu# Danuall$, >$#assin% the Dodules s$steD for ultiDate flexi>ilit$ *hen confi%urin% ho* $ou use
the cluster s$steD! 2his can >e achieved >$ Danuall$ settin% $our environDent varia>les, s#ecif$in% the
individual li>raries to >uild a%ainst *hen coD#ilin% code or storin% $our desired li>raries in $our hoDe
directories and addressin% theD directl$! )lthou%h these Dethods are Fuite valid for advanced users,
#lease >e a*are that >$#assin% the %lo>al Dodules confi%uration Da$ reduce the a>ilit$ of $our s$steD
adDinistrator to assist $ou if $our a##lications do not later execute as ex#ected!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e (: of +,
)# US$!G CO+,$LERS A!% L$BRAR$ES
)#"# 4&L2L' C+MP&L'RS *+R S'R&L/2TCH <+2S
)s *ell as #reAcoD#iled >inar$ a##lications, Dan$ HPC users also *rite their o*n soft*are to solve
coD#lex DatheDatical and scientific #ro>leDs! Users are encoura%ed to coD#ile ne* a##lications on the
lo%in node s$steDs, testin% theD for correct >ehaviour >efore su>Dittin% theD as Go>s to run across the
coD#ute cluster! 4our coD#ute cluster Da$ >e confi%ured *ith a dedicated #ool of resources for
develo#Dent and testin% *hich is distinct froD the Dain cluster!
) nuD>er of different o#enAsource and coDDercial Linux coD#ilers are availa>le, #rovidin% different
o#tiDisations, tunin% o#tions and feature sets! 2he default o#enAsource G/U coD#iler suite is #rovided as
#art of $our Linux distri>ution and is availa>le >$ default on all coD#ute clusters! 2he follo*in% ta>le
suDDarises the coD#iler coDDands availa>le on HPC clustersB
,ackage License type
Language
C CBB 9rtran00 9rtran42 9rtran4)
G!U O#enAsource gcc gLL g)ortran g)ortran g)ortran
Open1& O#enAsource opencc open%% n#a open)"$ open)"H
$ntel CoDDercial icc icc i)ort i)ort i)ort
,9rtland Gr9up CoDDercial pgcc pg%% pg)MM pg)"$ pg)"H
,at;scale CoDDercial pathcc path%% n#a path)"$ path)"H
CoDDercial coD#iler soft*are reFuires a valid license for $our site >efore users can coD#ile and run
a##lications! -f $our cluster is #ro#erl$ licensed, environDent Dodules are #rovided to ena>le $ou to use
the coD#iler coDDands listed a>ove! -f $our cluster is not licensed for coDDercial coD#ilers, or $ou have
not loaded a coD#iler environDent Dodule, the o#enAsource G!U coD#ilers a##ro#riate to $our Linux
distri>ution *ill >e availa>le for $ou to use!
5.-. 4&L2L' C+MP&L'RS *+R PRLL'L <+2S
8hen coD#ilin% source code for #arallel Go>s, users Da$ #refer to use a coD#iler *hich is ca#a>le of
creatin% >inaries coD#ati>le *ith a Dessa%eA#assin% interface (1P-"! 2he follo*in% coD#iler coDDands are
autoDaticall$ availa>le after loadin% an 1P- environDent Dodule (e!%! O#en1P-, 1v)P-CH"B
Language C CBB 9rtran00 9rtran42 9rtran4)
C9mmand mpicc mpi44 mpif55 mpif55 mpi44
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e (, of +,
5.0. CC'L'RT'/ HPC L&2RR&'S
2o assist develo#ers of hi%h #erforDance a##lications and utilities, a nuD>er of o#tiDised li>raries are
availa>le to #rovide coDDonl$ used functions includin% 0ourier transforDs, linear al%e>ra, s#arse solvers,
vector DatheDatics and other routines! Often these li>raries are tuned to take advanta%e of the latest CPU
instruction sets to accelerate their #erforDance on Dodern coD#uter s$steDs H a *ell tuned li>rar$ set can
iD#rove the #erforDance of a##lications run across the cluster >$ u# to 3'I!
4our HPC cluster has >een installed *ith a set of hi%h #erforDance li>raries that are a##ro#riate for the
hard*are and soft*are coD#onents that Dake u# the s$steD! .ach li>rar$ is installed in a shared location
availa>le to all coD#ute nodes H users Da$ add li>raries into their environDent >$ loadin% the Dodule for
the li>rar$ set the$ *ish to use! See section +! for Dore inforDation a>out loadin% environDent Dodules!
2he follo*in% li>rar$ #acka%es are coDDonl$ availa>le on HPC coD#ute clustersB
Li:rary name License type Li:rary type Li:rary l9cati9n ?aria:le
ATLAS O#enAsource )utoAtuned linear al%e>ra soft*are $7*<7(<'@
BLAS O#enAsource 7asic linear al%e>ra su>A#ro%raDs (0,, iD#leDentation" $@<7(<'@
CBLAS O#enAsource 7asic linear al%e>ra su>A#ro%raDs (C iD#leDentation" $%@<7(<'@
T>'3d9u:le O#enAsource 5iscrete 0ourier transforD (dou>leA#recision version " $CC*N='>
T>'3fl9at O#enAsource 5iscrete 0ourier transforD (sin%leA#recision version " $CC*N='>
T>* O#enAsource
5iscrete 0ourier transforD (latest sta>le version 3 for
sin%le@float, dou>le and lon%Adou>le #recision"
$CC*N='>
LA,AC= O#enAsource Linear al%e>ra #acka%e $<7P7%O='>
$ntel +=L CoDDercial -ntel kernel Dath li>rar$ suite $.O<
AC+L CoDDercial )15 core Dath li>rar$ suite $7%.<
CoDDercial li>raries reFuire a valid license for $our site >efore users can coD#ile and run a##lications
usin% theD! -f $our cluster is #ro#erl$ licensed, environDent Dodules are #rovided to ena>le $ou to use the
li>raries listed a>ove! -f $our cluster is not licensed for coDDercial li>raries, or $ou have not loaded a
coD#iler environDent Dodule, the standard s$steD li>raries a##ro#riate to $our Linux distri>ution *ill >e
availa>le for $ou to use!
SoDe hi%hA#erforDance li>raries reFuire #reAcoD#ilation or tunin% >efore the$ can >e linked into $our
a##lications >$ the coD#iler $ou have chosen! 8hen several different coD#ilers are availa>le on $our
cluster, li>raries Da$ have >een #re#ared usin% Dulti#le different coD#ilers to #rovide %reater flexi>ilit$ for
users *ho include theD froD their source code! 2he modules environDent Da$ list soDe hi%hA#erforDance
li>raries Dulti#le tiDes, one for each coD#iler the$ have >een #re#ared on! Users should load the li>rar$
environDent Dodule that Datches the coD#iler the$ intend to use to coD#ile their a##lications!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e (; of +,
5.8. C+MP&L'R '(4&R+(M'(T 4R&2L'S *+R L&2RR&'S
8hen a Dodule containin% accelerated li>raries is loaded, the s$steD autoDaticall$ u#dates relevant s$steD
environDent varia>les to assist users in *ritin% #orta>le code and Dakefiles *hich can >e used *ith
Dulti#le different li>raries! 2he 4!!6%"7/ and %&6%"7/ varia>les are d$naDicall$ u#dated to include the
relevant include directives for coD#ilers as li>rar$ Dodules are loadedE e!%!
[user@login$1 #]# module load li+s#gcc#))tw#././
[user@login$1 #]# echo $%PPC<7;(
-'#opt#gridware#li+s#gcc#))tw#1/1/#include
[user@login$1 #]# echo $<=C<7;(
-<#opt#gridware#li+s#gcc#))tw#1/1/#li+#
[user@login$1 #]#
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e (< of +,
1# +ESSAGE ,ASS$!G E!/$RO!+E!TS
=.%. M'SS)' PSS&() &(TR+/.CT&+(
) Dessa%eA#assin% interface (1P-" is an )P- s#ecification that allo*s #rocesses to coDDunicate *ith one
another >$ sendin% and receivin% Dessa%es! 2$#icall$ used for #arallel #ro%raDs, an 1P- can allo*
#rocesses on one node to coDDunicate *ith #rocesses runnin% on an entirel$ se#arate node, #rovidin%
%reater flexi>ilit$ and lar%e scale a##lications to >e executed across an HPC cluster! CoDDunication
Dessa%es >et*een nodes can >e trans#orted via a nuD>er of coDDon interconnect net*orks includin%
.thernet, 1$rinet or -nfini>and!
) nuD>er of different coDDercial and o#enAsource 1P- iD#leDentations are availa>le for C, CMM and
0ortran code and there Da$ >e Dulti#le different o#tions installed on $our coD#ute cluster! )##lications
*ritten to use an 1P- are often coD#iled to include the 1P- li>raries the$ reFuire, usin% the coD#ilers and
(o#tionall$" hi%hA#erforDance li>raries descri>ed in section 5!
=.-. MP& &(T'RC+(('CTS
2he net*ork trans#ort used for 1P- Dessa%es *ill de#end on the hard*are #rovided *ith $our HPC
cluster! 1ost >asic clusters have a %i%a>it .thernet interconnect *hich is shared >et*een a nuD>er of
different tasks! 1ore coD#lex clusters Da$ have one or Dore dedicated %i%a>it .thernet net*orks for 1P-
traffic! Clusters that #rocess lar%e aDounts of #arallel *orkload Da$ >e installed *ith a hi%hA>and*idth,
lo*Alatenc$ interconnect such as -nfini>and, 1$rinet or ('A%i%a>it .thernet! 2hese s#ecialist net*orks are
es#eciall$ desi%ned to #rovide the fastest Dessa%e #assin% s$steDs availa>le at >and*idths of Dulti#le
%i%a>$tes #er second! 4our s$steD adDinistrator *ill #rovide $ou *ith inforDation on the different 1P-
net*orks installed on $our cluster, and can offer advice on ho* to %et the >est #erforDance for $our
a##lications!
a" ET(ER!ET !ET>OR=S
2he DaGorit$ of HPC clusters incor#orate one or Dore %i%a>it .thernet net*orks into their desi%n! Po#ular
for their lo* cost of o*nershi#, relativel$ hi%h #erforDance of (G>#s #er link and flexi>le C)25. ca>lin%,
%i%a>it .thernet net*orks are no* coDDonl$ >ein% de#lo$ed across lar%e caD#uses and to usersL
deskto#s! 4our cluster Da$ have a dedicated 1P- %i%a>it .thernet net*ork, or Di%ht share a net*ork *ith
other tasks! ('A%i%a>it .thernet links are also >ecoDin% increasin%l$ #o#ular and are often used siD#l$ as a
hi%her >and*idth version of %i%a>it .thernet! Suita>le 1P- iD#leDentations for .thernet net*orks includeB
• O#en1P-E develo#ed >$ Der%in% the #o#ular 02A1P-, L)A1P- and L)1@1P- iD#leDentations
• 1P-CHE an iD#leDentation of the 1P-A(!( standard
• 1P-CHE an iD#leDentation of the 1P-A!' standard
• -ntel 1P-E a coDDercial 1P- iD#leDentation availa>le for >oth 6;: and -):+ architectures
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e ' of +,
>" $!$!$BA!% ABR$CS
)vaila>le as addAon interfaces to Dost coD#ute node s$steD, -nfini>and hostAchannel ada#ters allo*
coD#ute nodes to coDDunicate *ith others at s#eeds of u# to 3G>#s and latencies of around
Dicroseconds! -nfini>and also #rovides R51) (reDote direct DeDor$ access" ca#a>ilities to hel# reduce
the CPU overhead on coD#ute nodes durin% coDDunication! -nfini>and can transDit and receive Dessa%es
in a nuD>er of different forDats includin% those Dade u# of -nfini>and ver&s' or TCP Dessa%es siDilar to
.thernet net*orks! Suita>le 1P- iD#leDentations for -nfini>and includeB
• O#en1P-E develo#ed >$ Der%in% the #o#ular 02A1P-, L)A1P- and L)1@1P- iD#leDentations
• 19)P-CHE an iD#leDentation of the 1P-A(!( standard
• 19)P-CHE an iD#leDentation of the 1P-A!' standard
• -ntel 1P-E a coDDercial 1P- iD#leDentation availa>le for >oth 6;: and -):+ architectures
=.0. S'L'CT&() ( MP& &MPL'M'(TT&+(
.ach 1P- availa>le on $our cluster has >een #reA>uilt for the availa>le hard*are and different coD#iler
o#tions availa>le on $our site! .ach o#tion is confi%ured >$ default to autoDaticall$ run over the >est
#erforDin% interconnect net*ork availa>le! Contact $our s$steD adDinistrator if $ou need assistance runnin%
a #arallel Go> over a different interconnect fa>ric!
Use the Jmodule load@ coDDand to load the a##ro#riate environDent Dodule for the 1P- reFuiredB
[user@login$1 #]$ module a,ail
--------------------- #opt#gridware#module)iles# -------------------------
compilers#intel#11111$M/#/ li+s#intel#))tw#1/1/#))tw
compilers#intel#11111$M/#04 mpi#gcc#mpich/#11#mpich/
li+s#gcc#atlas#1"1/#atlas mpi#gcc#m,apich#11/rc1#m,apich
li+s#gcc#+las#1 mpi#gcc#m,apich/#11H11p1#m,apich/
li+s#gcc#c+las#1#c+las mpi#gcc#openmpi#11H#openmpi
li+s#gcc#))tw/#/111H#))tw-dou+le mpi#intel#mpich/#11#mpich/
li+s#gcc#))tw/#/111H#))tw-)loat mpi#intel#m,apich#11/rc1#m,apich
li+s#gcc#))tw#1/1/#))tw mpi#intel#m,apich/#11H11p1#m,apich/
li+s#gcc#lapack#11$#lapack mpi#intel#openmpi#11H#openmpi
li+s#intel#))tw/#/111H#))tw-dou+le torPue#/.H.2de)ault3
li+s#intel#))tw/#/111H#))tw-)loat
[user@login$1 #]$ module load mpi#gcc#mpich/
[user@login$1 #]$
-f users are develo#in% source code that inte%rates Dulti#le coD#iler, 1P- and HPC li>raries, the$ Dust
reDeD>er to load all the Dodules reFuired for the coD#ilation, e!%!
[user@login$1 #]$ module load compilers#intel li+s#intel#))tw mpi#intel#mpich/
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e ( of +,
Users Da$ #refer to include coDDonl$ used environDent Dodules into their default environDent to ensure
that the$ are loaded ever$ tiDe the$ lo%in! See section +! for Dore inforDation a>out loadin% environDent
Dodules! )##endix ) sho*s an exaD#le coD#ilation of a #arallel a##lication and its execution usin%
O#en1P- and the %ridAen%ine cluster scheduler!
=.8. '1'C.T&() PRLL'L PPL&CT&+(
)fter coD#ilin% $our #arallel a##lication, it can >e su>Ditted for execution via the cluster Go> scheduler
s$steD! 1ost Go> schedulers use Go>scri#ts to instruct the scheduler ho* to run $our a##lication H see
section , for Dore inforDation on ho* to create Go>scri#ts!
4our cluster scheduler autoDaticall$ deterDines *hich coD#ute nodes are availa>le to run $our #arallel
a##lication, takin% into account the to#olo%$ of the reFuested interconnect net*ork and an$ existin% Go>s
runnin% on the cluster! 2he cluster scheduler coDDunicates this inforDation to the selected 1P-
iD#leDentation via a mac;inefile *hich lists the hostnaDes of coD#ute nodes to >e used for the #arallel
execution! -n order to control ho* Dan$ coD#ute cores to use on each DultiACPU coD#ute node, users
Dust instruct the scheduler to allocate the correct nuD>er of cores to the Go>! Section , #rovides exaD#les
of the different o#tions *hich can >e used to #ass this inforDation to the cluster scheduler!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e of +,
0# CLUSTER SC(E%ULERS
>.%. CL.ST'R SCH'/.L'R &(TR+/.CT&+(
4our HPC cluster is Dana%ed >$ a cluster sc;eduler H also kno*n as the &atch scheduler, wor#load
manager, (ueuing system or load-&alancer! 2his a##lication allo*s Dulti#le users to fairl$ share the
Dana%ed coD#ute nodes, allo*in% s$steD adDinistrators to control ho* resources are Dade availa>le to
different %rou#s of users! ) *ide variet$ of different coDDercial and o#enAsource schedulers are availa>le
for coD#ute clusters, each #rovidin% different features for #articular t$#es of *orkload! )ll schedulers are
desi%ned to #erforD the follo*in% functionsB
• )llo* users to su>Dit ne* Go>s to the cluster
• )llo* users to Donitor the state of their Fueued and runnin% Go>s
• )llo* users and s$steD adDinistrators to control runnin% Go>s
• 1onitor the status of Dana%ed resources includin% s$steD load, DeDor$ availa>le, etc!
1ore advanced schedulers can >e confi%ured to iD#leDent #olicies that control ho* Go>s are executed on
the cluster, ensurin% fairAsharin% and o#tiDal loadin% of the availa>le resources! 1ost schedulers are
extendi>le *ith a variet$ of #lu%Ain o#tions for Donitorin% different Detrics, re#ortin% s$steD usa%e and
allo*in% Go> su>Dission via different interfaces! 2he scheduler s$steD availa>le on $our coD#ute cluster *ill
de#end on ho* $our s$steD adDinistrator has confi%ured the s$steD H the$ *ill >e a>le to advise $ou on
ho* $our HPC cluster is set u#!
8hen a ne* Go> is su>Ditted >$ a user, the cluster scheduler soft*are assi%ns coD#ute cores and DeDor$
to satisf$ the Go> reFuireDents! -f suita>le resources are not availa>le to run the Go>, the scheduler adds the
Go> to a Fueue until enou%h resources are availa>le for the Go> to run! 4our s$steD adDinistrator can
confi%ure the scheduler to control ho* Go>s are selected froD the Fueue and executed on cluster nodes!
Once a Go> has finished runnin%, the scheduler returns the resources used >$ the Go> to the #ool of free
resources, read$ to run another user Go>!
>.-. T,P'S +* <+2
Users can run a nuD>er of different t$#es of Go> via the cluster scheduler, includin%B
• Batc; 89:sE nonAinteractive, sin%leAthreaded a##lications that run onl$ on one coD#ute core
• Array 89:sE t*o or Dore siDilar >atch Go>s *hich are su>Ditted to%ether for convenience
• S+, 89:sC nonAinteractive, DultiAthreaded a##lications that run on t*o or Dore coD#ute cores on
the saDe coD#ute node
• ,arallel 89:sC nonAinteractive, DultiAthreaded a##lications Dakin% use of an 1P- li>rar$ to run on
Dulti#le cores s#read over one or Dore coD#ute nodes
• $nteracti?e 89:sC a##lications that users interact *ith via a coDDandAline or %ra#hical interface
/onAinteractive Go>s are su>Ditted >$ users to the >atch scheduler to >e Fueued for execution *hen
suita>le resources are next availa>le! -n#ut and out#ut data for nonAinteractive Go>s are usuall$ in the forD
of files read froD and *ritten to shared stora%e s$steDs H the user does not need to reDain lo%%ed into
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3 of +,
the cluster for their Go>s to run! Scheduler s$steDs #rovide a DechanisD for collectin% the inforDation
out#ut >$ nonAinteractive Go>s, Dakin% it availa>le as files for users to Fuer$ after the Go> has coD#leted!
/onAinteractive Go>s are usuall$ su>Ditted usin% a ?o6"cript *hich is used to direct the scheduler ho* to
run the a##lication! 2he coDDands s$ntax used in the Go> scri#t *ill de#end on the t$#e and version of
scheduler installed on $our HPC cluster!
-nteractive Go>s are coDDonl$ su>Ditted >$ users *ho need to control their a##lications via a %ra#hical or
coDDandAline interface! 8hen su>Ditted, the scheduler atteD#ts to execute an interactive Go> iDDediatel$
if suita>le resources are availa>le H if all nodes are >us$, users Da$ choose to *ait for resources to
>ecoDe free or to cancel their reFuest and tr$ a%ain later! )s the in#ut and out#ut data for interactive Go>s
is d$naDicall$ controlled >$ users via the a##lication interface, the scheduler s$steD does not store out#ut
inforDation on the shared files$steD unless s#ecificall$ instructed >$ the a##lication! -nteractive Go>s onl$
continue to run *hile the user is lo%%ed into the cluster H the$ are terDinated *hen a user ends the lo%in
session the$ *ere started froD!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e + of +,
-# GR$%3E!G$!E CLUSTER SC(E%ULER
@.%. .S&() )R&/5'()&(' = CL.ST'R SCH'/.L'R
GridA.n%ine (G." is a cluster scheduler desi%ned to Dana%e the resources availa>le on $our cluster
Dachines and allo* Dulti#le users to securel$ and efficientl$ su>Dit *ork to a shared resource! GridA
en%ine #rovides a siD#le DechanisD for users to su>Dit >atch and #arallel Go>s froD interactive nodes into
centraliCed Fueues and have Go> results delivered to a desi%nated location!
) t$#ical %ridAen%ine installation reFuires a (master server (norDall$ the headnode or a cluster service
node", one or Dore su>Dit hosts froD *here users can su>Dit Go>s (t$#icall$ a lo%in or headnode server"
and a nuD>er of execution hosts *here Go>s are run! 2he #rocess for runnin% a Go> throu%h %ridAen%ine isB
• Pre#are the a##lication or >inar$ file to run on a cluster node
• Create a 89:script to run the a##lication *ith the reFuired #araDeters
• Select the %ridAen%ine directives reFuired to control ho* $our Go> is run
• Su>Dit the Go>scri#t to G. froD the cluster lo%in or Daster node
2he ste#s >elo* indicate ho* to #erforD these ste#s to su>Dit different t$#es of Go> to the %ridAen%ine
scheduler!
@.-. <+2 SCR&PTS (/ AS.2 /&R'CT&4'S
) 89:script usuall$ takes the forD of a siD#le shell scri#t containin% a list of tasks to #erforD in order to run
$our Go>! 2hese Da$ include siD#le coDDands (e!%! #rintin% a status Dessa%e, co#$in% or Dovin% files, etc!"
or callin% a >inar$ a##lication to execute *ith #araDeters! )n exaD#le Go>scri#t is sho*n >elo*B
#F#+in#+ash
# *his is an eEample Go+ script
# ;rid-engine directi,es are shown +elow
# 9Eport my current en,ironment ,aria+les
#$ -:
# .erge stdout and stderr
#$ -G y
echo IJo+ starting at 8date8K
~#test#myapplication -i ~#data#intput)ile4
Lines #receded >$ a 898 character are inter#reted as coDDents and are i%nored >$ >oth the shell and
%ridAen%ine as the scri#t is su>Ditted and executed! Lines #receded >$ the 898 characters are
inter#reted as %ridAen%ine directives H these o#tions are #arsed *hen the Go>scri#t is su>Ditted, >efore the
scri#t itself is run on an$ s$steD! 5irectives can >e used to control ho* %ridAen%ine schedules the Go> to
>e run on one or Dore coD#ute nodes and ho* $ou can >e notified of the status of $our Go>!
2he follo*in% coDDon directives are su##orted >$ %ridAen%ineB
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 5 of +,
/irecti9e /e"cription B"u6 examp!e
-a
[[%%]]BB]..==hhmm[.((]
5efines the date and tiDe *hen a Go> can >e
executed!
-a /$111$/1!$
-ar ar1id
)ssi%ns the Go> to the advanced reservation *ith -5
ar'id
-ar 41/
-+ y[es]Qn[o]
)llo*s a user to indicate that the coDDand #rovided
to Fsu> is a >inar$ instead of a scri#t!
-+ y my+inary.+in
-cwd
-nstructs %ridAen%ine to execute the Go> froD the
current *orkin% director$!
-cwd
-display
)llo*s %ridAen%ine to confi%ure the reDote 6 dis#la$
for a %ra#hical a##lication!
-display imac:
-dl
[[%%]]BB]..==hhmm[.((]
)vaila>le for users allo*ed to su>Dit deadline Go>s to
indicate the final date and tiDe there Go>s can >e run!
-dl /$111$/1!$
-e path
-ndicates the #ath to >e used for standard error out#ut
streaDs!
-e #users#+o+#output
-hard
S#ecifies that Go>s su>Ditted *ith resource
reFuireDents Dust full$ satisf$ these reFuireDents
>efore the$ can run!
-hard
-hold1Gid RGo+id?
5efines that the su>Ditted Go> Dust *ait >efore
executin% until all Go>s in the coDDaAse#arated
:;obid< list have coD#leted!
-hold1Gid //S/4
-i R)ile?
S#ecifies that the in#ut streaD to the Go> should >e
taken froD the file :file<.
-i #users#+o+#input.//
-G y[es]Qn[o]
-nstructs the scheduler to Der%e the stdout and stderr
streaDs froD the Go> into a sin%le file! 7$ default,
se#arate files *ill >e created for each streaD!
-G y
-l resourceT,alue
S#ecifies that the Go> reFuires a #articular resource to
>e a>le to run! 2he =conf -sc coDDand sho*s the
confi%ured resources availa>le for selection!
-l eEclusi,eTtrue
-m +QeQaQsQn
-nstructs the scheduler the send eDail to notif$ the
user *hen the Go>B
> H >e%ins
e H ends
a H is a>orted
s H is sus#ended
n H never eDail for this Go>
-m +eas
-. user[@host]
S#ecifies the eDail address to use to notif$ users of
Go> status!
-. myuser
-noti)y
ReFuest that the scheduler sends *arnin% si%nals
(/(7./*1 and /(7./*2" to runnin% Go>s #rior to
sendin% the actual /(7/#>! or /(7?(%% Dessa%es at
-noti)y
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e : of +,
/irecti9e /e"cription B"u6 examp!e
terDination tiDe!
-now y[es]Qn[o]
Si%nifies that interactive Go>s run usin% Fsu>, Fsh,
Flo%in or Frsh should >e scheduled and run
iDDediatel$ or not at all!
-now y
-& name
)llo*s users to set the naDe to >e used to identif$
the Go>! -f this #araDeter is not s#ecified, the Go> is
%iven the saDe naDe as the Go>scri#t!
-& @o+:.=Go+4
-o path
-ndicates the #ath to >e used for standard out#ut
streaDs!
-o #users#+o+#output
-p priority
)llo*s a user to reFuest that their Go>s are run *ith
lo*er than norDal #riorit$! 9alid #riorities are '
(default" to A('3 (lo*est #riorit$"!
-p -1$/
-pe name nodes
ReFuests the naDed #arallel environDent and nuD>er
of nodes for the Go> to >e run over!
-pe mpi-,er+ose !
-P Pueue
2he scheduler Fueue to *hich the Go> should >e
su>Ditted! -f oDitted, the scheduler autoDaticall$
deterDines the correct Fueue to use >ased on the Go>
t$#e!
-P serial.P
-r y[es]Qn[o]
-f set to $es, this #araDeter causes the scheduler to
autoDaticall$ reArun the Go> if it fails durin% execution!
-r y
-so)t
S#ecifies that Go>s su>Ditted *ith resource
reFuireDents can still >e run even if the reFuested
resources are not availa>le! -f this #araDeter is not
s#ecified, the scheduler defaults to hard
reFuireDents!
-so)t
-sync y[es]Qn[o]
Causes the =sub coDDand to *ait for the Go> to
coD#lete >efore exitin%!
-sync y
-t )irst-last
Su>Dits a Go> task arra$ startin% at task first and
ended *ith task last!
-t 1-1$$$
-,eri)y
5is#la$s results of a dr$Arun su>Dission *ithout actuall$
su>Dittin% the Go> for execution!
-,eri)y
-: .x#orts all current environDent varia>les to the Go>! -:
-wd path Sets the *orkin% director$ of the Go> -wd #users#+o+#app/
GridAen%ine directives Da$ also >e s#ecified at Go> su>Dission tiDe as #araDeters to the =sub coDDandE
for exaD#leB
[user@login$1 ~]$ Psu+ -G y -: -cwd .#my-serial-Go+.sh
Please note that %ridAen%ine autoDaticall$ iD#leDents a walltime settin% #er Go> Fueue *hich controls the
DaxiDuD aDount of tiDe a user Go> Da$ >e allo*ed to execute! See $our s$steD adDinistrator for Dore
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e , of +,
inforDation on the DaxiDuD tiDe liDits enforced for the Fueues confi%ured on $our cluster!
@.0. )R&/5'()&(' PS'./+ '(4&R+(M'(T 4R&2L'S
2o assist users *hen *ritin% Go> scri#ts, %ridAen%ine autoDaticall$ creates a nuD>er of environDent
varia>les that Da$ >e referenced in the scri#t! 2hese areB
/aria:le name %escripti9n
$>,0 2he userLs hoDe director$ on the execution Dachine
./0* 2he user -5 of the Go> o*ner
@>)'(& 2he current Go> -5
@>)'-",0 2he current Go> naDe (Da$ >e set via the -- directive"
$>/#-",0 2he naDe of the execution host
#"/?-(& 2he arra$ Go> task index nuD>er
Other environDent varia>les Da$ >e set Danuall$ >$ the user >efore su>Dittin% the Go>, or >$ loadin%
environDent Dodules containin% the varia>les reFuired! Users Dust reDeD>er to use the -A directive to
ex#ort their environDent varia>les to su>Ditted Go>s if the$ have Danuall$ confi%ured environDent varia>les!
@.8. <+2 +.TP.T *&L'S
7$ default, %ridAen%ine *ill collect an$ inforDation out#ut to the stdout or stderr channels >$ $our Go>scri#t
and *rite it to files naDed ;obscript.o and ;obscript.e (*here J)o&script* is the filenaDe of the
su>Ditted Go>scri#t"! 2he default location of these files is $our hoDe director$, >ut this can >e Dodified
usin% the -cwd, -o or -e directives! -f the -; y directive is also s#ecified, stdout and stderr streaDs are
>oth *ritten to a sin%le file naDed ;obscript.o (or to the location s#ecified >$ the -o directive"!
@.5. S.2M&TT&() (+(5&(T'RCT&4' <+2S 4& AS.2
2o su>Dit a >atch or serial Go>, use the J=subK coDDandE
[user@login$1 ~]$ cat my-serial-Go+.sh
#F#+in#+ash
echo I' am a serial Go+FK
sleep 1$
[user@login$1 ~]$ Psu+ my-serial-Go+.sh
4our Go> *ill >e held in the serial Fueue until a coD#ute node is availa>le to run it! SoDe clusters have
s#ecific Fueues confi%ured to run different t$#es of Go>sE use the J=stat -g c3 coDDand to vie* the
Fueues confi%ured on $our clusterB
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e ; of +,
[user@login$1 ~]$ Pstat -g c
%<A(*9> UA9A9 %U<D7= A(9= >9( 7:7'< *D*7< ao7%=( cdsu9
--------------------------------------------------------------------------------
de,-multiway.P $.$0 $ $ / 4! 10 $
de,-parallel.P $.$0 1 $ 4 0 1 $
de,-serial.P $.$0 1 $ " 4! ! $
multiway.P $.M" / $ 10$ 1$$$ M0$ 4!
parallel.P $.M" !0 $ /$ 1/H 10 0
serial.P $.M" $ $ 1M 1$$$ MH/ 4!
smp.P $.$0 4 $ 0$ 04 $ $
Use the -= :=ueuename< directive to su>Dit $our Go> to a #articular Fueue as directed >$ $our local
s$steD adDinistratorB
[user@login$1 ~]$ Psu+ -P serial.P my-serial-Go+.sh
@.=. 4&'3&() TH' STT.S +* S.2M&TT'/ <+2
2he %ridAen%ine scheduler allo*s users to vie* the status of the Go>s the$ have su>Ditted! 2he =stat
coDDand dis#la$s the status of all the Go>s su>Ditted >$ the userB
Go+-'= prior name user state su+mit#start at Pueue slots
---------------------------------------------------------------------------------------
/1 $.HHH$$ openmpi.sh alces-user r $1#$0#/$11 1$:$4:4 parallel.P@node$1.lcl 10
1 $.//$$$ sleepGo+.sh alces-user r $1#$0#/$11 11:44:/$ serial.P@node$$.lcl 1
/ $.1/$$$ sleepGo+.sh alces-user Pw $1#$0#/$11 11:4H:1$ 1
4 $.1/$$$ sleepGo+.sh alces-user Pw $1#$0#/$11 11:4!:44 1
H $.1/$$$ sleepGo+.sh alces-user Pw $1#$0#/$11 11:4!:H/ 1

2he Go> state can >e Darked as one or Dore of the follo*in%B
Status c9de 79: state %escripti9n
d deleted
) user or adDinistrator has reFuested that the Go> should >e
deleted froD the Fueuin% s$steD
0 0rror
2he Go> is in error status! Use the 3-eBplain 03 o#tion to
=stat for Dore inforDation!
h hold 2he Go> has >een set to hold >$ a user or adDinistrator
r running 2he Go> is runnin%
* *estarted 2he Go> has >een restarted
s suspended 2he Go> has >een sus#ended and is not currentl$ runnin%
/ /uspended 2he Go> is currentl$ >ein% sus#ended
t transferring 2he Go> is >ein% transferred to an execution host to >e run
= =ueued 2he Go> is Fueued for execution
w waiting 2he Go> is *aitin% for resources to >e availa>le
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e < of +,
7$ default, the =stat coDDand onl$ sho*s Go>s >elon%in% to the user executin% the coDDand! Use the
2=stat -u 8C83 coDDand to see the status of Go>s su>Ditted >$ all users!
2he =stat -f coDDand #rovides Dore detail a>out the scheduler s$steD, also listin% the status of each
Fueue instance on ever$ execution host availa>le in $our cluster! Queues are listed *ith the follo*in%
statusB
Status c9de Queue state %escripti9n
a alarm DloadE
) Fueue instance has exceeded its #reAconfi%ured DaxiDuD
load threshold
c
configuration
error
) Fueue instance has a confi%uration error H contact $our
s$steD adDinistrator for assistance
d disabled
) Fueue instance has >een teD#oraril$ disa>led >$ a s$steD
adDinistrator
o orphaned
2he indicated Fueue instance has >een deAconfi%ured, >ut Go>s
are still runnin% usin% Fueue resources!
s suspended 2he Fueue instance has >een sus#ended!
u unknown
2he scheduler has lost contact *ith the Dachine hostin% the
Fueue instance!
"
alarm
DsuspendE
2he Fueue instance has exceeded its sus#ension threshold
4
4alendar
suspended
2he Fueue has >een autoDaticall$ sus#ended via the >uilt in
calendar facilit$! Contact $our s$steD adDinistrator for
inforDation on the confi%ured calendar #olicies for $our site!
&
4alendar
disabled
2he Fueue has >een autoDaticall$ disa>led via the >uilt in
calendar facilit$! Contact $our s$steD adDinistrator for
inforDation on the confi%ured calendar #olicies for $our site!
0 0rror
2he scheduler *as una>le to contact the she#herd #rocess on
the Dachine hostin% this Fueue instance! Contact $our s$steD
adDinistrator for assistance!
/ /ubordinated
2his Fueue instance has >een sus#ended via su>ordination to
another Fueue!
@.>. S.2M&TT&() PRLL'L <+2
4our cluster has also >een confi%ured *ith a #arallel Fueue suita>le for runnin% 1P- Go>s across Dulti#le
nodes! 7$ default, this Fueue has >een confi%ured *ith a #arallel environDent for the 1P- environDents
availa>le on $our cluster! 2o su>Dit a #arallel Go> via the cluster scheduler, users can create a Go> scri#t
and su>Dit it *ith the (su& coDDand, usin% the -pe name :slots< %ridAen%ine directiveB
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3' of +,
[user@login$1 #]# cat Go+script.sh
#F#+in#+ash
#$ -: -cwd
#$ -pe mpi 4
mpirun -np ! .#+enchmark1!.+in
[user@login$1 #]# module load mpi#intel#openmpi#1.4
[user@login$1 #]# Psu+ .#Go+script.sh
2he mpirun coDDand included in the Go>scri#t a>ove su>Dits an O#en1P- Go> *ith ; #rocesses H the 1P-
Dachinefile is autoDaticall$ %enerated >$ %ridAen%ine and #assed to the 1P- *ithout needin% further
#araDeters! 2he -pe mpi F directive instructs the scheduler to su>Dit the Go>scri#t to the ,!( #arallel
environDent usin% + node slots (usin% #rocesses #er node"!
2here are #arallel environDents set u# on $our cluster >$ defaultB
• mpi D default 1P- #arallel environDent
• mpi3?er:9se D su>Dits the Go> *ra##ed in Go> su>Dission inforDation such as date su>Ditted
and Fueue inforDation!
@.@. S.2M&TT&() &(T'RCT&4' <+2SC
GridAen%ine also allo*s interactive Go>s to >e scheduled to run on the coD#ute nodes of $our cluster! 2his
Dethod can >e used to allo* Dulti#le users to run coDDandAline and %ra#hical a##lications across cluster
coD#ute nodes, fairl$ sharin% resources >et*een other interactive a##lications and >atch Go>s!
2he 3=rsh :binary<3 coDDand can >e used to schedule and launch an a##lication, or *hen invoked
*ithout an a##lication naDe, to launch an interactive shell session on an availa>le coD#ute node! 2he (rsh
session is scheduled on the next availa>le coD#ute node on the default interactive Fueue, or a s#ecific
Fueue s#ecified *ith the 2-= :=ueuename<3 #araDeter!
[user@login$1 #]$ uptime
$:H":/ up / daysS 1" usersS load a,erage: 4./S H.4S H.H1
[user@login$1 #]$ Prsh
[user@node1/ ~]$ uptime
$:H":/H up H! minsS 1 userS load a,erage: $.$$S $.$$S $.$$
[user@node1/ ~]$
2he 2=rsh -A Bterm3 coDDand can >e used to schedule and launch an interactive xterD session on a
coD#ute node! 2he (rsh session is scheduled on the next availa>le coD#ute node on the default
interactive Fueue, or a s#ecific Fueue s#ecified *ith the 2-= :=ueuename<3 #araDeter! 2he %ra#hical
dis#la$ for $our a##lication or +term session *ill >e dis#la$ed on the s$steD identified >$ the &(/!%"+
environDent varia>le settin% *hen (rsh *as invoked (usuall$ $our local %ra#hical terDinal"! )lthou%h
%ra#hical a##lications are t$#icall$ executed froD an interactive %ra#hical deskto# session (via !emote
"es#top or $,' advanced users can direct the %ra#hical dis#la$ to a *orkstation outside the cluster >$
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3( of +,
settin% the &(/!%"+ varia>le >efore runnin% (rsh-
2he exaD#le >elo* sho*s ho* the =sh coDDand launches an xterD session on an availa>le coD#ute
node! 8hen the %ra#hical glBgears a##lication is started, the out#ut *indo* is autoDaticall$ dis#la$ed in
the ori%inatin% !emote "es#top %ra#hical deskto# *indo*!
@.D. S.2M&TT&() ( RR, +* <+2SC
) coDDon #ro>leD is that $ou have a lar%e nuD>er of Go>s to run, and the$ are lar%el$ identical in terDs
of the coDDand to run! 0or exaD#le, $ou Da$ have (''' data sets, and $ou *ant to run a sin%le #ro%raD
on theD usin% the cluster! ) Fuick solution is to %enerate (''' se#arate Go>scri#ts, and su>Dit theD all to
the Fueue! 2his is not efficient, neither for $ou nor for the scheduler Daster node!
GridAen%ine allo*s users to su>Dit a sin%le Go> *ith a nuD>er of se#arate tasksE these are scheduled *ith
a sin%le Go> -5 (Dakin% it siD#le to track, #rioritiCe or cancel all the Go>s" *ith a nuD>er of se#arate tasks!
8hen the exaD#le Go>scri#t >elo* is su>Ditted, an arra$ of (',''' tasks *ill >e %enerated and executed
on the Fueue selected!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3 of +,
#F#+in#+ash
# 9Eport our current en,ironment 2-:3 and current working directory 2-cwd3
#$ -: -cwd
# *ell (;9 that this is an array Go+S with VtasksV to +e num+ered 1 to 1$$$$
#$ -t 1-1$$$$
# Nhen a single command in the array Go+ is sent to a compute nodeS its task num+er is
# stored in the ,aria+le (;91*7(O1'=S so we can use the ,alue o) that ,aria+le to get
# the results we want:
~#programs#program -i ~#data#input.$(;91*7(O1'= -o ~#results#output.$(;91*7(O1'=
2he scri#t can >e su>Ditted as norDal *ith the =sub coDDand and is dis#la$ed >$ %ridAen%ine as a sin%le
Go> *ith Dulti#le #artsB
[user@login$1 ~]$ Psu+ -P serial.P array1Go+.sh
Bour Go+-array /.1-/:1 2Varray1Go+.shV3 has +een su+mitted
[user@login$1 ~]$ Pstat
Go+-'= prior name user state su+mit#start at Pueue
slots Ga-task-'=
----------------------------------------------------------------------------------------
/ $.HHH$$ array1Go+. user s Pw $#/M#/$1$ 1/:$!:$/ 1 1-1$$$$:1
[user@login$1 ~]$
@.%$. <+2 /'P'(/'(C&'SC
Users su>Dittin% Go>s can indicate that %ridAen%ine should *ait >efore startin% their Go> until a #revious Go>
or list of Go>s has >een successfull$ coD#leted! -f the #roceedin% Go> is an arra$ Go>, %ridAen%ine *ill *ait
until all arra$ tasks have >een coD#leted! 2he follo*in% exaD#le Da$ >e used to su>Dit a ne* Go> called
my';ob.sh that *ill run *hen the Go> *ith -5 nuD>er ++; has >een coD#letedB
[user@login$1 ~]$ Psu+ -P serial.P -hold 44! my1Go+.sh
Bour Go+ H1/ 2Vmy1Go+.shV3 has +een su+mitted
[user@login$1 ~]$ Pstat
Go+-'= prior name user state su+mit#start at Pueue
slots Ga-task-'=
----------------------------------------------------------------------------------------
H1/ $.HHH$$ my1Go+. user s Pw $#/"#/$1$ 1$:1!:44 1
[user@login$1 ~]$
@.%%. /'L'T&() S.2M&TT'/ <+2
Users can delete their o*n Go>s froD the scheduler s$steD usin% the =del coDDand! ?o>s *hich have not
$et >een scheduled to run *ill >e reDoved froD the Fueuin% s$steD *ithout ever runnin%! ?o>s *hich are
alread$ runnin%, or have #artiall$ run and >een sus#ended, *ill >e sent /(7/#>! and /(7#0*, si%nals to
sto# theD executin%! Users Da$ s#ecif$ one or Dore Go>s on the coDDand line, se#arated >$ s#aces, e!%B
[user@login$1 ~]$ Pdel /4 /H
user has deleted Go+ /4
user has deleted Go+ /H
[user@login$1 ~]$
2he Go> -5 *as #rinted to $our screen *hen the Go> *as su>DittedE alternativel$, $ou can use the =stat
coDDand to retrieve the nuD>er of the Go> $ou *ant to delete!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 33 of +,
4# TORQUE 5 O,E!,BS CLUSTER SC(E%ULER
D.%. +4'R4&'3 +* T+RA.' / +P'(P2S
Ori%inall$ kno*n as the O#en Porta>le 7atch S$steD (O#enP7S" and later reAreleased as the 2erascale
O#enASource Resource and QU.ue Dana%er (2ORQU.", this scheduler is ca#a>le of coordinatin% and
executin% lar%e nuD>ers of serial, #arallel and interactive Go>s across a lar%e HPC cluster! 2orFue is
#articularl$ *ell suited for inte%ration *ith *orkload Dana%ers, allo*in% Dulti#le clusters to >e loosel$
connected to%ether for iD#roved DultiAsite Go> schedulin%! .ven recent releases of the scheduler are often
still referred to in the HPC coDDunit$ as >oth JtorFueK and JP7SK as *ell as JO#enP7SK!
) t$#ical torFue installation reFuires a %other-ship server (norDall$ the headnode or a cluster service
node", one or Dore su>Dit hosts froD *here users can su>Dit Go>s (t$#icall$ a lo%in or headnode server"
and a nuD>er of execution hosts *here the Go>s are run via the PBSmom daeDon! 2he #rocess for runnin%
a Go> throu%h torFue isB
• Pre#are the a##lication or >inar$ file to run on a cluster node
• Create a 89:script to run the a##lication *ith the reFuired #araDeters
• Select the torFue directives reFuired to control ho* $our Go> is run
• Su>Dit the Go>scri#t to torFue froD the cluster lo%in or Daster node
2he ste#s >elo* indicate ho* to #erforD these ste#s to su>Dit different t$#es of Go> to the scheduler!
D.-. <+2 SCR&PTS (/ T+RA.' /&R'CT&4'S
) 89:script usuall$ takes the forD of a siD#le shell scri#t containin% a list of tasks to #erforD in order to run
$our Go>! 2hese Da$ include siD#le coDDands (e!%! #rintin% a status Dessa%e, co#$in% or Dovin% files, etc!"
or callin% a >inar$ a##lication to execute *ith #araDeters! )n exaD#le Go>scri#t is sho*n >elo*B
#F#+in#+ash
# *his is an eEample Go+ script W these lines are comments
# P@( directi,es are shown +elow
# %on)igure the resources needed to run my Go+
#P@( -l memTM$$m+SwalltimeT1:$:$$
# .erge stdout and stderr
#P@( -G oe
echo IJo+ starting at 8date8K
~#test#myapplication -i ~#data#intput)ile4
Lines #receded >$ a 898 character are inter#reted as coDDents and are i%nored >$ >oth the shell and
torFue as the scri#t is su>Ditted and executed! Lines #receded >$ 89!)/8 are inter#reted as torFue
directives H these o#tions are #arsed *hen the Go>scri#t is su>Ditted, >efore the scri#t itself is run on an$
s$steD! 5irectives can >e used to control ho* torFue schedules the Go> to >e run on one or Dore
coD#ute nodes and ho* $ou can >e notified of the status of $our Go>!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3+ of +,
2he follo*in% coDDon directives are su##orted >$ torFueB
/irecti9e /e"cription B"u6 examp!e
-d Rdir? 2he *orkin% director$ to star the Go> in -d #users#myuser#Go+s
-G oe
1er%e the standard error and standard error
streaDs for the su>Ditted Go>! ParaDeter 2oe3
Der%es >oth streaDs into the out#ut fileE 2eo3
Der%es into the error file!
-G oe
-o R)ile?
-nstructs the scheduler to redirect the standard
out#ut streaD to the %iven filenaDe!
-o ~#outputs#Go+.out
-e R)ile?
-nstructs the scheduler to redirect the standard
error streaD to the %iven filenaDe!
-e ~#outputs#Go+.err
-& Rname?
)llo*s users to set the naDe to >e used to identif$
the Go>! -f this #araDeter is not s#ecified, the Go> is
%iven the saDe naDe as the Go>scri#t!
-& .yJo+&ame4
-m +QeQaQn
-nstructs the scheduler the send eDail to notif$ the
user *hen the Go>B
> H >e%ins
e H ends
a H is a>orted
n H never eDail for this Go>
-m a+e
-. user[@host]
S#ecifies the eDail address to use to notif$ users of
Go> status!
-. myuser@sdu.ac.uk
-P PueueQhost
2he scheduler Fueue to *hich the Go> should >e
su>Ditted! -f oDitted, the scheduler autoDaticall$
deterDines the correct Fueue to use >ased on the
Go> t$#e!
-P serial.P
or
-P node4
-: .x#orts all current environDent varia>les to the Go>! -:
-t )irst-last
or
-t range
Su>Dits a Go> task arra$ startin% at task first and
ended *ith task last! )lternativel$, if one nuD>er
is s#ecified, a task for all nuD>ers startin% at Cero
to the s#ecified nuD>er are started!
-t $-"""
.is e(uivalent to,
-t 1$$$
-' S#ecifies that the Go> should >e run interactivel$ -'
-l nodes Rcount?:ppnTRX?
:QgpusTRB?Q
S#ecifies the nuD>er of nodes to >e reserved for
exclusive use >$ the Go>! 2he ppn #araDeter
s#ecifies ho* Dan$ #rocesses should >e started
on each node! 2he gpus #araDeter allo*s users to
reserve GPGPU devices as *ell as host CPU
cores!
-l nodesT1:ppnT1:gpusT1
./ core in total,
-l nodesT/:ppnT4
.0 cores in total,
-l nodesT1$:ppnT1/
./12 cores in total,
-l memTRsiYe?
-ndicates the DaxiDuD aDount of #h$sical DeDor$
used >$ the Go> H this value is i%nored if the Go>
-l memTH$$m+
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 35 of +,
/irecti9e /e"cription B"u6 examp!e
runs on Dore than one node! 2he si3e #araDeter
Da$ >e s#ecified in the follo*in% unitsB
+ or w - 1 +yte or word
k+ or kw - 1$/4 +ytes 2kilo+yte3
m+ or mw - 1$/4 O@ 2mega+ytes3
g+ or gw - 1$/4 .@ 2giga+ytes3
t+ or tw - 1$/4 ;@ 2tera+ytes3
-l procs
2he nuD>er of #rocessors to >e allocated to a Go>!
2he #rocessors can coDe froD one or Dore
Fualified node(s"!
-l procsT/
-l walltimeThh:mm:ss
2he DaxiDuD aDount of real tiDe durin% *hich the
Go> can >e in the runnin% state!
-l walltimeT/$:$:$$
-l Rother?
ReFuest a %eneric resource confi%ured for $our
site! Contact $our s$steD adDinistrator for Dore
inforDation on the resource t$#es availa>le to $ou!
-l otherTmatla+
2orFue directives Da$ also >e s#ecified at Go> su>Dission tiDe as #araDeters to the =sub coDDandE for
exaD#leB
[user@login$1 ~]$ Psu+ -G oe -l nodesT1 .#my-serial-Go+.sh
Please note that torFue autoDaticall$ iD#leDents a DaxiDuD walltime settin% #er Go> Fueue *hich controls
the DaxiDuD aDount of tiDe a user Go> Da$ >e allo*ed to execute! See $our s$steD adDinistrator for
Dore inforDation on the DaxiDuD tiDe liDits enforced for the Fueues confi%ured on $our cluster!
D.0. T+RA.' PS'./+ '(4&R+(M'(T 4R&2L'S
2o assist users *hen *ritin% Go> scri#ts, torFue autoDaticall$ creates a nuD>er of environDent varia>les that
Da$ >e referenced in the scri#t! 2hese areB
/aria:le name %escripti9n
!)/'@>)-",0
2he current Go> naDe (Da$ >e set via the -- directive"
!)/'>'1>*?&(*
2he Go>Ls su>Dission director$
!)/'>'$>,0
2he hoDe director$ of the su>Dittin% user
!)/'>'%>7-",0
2he usernaDe of the su>Dittin% user
!)/'>'@>)(&
2he torFue Go> -5 nuD>er
!)/'>'$>/#
2he execution host on *hich the Go>scri#t is run
!)/'G.0.0
2he torFue Fueue naDe the Go>scri#t *as su>Ditted to
!)/'->&06(%0
) file containin% a line deliDited list of nodes allocated to the Go>
!)/'>'!"#$
2he P)2H varia>le used *ithin a Go>scri#t
!)/'"**"+(&
2he arra$ index nuD>er for a task Go>
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3: of +,
Other environDent varia>les Da$ >e set Danuall$ >$ the user >efore su>Dittin% the Go>, or >$ loadin%
environDent Dodules containin% the varia>les reFuired in the Go>scri#t!
D.8. <+2 +.TP.T *&L'S
7$ default, torFue *ill collect an$ inforDation that is out#ut to the stdout or stderr channels >$ $our
Go>scri#t and *rite it to files naDed ;obscript.o and ;obscript.e (*here )o&script is the filenaDe of
the Go>scri#t su>Ditted"! /ote that the scheduler onl$ co#ies out#ut inforDation to this location *hen the
Go> ends H *hile the Go> is still runnin%, out#ut is >uffered on the node(s" *here the Go> is runnin%!
2he default location of these files is the Go> *orkin% director$, >ut this can >e Dodified usin% the -o or -e
directives! -f the -; oe directive is also s#ecified, stdout and stderr streaDs are >oth *ritten to a sin%le file
naDed ;obscript.o (or to the location s#ecified >$ the -o directive"!
D.5. S.2M&TT&() (+(5&(T'RCT&4' <+2S 4& AS.2
2o su>Dit a >atch or serial Go>, use the J=subK coDDandE torFue *ill re#l$ *ith $our Go> -5B
[user@login$1 ~]$ cat my-serial-Go+.sh
#F#+in#+ash
#P@( -l walltimeT1:$$SnodeT1
echo I' am a serial Go+FK
sleep 1$
[user@login$1 ~]$ Psu+ my-serial-Go+.sh
1".headnode.cluster.local
[user@login$1 ~]$
4our Go> *ill >e held in the serial Fueue until a coD#ute node is availa>le to run it! SoDe clusters have
s#ecific Fueues confi%ured to run different t$#es of Go>sE use the J=stat -=K coDDand to vie* the
Fueues confi%ured on $our clusterB
[user@login$1 ~]$ Pstat -P
ser,er: master.cm.cluster
Uueue .emory %PA *ime Nalltime &ode >un Uue <m (tate
---------------- ------ -------- -------- ---- --- --- -- -----
serial.P -- -- 4!:$$:$$ -- 4 ! -- 9 >
parallel.P -- -- "0:$$:$$ -- 1 $ -- 9 >
----- -----
H !
[user@login$1 ~]$
Queues are listed *ith the follo*in% stateB
Status c9de Queue state %escripti9n
0 0nabled 2he Fueue is ena>led to run Go>s
& &isabled 2he Fueue is disa>led and *ill not schedule ne* Go>s to start!
* *unning 2he Fueue is currentl$ runnin% Go>s
/ /topped 2he Fueue is sto##ed and is not runnin% Go>s!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3, of +,
Use the -= :=ueuename< directive to su>Dit $our Go> to a #articular Fueue as directed >$ $our local
s$steD adDinistratorB
[user@login$1 ~]$ Psu+ -P serial.P my-serial-Go+.sh
D.=. 4&'3&() TH' STT.S +* S.2M&TT'/ <+2
2he torFue scheduler allo*s users to vie* the status of the Go>s the$ have su>Ditted! 2he 3=stat -an3
coDDand dis#la$s the status of Go>s su>Ditted >$ a user, includin% *hich nodes Go>s are runnin% onB
headnode.cluster: >eP6d >eP6d 9lap
Jo+ '= Asername Uueue Jo+name (ess'= &=( *(O .emory *ime ( *ime
--------------- ------- ------- --------------- ------ ----- --- ------ ----- - -----
1"".headnode.wiY alces serial.P testGo+/.sh HH4" 1 1 M$$m+ $$:/4 > --
node$$#$
/$$.headnode.wiY alces serial.P testGo+/.sh H0/ 1 1 M$$m+ $$:/4 > --
node$$#1
2he Go> state (J/K" can >e Darked as one or Dore of the follo*in%B
Status c9de 79: state %escripti9n
1 1aiting 2he Go> is *aitin% for resources to >e availa>le
# #ransferring 2he Go> is >ein% transferred to a ne* location
/ /uspended 2he Go> has >een sus#ended and is not currentl$ runnin%
* *unning 2he Go> is runnin%
G Gueued 2he Go> is Fueued for execution
$ $old 2he Go> has >een held and is not runnin%
0 0Biting 2he Go> is exitin% after havin% >een run
4 4ompleted 2he Go> has coD#leted execution
7$ default, the =stat coDDand onl$ sho*s Go>s >elon%in% to the user executin% the coDDand! Use the
2=stat -u 8C83 coDDand to see the status of Go>s su>Ditted >$ all users!
D.>. S.2M&TT&() PRLL'L <+2
4our cluster has also >een confi%ured to run 1P- Go>s across Dulti#le nodes! 8hen su>Dittin% a #arallel
Go>, the chosen 1P- iD#leDentation Dust >e #rovided *ith a list of nodes to #re#are for the Go> execution!
2orFue %enerates this list *hen a Go> is scheduled to run and *ill autoDaticall$ #ass it to O#en1P- *hen
mpirun is called! 0or other 1P- iD#leDentations that cannot o>tain the list of hosts directl$ froD torFue,
users Dust use the varia>le !)/'->&06(%0 to send the list to mpirun via the -machinefile
#araDeter! 2he exaD#le >elo* deDonstrates ho* call mpirun in a Go>scri#t *hen usin% O#en1P-B
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3; of +,
[user@login$1 #]# cat Go+script-Dpen.P'.sh
#F#+in#+ash
# >ePuest 4 nodes each with / %PAs each
#P@( -l nodesT4:ppnT/
# >ePuest maEimum runtime o) 1/ hours
#P@( -l walltimeT1/:$$:$$
# -d #users#myuser#+enchmark1!
# ena+le modules en,ironment
. #etc#pro)ile.d#modules.sh
module load mpi#intel#openmpi#1.4
mpirun +enchmark1!.+in W-rasteronly --,er+ose
[user@login$1 #]# Psu+ .#Go+script-Dpen.P'.sh
2he mpirun coDDand included in the Go>scri#t a>ove su>Dits an O#en1P- Go> *ith ; #rocesses H the 1P-
DachineAfile is autoDaticall$ %enerated >$ torFue and #assed to the 1P-! 2he I-l nodesHFIppnH23
torFue directive reFuests that the Go> is scheduled on + nodes, usin% #rocesses #er node!
8hen usin% another 1P- that does not full$ inte%rate *ith P7S, $our Go>scri#t Dust #rovide the I-
machinefile3 and I-np :processes<3 #araDeters to mpirun to allo* the 1P- to #re#are the
necessar$ nodes for the Go>E for exaD#leB
[user@login$1 #]# cat Go+script-'ntel.P'.sh
#F#+in#+ash
# >ePuest 4 nodes each with / %PAs each
#P@( -l nodesT4:ppnT/
# >ePuest maEimum runtime o) 1/ hours
#P@( -l walltimeT1/:$$:$$
# -d #users#myuser#+enchmark1!
# ena+le modules en,ironment
. #etc#pro)ile.d#modules.sh
module load mpi#intel#intelmpi#/.
mpirun -np ! -machine)ile $P@(1&D=9C'<9 +enchmark1!.+in W-rasteronly --,er+ose
[user@login$1 #]# Psu+ .#Go+script-'ntel.P'.sh
D.@. '1'C.T&() ( &nteracti9e <+2
2orFue also allo*s interactive Go>s to >e scheduled to run on the coD#ute nodes of $our cluster! 2his
Dethod can >e used to allo* Dulti#le users to run coDDandAline and %ra#hical a##lications across cluster
coD#ute nodes, fairl$ sharin% resources >et*een other interactive a##lications and >atch Go>s!
2he 3-(3 directive can >e used to schedule and launch an a##lication interactivel$, or *hen invoked
*ithout an a##lication naDe, to launch an interactive shell session on an availa>le coD#ute node! Use the
2-J3 directive to for*ard $our 6Adis#la$ to $our current deskto# and the 2-B3 if the coDDand $ou *ish to
run is a >inar$ rather than a Go>scri#t! 0or exaD#leB
[user@login$1 #]$ uptime
$:H":/ up / daysS 1" usersS load a,erage: 4./S H.4S H.H1
[user@login$1 #]$ Psu+ -' -X
4!M.login$1.cluster.local
[user@node1/ ~]$ uptime
$:H":/H up H! minsS 1 userS load a,erage: $.$$S $.$$S $.$$
[user@node1/ ~]$
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e 3< of +,
2he 2=sub -( -J -B Bterm3 coDDand can >e used to schedule and launch an interactive xterD
session on a coD#ute nodeE the session is scheduled on the next availa>le coD#ute node on the default
interactive Fueue! 2he %ra#hical dis#la$ for $our a##lication or +term session *ill >e dis#la$ed on the
s$steD identified >$ the &(/!%"+ environDent varia>le settin% *hen the Go> *as invoked (usuall$ $our local
%ra#hical terDinal"! )lthou%h %ra#hical a##lications are t$#icall$ executed froD an interactive %ra#hical
deskto# session (via !emote "es#top or $,' advanced users can direct the %ra#hical dis#la$ to a
*orkstation outside the cluster >$ settin% the &(/!%"+ varia>le >efore su>Dittin% the Go> reFuest-
D.D. S.2M&TT&() ( RR, +* <+2SC
) coDDon #ro>leD is that $ou have a lar%e nuD>er of Go>s to run, and the$ are lar%el$ identical in terDs
of the coDDand to run! 0or exaD#le, $ou Da$ have (''' data sets, and $ou *ant to run a sin%le #ro%raD
on theD usin% the cluster! ) Fuick solution is to %enerate (''' se#arate Go>scri#ts, and su>Dit theD all to
the Fueue! 2his is not efficient, neither for $ou nor for the scheduler Daster node!
2orFue allo*s users to su>Dit a sin%le Go> *ith a nuD>er of se#arate tasksE these are scheduled *ith a
sin%le Go> -5 (Dakin% it siD#le to track, #rioritiCe or cancel all the Go>s" *ith a nuD>er of se#arate tasks!
8hen the exaD#le Go>scri#t >elo* is su>Ditted, an arra$ of (',''' tasks *ill >e %enerated and executed
on the Fueue selected!
#F#+in#+ash
# >ePuest maEimum runtime 2per task Go+3 o) 1 hour
#P@( -l walltimeT1:$$:$$
# .erge stdout and stderrS and set location o) output )iles
#P@( -G oe -o #users#user#Go+s#output#
# *ell P@( that this is an array Go+S with VtasksV to +e num+ered 1 to 1$$$$
#P@( -t 1-1$$$$
# Nhen a single command in the array Go+ is sent to a compute nodeS its task num+er is
# stored in the ,aria+le P@(17>>7B'=S so we can use the ,alue o) that ,aria+le to get
# the results we want:
~#programs#program -i ~#data#input.$P@(17>>7B'= -o ~#results#output.$P@(17>>7B'=
2he scri#t can >e su>Ditted as norDal *ith the =sub coDDand and is dis#la$ed >$ torFue as a sin%le Go>
*ith Dulti#le #arts! .ach se#arate task *ill >e scheduled and executed se#aratel$, each *ith its o*n out#ut
and error streaD (de#endin% on the directives used"! Use the 2=stat -ant3 coDDand to vie* the status
of individual tasksB
[user@headnode Go+s]$ Pstat -ant
headnode.cluster.local
>eP6d >eP6d 9lap
Jo+ '= Asername Uueue Jo+name (ess'= &=( *(O .emory *ime ( *ime
---------------- -------- -------- ---------------- ------ ----- --- ------ ----- - ----
//[1].headnode user serial.P taskGo+.sh-/H "H"! -- -- M$$m+ $$:/4 > $$:$$
node$$#1
//[/].headnode user serial.P taskGo+.sh-/0 "0$! -- -- M$$m+ $$:/4 > $$:$$
node$$#$
//[].headnode user serial.P taskGo+.sh-" 1$4! -- -- M$$m+ $$:/4 > --
node$$#1$
//[4].headnode user serial.P taskGo+.sh-4$ 14/ -- -- M$$m+ $$:/4 U --
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e +' of +,
D.%$. R'A.'ST&() '1CL.S&4' .S' +* C+MP.T' (+/'
4our local s$steD adDinistrators Da$ allo* certain #o*er users to reFuest exclusive use of one or Dore
coD#ute nodes for their Go>s! 8hen selected, this (uality-of-service settin% instructs the scheduler not to
allo* an$ other Go>s to share the saDe #h$sical coD#ute nods as the exclusive Go>! 2$#ical uses include
*hen users kno* in advance that their Go>s reFuire exclusive use of a local coD#ute node device such as
a hard disk, net*ork interface or GPGPU device! S$steD adDinistrators often restrict the nuD>er of Go>s
*hich can reFuest exclusive status as it can lead to inefficient use of resources and lon%er Fueue tiDes
>efore Go>s can >e serviced!
2o reFuest exclusive use of coD#ute resources, use the 2-B G>/IeBclusive3 torFue directive *hen
su>Dittin% $our Go>B
[user@login$1 ~]$ Psu+ -l nodesT!:ppnT! -E UD(:eEclusi,e .#my-parallel-Go+.sh
D.%%. /'L'T&() S.2M&TT'/ <+2
Users can delete their o*n Go>s froD the scheduler s$steD usin% the =del coDDand! ?o>s *hich have not
$et >een scheduled to run *ill >e reDoved froD the Fueuin% s$steD *ithout ever runnin%! ?o>s *hich are
alread$ runnin%, or have #artiall$ run and >een sus#ended, *ill >e sent /(7/#>! and /(7#0*, si%nals to
sto# theD executin%! Users Da$ s#ecif$ one or Dore Go>s on the coDDand line, se#arated >$ s#aces, e!%B
[user@headnode Go+s]$ Pstat
Jo+ id &ame Aser *ime Ase ( Uueue
------------------------- ---------------- --------------- -------- - -----
/1.headnode testGo+.sh user $ > serial.P
[user@headnode Go+s]$ Pdel /1
[user@headnode Go+s]$ Pstat
[user@headnode Go+s]$
2he Go> -5 *as #rinted to $our screen *hen the Go> *as su>DittedE alternativel$, $ou can use the =stat
coDDand to retrieve the nuD>er of the Go> $ou *ant to delete, as sho*n a>ove!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e +( of +,
"2# LUSTRE ,ARALLEL $LES.STE+
%$.%. L.STR' 2C;)R+.(/
2he Lustre #arallel files$steD #rovides scala>le, hi%hA#erforDance stora%e services to Dulti#le client nodes
via a ran%e of interconnect technolo%ies! ) file stored in a Lustre files$steD is availa>le to all clients
Dountin% the files$steD, siDilar to files$steDs shared *ith the /0S or C-0S file sharin% #rotocols! /o
s#ecial file handlin% is reFuired to use the shared files$steD H all POS-6 coD#liant a##lications and utilities
are coD#ati>le!
0iles stored in a Lustre files$steD are se#arated into their Detadata (file naDes, siCes, locations, #erDissions,
etc!" and a nuD>er of data >locks! 0ile Detadata is stored >$ the Lustre Detadata servers (15S" and >lock
data is stored >$ o>Gect stora%e servers (OSS"! ) sin%le shared files$steD reFuires one 15S (t$#icall$
de#lo$ed as a #air of hi%hAavaila>ilit$ servers" and t*o or Dore OSS! 2he usa>le siCe and #erforDance of
the resultin% files$steD de#ends on the confi%uration of OSS Dachines su##ortin% it!
2he Lustre files$steD Da$ >e accessed via a ran%e of net*ork technolo%ies includin% %i%a>it .thernet, ('A
%i%a>it .thernet and -nfini>and H the onl$ reFuireDent is that all servers accessin% the files$steD have
net*ork access to all the 15S and OSS Dachines! Lustre also su##orts a %ate*a$ service *hich can >e
used to extend the files$steD to reDote clients across a L)/ or 8)/ link! 2he %ate*a$ can also >e used
to allo* files$steD access for reDote .thernet connected clients, irres#ective of the local Lustre
interconnect t$#e!
%$.-. A.'R,&() *&L'S,ST'M SPC'
2he 2lfs df3 coDDand is used to deterDine availa>le disk s#ace on a Lustre files$steD! -t dis#la$s the
aDount of availa>le disk s#ace on the Dounted Lustre files$steD and sho*s s#ace consuD#tion #er OS2! -f
Dulti#le Lustre files$steDs are Dounted, a #ath Da$ >e s#ecified, >ut is not reFuired! Su##orted o#tions
includesB
-h HuDanAreada>le #rint siCes in an easil$ reada>le forDat
-i, --inodes Lists inodes instead of >lock usa%e
2he 2df -i3 and 2lfs df -i3 coDDands sho* the DiniDuD nuD>er of inodes that can >e created in
the files$steD! 5e#endin% on the confi%uration, it Da$ >e #ossi>le to create Dore inodes than initiall$
re#orted >$ 2df -i3! Later, 2df -i3 o#erations *ill sho* the current, estiDated free inode count! -f the
underl$in% files$steD has fe*er free >locks than inodes, then the total inode count for the files$steD re#orts
onl$ as Dan$ inodes as there are free >locks! 2his is done >ecause Lustre Da$ need to store an external
attri>ute for each ne* inode, and it is >etter to re#ort a free inode count that is the %uaranteed, DiniDuD
nuD>er of inodes that can >e created!
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e + of +,
# l)s d)
AA'= 1O-+locks Ased 7,aila+le AseZ .ounted on
mds-lustre-$1AA'= "1M4/! 1$/$$/4 !1H4$4 11Z #mnt#lustre[.=*:$]
ost-lustre-$1AA'= "41!10! H0$M$! M!H$00$ H"Z #mnt#lustre[D(*:$]
ost-lustre-11AA'= "41!10! H0!HM4! MM"H0/$ H"Z #mnt#lustre[D(*:1]
ost-lustre-/1AA'= "41!10! H4H/$1/ "!/"H0 HMZ #mnt#lustre[D(*:/]
)ilesystem summary:/!/H441$4 10M$0!40! "!/"H0 HMZ #mnt#lustre
# l)s d) -i
AA'= 'nodes 'Ased 'Cree 'AseZ .ounted on
mds-lustre-$1AA'= //11HM/ 41"/4 /10"04! 1Z #mnt#lustre[.=*:$]
ost-lustre-$1AA'= MM/!$ 1/1! M/H$"M 1Z #mnt#lustre[D(*:$]
ost-lustre-11AA'= MM/!$ 1/// M/H$4! 1Z #mnt#lustre[D(*:1]
ost-lustre-/1AA'= MM/!$ 1//14 M/H$00 1Z #mnt#lustre[D(*:/]
)ilesystem summary://11HM/ 41"/4 /10"04! 1Z #mnt#lustre[D(*:/]
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e +3 of +,
A,,E!%$6 A: E6A+,LE ,ARALLEL 7OB E6ECUT$O!
2he follo*in% exaD#le deDonstrates ho* to *rite a >asic 1P- #ro%raD and su>Dit it to run on the cluster
via the %ridAen%ine scheduler! Lo% on to $our cluster as a user ( not the root account", and Dake a ne*
director$ for $our 1P- test #ro%raDB
[user@login$1 ~]# mkdir ~#mpi1test
[user@login$1 ~]# cd ~#mpi1test
Create a ne* file called 2K/mpi'test/test.c3 and enter the follo*in% C code to forD $our 1P-
#ro%raDB
#include Rstdio.h?
#include Rmpi.h?
#include Rtime.h?
#include Rstring.h?
int main2int argcS char **arg,3 [
char name[.P'1.7X1P>D%9((D>1&7.9]\
int nprocsS procnoS len\
.P'1'nit2 ]argcS ]arg, 3\
.P'1%omm1siYe2 .P'1%D..1ND><=S ]nprocs 3\
.P'1%omm1rank2 .P'1%D..1ND><=S ]procno 3\
.P'1;et1processor1name2 nameS ]len 3\
name[len] T 6^$6\
time1t lt\
lt T time2&A<<3\
print)2 V5ello FF )rom Zs@Zd#Zd on Zs^nVS nameS procnoS nprocsS ctime2]lt33\
.P'1@arrier2 .P'1%D..1ND><= 3\
.P'1CinaliYe23\
return2 $ 3\
_
2he source code for this short #ro%raD is stored in the /opt/alces/eBamples director$ on $our HPC
cluster for convenience! Use the modules coDDand to load the 4pen%PI environDent and coD#ile $our
C code to Dake an executa>le >inar$ called mpi'testB
[user@login$1 ~]# module load mpi#openmpi-1./.0-gcc
[user@login$1 ~]# mpicc -o mpi1test -D test.c
/ext, create a ne* file called 2K/mpi'test/sub.sh” and edit it to contain the follo*in% lines to forD a
%ridAen%ine Go>scri#tB
# (;9 (A@.'(('D& (%>'P*
# (peci)y the parallel en,ironment and num+er o) nodes to use
# (u+mit to the mpi-,er+ose parallel en,ironment +y de)ault so we can see
# output messages
#$ -pe mpi-,er+ose /
#(peci)y speci)ic (;9 options
#$ -cwd -: -G y
#(peci)y the (;9 Go+ name
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e ++ of +,
#$ -& mpi1test
#(peci)y the output )ile name
#$ -o DA*PA*
# 7 .achine )ile )or the Go+ is a,aila+le here i) you need to modi)y
# )rom the (;9 de)ault and pass to mpirun. *he machine hostnames in use are
# listed in the )ile separated +y new lines 2^n3 e.g:
# node$$
# node$1
.7%5'&9C'<9TV#tmp#sge.machines.$JD@1'=V
#*he mpirun command - speci)y the num+er o) processes per node and the +inary
mpirun -machine)ile #tmp#sge.machines.$JD@1'= -np / ~#mpi1test#mpi1test
Su>Dit a ne* Go> to the cluster scheduler that *ill execute $our >inar$ usin% the %ridAen%ine Go>scri#t!
ConfirD the Go> is runnin% *ith the =stat coDDand, and read the out#ut file to vie* the results of the Go>
execution!
[user@login$1 mpi1test]$ Psu+ su+.sh
Bour Go+ 11 2Vmpi1testV3 has +een su+mitted
[user@login$1 mpi1test]$ Pstat -)
Pueuename Ptype res,#used#tot. load1a,g arch states
-------------------------------------------------------------------------------
parallel.P@node$$.alces-so)twa 'P $#1#1 $.$0 lE/4-amd04
11 $.HHH$$ mpi1test user r $/#$/#/$$" 1/:1!:4" 1
-------------------------------------------------------------------------------
parallel.P@node$1.alces-so)twa 'P $#1#1 $.$4 lE/4-amd04
11 $.HHH$$ mpi1test user r $/#$/#/$$" 1/:1!:4" 1
-------------------------------------------------------------------------------
serial.P@node$$.alces-so)tware @' $#$#! $.$0 lE/4-amd04 (
-------------------------------------------------------------------------------
serial.P@node$1.alces-so)tware @' $#$#! $.$4 lE/4-amd04 (
[user@login$1 mpi1test]$ cat DA*PA*
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
(;9 Go+ su+mitted on Ned .ay /M $4:14:1H @(* /$$"
/ hosts used
JD@ '=: 4
JD@ &7.9: mpi1test
P9: mpi
UA9A9: parallel.P
.7(*9> node$$.alces-so)tware.org
&odes used:
node$$ node$1
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Jo+ Dutput Collows:
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
5ello FF )rom node$$@/#! on Ned .ay /M $4:14:$ /$$"
5ello FF )rom node$$@$#! on Ned .ay /M $4:14:$ /$$"
5ello FF )rom node$$@4#! on Ned .ay /M $4:14:$ /$$"
5ello FF )rom node$$@0#! on Ned .ay /M $4:14:$ /$$"
5ello FF )rom node$1@1#! on Ned .ay /M $4:14:M /$$"
5ello FF )rom node$1@#! on Ned .ay /M $4:14:M /$$"
5ello FF )rom node$1@H#! on Ned .ay /M $4:14:M /$$"
5ello FF )rom node$1@M#! on Ned .ay /M $4:14:M /$$"
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
(;9 Go+ completed on Ned .ay /M $4:14:1 @(* /$$"
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e +5 of +,
A,,E!%$6 B: E6A+,LE O,E!+, 7OB CO+,$LAT$O!
2he O#en1P )P- su##orts DultiA#latforD sharedADeDor$ #arallel #ro%raDDin% in C@CMM and 0ortran!
O#en1P is a #orta>le, scala>le Dodel *ith a siD#le and flexi>le interface for develo#in% #arallel
a##lications on #latforDs froD the deskto# to the su#ercoD#uter! O#en1P Go>s Da$ >e executed on
standard coD#ute nodes, or used on dedicated S1P nodes, often eFui##ed *ith lar%er nuD>ers of
#rocessin% cores and Dore s$steD DeDor$!
2he follo*in% exaD#le deDonstrates ho* to *rite a >asic O#en1P #ro%raD and su>Dit it to run on the
cluster via the SG. scheduler! Lo% in to $our cluster as a user (n9t the root account", and Dake a ne*
director$ for $our O#en1P test #ro%raDB
[user@login$1 ~]# mkdir ~#openmp1test
[user@login$1 ~]# cd ~#openmp1test
Create a ne* file called 2K/openmp'test/hello.c3 and enter the follo*in% C code to forD $our
O#en1P #ro%raDB
#include Romp.h?
#include Rstdio.h?
int main2int argcS char* arg,[]3 [
int id\
#pragma omp parallel pri,ate2id3
[
id T omp1get1thread1num23\
print)2VZd: 5ello NorldF^nVS id3\
_
return $\
_
CoD#ile $our C code to Dake an executa>le >inar$ called hello and use the ldd coDDand to confirD
the shared li>raries it *ill use on executionB
[user@login$1 smp]$ gcc -)openmp -o hello hello.c
[user@login$1 smp]$ ldd hello
li+gomp.so.1 T? #usr#li+04#li+gomp.so.1 2$E$$$$/+c")))"$$$3
li+pthread.so.$ T? #li+04#li+pthread.so.$ 2$E$$$$$$/!e$$$$$3
li+c.so.0 T? #li+04#li+c.so.0 2$E$$$$$$/!/$$$$$3
li+rt.so.1 T? #li+04#li+rt.so.1 2$E$$$$$$/!4/$$$$$3
#li+04#ld-linuE-E!0-04.so./ 2$E$$$$$$/!/e$$$$$3
/ext, create a ne* file called 2K/openmp'test/run-;ob.sh3 and edit it to contain the follo*in% lines
to forD a %ridAen%ine Go>scri#tB
# (;9 (A@.'(('D& (%>'P*
# (u+mit to the smp-,er+ose parallel en,ironment +y de)ault so we can see
# output messages. Ase ! slots in the parallel en,ironmentS corresponding
# to the num+er o) threads to use 2one slot T one thread T one %PA core3
#
#$ -: -cwd -pe smp-,er+ose ! -G y -o out.$JD@1'=
.#hello
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e +: of +,
Su>Dit a ne* Go> to the cluster scheduler that *ill execute $our >inar$ usin% the %ridAen%ine Go>scri#t!
ConfirD the Go> is runnin% *ith the =stat coDDand, and read the out#ut file to vie* the results of the Go>
execution!
[user@login$1 mpi1test]$ Psu+ run-Go+.sh
Bour Go+ 11 2VhelloV3 has +een su+mitted
[user@login$11 smp]$ Pstat -) -P smp.P
Pueuename Ptype res,#used#tot. load1a,g arch
-------------------------------------------------------------------------------
smp.P@smp1.aalces-so)tware.com 'P $#!#/ $.$$ lE/4-amd04
1 $.HHH$$ run-Go+.sh alces r 1/#$1#/$$" /$:1$:/M !
-------------------------------------------------------------------------------
smp.P@smp/.alces-so)tware.com 'P $#$#/ $.$$ lE/4-amd04
[user@login$1 smp]$ cat out.1
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
(;9 Go+ su+mitted on *ue (ep ! 1$:1$:1" ;.* /$$"
JD@ '=: 1
JD@ &7.9: run-Go+.sh
P9: smp-,er+ose
UA9A9: smp.P
.7(*9> smp1.alces-so)tware.com
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
** 7 machine )ile has +een written to #tmp#sge.machines.1 on
smp1.alces-so)tware.com **
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
') an output )ile was speci)ied on Go+ su+mission
Jo+ Dutput Collows:
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
$: 5ello NorldF
1: 5ello NorldF
M: 5ello NorldF
4: 5ello NorldF
/: 5ello NorldF
: 5ello NorldF
H: 5ello NorldF
0: 5ello NorldF
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
(;9 Go+ completed on *ue (ep ! 1$:1$:/! ;.* /$$"
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
HPC Cluster Quick Reference User Guide for Redhat, CentOS, Scientific Linux 5 (v!"
Co#$ri%ht & '(( )lces Soft*are Ltd )ll Ri%hts Reserved Pa%e +, of +,