You are on page 1of 23

SGE Basic Commands

1 © 2010 Wipro Ltd - Confidential


SGE

▪ What is SGE
– SGE stands for Sun Grid Engine

– Distributed resource management software

– Provides users the means to submit computationally demanding tasks to the


SGE system for transparent distribution of the associated workload.

2 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
How SGE works

▪ Users submit jobs to the Grid Engine.

▪ Unless resources are immediately available jobs are kept in queues until
resources to execute them become available.

▪ Jobs are passed onto the available executing hosts.

▪ Records of each jobs progress through the system are kept and reported
when requested

3 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
SGE Components

▪ Hosts
– Master
– Execution
– Administration
– Submit
▪ Queues (defined by the administrator)
▪ Daemons:
– sge_qmaster (Master Daemon),
– sge_schedd (Scheduler Daemon),
– sge_execd (Execution Daemon) and
– sge_commd (Communication Daemon)

4 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Host Roles
▪ Master Host
– Controls overall cluster activity
– Frontend, head node
– It runs the master daemon:sge_qmaster, controlling
• queues, jobs, status, user access permission
– Also the scheduler: sge_schedd
▪ Execution Host
– executes SGE jobs
– execution daemon: sge_execd
• Runs jobs on its hosts
• Forwards sys status/info to sge_qmaster
▪ Submit Host
– They are allowed for submitting & controlling only batch jobs
– No daemon required to run in this type of host
▪ Administration Host
– SGE administrator controls whole structure

5 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Summary table of useful SGE commands

Command(s) Description User/System


qsub, qresub, qmon Submit batch jobs USER
qsh, qrsh Submit Interactive Jobs USER

qstat , qhost, qdel, Status of queues and USER


qmon jobs in queues , list of
execute nodes, remove
jobs from queues

qacct, qmon, qalter, Monitor/manage SYSTEM ADMIN


qdel, qmod accounts, queues, jobs
etc

6 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Working with SGE as a user:

– qsub : To submit a Job


– qstat : Determine the Status of a Job
– qhost : Display Node Information
– qdel : Cancel a Job

7 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Submitting a Job:

▪ Create a script file (named script.sh) by using a text editor such as gedit ,vi or emacs
and inputing the following lines:

#!/bin/sh
#
echo “This code is running on” /bin/hostname
/bin/date

▪ Now Submit this script to SGE using the qsub command:

qsub script.sh

8 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
SGE job script

#$ -N <Job/Program Name>

#$ -cwd <Use current working dir>

#$ -e <Error File>

#$ -o <Output File>

#$ -q <Q - name>

#$ -V <Carry all Env variables>

#$ -pe <MPI Parallel Environment>

9 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
SGE – Sample script

#!/bin/bash
#$ -N SLEEP_JOB
#$ -cwd
#$ -e Error.$JOB_NAME.$JOB_ID
#$ -o Output.$JOB_NAME.$JOB_ID
#$ -V
date
sleep 100
date

10 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Script for Serial job

#!/bin/bash

#$ -N SERIAL_JOB

#$ -cwd

#$ -e Error.$JOB_NAME.$JOB_ID

#$ -o Output.$JOB_NAME.$JOB_ID

#$ -V

< full path to the serial executable> <options & input parameters>

11 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Script for Parallel Job

#!/bin/bash
#$ -N PARALLEL_JOB
#$ -cwd
#$ -e Error.$JOB_NAME.$JOB_ID
#$ -o Output.$JOB_NAME.$JOB_ID
#$ -V
#$ -pe mvapich2 32
/data/mvapich2_intel/bin/mpirun -np $NSLOTS <full path to the executable>
<options & input parameters>

12 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
SGE Commands

▪ Create a jobscripts (myjob.sh)


▪ Submit for execution

$ qsub myjob.sh
Your job 742 ("myjob.sh") has been submitted.

13 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Monitoring Jobs:

A submitted job will either be;


1. still waiting in the queue,

2. be executing,

3. finished execution and left the SGE scheduling system.

In order to monitor the progress of your job while in states (1) and (2) use the
qstat or Qstat commands that will inform you if the job is still waiting or started
executing. The command qstat gives info about all the jobs but Qstat gives info about
your jobs alone.

14 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Monitoring Jobs ( Contd... )
While executing (state 2) ;
use qstat –j job_number to monitor the jobs status including time and memory
consumptions.

Better still use qstat –j job_number | grep mem that will give time and memory
consumed information.

Also use tail –f job_output_filename to see the latest output from the job

Finished executing ( state 3) :

qacct is the only command that may be able to tell you about the past jobs by referring
to a data-base of past usage. Output file names will contain the job number so;

qacct -j job_number : should give some information.

15 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
SGE Commands - qstat
Check status of your job:
qstat : command will list all the jobs in the system that are either waiting to
be run or running
– qstat –f –u “*” : Detailed information of nodes
– qstat -u username : Displays ser submitted jobs
– qstat -j job : Displays job related information

▪ Status of the job is indicated by letters as:


– w : waiting - E : Error
– t : transferring - T : Threshold
– r : running - h : hold
– s, S : suspended - d : deleted
– R : restarted

16 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Some useful options :
qstat :
-explain a|A|c|E
c : displays the reason for the configuration ambiguous state of a queue
instance.
a : shows the reason for the alarm state.
(the load threshold is currently exceeded )
A : shows suspend alarm state reasons.
( The suspend threshold is currently exceeded )
E : displays the reason for a queue instance error state.
-ext : Displays additional information for each job related to the job ticket
policy scheme
-f : Specifies a "full" format display of information.
-pri : Displays additional information for each job related to the job
priorities in general.
-r : Prints extended information about the resource requirements of
the displayed jobs.
17 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Deleting Jobs :

qdel command will remove from the queue the specified jobs that are waiting
to be run or abort jobs that are already running.

▪ Individual Job
qdel Job_number

▪ List of Jobs
qdel Job_number1 Job_number2 ....

▪ All Jobs running or queuing under a given username

qdel –u <username>

18 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Reasons for Job Failure:

– SGE cannot find the binary file specified in the job script

– Required input files are missing from the start up directory

– You have exceeded your quota and job fails when trying to write to a file ( use
quota command to check usage )

– Environment variable is not set.

– Hardware failure

19 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Monitor your cluster

▪ Open the browser and type the following path


http://brahma.glabs.in/ganglia

▪ You will find screen as follows

20 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Ganglia Monitoring Tool : Home Page

21 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Ganglia – Home Page Contd…

22 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential
Ganglia - Node View :

23 ©
© 2010
2010 Wipro
Wipro Ltd
Ltd -- Confidential
Confidential

You might also like