You are on page 1of 8

Condor

Title:
Practical
Submitting
Subtitle
your first
:
Condor job
Alain Roy
and Todd
Tutor:
Tannenbau
m
Alain Roy
Author
and Ben
s:
Burnett

3.0 Submitting your first Condor job


Main Page
3.1 First you need a
3.1 First you need a job
job
3.2 Submitting your
job
Before you can submit a job to Condor, you need a job. We will quickly
write a small batch script. If you aren't an expert script writer, fear not.
3.3 Doing a parameter
sweep
We will hold your hand throughout this process (but we'll let go, we
promise, when you want to get back to typing).

First, create a file called simple.bat using your favorite editor (notepad
will work well for our purposes). To make it easy to find, put the file in a
new sub-folder that you create on the C:\ drive (this will make it easier
to get to when you are back at the command-line). In that file, put the
following text. Copy and paste is a good choice:

C:\> mkdir condor-test


C:\> cd condor-test
C:\condor-test> notepad simple.bat

This will run notepad, which will prompt you to create the new file.
Select Yes.

We will use the following script in our examples, so enter it to the newly
created file:

@echo off

setlocal

set THINKING_TIME=2
set COUNT=10

if not A%1 == A ( set THINKING_TIME=%1 )


if not A%2 == A ( set COUNT=%2 )

echo Thinking really hard for %THINKING_TIME% seconds...


rem We use ping here as a hack because "sleep" is non-standard.
ping -n %THINKING_TIME% 127.0.0.1 >NUL 2>&1

echo Our result:


if %COUNT% GEQ 1 (
for /L %%x in (1,1,%COUNT%) do (
echo %%x
)
)

endlocal

Now we can run the program and tell it to print all the numbers up to 12
and sleep for four seconds:

C:\condor-test> simple.bat 4 12
Thinking really hard for 4 seconds...
Our result:1
2
3
4
5
6
7
8
9
10
11
12

Great! You have a job you can tell Condor to run! Although it clearly
isn't an interesting job, it models some of the aspects of a real scientific
program: it takes a while to run and it produces some output.

Top

3.2 Submitting your job


Now that you have a job, you just have to tell Condor to run it. Put the
following text into a file called simple.sub:

Universe = vanilla
Executable = simple.bat
Arguments = 4 12
Log = simple.log.txt
Output = simple.out.txt
Error = simple.err.txt
Queue

Let's examine each of these lines:

• Universe: The vanilla universe means a plain old job. Later on,
we'll encounter some special universes.
• Executable: The name of your program
• Arguments: These are the arguments you want. They will be
the same arguments we typed above.
• Log: This is the name of a file where Condor will record
information about your job's execution. While it's not required, it
is a really good idea to have a log.
• Output: Where Condor should put the standard output from
your job.
• Error: Where Condor should put the standard error from your
job. Our job isn't likely to have any, but we'll put it there to be
safe.

Next, tell Condor to run your job:

C:\condor-test\> condor_submit simple.sub


Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 1.

Now, watch your job run:

C:\condor-test\> condor_q

-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21


ID OWNER SUBMITTED RUN_TIME ST PRI SIZE
CMD
3.0 Administrator 11/27 10:01 0+00:00:00 I 0 0.0
simple.bat 4 12

1 jobs; 1 idle, 0 running, 0 held

C:\condor-test\> condor_q

-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21


ID OWNER SUBMITTED RUN_TIME ST PRI SIZE
CMD
3.0 Administrator 11/27 10:01 0+00:00:00 R 0 0.0
simple.bat 4 12

1 jobs; 0 idle, 1 running, 0 held

C:\condor-test\> condor_q
-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE
CMD

0 jobs; 0 idle, 0 running, 0 held

Notice a few things here. In a real pool, when you do condor_q, you
might get a long list of everyone's jobs. So you can tell condor_q to just
list your jobs with the -sub option, which is short for submitter, as in:

C:\condor-test\> condor_q -sub roy

For this tutorial, there is probably only one person per computer, so it
probably isn't necessary.

When my job was done, it was no longer listed. Because I told Condor
to log information about my job, I can see what happened:

C:\condor-test\> more simple.log.txt


000 (003.000.000) 11/27 10:01:54 Job submitted from host:
<129.215.30.181:2207>
...
001 (003.000.000) 11/27 10:02:00 Job executing on host:
<129.215.30.173:2217>
...
005 (003.000.000) 11/27 10:02:01 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote
Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local
Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote
Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local
Usage
175 - Run Bytes Sent By Job
322 - Run Bytes Received By Job
175 - Total Bytes Sent By Job
322 - Total Bytes Received By Job
...

That looks good: It took a few seconds for the job to start up, though
you will often see slightly slower startups. Condor doesn't optimize for
fast job startup, but for high throughput, The job ran for about four
seconds. But did our job execute correctly? If this had been a real
Condor pool, the execution computer would have been different than
the submit computer, but otherwise it would have looked the same.

C:\condor-test\> more simple.out.txt


1
2
3
4
5
6
7
8
9
10
Thinking really hard for 4 seconds...

Excellent! We ran our sophisticated scientific job on a Condor pool!

Top

3.3 Doing a parameter sweep


If you only ever had to run a single job, you probably wouldn't need
Condor. But we would like to have our program calculate a whole set of
values for different inputs. How can we do that? Let's change our
submit file to look like this:

tx

There are two important differences to notice here. First, the Output
and Error lines have the $(Process) macro in them. This means that
the output and error files will be named according to the process
number of the job. You'll see what this looks like in a moment. Second,
we told Condor to run the same job an extra two times by adding extra
Arguments and Queue statements. We are doing a parameter sweep
on the values 10, 11, and 12. Let's see what happens:

C:\condor-test\> condor_submit simple.sub


Submitting job(s)...
Logging submit event(s)...
3 job(s) submitted to cluster 3.

C:\condor-test\> condor_q

-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21


ID OWNER SUBMITTED RUN_TIME ST PRI SIZE
CMD
6.0 Administrator 11/27 10:39 0+00:00:00 I 0 0.0
simple.bat 4 10
6.1 Administrator 11/27 10:39 0+00:00:00 I 0 0.0
simple.bat 4 11
6.2 Administrator 11/27 10:39 0+00:00:00 I 0 0.0
simple.bat 4 12

3 jobs; 3 idle, 0 running, 0 held

C:\condor-test\> condor_q
-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE
CMD
6.0 Administrator 11/27 10:39 0+00:00:04 R 0 0.0
simple.bat 4 10
6.1 Administrator 11/27 10:39 0+00:00:04 R 0 0.0
simple.bat 4 11
6.2 Administrator 11/27 10:39 0+00:00:04 R 0 0.0
simple.bat 4 12
3 jobs; 3 idle, 0 running, 0 held

C:\condor-test\> condor_q -run


-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21
ID OWNER SUBMITTED RUN_TIME HOST(S)
6.0 Administrator 11/27 10:39 0+00:00:01 lab-13
6.1 Administrator 11/27 10:39 0+00:00:01 lab-04
6.2 Administrator 11/27 10:39 0+00:00:01 lab-22

C:\condor-test\> condor_q
-- Submitter: lab-21 : <129.215.30.181:2207> : lab-21
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE
CMD

0 jobs; 0 idle, 0 running, 0 held

C:\condor-test> dir *out.txt


Volume in drive C has no label.
Volume Serial Number is 14E3-4F7E

Directory of C:\condor-test

11/15/2007 02:10 PM 70 simple.0.out.txt


11/15/2007 02:10 PM 74 simple.1.out.txt
11/15/2007 02:10 PM 78 simple.2.out.txt
3 File(s) 222 bytes
0 Dir(s) 31,199,772,672 bytes free

C:\condor-test>more simple.0.out.txt
1
2
3
4
5
6
7
8
9
10
Thinking really hard for 4 seconds...
C:\condor-test>more simple.1.out.txt
1
2
3
4
5
6
7
8
9
10
11
Thinking really hard for 4 seconds...

C:\condor-test>more simple.2.out.txt
1
2
3
4
5
6
7
8
9
10
11
12
Thinking really hard for 4 seconds...

Notice that we had three jobs with the same cluster number, but
different process numbers. They have the same cluster number
because they were all submitted from the same submit file. When the
jobs ran, they created three different output files, each with the desired
output.

You are now ready to submit lots of jobs! Although this example was
simple, Condor has many, many options so you can get a wide variety
of behaviors. You can find many of these if you look at the
documentation for condor_submit.

Extra credit
• What if you want the cluster number to be part of the output
filename?
• Condor sends you email when a job finishes. How can you
control this?
• Make another scientific program that takes it input from a file.
Now submit 3 copies of this program where each input file is in a
separate directory. Use the initialdir option described in the
lecture, or in the manual.
• Bonus points: You know that your job should never run for
more than four hours. If it does, then the job should be killed
because there is a problem. How can you tell Condor to do this
for you?

Next: Submitting with file transfer

Top

You might also like