Parallel processing SAS Batch jobs in Windows, check return codes and then aggregate

information
This post will discuss a method and present macros to achieve Parallel processing SAS Batch jobs in
Windows OS.
Scenario:
You need to extract data from say a large database or multiple sources and then aggregate that
information into one dataset in the least possible time without sacrificing much on the processing time
and programming time.
Let’s discuss this hypothetical scenario with an example…Imagine that there were 3 independent extract
processes…and suppose each of those processes took the below min to process…See Appendix #1
below…The example is simple to understand but the macro might be a little bit overkill…but I am sure it
is worth 100% to try…Thanks in advance for your patience….
Process 1 took 30 min
Process 2 took 30 min
Process 3 took take 30 min
Aggregate process took 15 min
Total process : 1 hr 45 min.

Now your task is to somehow decreasing the processing time..
The simple solution is to decrease the processing time by processing these independent extract jobs in
parallel and then aggregate the info in a separate process. But how do we do that?

Assuming that we have a directory structure like this for the Project to store permanent datasets(data),
source information (sourcedata), logs (log), programs(progs) and output (out)

Here I present the steps to do that:
1. Break down the processes into separate .sas programs (Look at Programs in Appendix # 2 #3)
Put each of the extract processes into separate programs…For this example prog1, prog2 , prog 3
and aggregate(aggregation part)
SASStart.bat – It is a batch file (save the code in Appendix # 4 into C:\Project\bat\SASStart.bat)

2. Kick off those separate programs using systask command with the SASStart.bat batch file
Use systask SAS statement to execute the Batch file SASStart.bat asynchronously i.e.
concurrently.
More info http://support.sas.com/onlinedoc/913/getDoc/en/hostwin.hlp/win-stmt-systask.htm
Syntax briefly
systask command "SASStart.bat progsname" taskname="task1" status=rc_task1;
systask command – tells SAS to run the bat program asynchronously
taskname – identify the task
status= stores the return code value in the macro variable specified (rc_task1 in the example)
3. Check the return codes before aggregating

We need to check the return codes of each of those process by looking at the values of the macro
variables mentioned in the status=macvar.
4. Aggregate the code
Prepare the aggregation code and the process code to do all the above tasks.
Please look at the Appendices for the code and explanation of the code.
Download all the code and data from http://www.sastechies.com/blogs/parallel.zip
All the macros are generic and a little bit of customization in the highlighted areas is required to
implement this technique to your programs.
1.Replace these macro variable values to your project specific paths
%let progPath=C:\Project\progs; /* Location of the Programs*/
%let BatchPath=C:\Project\bat; /* Location of the SAS Start Batch file*/
%let prognames= prog1 prog2 prog3; /*name of the program names to execute*/
2. replace the contents of the Prog1-3.sas and aggregate.sas to your project specific programs.

APPENDIX # 1
Code that processes sequentially
Libname datalib "C:\Project\data";
Libname source "C:\Project\sourcedata";

/* Process 1: Extracting info starting with J */
proc sql;
create table datalib.data1 as
select *
from source.class1
where name like 'J%';
quit;
/* Process 2: Extracting info starting with A */
proc sql;
create table datalib.data2 as
select *
from source.class2
where name like 'A%';
quit;
/* Process 3: Extracting info starting with R */
proc sql;
create table datalib.data3 as
select *
from source.class3

where name like 'R%';
quit;
/* aggregate Process: Put the information together */
data datalib.aggregatedata;
set datalib.data1 datalib.data2 datalib.data3;
run;
/* sorting by name */
proc sort data=datalib.aggregatedata;
by name;
quit;

Appendix # 2:
Save the code into each of these .sas files

Program prog1.sas
Libname datalib "C:\Project\data";
/* Process 1: Extracting info starting with J */
proc sql;
create table datalib.data1 as
select *
from source.class1
where name like 'J%';
quit;

Program prog2.sas
Libname datalib "C:\Project\data";
/* Process 2: Extracting info starting with A */
proc sql;
create table datalib.data2 as
select *
from source.class2
where name like 'A%';
quit;
Program prog3.sas
Libname datalib "C:\Project\data";
/* Process 3: Extracting info starting with R */
proc sql;
create table datalib.data3 as
select *
from source.class3
where name like 'R%';
quit;

Appendix # 3:
Program aggregate.sas
Description of each of these macros used in the program
%initiateParallelProcessing – This is the main macro which kicks off all the parallel programs with
different SAS sessions. The SYSTASK statement within the Do loop here kicks off the batch program
c:\Project\bat\SASStart.bat with program as the parameter for every task. It also has the name of the
taskname and stores the return code of the batch process to the macro variable mentioned in the
status= statement.
systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.;
the WAITFOR _ALL_ task1 task2 task3 – tells SAS to wait for all the processes to complete before
executing all the statements after that i.e. %checkreturncodes and %aggregate etc..
waitfor _all_ %tasklist(&start.,&end.);;

/* Wait for All of the tasks to complete */

%checkreturncodes(&start.,&end.,task,"flag_error=N");; - Print All the Error Codes to Log
%aggregatecode – this macro checks all the return codes of the parallel tasks..if 0 then
completes the aggregating code after that.
%sleepsecs – sometimes it is better to sleep sas for couple of secs just to let the Windows OS settle
with the new task…so I’ve used sleepsecs(2) so that Windows can recognize the batch file execution
properly…this is optional

%tasklist(start,end) – I’ve used this macro just to give me the list of the tasks that were spawned..
%Conditions(start,end,rctype)- I’ve used this macro piece to check the values for rc_task1 eq 0 or
rc_task2 eq 0 etc for all the task id’s spawned…
%Checkreturncodes(start,end,rctype,cond)- This macro checks for the conditions rc_task1 eq 0 or
rc_task2 eq 0 and then sets the condition that a user wants to set…in this case I wanted to set the
condition flag_error=N…
%printerrorcodes(start,end,rctype)- This macro prints the error codes to the Log..just a listing of the
values of the macro variables of all the tasks..
splitmacroval(macvar,prefix,splitcnt,dlm=' ')- Read more about this macro athttp://sastechies.blogspot.com/2009/11/sas-macro-to-split-macro-variables.html

Full Code
/* Macro to send SAS to a sleep mode for few seconds */
%macro sleepsecs(sec);
%put;%put =============Macro: sleepsecs(&sec);
data _null_;
slept= sleep(&sec);/* argument to sleep must be expressed in seconds */
run;
%mend sleepsecs;
/* Macro to generate a tasklist task1 task2 etc... */
%macro tasklist(start,end);

%do i=&start %to &end.;
task&i.
%end;
%mend tasklist;
/* Macro to check if all the returncodes for the Child sessions is 0 or not*/
%macro Conditions(start,end,rctype);
%put;%put =============Macro: Conditions(&start,&end,&rctype);
%do i=&start %to %eval(&end - 1);
&&&rc_&rctype.&i. eq 0 or
%end;
%mend Conditions;
/* Macro to Check the return codes of all tasks */
%macro Checkreturncodes(start,end,rctype,cond);
%put;%put =============Macro: Checkreturncodes(&start,&end,&rctype,&cond);
%put;%put
=============================================================================
;
%put The Error codes for the Above tasks are:
%printerrorcodes(&start,&end,&rctype);
%put
=============================================================================
;%put;
%if %Conditions(&start,&end,&rctype) &&&rc_&rctype.&end. eq 0 %then
%do;

%let leftof=%scan(%sysfunc(compress(&cond,'"')),1,'=');
%let rightof=%scan(%sysfunc(compress(&cond,'"')),2,'=');
%global &leftof;
%let &leftof=&rightof;
%put Flag Set: &leftof=&rightof;
%put;
%end;
%mend Checkreturncodes;
/* Macro to print the error codes to the Log */
%macro printerrorcodes(start,end,rctype);
%put;%put =============Macro: printerrorcodes(&start,&end,&rctype);
%do i=&start %to %eval(&end);
rc_&rctype.&i.=&&&rc_&rctype.&i.
%end;
%mend printerrorcodes;
/* Create a SAS session for each of the Filename Inputs */
%macro initiateParallelProcessing;
%let start=1;
%let end=&progcnt;
options noxwait noxsync;
x cd "&BatchPath";
%do i=&start. %to &end.;

/* */

/* Call the SASStart.bat with All the parameters in the Back ground mode
with Systask command */
%put;%put *************************Task executing
task&i.********************************************;
%put systask command "SASStart.bat &&progs&i." taskname="task&i."
status=rc_task&i.;
systask command "SASStart.bat &&progs&i." taskname="task&i."
status=rc_task&i.;
%sleepsecs(3);
/* wait for 3 secs atleast for the system to stabilize
with the new App*/
%end;
waitfor _all_ %tasklist(&start.,&end.);;
/* Wait for All of the tasks
to complete */
%checkreturncodes(&start.,&end.,task,"flag_error=N");;
/* Print All
the Error Codes to Log */
%mend initiateParallelProcessing;
%macro splitmacroval(macvar,prefix,splitcnt,dlm=' ');
%global &splitcnt; /*create the global variable for storing the splitcount*/
%let num=1; /*initialize a counter variable*/
%global &prefix# /*create the global variable for storing the first
split*/
%let &prefix&num=%scan(&&&macvar,&num,&dlm); /*Scan the first value*/
%do %while(&&&prefix&num ne );
%let num=%eval(&num + 1);/*increment the counter*/

%global &prefix# /*create the global variable*/
%let &prefix&num=%scan(&&&macvar,&num,&dlm); /*scan the next value*/
%end;
%let &splitcnt=%eval(&num - 1); /*Store the split count to the macro variable
reqd*/
%mend splitmacroval;
%let flag_error=Y; /* Initialize the error flag to Y */
%let progPath=C:\Project\progs; /* Location of the Programs*/
%let BatchPath=C:\Project\bat; /* Location of the SAS Start Batch file*/
%let prognames= prog1 prog2 prog3; /*name of the program names to execute*/
%splitmacroval(prognames,progs,progcnt,dlm=' ');/* Read more about this macro
at-http://sastechies.blogspot.com/2009/11/sas-macro-to-split-macrovariables.html*/

%initiateParallelProcessing;/* Call the SAS Process to process those files */
%put The value of flag_error=&flag_error; %put;
%macro aggregatecode;
%if &flag_error eq N %then
%do;
Libname datalib "C:\Project\data";
data datalib.aggregatedata;
set datalib.data1 datalib.data2 datalib.data3;
run;
proc sort data=datalib.aggregatedata;

by name;
quit;
%end;
%else %do; %put Errors in Processing...Please check the Logs...; %end;
%mend aggregatecode;
%aggregatecode;

Appendix # 4:
Save the code to SASstart.bat and put it in C:\projects\bat\
This windows batch file takes one parameter i.e. name of the sas program (extension .sas is not
required). This batch file kicks a batch SAS session with the command
For eg. The macro initiateProcessing uses SASstart.bat to kick off prog2.sas with this command.
"C:\Program Files\SAS\SAS 9.1\sas.exe " -sysin "C:\Project\progs\prog2.sas.sas" -print
"C:\Project\out\prog2 08DEC2009.lst" –log "C:\Project\Log\prog2 08DEC2009.log" -unbuflog -nodmsexp
-nosplash
Read more about Running SAS in batch mode in Windows OS at
http://sastechies.blogspot.com/2009/12/running-sas-in-batch-mode-in-windows-os.html
tips to understand this program is to compare with SAS syntax…
The Set statement is like our SAS %let statement
%XXX% is like out SAS &XXXX macro resolution
%ERRORLEVEL% is like out SAS &sysrc macvariable
At SET ProgName=%1 – is the parameter that we send to the SASstart.bat program
SASStart.bat
@echo off

if x==%1x goto CHKPARM
goto PROCESS
:PROCESS
echo Start Processing
SET ProgName=%1
SET SASEXEpath=C:\Program Files\SAS\SAS 9.1\sas.exe
SET ProgPath=C:\Project\progs
SET Prog2Start=%ProgPath%\%ProgName%.sas
for /f "tokens=1-2,* delims= " %%p in ('echo %DATE%') do SET Rundated=%%q
echo Rundated=%Rundated%

for /f "tokens=1-3,* delims=/" %%f in ('echo %Rundated%') do SET Rundatet=%%f%%g%%h
echo Rundatet=%Rundatet%
SET day=%Rundatet:~2,2%
SET year=%Rundatet:~4,4%
SET Monthtemp=%Rundatet:~0,2%
echo Monthtemp=%Monthtemp%
if %Monthtemp%==01 set month=JAN
if %Monthtemp%==02 set month=FEB

if %Monthtemp%==03 set month=MAR
if %Monthtemp%==04 set month=APR
if %Monthtemp%==05 set month=MAY
if %Monthtemp%==06 set month=JUN
if %Monthtemp%==07 set month=JUL
if %Monthtemp%==08 set month=AUG
if %Monthtemp%==09 set month=SEP
if %Monthtemp%==10 set month=OCT
if %Monthtemp%==11 set month=NOV
if %Monthtemp%==12 set month=DEC
SET Rundate=%day%%Month%%year%
SET ProjPath=C:\Project
SET Logpath=%ProjPath%\Log
SET OutputPath=%ProjPath%\out
SET Logfile=%Logpath%\%ProgName% %Rundate%.log
SET Outputfile=%OutputPath%\%ProgName% %Rundate%.lst
echo Command Executing is
echo --------------------------------------------------echo "%SASEXEpath%" -sysin "%Prog2Start%.sas" -print "%Outputfile%" -log "%Logfile%" -unbuflog
-nodmsexp -nosplash
echo --------------------------------------------------"%SASEXEpath%" -icon -sysin "%Prog2Start%" -print "%Outputfile%" -log "%Logfile%" -unbuflog nodmsexp -nosplash
SET Capterr=%ERRORLEVEL%
echo SAS Execution Complete with Exit code %Capterr%

goto TERM

:CHKPARM
echo Please check your parameter list
echo Correct Usage: DQS AppName Environment FileDate Filename
echo Eg: DQS CAHSEE Production 05/10/2009 CAHSEE.DAT
goto TERM
:TERM
exit /b %Capterr%

Appendix # 5
Log of the Aggregate process
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569

/* Macro to send SAS to a sleep mode for few seconds
%macro sleepsecs(sec);
%put;%put =============Macro: sleepsecs(&sec);

*/

data _null_;
slept= sleep(&sec);/* argument to sleep must be expressed in seconds */
run;
%mend sleepsecs;
/* Macro to generate a tasklist task1 task2 etc... */
%macro tasklist(start,end);
%do i=&start %to &end.;
task&i.
%end;
%mend tasklist;
/* Macro to check if all the returncodes for the Child sessions is 0 or not*/
%macro Conditions(start,end,rctype);
%put;%put =============Macro: Conditions(&start,&end,&rctype);
%do i=&start %to %eval(&end - 1);
&&&rc_&rctype.&i. eq 0 or
%end;
%mend Conditions;
/* Macro to Check the return codes of all tasks */
%macro Checkreturncodes(start,end,rctype,cond);
%put;%put =============Macro: Checkreturncodes(&start,&end,&rctype,&cond);
%put;%put =============================================================================;
%put The Error codes for the Above tasks are: %printerrorcodes(&start,&end,&rctype);
%put =============================================================================;%put;
%if %Conditions(&start,&end,&rctype) &&&rc_&rctype.&end. eq 0 %then
%do;

570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
597!
598
598!
599
600
601
601!
602
603
604
604!
605
605!

%let leftof=%scan(%sysfunc(compress(&cond,'"')),1,'=');
%let rightof=%scan(%sysfunc(compress(&cond,'"')),2,'=');
%global &leftof;
%let &leftof=&rightof;
%put Flag Set: &leftof=&rightof;
%put;
%end;
%mend Checkreturncodes;
/* Macro to print the error codes to the Log */
%macro printerrorcodes(start,end,rctype);
%put;%put =============Macro: printerrorcodes(&start,&end,&rctype);
%do i=&start %to %eval(&end);
rc_&rctype.&i.=&&&rc_&rctype.&i.
%end;
%mend printerrorcodes;
/* Create a SAS session for each of the Filename Inputs */
%macro initiateParallelProcessing;
%let start=1;
%let end=&progcnt;
options noxwait noxsync;
x cd "&BatchPath";
%do i=&start. %to &end.;
/* */
/* Call the SASStart.bat with All the parameters in the Back ground mode with Systask
command */
%put;%put *************************Task executing
task&i.********************************************;
%put systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.;
systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.;
%sleepsecs(3); /* wait for 3 secs atleast for the system to stabilize with the new
App*/
%end;

*/

waitfor _all_ %tasklist(&start.,&end.);;

/* Wait for All of the tasks to complete

%checkreturncodes(&start.,&end.,task,"flag_error=N");;
Codes to Log */

/* Print All the Error

606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633

%mend initiateParallelProcessing;
%macro splitmacroval(macvar,prefix,splitcnt,dlm=' ');
%global &splitcnt; /*create the global variable for storing the splitcount*/
%let num=1; /*initialize a counter variable*/
%global &prefix# /*create the global variable for storing the first split*/
%let &prefix&num=%scan(&&&macvar,&num,&dlm); /*Scan the first value*/
%do %while(&&&prefix&num ne );
%let num=%eval(&num + 1);/*increment the counter*/
%global &prefix# /*create the global variable*/
%let &prefix&num=%scan(&&&macvar,&num,&dlm); /*scan the next value*/
%end;
%let &splitcnt=%eval(&num - 1); /*Store the split count to the macro variable reqd*/
%mend splitmacroval;

%let progPath=C:\Project\progs; /* Location of the Programs*/
%let BatchPath=C:\Project\bat; /* Location of the Batch file*/
%let prognames= prog1 prog2 prog3; /*name of the program names to execute*/
%splitmacroval(prognames,progs,progcnt,dlm=' ');/* Read more about this macro athttp://sastechies.blogspot.com/2009/11/sas-macro-to-split-macro-variables.html*/
%let flag_error=Y;
%initiateParallelProcessing;

/* Call the SAS Process to process those files */

*************************Task executing task1********************************************
systask command "SASStart.bat prog1" taskname="task1" status=rc_task1
=============Macro: sleepsecs(3)
NOTE: Task "task1" produced no LOG/Output.
NOTE: DATA statement used (Total process time):
real time
3.06 seconds
cpu time
0.04 seconds

*************************Task executing task2********************************************
systask command "SASStart.bat prog2" taskname="task2" status=rc_task2
=============Macro: sleepsecs(3)
NOTE: Task "task2" produced no LOG/Output.
NOTE: DATA statement used (Total process time):
real time
3.06 seconds
cpu time
0.04 seconds

*************************Task executing task3********************************************
systask command "SASStart.bat prog3" taskname="task3" status=rc_task3
=============Macro: sleepsecs(3)
NOTE: Task "task3" produced no LOG/Output.
NOTE: DATA statement used (Total process time):
real time
3.06 seconds
cpu time
0.04 seconds

=============Macro: Checkreturncodes(1,3,task,"flag_error=N")
=============================================================================
=============Macro: printerrorcodes(1,3,task)
The Error codes for the Above tasks are: rc_task1=0
rc_task2=0
rc_task3=0
=============================================================================

=============Macro: Conditions(1,3,task)
Flag Set: flag_error=N
634 %put The value of flag_error=&flag_error; %put;
The value of flag_error=N
635
636

%macro aggregatecode;

637 %if &flag_error eq N %then
638 %do;
639
Libname datalib "C:\Project\data";
640
data datalib.aggregatedata;
641
set datalib.data1 datalib.data2 datalib.data3;
642
run;
643
644
proc sort data=datalib.aggregatedata;
645
by name;
646
quit;
647
648 %end;
649 %else %do; %put Errors in Processing...Please check the Logs...; %end;
650 %mend aggregatecode;
651
652 %aggregatecode;
NOTE: Libref DATALIB was successfully assigned as follows:
Engine:
V9
Physical Name: C:\Project\data
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:

There were 7 observations read from the data set DATALIB.DATA1.
There were 2 observations read from the data set DATALIB.DATA2.
There were 2 observations read from the data set DATALIB.DATA3.
The data set DATALIB.AGGREGATEDATA has 11 observations and 5 variables.
DATA statement used (Total process time):
real time
0.04 seconds
cpu time
0.00 seconds

NOTE: There were 11 observations read from the data set DATALIB.AGGREGATEDATA.
NOTE: The data set DATALIB.AGGREGATEDATA has 11 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time
0.07 seconds
cpu time
0.01 seconds

Appendix # 6
Final Datasets

Download all the code and data from
https://drive.google.com/open?id=0ByjOA-zaHec0Y2FpbzN5eFpfM0U&authuser=0

Read more @ http://sastechies.blogspot.com/