Professional Documents
Culture Documents
Sequentially and
Conditionally
created by Anoop kumar on Dec 25, 2012 3:26 PM, last modified by
27, 2012 12:09 PM
Anoop kumar
on Dec
Version 7
inShare
Introduction:
This Article provides various solutions for scheduling multiple BODS Batch Jobs (Jobs) sequentially and
conditionally. As you are aware that BODS does not contain an inbuilt mechanism to chain multiple
Jobs within one parent Job and the default way to work is to chain multiple Workflow's within a Job.
However we cannot execute workflow on its own and need a Job to execute it and in various scenarios
there is a need to sequence the Jobs and conditionally run the Jobs. The approaches provided below
can be used where chaining multiple Workflow's within a Job is not enough and Jobs have to be
chained/Sequenced.
The advantages of using below approaches are
1.There is no need for a Third Party Scheduling Tool, Various features within BODS can be combined to
create a Job, which acts as a Parent Job, which can be scheduled to trigger Jobs one after other. Parent
Job acts like a Sequencer of Jobs.
2. We can avoid scheduling each and every Job and only Schedule Parent Jobs.
3. Using WebServices approach, Global Variables can be passed Via XML file to the Jobs in a simplified
manner.
4. Using WebServices approach, Developer would need access only to the folder, which JobServer can
access, to place the XML files and does not require access to the JobServer itself.
5. Avoids loading a Job with too many WorkFlows.
6. Time based Scheduling (example:Schedule Jobs at 10 every minutes Interval) can be avoided and
Hence there will not any overlap if the preceding Job takes more than 10 minutes.
7.As the Child Jobs and the Parent Job will have its own Trace Logs it would make it easier to
troubleshoot in case of any issues.
8.At any point, Child Jobs can be run independently too in Production Environment, this will not be
possible if the entire Job logic is put into a WorkFlow.
When the Schedule_Jobs Parent Job is run it triggers Job1 and then after completion (successful
completion/Termination) of Job1 it triggers Job2. Now Parent Job can be Scheduled in the Management
Console to run at a Scheduled time and it would trigger both Job1 and Job2 in sequence as required.
Note that if the Job1 hangs due to some reason, Schedule_Job will wait until Job1 comes out of the
hung state and returns control to Schedule_Job. In this way any number of Jobs can be sequenced.
Sequencing using Webservices:
If the same above two jobs (Job1 and Job2) have to be executed in sequence using Webservices, below
approach can be used.
1.Publish both Job1 and Job2 as Webservice from Management Console.
2.Pick up the Webservice URL using the view WSDL option, the link will be as given below
http://<hostname>:28080/DataServices/servlet/webservices?ver=2.1&wsdlxml
3.In Designer, create a new DataStore with Datastore type as WebService and provide the WebService
URL fetched from View WSDL option.
5. Create a simple Parent Job (called Simple_Schedule) to trigger Job1 and Job2
6. In the Call_Job1 query object, call Job1 as shown in below diagrams, as no inputs are required for
Job1, the DI_ROW_ID from Row_Generation or Null can be passed on to the Job1.
Using the above code in the Script, when the Parent Job is Run it will trigger Job1 and only if Job1 has
completed successfully it will trigger Job2. If Job1 fails then Parent Job will be terminated using the
raise_exception function. This approach can be used to conditionally schedule any number of Jobs.
Conditional Execution using Webservices:
To conditionally execute Job (Published as WebService) based on the status of preceding Job (again
Published as WebService), the same concept used in the Conditional Execution using Script can be
applied. i.e Call Job1, Check the Status of Job1 and then if Job1 is successful trigger Job2.
1.Create a Parent Job with 2 DataFlows and a Script in between the DataFlows
2. Use First DataFlow to call the First Job (Refer above section for detail on calling a Job as webservice
within another Job)
3. Use the Second DataFlow to call the Second Job
4. Use the Script to Check the Status of First Job
Using the above code in the Script when the Parent Job is Run it will trigger Job1 and only if Job1 has
completed successfully it will trigger Job2. This approach can be used to conditionally schedule any
number of Jobs that are published as WebService.
Conditional Execution using Webservices: Jobs with Global Variables
When Jobs have Global Variables for which values needs to be passed while triggering it,It needs to be
handled differently as when the Job is called as webservice it expects Global Variables to be mapped.
So the idea is to pass either Null Values (For Scheduled Run) or actual Values (For Manual Trigger) using
XML file as input.
Lets assume that the First Job has 2 Global Variables like $GV1Path and $GV2Filename and that the
second Job does not have any Global Variables and the requirement is to trigger Job2 immediately after
successful completion of Job1.
1.Similar to above Parent Job, Create a Parent Job with 2 DataFlows and a Script in between the
DataFlows
2. Use First DataFlow to call the First Job (Refer above sections for detail on calling a Job as webservice
within another Job), Instead of Using Row Generator Object use XML input File as source
The XSD for the Input XML file will be as given below, if there are more Global Variables in the Job then
elements GV3, GV4 and so on should be added to the Schema.
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="FIRSTJOB">
<xs:complexType>
<xs:sequence>
<xs:element name="GLOBALVARIABLES">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="GV1"/>
<xs:element type="xs:string" name="GV2"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
3. In the "WebService function Call" in "call_FirstJob" Query object, map the Global Variables as shown
below
4.Use Second DataFlow to call the Second Job. As this Job does not contain Global Variables, Row
Generation object would be enough (as in previous
section)
5. Use the Script object to Check the Status of First Job
Using the above approach, When the Parent Job is Run it will trigger First Job and pass the Global
Variables present in the Input XML File and only if First Job has completed successfully it will trigger
Second Job. This approach can be used to conditionally schedule any number of Jobs that are published
as WebService. For every Job that has Global Variables, An XSD and XML file should be created. The
Global Variables passed from the XML file to the WebService seems to be working only when the
parameters are passed in right order, Hence it would be good practice to name the Global Variables
with good naming convention like $GV1<name>, $GV2<name> and so on.
inShare
I really appreciate the quality of Anoop Kumar's recent article "Scheduling BODS Jobs Sequentially and
Conditionally". And the technical accuracy is high -- yes, you can accomplish what you are trying to do
with the techniques discussed in the article. Love the visuals, too.
However.
I cannot really recommend this kind of solution. Data Services is not an enterprise scheduling or
orchestration tool. This approach suffers a bit from Maslow's law of the instrument: "if the only tool
you have is a hammer...treat everything as if it were a nail." Yes, I love Data Services and Data
Services is capable of doing all of these things. Is it the best tool for this job?
Not exactly. And this question is answered in the first paragraph that mentions chaining workflows.
Data Services already gives you the capability to encapsulate, chain together, and provide conditional
execution of workflows. If jobs only contain one dataflow each, why are you calling them jobs and why
do you want to execute these jobs together as a unit? Data Services is a programming language like
other programming languages, and some discretion needs to be taken for encapsulation and
reusability.
I do really like the use of web services for batch job launching. It is a fantastic feature that is
underutilized by DS customers. Instead, I see so many folks struggling to maintain tens and
sometimes hundreds of batch scripts. This is great for providing plenty of billable work for the
administration team, but it isn't very good for simplifying the DS landscape. The web services
approach here will work and seems elegant, but the section about "sequencing using web services"
does not sequence the jobs at all. It just sequences the launching. Batch jobs launched as web
services are asynchronous... you call the SOAP function to launch the job, and the web service provider
replies back with whether the job was launched successfully. This does not provide any indication of
whether the job hascompleted yet. You must keep a copy of the job's runID (provided to you as a
reply when you launch the job successfully) and use the runID to check back with the DS web service
function Get_BatchJob_Status (see section 3.3.3.3 in the DS 4.1 Integrator's Guide). [Note: scheduling
and orchestration tools are great for programming this kind of logic.]
Notice how it would be very hard to get true dependent web services scheduling in DS since you would
have to implement this kind of design inside of a batch job:
Have a dataflow that launches Job1 and returns the runID to the parent object as a variable
In the looping workflow, pass the runID to a dataflow that checks to see if Job1 is completed
successfully
Have a dataflow that launches Job2 and returns the runID to the parent object as a variable
This convoluted design is functionally IDENTICAL to the following and does not rely on web services:
I'm also hesitant to recommend a highly customized DS job launching solution because of
supportability. When you encapsulate your ETL job launching and orchestration in an ETL job, it's not
very supportable by the consultants and administrators who will inherit this highly custom solution.
This is why you invest in a tool like Tidal, Control-M, Maestro, Tivoli, Redwood, etc., so that the
scheduling tool encapsulates your scheduling and monitoring and notification logic. Put the job
execution logic into your batch jobs, and keep the two domains separate (and separately
documentable). If you come to me with a scheduling/launching problem with your DS-based highly
customized job launching solution, I'm going to tell you to reproduce the problem without the
customized job launching solution. If you can't reproduce the problem in a normal fashion with out-ofthe-box DS scheduling and launching, you own responsibility for investigating the problem yourself.
And this increases the cost to you of owning and operating DS.
If you really want to get fancy with conditional execution of workflows inside of a job, that is pretty
easy to do.
Set up substitution parameters to control whether you want to run Workflow1, Workflow2,
Workflow3, etc. [Don't use Global Variables. You really need to stop using Global Variables
so much...your doctor called me and we had a nice chat. Please read this twice and call me
in the morning.]
Ok, so you have multiple substitution parameters. Now, set up multiple substitution parameter
configurations with $$Workflow1=TRUE, $$Workflow2=TRUE, $$Workflow3=TRUE, or $
$Workflow1=TRUE, $$Workflow2=FALSE, $$Workflow3=FALSE, etc. Put these substitution
parameters into multiple system configuration, e.g. RunAllWorkflows or RunWorkflows12.
In your job, use Conditional blocks to evaluate whether $$Workflow1=TRUE -- if so, run
Workflow1. Else continue with the rest of the job. To another conditional that evaluates $
$Workflow2...etc.
Depending on which workflows you want to execute, just call the job with a different system
configuration.
Yes, you can include System Configuration name when you call a batch job via command line
or via a web service call.
For web services, you just need to enable Job Attributes in the Management Console > Administrator ->Web Services (see section 3.1.1.1 step 9 in the DS 4.1 Integrator's Guide) and
specify the System Configuration name inside of element:
<job_system_profile>RunAllWorkflows</job_system_profile>.
For command line launching, use the al_engine flag:
-KspRunAllWorkflows
Yes, you can override your own substitution parameters at runtime.
For Web Services, enable Job Attributes and specify the overrides inside of the tags:
<substitutionParameters>
<parameter name="$$Workflow1">TRUE</parameter>
<parameter name="$$Workflow2">FALSE</parameter>
</substitutionParameters>
For command line launching, use the al_engine flag:
-CSV"$$Workflow1=TRUE:$$Workflow2=FALSE" (put a list of Substitution Parameters in quotes, and
separate them with semicolons)