Professional Documents
Culture Documents
AUTOMATION
All about knowledge of datawarehouse and automation of routine tasks
Skip to content
HOME
INFORMATICA : DOCUMENTATION OF CODE
DATASTAGE : DOCUMENTATION OF CODE
ETL : JOB CONTROL TABLE AND ITS IMPLEMENTAION FOR INCREMENTAL LOAD.
UNIX: SHELL SCRIPT TO PULL REQUIRED FIELDS FROM THE SOURCE FILE.
UNIX: S3 CODE SNIPPETS
ABOUT
SQOOP : MYSQL TO S3
data since the last run date of ETL jobs. The below diagram is specifically done by considering Informatica as
the ETL tool. The same can be implemented in other ETL tools with some modifications.
Initial Values in ETL Control table : The initial values for High and Low watermark dates will be set to
1/1/1900 12:00 and process name = <name of the job> will be inserted into Job Control table for all the
dataflow jobs. This could be inserted in the deployment script as a one time activity.
ETL_Control_Tabl
e
1/1/190 1/1/190
-1 wf_Patient NULL NULL
0 0:00 0 0:00
Explanation of the flow:
1. Batch Identifier is a sequentially generated number which is unique for each run of the jobs. A batch id is
generated initially when we start our jobs. The batch start date is inserted into table. Batch End Date will be
The batch table is used to monitor the performance of the jobs over a period of time.
2. The dataflow jobs which would be run after the Batch Identifier job, will get the previous successful run of
the respective dataflow from the Job Control table. The High Water mark date of the previous run will be used
High Watermark date of current run is determined by the max date of source system.
Low Watermark date = High Watermark date of recent previous success run.
3. Once the dataflow completes its execution, the status of the execution is updated in Job Control tables with
Low water mark and High water mark. This record will be used to get the Low Watermark of the next run.
In case of failure, the error message will also be updated in the control table.
used and on completion Batch end date will be updated in the batch table.
ETL_Control_Tab
le
1/1/190 1/1/190
-1 wf_Patient NULL NULL
0 0:00 0 0:00
Data
too
8/6/201 8/6/201 8/4/201 8/5/201 large
2 wf_Patient E Error
4 16:35 4 16:37 4 0:00 4 0:00 for
colum
n
Name(required)
Email(required)
Website
Comment(required)
Submit
REPORT THIS AD
REPORT THIS AD
Share this:
Leave a Reply
REPORT THIS AD
Blog at WordPress.com.
Close and accept
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
Follow