ETL Review

Documents needed for review process
1) Design documents (SRS, SRDS)
2) Retirement plan for existing objects
3) Source system related details
4) Impact analysis document for existing objects, if any
5) Extraction frequency (Daily, Weekly or Monthly)
6) Target system related details, if target is outside database
7) Create new informatica folder and unix folders with standard naming convention
8) Install guide (MD120)
9) ETL Transformation specifications
10) Checking the naming conventions
11) Review of mappings & sessions (Technical Issues)
12) FMEA document
13) Cronacle documents and Chains
14) Checking the performance
15) Co-ordinate with development team to setup the automated ftp for source files
16) Unit test plan
17) Migrating ETL objects into QA
Design documents
The design documents such as the SRS and the SRDS should be updated and all the
details should be present, all hyperlinks should work and the parts not applicable
should be clearly highlighted. This will help to save time and effort to review the
entire document since the SRS and the SRDS are detailed documents, which need a
considerable time and effort to review
The SRDS document should furnish details as:
• Transformation Specifications
• Capacity Plan
• Test Plan
• Impact Analysis Document
• High Level Logical Data Model
GE Confidential
Transformation Specifications
The transformation specifications should contain details about all the transformations
that have been used in the mappings the transformation logic should be explained on
a high level. This should also have information about the source and the target type
viz the source or the target is a flat file, or RDBM system.
Capacity Plan
The database capacity plan should be highlighted
Test Plan
The Unit Test Plan and the System Test Plan should be prepared highlighting all the
test cases and also it should have information that whether the entire module passed
all the tests in the development environment. Any sort of extraordinary errors
encountered should be mentioned.
Informatica Project Review (M1, M2-M3, M4)
MILESTONES CHECK POINTS
M1 1. SRS and SRDS documents should be correct from documentation

perspective eg. The hyperlinks should work. Not applicable points should be
clearly indicated.
2. SRDS should have details about the Transformation specs, Capacity Plan,
Test Plan, Impact Analysis document, High level Logical model etc.
3. Standard templates should be used for Transformation specification,
Capacity Plan.
4. If some new project is coming (not enhancement or remediation), in such
cases, whether they are going for oracle or teradata database as target
database. If they are going to load into oracle, we need to ask what is the
reason they are not going for teradata.
M2-M3 1. Detailed Transformation specifications with all details such as, the source or
target system database/schema name, tranformation logic for individual
target column. Etc.
2. For Dos and Don’ts, Informatica Developers Coding standard should be
referred. Available in BI Ops Manual.
3. One mapping, One session, One Workflow: Reason: Workflows are scheduled
by Cronacle and debugging of failures become easier.
4. In one mapping, ONLY one independent flow from source to target is
ALLOWED. More than one independent flow from source to target is NOT
ALLOWED. In such case, create separated mappings, sessions and
workflows.
5. In workflow, the option “Fail parent if this task fails” and “Fail parent if this
task does not run” should be checked in task/Sessions.
6. Mention in MD120 clearly if you are changing the default values of commit
interval, DTM buffer size, Enable high precision etc properties of session.
7. Informatica mappings should be tuned properly. Unwanted Transformation
(specially expression) should be avoided. Refer Developers coding
guidelines.
8. In lookups, for the unwanted ports (Not used in lookup conditions or as
GE Confidential
output ports) in lookups, the "O" option should be
9. Hard coding in sql override (in source qualifier or Lookup sql override)
should be avoided or if possible should be made generic. e.g. The fiscal
month,year,date related logic should be implemented dynamically instead of
hard coding if possible.
10. In loading strategy, Teradata mload or fastload should preferably be used

rather than the teradata relational odbc connections.
11. All the log file, bad file paths should be used as the production support team
gives. Reason behind it is, the root directory structure difference between
the gemsdw2 or gemsdw1 and gemsdw8p servers for Informatica setup and
default values set for Informatica server variables.
12. eg. /ftp/ in gemsdw1 or gemsdw2, relevant directory in gemsdw8p is
dwftp/.GEMSDW8P is secured server. Being it is secured, no files can be
ftped to this server using normal ftp from external to this server. So
currently the data files come to GEMSDW2 server, from the informatica
server 6.2 (set up in GEMSDW8) can read the files.
Caution:If you are manipulating the output flat files of informatica mappings or
data files, the scripts operating on these file should be on GEMSDW8P and data
files should be ftped to relevant directory in GEMSDW8P to make them available
to scripts.
13. MD120 is the baseline document for the BI Operations team from Migration
(within various environments dev/QA/Preprod/Prod) perspective. It should
be 100% accurate and provide all the information need for
migration.eg.replace/reuse instruction in case of Source/Target definitions,
mappletes, especially reusable components. Sequence generators current
values after moving into production. Lookup transformation name and its
location details (database user name@database name). Source and Target
details if database (db user name@database name for oracle/teradata)
along teradata mload or fastload information and also whether
insert/update/upsert or delete mode should be used for external loaders. If
the source is flat files, mention clearly paths of flat files and names or flat
files. Mention clear instruction of changing the paths for input files
mentioned indirect files used by Informatica session if required. In case of
change control meant for bug fixing, enhancement, mention the changes
done in mappings with mapping name and changes.
14. MD120 should have the details of database details of the environment (to
which the code is going to be migrated) eg.after M3, informatica code will be
migrated to test, the database details should be of test environment
specially teradata.)
15. Avoid calling large complicated stored procedures from informatica mapping
if possible to code in informatica mappings.
16. In case of lookups, if your lookup resides in source or target database,
instead of using source or target database odbc connection value for
lookups, use $Source or $Target, mention the connection details in
properties tab in session (if reusable otherwise in tasks).
17. Any loader connections or odbc connections should not be created by
program team in development repository, BI Operations team should
informed by entering SPR.
18. Performance tuning point of view, if the source qualifier is followed by
aggregator, the sorted input ports option should be utilized if feasible,
Lookups in source database if possible can be moved to source qualifier
having sql override based on performance of sql query override. Filters
GE Confidential
should be closer to source, if you are updating only or inserting only, use
option of treat rows as update/insert as required rather than using update
strategy and treating rows as data driven. If you are using variables in
expressions, the sequence of ports should be input, variable, then output
ports. If possible, use sql override in lookups to bring only required data
from database to cache using filter conditions.
19. If the source are files, the details of source files should be provide in MD120
like from which system it is going to come, what is the frequence, who is
contact person etc.
20. As decided, the migration from development to test, test to preprod is done
by changing the existing connections details to required environment
connection details.eg.INV_DEV_INS will be change to INV_TEST_INS and
internally it will be pointed to test environment connections.
21. No presession or post session command are allowed. That should be step in
cronacle job chain. It makes easier to debug the failures.
22. In installation instructions, the repository names and folders should be

clearly mentioned. If a particular object (mapping/workflow/session etc)
already exist then MD120 should clearly state that the backup of existing
object needs to be taken prior to migration.
23. Paths for Session log, bad files, workflow log file for a particular project
should be used in development as
This need to be changed for all workflows.
Reject File directory :-> $PMBadFileDir/<Application_folder>/

Workflow log directory :-> $PMWorkflowLogDir/ <Application_folder>/
Session Log Directory-> $PMSessionLogDir/ <Application_folder>/
The following relevant directory structure should be used if required.
Source File Directory :-> $PMSourceFileDir/ <Application_folder>/

Output and Merge File Directory :-> $PMTargetFileDir/ <Application_folder>/
Parameter file directory-> $PMSourceFileDir/ <Application_folder>/scripts/
User created Scripts directory:-> /ftp/scripts/ <Application_folder>/
24. DB link are not allowed to use in source qualifier sql override.
25. Tracing level for all the transformation should be NORMAL.
26. If source data is having Chinese /Japanese characters, the code page for the
oracle database connections should be UTF- 8
27. If the source data have Unicode character, to handle those Unicode
characters, it’s mandatory to mention to use UTF-8 code page for creation of
Teradata Relational Connections. If mload or fastload is being used, the
output file properties to be set to have UTF-8 code page.
28. The step of Generation of Reject records log and sending in email should be
added at the end of Cronacle Job chain.
29. The source and target definition changes should be done by importing the
same from database instead of making changes in definitions MANUALLY in
Informatica designer.
M4 1. MD120 should have database connection details of teradata preprod

environment.
2. The folder to be migrated from Americas Test Repository to Americas QA
GE Confidential
Repository should be clean i.e. should have only mapping, sessions and workflow
those need to be migrated.
Reference 1. http://gemsbidev.med.ge.com/opsmanual/
2. http://uswaubus02medge.med.ge.com/webhouse/reports/BI_Life_Cycle.htm
Informatica Standards
1. Naming Convention for Informatica Objects
Object Type Naming Convention

Folder XXX_<Data Mart Name>
Mapping m_XY_<Target Table Name>_<OPR>_V
Where XY = DL for ‘Daily Load’, ML for ‘Monthly Load’
<Target Table Name> =
For Teradata: The ETL View Name (for Dim & Fact) or Stage table
Name
For Oracle: The Target Table Name
OPR = INS (Insert), DEL (Delete), UPD (Update), UPS (Upsert)
V = Version No e.g. 1,2,3 etc. It should not consist of dot “.”
Session s_<Mapping Name>[ _optional session versions]

Workflow wkf_<Session Name>
Note: If any temporary workflow created for testing purpose, prefix the
workflow name with word “TEST”
Source Definition <Source Table Name>
Name
Target Definition <Target Table Name>
Name
Aggregator AGG_<Purpose>
Expression EXP_<Purpose>
Filter FLT_<Purpose>
Joiner JNR_<Names of Joined Tables>
Lookup LKP_<Lookup Table Name>
<Lookup Table Name>:
Name
Normalizer NRM_<Source Name>
Rank RNK_<Purpose>
Router RTR_<Purpose>
Sequence Generator SEQ_<Target Column Name>
Source Qualifier SQ_<Source Table Name>
Stored Procedure STP_<Database Name>_<Procedure Name>
Update Strategy UPD_<Target Table Name>_xxx
Mapplet MPP_<Purpose>
Input Transformation INP_<Description of Data being funneled in>
Output OUT_<Description of Data being funneled out>
GE Confidential
Transformation
Database XXX_<Database Name>_<Schema Name>
Connections
ODBC Connection <Schema_Name/ Data_Base_Name>
Name For Oracle use Schema Name e.g. INDLOAD
For Teradata use Data Base Name: e.g. SRC_ETL_TARGET
NOTE* Do not use any suffix or prefix to the ORDBC Connection name.
Eg. SRC_ETL_TARGET_1 or SRC_ETL_TARGET_test etc. There should be
only one ODBC Connection for one Database for same Project
2. Naming Convention for Ports
Port Type Naming Convention

Input Only I_<Field Name > (Port name should be Prefixed By “I”)
Output Only O_<Field Name> (Port name should be Prefixed By “O”)
Input – Out Put <Field Name>
Variable Port V_<Field Name> (Port name should be Prefixed By “V”)
ALL PORT Port Name Should be in Capitals
Mapping Parameter P_<Parameter Name> (Port name should be Prefixed By “P”)
3. Naming Convention for Parameters/ Paths
Path/ Parameter Naming Convention

Session Log <Session Name>.log
Workflow Log <Workflow Name>.log
Output File Name <Target Table Name>.out
<Target Table Name>:
Name
NOTE: File name should not be more than 30 Char in length

Parameter File Name src_did_<purpose>_param.txt
Note: A single parameter file for multiple sessions is preferred unless
there is some specific requirement.
Source File Directory $PMSourceFileDir/xxx/
Where xxx = <3 Letter Project Code in Lower Case>
Target File/ Merge $PMTargetFileDir/xxx/
Directory Where xxx = <3 Letter Project Code in Lower Case>
Session Log File $PMSessionLogDir/xxx/
Workflow Log File $PMWorkflowLogDir/xxx/
Parameter File /ftp/scripts/xxx/
Directory to keep /ftp/scripts/xxx/
User Created shell Where xxx = <3 Letter Project Code in Lower Case>
Script
4. Review & Migration Checklist:
GE Confidential
4.1 Naming convention for Informatica Objects (Refer Naming
Convention for Informatica Objects section in this Document).
4.2 Naming convention for Informatica Parameters / Paths. (Refer

Naming Convention for Parameters/ Paths section in this
Document).
4.3 Mapping Level Check List –
1) SQL Override: No SQL Override should be used in the Source Qualifier unless
used to join multiple tables. It should not be overwritten to pass any filter. In
such scenario just mention the filter in the filter section.
2) Hard Coded DB Name: There should not be any “hard coded” database name
in any sort of SQL Override. This is applicable to SQL Override used in the SQ
or Lookup
3) Filter Usage: Use Filter in the SQ if possible or as close as possible to SQ.
4) Update Strategy Transformation: While loading to a Teradata target Use
UPSERT /DELETE/UPDATE Logic in session level instead of UPDATE STRETEGY
Transformation.
5) Unconnected Ports: No Ports should connected forward if not used by any
Transformation.
6) Lookup Vs. Unconnected Lookup: If same technicality can be achieved by
both Connected and un-connected Lookup, connected lookup should be used.
7) Usage of Lookup: If it is avoidable using a lookup by putting it into SQ, it is
suggested to do so.
8) Caching: Caching should be enables in Lookup
9) Joiner: Joiner should not be used to join two table from same Database
10) Target Load Order: Multiple”Target Load Order” should not be used in a Single
Mapping
11) Filter Used for Testing: All the filters or customization done in the mapping for
the testing purpose has to be removed.
12) Sequence Generator: While migrating to QA/PREPROD (M3/M4 Review) Initial
value should be set 1.
4.4 Session Level Checklist –
4.4.1 Session Properties > General Tab:

1) Check the “ Fail parent if this task fails” check box.
2) Check the “Fail parent if this task does not run “ check box.
3) Put 1 or 2 line description for the session the description
4.4.2 Session Properties > Session Properties Tab:
4.4.2.1In General Options Sub Tab:

1) Session Log File Name should be as per standard
2) Session Log Directory Name should be as per standard
3) Parameter file name should be empty if no parameter is used
4) “Enable test Load” check box should be unchecked
5) $Source & $Target Connection Value should provided. Exception: the
source/target type is Flat File.
GE Confidential
(Note* ODBC connection Name should be mentioned in case of External
Loader Connection used)
6) Rest of the options should be as default unless other wise required.
4.4.2.2 Performance Sub Tab:

1) All the options should be as default unless other wise there is specific
requirement. Any changes to default option should be justified.
4.4.3 Session Properties > Config Object Tab:
4.4.3.1“Advance Sub” Tab:

4.4.3.2“Log Options” Sub Tab:

1) Set the “Save session logs for this session “ to 1
4.4.3.3“Error Handling” Sub Tab:

4.4.4 Session Properties > Sources Tab:
4.4.4.1“Connections” Sub Tab:

1) If it is required to transfer the source file from different server, do not use
Informatica FTP Connection. It should be done through a separate FTP Script
and Cronacle should call it.
4.4.5 Session Properties > Target Tab:
4.4.5.1“Connections” Sub Tab:

1) If it is required to transfer the target file to different server, do not use
Informatica FTP Connection. It should be done through a separate FTP Script
and Cronacle should call it.
2) Merge File Name & Directory should be empty if partitioning is not used.
3) Output File Path should be set as mentioned in the “Naming Convention for
Parameters/ Paths” section.
4.4.6 Session Properties > Components Tab:

1) Pre-Session Command, Post-Session Success Command, Post-Session failure
Command Should beset to “None”. If any such activity is required it should be
done through a Cronacle Step.
2) “On Success E-mail” and “On failure E-Mail” should be set to “None”.
Notification method should be set up through Cronacle Chain.
4.4.7 Session Properties > Transformation Tab:

1) Lookup Connection should be specified as indirect connection i.e. $Source or
$Target. Direct reference to Relational DB connection should not be specified.
2) SQL or Filter Condition should not be over written. It should be same as the
mapping. Expiation: Multiple Sessions are used for a single mapping and
different filter condition or SQL Over Rides are used in different Session. In
such scenario DB Name should not be hardcode.
GE Confidential
4.5 Checklist for MD120:
1) All the Mapping, Session, Workflow name, which need to be migrated, should
be mentioned Correctly
2) All the Source & Target Connection should be mentioned Correctly. In case the
source is flat file, the path and file name should be mentioned. During M3
Migration Mention the DB Connection or Path with respect to TEST/QA Server.
During M4 review it should be done for PREPROD Server.
3) Incase of loading to Teradata, Loader Type (MLOAD/FLOAD) and type of load
(Insert, Update, Delete, Upsert, and Trunc-Insert) should be clearly
mentioned for each session.
4) Provide all the Lookup Database Connection Information for each mapping.
5) If any of the CTL file has been created as READ-ONLY, please provide clear
instruction to migrate them into respective environment and keep them as
READ-ONLY. Also provide the CTL File name, current location and instruction
to change the Log On Information, Database Name & Source File Path for
CTL file. Keep a backup of these READ-ONLY CTL Files.
6) Provide a list of the Source/ Target tables expected to present before start
loading in the new environment
7) Provide a list of the Lookup tables expected to present before start loading in
the new environment
8) Provide the list of UNIX Scripts to be migrated, their location, and changes to
be made in the script during migration.
9) Provide instruction to set the Initial Value and Current Value to 1 for all
sequence generators.
Cronacle Standards and Check lists
Source Files
All source files need to be defined with their server locations, file names, time of
arrival, estimated size.
Time Windows
All scripts to be defined with their scheduled start and scheduled finish times.
Project
Each script should be part of one and only one project. The project name should be
defined using the following nomenclature. <Value
Chain>_<Module>_<Program>_<Project>. All the sub-parts should be in sync with
the End-to-End Inventory.
Notification
For each script, email ids need to be specified for people who will be notified in case
of failure, success and overdue.Contact Details will be used to contact only in case of
emergencies and for Error Mails. BI Mailing List and Func Mailing List will be used for
the purpose of sending delay mails only.
Dependencies
GE Confidential
In case any job is dependant on flag files or job chains, it needs to specified in the
dependencies and also mention the name of the job that creates the flag file
Checklists
Job chain
1. Job chain name should follow the standards

2. There should be one worksheet for each job chain with the job chain name as
the title of the worksheet.
3. All Contact Details are mandatory, Email / Distribution List as well as Tel #.
4. Login ID is mentioned.
5. Job chain has been uploaded at the path mention in CVS
6. Job chain has been completely tested.This included all scripts and
dependencies
7. All Input files are validated by Screen door.
8. Email List have been provided.
9. All fields under Scheduling information (marked in RED) are mandatory
10. All Fields are mandatory. Please mention 'NA' wherever not applicable.
Dependencies
1. All Events are mentioned and are part of the job chain script provided in
CVS
2. Are all source tables part of the dependent job chains
3. In case of File Events the location of the source as well as the contact
details are mandatory
4. What will be the corrective action in case the input data file doesn't arrive?
Scripts
1. If project specific scripts are used the description should be explain the
purpose of creating such a script as against the usage of the generic
script.
Time Window
1. Check if any of the source tables are being loaded in the time
frame when the job chain is expected to run.
Source / Lookup Tables / Database Objects / Flat Files :
Are all source tables, lookup tables, database objects and dependant flag files
mentioned.
GE Confidential

ETL Review

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ETL Review

Uploaded by

Copyright:

Available Formats

Documents needed for review process

1) Design documents (SRS, SRDS)

2) Retirement plan for existing objects

3) Source system related details

4) Impact analysis document for existing objects, if any

5) Extraction frequency (Daily, Weekly or Monthly)

6) Target system related details, if target is outside database

8) Install guide (MD120)

9) ETL Transformation specifications

10) Checking the naming conventions

11) Review of mappings & sessions (Technical Issues)

12) FMEA document

13) Cronacle documents and Chains

14) Checking the performance

16) Unit test plan

17) Migrating ETL objects into QA

The SRDS document should furnish details as:

The database capacity plan should be highlighted

Informatica Project Review (M1, M2-M3, M4)

MILESTONES CHECK POINTS

M1 1. SRS and SRDS documents should be correct from documentation

10. In loading strategy, Teradata mload or fastload should preferably be used

22. In installation instructions, the repository names and folders should be

This need to be changed for all workflows.

Reject File directory :-> $PMBadFileDir/<Application_folder>/

The following relevant directory structure should be used if required.

Source File Directory :-> $PMSourceFileDir/ <Application_folder>/

M4 1. MD120 should have database connection details of teradata preprod

1. Naming Convention for Informatica Objects

Object Type Naming Convention

Session s_<Mapping Name>[ _optional session versions]

2. Naming Convention for Ports

Port Type Naming Convention

3. Naming Convention for Parameters/ Paths

Path/ Parameter Naming Convention

NOTE: File name should not be more than 30 Char in length

4. Review & Migration Checklist:

4.2 Naming convention for Informatica Parameters / Paths. (Refer

4.3 Mapping Level Check List –

4.4 Session Level Checklist –

4.4.1 Session Properties > General Tab:

4.4.2 Session Properties > Session Properties Tab:

4.4.2.1In General Options Sub Tab:

4.4.2.2 Performance Sub Tab:

4.4.3 Session Properties > Config Object Tab:

4.4.3.1“Advance Sub” Tab:

4.4.3.2“Log Options” Sub Tab:

4.4.3.3“Error Handling” Sub Tab:

4.4.4 Session Properties > Sources Tab:

4.4.4.1“Connections” Sub Tab:

4.4.5 Session Properties > Target Tab:

4.4.5.1“Connections” Sub Tab:

4.4.6 Session Properties > Components Tab:

4.4.7 Session Properties > Transformation Tab:

Cronacle Standards and Check lists

1. Job chain name should follow the standards

Source / Lookup Tables / Database Objects / Flat Files :

You might also like