Professional Documents
Culture Documents
Specifications
<Month Year>
for <Client name> <Project Name>
Design Specifications
Notice
Page 3 of 25
Design Specifications
Client:
Project:
Document Name:
Page 4 of 25
Design Specifications
Table of Contents
1 Introduction............................................................................................................6
1.1 Objective and Scope.......................................................................................6
1.1.1 In Scope..................................................................................................6
1.1.2 Out of Scope...........................................................................................6
1.2 Reference........................................................................................................6
1.3 Structure of the document..............................................................................6
1.4 Acronyms.......................................................................................................6
2 ETL Architecture....................................................................................................7
2.1 Logical Architecture.......................................................................................7
2.1.1 Source System........................................................................................7
2.1.2 Staging Area...........................................................................................7
2.1.3 ODS........................................................................................................7
2.1.4 EDW.......................................................................................................7
2.1.5 DM.........................................................................................................7
2.2 Physical Architecture......................................................................................8
2.3 Sources and Targets System Details...............................................................9
2.3.1 Source System 1.....................................................................................9
2.3.2 Target System 1....................................................................................10
2.3.3 Volumetric............................................................................................10
2.4 Schedule and Frequency...............................................................................11
2.4.1 Execution Order...................................................................................11
2.4.2 Frequency.............................................................................................12
2.4.3 Scheduler Information..........................................................................12
2.5 Historical Data Loading...............................................................................12
2.6 ETL Control Schema Design.......................................................................12
2.6.1 Data Validation Rules...........................................................................12
2.6.2 Control Schema Tables.........................................................................12
2.6.3 Reconciliation.......................................................................................12
2.6.4 Exception Handling..............................................................................12
3 Release Management............................................................................................13
3.1 Configuration Management..........................................................................13
3.2 Security........................................................................................................13
3.3 Deployment Options....................................................................................13
4 Reusable Components..........................................................................................14
4.1 Transformations............................................................................................14
4.2 Mapplets.......................................................................................................14
4.3 User Defined Functions................................................................................15
4.4 Parameters and Variables.............................................................................15
4.4.1 Parameter File Format..........................................................................15
5 Testing Strategy....................................................................................................16
5.1 Testing Environments...................................................................................16
5.2 Unit Testing..................................................................................................17
5.2.1 Approach..............................................................................................17
5.2.2 Deliverables..........................................................................................17
5.3 System Integration Testing...........................................................................17
5.3.1 Approach..............................................................................................17
Page 5 of 25
Design Specifications
5.3.2 Deliverables..........................................................................................17
Appendix A: Traceability to SRS.................................................................................19
Appendix B: File Formats............................................................................................20
Appendix C: Standards and Best Practices..................................................................21
Appendix D: Mapping Inventory.................................................................................22
Page 6 of 25
Design Specifications
1 Introduction
1.1 Objective and Scope
<<Client name>> is deploying xxx product as <<an ETL Solution that’s part of a
BI initiative/a Data Migration Platform/a Data Consolidation Platform/a Data
Quality Platform/a Tool Migration Platform>>.
This document covers detailed design of the ETL layer that extracts data from
<sources> and loads transformed data to <target>.
1.1.1 In Scope
Source to target mapping flow
Data Cleansing Rules
Data Validation and Reconciliation Rules
Execution Order and Scheduling
Security Model
Release Management
Testing Strategy
1.2 Reference
Product Best Practices
Product Lessons Learnt
Project SRS document
Product Manuals
<Any other reference material>
1.4 Acronyms
Page 7 of 25
Design Specifications
2 ETL Architecture
This chapter details different components that make up the ETL environment.
Logical and physical deployment architectures, execution order (scheduling),
connectivity and details on sources and targets systems are elaborated in
following sub-sections.
Source System -> Staging Area -> ODS -> EDW -> DM
2.1.3 ODS
Holds clean and standardized transactional data
2.1.4 EDW
Holds facts-dimensions
2.1.5 DM
Holds subject area specific summaries/snapshots
Table below summarizes ETL details for various transformation phases. The details
filled in this table are sample information
Data
ETL Extraction Extract Target Load Frequency R
Source Type Retention
Stage Strategy Mechanism Type Mechanism of Loads S
period
Sourc
e to 1. SAP 1. Incremental 1. Push T
Oracle Direct 1 load cycle Daily
Stagin 2. Oracle 2. Incremental 2. Pull In
g
Stagin
1 fiscal In
g to Oracle Full extracts Pull Oracle Direct Daily
month U
ODS
ODS
Teradat History (no In
to Oracle Changed Data Pull Indirect Daily
a deletes) U
EDW
Extraction Strategy:
Page 8 of 25
Design Specifications
Full Extracts - All the data from the source is extracted for transformation and
loading
Incremental Extracts - Date stamps or flags available on the source tables are
utilized for this. Usually, data inserted/updated/deleted in a load cycle is
fetched and passed for further processing. A control schema can also be
utilized for identifying records from source system to be fetched for a run cycle
Changed Data - All the data that has changed in the source system will be
fetched. Products like PowerExchange can be utilized for capturing changed
data. Otherwise, complete compare of source tables with the target tables can
be done to identify changed data. Also, date stamps or flags as mentioned in
Incremental aggregation can be utilized
Extract Mechanism:
Push Mechanism - Source system provides data to the ETL engine in file
format. These could be flat files or normalized files that are ftped to a server
machine. E.g. Cobol files pushed via JCL’s
Pull Mechanism - ETL engine extracts data from source system via ODBC or
Native connectivity. Also, extraction via adapters for mainframe and
applications (such as SAP, PeopleSoft, etc) is also a Pull mechanism
Load Mechanism:
Direct - Data is directly loaded to target tables after transformations via native
or ODBC connectivity.
Indirect - Data is written to flat files or intermediate structures after
transformations. These are passed to Loader utilities to be committed on
target system
System Integration
Environment Development Pre-production Production
Test
Domain Name
Node Name
Operating System
No. of CPUs
Page 9 of 25
Design Specifications
32-bits/64-bits
environment
RAM (in GB)
Services Running on
the node
Backup Node
Repository Database
2.3 Sources and Targets System Details
This section provides information on target tables getting loaded from one or
more source tables.
Any custom table generated from ETL perspective for Error Handling,
Reconciliation and Reference data are also to be included in this section. All the
custom tables should ideally map only to the target system. These tables are
termed as Control tables and provide metadata on data errors and ETL Load
process.
The following table can be used as a reference for detailing Source System type
and Connectivity columns
The following table lists the source tables from which data will be extracted.
Page 10 of 25
Design Specifications
Page 11 of 25
Design Specifications
The following table lists the target tables to which data will be loaded as part of
the ETL program:
2.3.3 Volumetric
Sources
S No Load Type Source Row Size Number Estimated
table /File (in KB) of Rows Size in
MB/GB
Initial Load
Regular
/Scheduled
Load
Dimensions
S Load Type Target Number Row Number Number Number Estimated
No table of size SCD of of New of Volume in
columns In KB Type history Rows Updated MB/GB
rows Rows
Initial
Regular/
Scheduled
Facts
S Load Type Target Number Row Number Number Number Estimated
No table of size of of New of Volume in
columns In KB history Rows Updated MB/GB
rows Rows
Initial
Page 12 of 25
Design Specifications
Regular/
Scheduled
2.4 Schedule and Frequency
This section details the data flow from source to target system based on:
Execution order of various ETL routines
Dependencies between ETL routines
The following sample figure depicts the sub-processes executed as part of ETL_Process1
Table below gives the elaborate list of individual ETL processes and sub-process hierarchy
Page 13 of 25
Design Specifications
These are usually one time loads. Separate set of PowerCenter mappings can be created for
one time loading or mappings created for on-going load process can be utilized for this. Thus,
the strategy to be followed for loading Historical data will go in this sub section.
The validation checks could result in dropping the entire dataset or partial removal of source
data. In any scenario, information about errors is reported either by utilizing PowerCenter
Metadata repository or by creating a custom schema.
2.6.3 Reconciliation
Target data should always trace back to extracted + rejected data. A strategy for handling this
traceability is documented as part of this sub-section.
Page 14 of 25
Design Specifications
3 Release Management
3.1 Configuration
Management
Versioning of PowerCenter components can be
managed from within the product utilizing Team
Based Development. Also, the objects can be
exported as xml and held in version
management tools like Visual Source Safe,
PVCS, etc.
3.2 Security
This sub-section mentions:
1. Groups and user created in PowerCenter environment for performing Development,
Deployment and house keeping activities
2. User privileges on source and target systems
3. Constraints - Typically, for a target system, DELETE or TRUNCATE options from
within ETL routines are not enabled
Page 15 of 25
Design Specifications
4 Reusable Components
All common business rules should be out-lined in this sub-section. The details could be as
granular as mentioned in Detail Design document or could be at a very high level.
Considerable effort should be invested in identifying reusable components before Build phase
to avoid duplication of efforts and components and thus enabling faster build-test with better
performance. Also, this also reduces maintenance overheads.
4.1 Transformations
These could be tabulated as in following table
4.2 Mapplets
EXP_FIELD_CO MPO_ERRORS
NCATENATION _PROCESS_LO
G
Sample VISIO templates can be build for any reusable mapplet components as depicted in
process flow above.
Page 16 of 25
Design Specifications
Data
S.No Name Type Prec Scale Aggregation Example
Type
1. $$P_CUTOFF_DATE Parameter String 8 20060910
Page 17 of 25
Design Specifications
5 Testing Strategy
The primary purpose of testing is to verify that the system is developed according to the
design and specifications provided. In addition, it will ensure that the solution operates as
intended.
Development PowerCenter environment will be reused for Unit testing as well as System
Testing. Backup of production source database will be used as source for both Unit as well as
System testing. Two separate target instances will be created for Unit testing and system
testing.
Page 18 of 25
Design Specifications
5.2.1 Approach
1. Prepare
a. Unit Test Plans
b. Ensure data available in source is sufficient to execute all test cases
2. Review of Test Plans and test results internally
3. Perform Unit Testing
a. Perform Code Walkthrough to ensure coding standards and practices are
applied
b. Document Unit test results corresponding to Unit test cases
4. Close any defects raised during Unit Testing
5.2.2 Deliverables
1. Unit Test Plans
2. Unit Test Results
3. Issue tracker
4. Unit tested components
5.3.1 Approach
1. Prepare
a. Test Plans
b. Prepare Test Data – from existing production data
c. Operations Guide - for conducting verification testing
d. Test scripts
2. Review of Test plans by internal and client team
3. Setup System Test environment
a. Movement of PowerCenter components from local folders to Project specific
folders in Development environment
b. Setting up a new instance of Target schema
c. Replication of source data from other environments to test environment
4. Perform System Testing
a. Documentation of system test results corresponding to test cases
5. Incremental Regression Testing
a. System test cases will be reused
b. Regression test results corresponding to test cases will be documented
6. Close any defects raised during System Testing
5.3.2 Deliverables
Page 19 of 25
Design Specifications
Page 20 of 25
Design Specifications
Page 21 of 25
Design Specifications
Page 22 of 25
Design Specifications
Page 23 of 25
Design Specifications
Page 24 of 25