You are on page 1of 20

HCL OCBC DWBI Engagement

<Data Modeling & Data Mapping> Training

Oct 2010

Author name: Abhishek Shrivastava


Section 1 - Introduction and basic overview

What is a Data Model?


 A graphical representation of data elements of an organization or business.
 It identifies those things about which it is important to track information (entities), the details about those things
(attributes), and the relationship between those things (relationships).
 Entity- Relationship Diagram plus Entity and Attribute data definitions
 Subject-oriented, designed in Third Normal Form
 Facilitates communication between the business users and the IT analysts

Data Modeling Terminologies:


 Entity: A thing which is recognized as being capable of an independent existence and which can be uniquely
identified. Examples: a Customer, an employee, an Account, a campaign, Branch. EDW Example: Party,
Agreement, Organization, Individual.
 Weak Entity: An entity can be a weak entity, that cannot be uniquely identified by its attributes alone . Weak
entity can be an associative entity, or a sub type entity.
Examples Associative entity: Account Party, Party Address.
Examples of sub type entity: Organization, Individual.
 Relationship: A relationship captures how two or more entities are related to one another. Examples: ‘employee
of’ is a relationship between an organization and employee.
 Attributes: Attribute describes an entity or a relationship. Entities and relationships can both have attributes.
Examples: Party Start date, Party name, Account Number.
Section 2a – Data Modeling & Data Mapping guidelines Upstream

EDW Upstream Data Model – To EDW, Upstream is a combination of layers having processes, databases, jobs, and
de-coupling views through which extracted source system files are loaded from Extraction Layer to staging (or Loading)
layer and then further transformed via Transform Layer to Core EDW Layer.

Upstream Model standards:


 To be modeled and extended by keeping Teradata FSLDM as base.
 Tables to be in 3rd Normal Form as per Teradata FSLDM.
 Jobs to follow the Primary Key defined by OCBC Data Modeler.

Prerequisite:
 User requirement Or Functional Spec document, to determine the data elements to bring to EDW.
 Early data Inventory (EDI), to understand the source system which contains the required data elements.
 It has to explain the business attributes,
 primary keys of each tables,
 The relationship between the tables,
 If the source system has the files which relates to other system.
 Data Profiling documents to understand the data demographics and integrity of the new data elements
 Understanding of the existing EDW model Or Teradata FSLDM.
Section 2a – Data Modeling & Data Mapping guidelines Upstream Contd..

Upstream Logical Data Modeling Process: Once the data elements required to bring to EDW is finalized and the
EDI for the corresponding source system is available, then it can follow the below process to prepare the data model.
1. Identify the data entity: Based on the EDI and data profiling, identify the data entity corresponding to the data
element which has to bring to EDW. E.g. Customer, account, Branch, employee, transaction, campaign, product,
collateral etc..
2. Group the attributes related to each entity, which EDW is interested.
 Attributes describes data entities e.g.: Account (entity) : Status, account number, account open date ..etc
3. Refer the FSDLM guidelines to map the identified entities to OCBC FSLDM.
 As per FSLDM, Customer, Branch, Employee are Party
 Transaction is event
 Collateral is Party asset
4. Extend the OCBC FSLDM model if the required entity doesn’t fit the existing model subject to the approval of
OCBC Data Modeler.
Section 2a – Data Modeling & Data Mapping guidelines Upstream Contd..

Upstream Physical Data Model creation: Physical data models are derived from logical model entity definitions and
relationships and are constructed to ensure that data is stored and managed in physical structures that operate effectively
on the Teradata database platform. Below are the PDM creation guidelines:
 Primary Index: Target tables to follow the best candidate eligible for Primary Index (Unique or Non-Unique) knowing
the access path, distribution, join criteria etc.
Note: Primary Index and Primary key are different concept.
 Partitioned Primary Index: To enhance the performance of the big size tables, PPI can be implemented. Refer the
Teradata technical documentation for more details on PPI guidelines.
 Default Values: It can be used to default the values if no value passed to the table attribute
 ETL Control Frame work Attributes: All EDW target tables (except b-key, b-map and reference tables) will have
seven control frame work attributes.
1. Start_Dt
2. End_Dt
3. Data_Source_Cd
4. Record_Deleted_Flag
5. BusinessDate
6. Ins_Txf_BatchID
7. Upd_Txf_BatchID
 Data Types Assignation: Data Types should be assigned as part of physicalisation.
 Null / Not Null Handling: Nulls and Not-Nulls should be assigned for each attribute.
Section 2a – Data Modeling & Data Mapping guidelines Upstream Contd..

There are two Data Model deliverables in Upstream:


 T - EDW Platform Upstream Data Architecture: SDLC standard document to document the data model,
which describes all the business data entities, attributes and how the data elements modeled in OCBC EDW. It
also should include all the model design decisions and deviations if any. Data architecture document should have
the logical and physical model diagram with relationships and primary keys, primary indexes and other physical
design considerations.

 T - EDW Platform Upstream Source Target Mapping: Data architecture document will be used as the
input for the source to target mapping document. Source to Target Mapping document captures the detailed
business rules, how the source business attributes transformed to EDW data elements. It will be used as the
reference document for the further ETL development.
Section 2b – Data Modeling & Data Mapping guidelines Downstream

EDW Downstream Data Model: To EDW, Downstream is a combination of layers having process, databases, tables,
views and through which user requirements are modeled and met after applying the business and technical
transformation rules on data read via customized OCBC FSLDM.
Downstream Model standards:
 To be modelled to support user requirements and must be scalable.
 To be modelled to ensure that it’s easy to maintain and it’s tuned to meet the SLA based on best performance.
 To be modelled keeping in mind the enterprise level reporting (including geographies).
 To be modelled with various security attributes at most granular level.
 Tables must not be skewed.
 Jobs to follow the Primary Key defined by OCBC Data Modeler.
 Target tables to follow the best candidate eligible for Primary Index (Unique or Non-Unique) knowing the access path,
distribution, join criteria etc.

Prerequisite: User requirement or Functional Spec document, to determine the data elements to bring report to users which:
 should explain the business rules to derive the reporting data elements
 should specify the reporting level of each data element
 should have details of the reporting hierarchies
Section 2b – Data Modeling & Data Mapping guidelines Downstream Contd..

There are two Data Model deliverables in Downstream:


 T - EDW Platform Downstream Data Architecture: SDLC standard document to document the data
model, which describes all the business data entities, attributes and how the data elements modeled in OCBC EDW
downstream data mart to support the user reporting requirement. It also should include all the model design
decisions and deviations if any. Data architecture document should have the logical and physical model diagram
with relationships and primary keys, primary indexes and other physical design considerations.

 T - EDW Platform Downstream Source Target Mapping: Data architecture document will be used as
the input for the source to target mapping document. Source to Target Mapping document captures the detailed
business rules, how the source business attributes transformed to EDW data elements. It will be used as the
reference document for the further ETL development.
Section 3a – Naming standards and conventions

 Upstream Data Model: The naming conventions can be proposed by HCL Data Modeler to OCBC Data Modeler, in case of
FSLDM Customizations. The approval will be provided by OCBC Data Modeler.

 Downstream Data Model: The naming conventions shall be followed by HCL Data Modeler based on the below rules:

 Creation of Parent Database:

Type Standard Definition

Parent Database D<Env>_<Abbreviated Application This will hold all filtering and configuration for the associated
Name>DM Data Mart

 Example:

Type Data Mart For Example (For Production Environment)

Parent Database AMP – Customer Management Data Mart DP_CMDM

Parent Database Basel II – Risk Scoring DP_RSDM

Parent Database Retail Payment System DP_RPSDM


Section 3a – Naming standards and conventions Contd…

 Child Database: There will be some Childs of Parent database DP_CMDM based on different layers.

No Standard Definition

1 D<Env>_F<Abbreviated Application Name>DM This will hold all filtering and configuration for the associated Data Mart
*All in capital letter.
E.g. DP_FCMDM

2 D<Env>_B<Abbreviated Application Name>DM This database is the ‘base view’ layer from EDW into data mart. It
*All in capital letter. contains (1:1) views on top of VEDW, which make use the filter

E.g. DP_BCMDM mechanism provided in the associated filter Data Mart

3 D<Env>_W<Abbreviated Application Name>DM This will hold all interim / temporary tables, macros, Stored procedures
*All in capital letter. required to provide the functionality in DP_V<App Name>

E.g. DP_WCMDM

4 D<Env>_T<Abbreviated Application Name>DM This will hold all the ‘extra’ tables (e.g. materialized views) that are
*All in capital letter. required to be created in the Data Mart

E.g. DP_TCMDM

5 D<Env>_V<Abbreviated Application Name>DM This will hold all the ‘final’ views required by the down stream
*All in capital letter. application / user

E.g. DP_VCMDM
Section 3a – Naming standards and conventions Contd…

 Table name Convention: Table naming convention is as per below rule.

No Convention Description

1 <Object Type> Contains one/two character indicating the object type for the objects in working database only.
*All in capital letter. T: Table, V: View, SP: Procedures, XV: Transformation Views, M: Macros, etc.
E.g. T

2 <Abbreviated Application Name> Short name for application name e.g. CM (customer marketing)
*All in capital letter.
E.g. Retail Payment System (RPS)

3 <Abbreviated Product or Business or Rule1: Short name for product or business or table name eg. FITAS, OD, BAL_TYPE.
Table Name> Rule 2: Give preference to the kind of information it is going to keep and the terminology goes as
*All in capital letter. per Core EDW.
E.g. ACCOUNT_TYPE Rule 3: Take out the vowels i.e. a,e,i,o,u
ACCNT_TYP Rule 4: The length name should not be greater than 30 characters

<Object Type>_<Abbreviated Application Name>_<Abbreviated Product or Business or Table Name>


Example : T_RPS_ACCNT_TYP
Section 3a – Naming standards and conventions Contd…

Column name Convention: Table naming convention is as per below rule.


 Keep the column name same as of parent column in Core EDW.
 Take off the vowels i.e. a, e, i, o and u. Please do not remove the vowel if it is first letter of the word for example, “A” in
Amount. If the column name is still longer than the 30 characters please contact the architecture or the standard and
guidelines team.
 Keep the column name in mixed case (first letter in capital). For example, Data_Source_Cd instead of
DATA_SOURCE_CD or Data_SOURCE_Cd etc.
 The control columns should be brought as it is from Core EDW. And the above rules does not apply on them.

 Example:
 Like measure Count of Transaction is not part of Core EDW then
 Column Name
 Count_Of_Transactions / Transaction_Count
 Take off the vowels i.e. a, e, i, o and u.
 Column Name post this rule
 Cnt_Of_Trnsctn / Trnsctn_Cnt
 Keep the column name in mixed case (first letter in capital).
 Column Name post this rule
 Cnt_Of_Trnsctn / Trnsctn_Cnt
 The control columns should be brought as it is from Core EDW. And the above rules does not apply on them.
 Column Name post this rule
 The above column is not a control column.
 Control column like Data_Source_Cd should be brought as it is.
Section 3b – Basic S/W Setup and Login

 Currently OCBC is using Erwin Version 4.1. Every HCL Data Modeler need to get the Erwin CD from OCBC Data Modeler
to install the software on their respective machines.
 For installation, one need to raise a Temp Admin ticket and get it approved by EIS Head. Then the Erwin software for data
modeling can be installed.
 The instructions need to be followed on the screen and lastly ‘Key in’ the Erwin Key details mentioned in the CD.
 For further clarifications, please contact HCL Data Modeler.
Section 4a – Tips for Developers

 Request developers not to create any tables in the DD environment for development. Ask the HCL data modeler to create
through Erwin then he/she will generate the DDLs through Erwin and get it deployed in DD environment through DBA /
OCBC Data Modeler.
Section 4b – Best Practices

Upstream Mapping Naming Structure: The document name must reflect the followings:
 Project Name or ITSR# or ITSC# such as ALMS or ITSR 7338 or ITSC 38755
 Source Application Name such as SG_CIF
 Version Number or CR Release such as ver 1.0 CR 14 or Jul10
Example for mapping doc would be:
ALMS SG_CIF Upstream Mapping Ver 1.0

Version Number Control:


• During mapping phase, we start draft copy from version number 0.1
• Incremented by 0.1 when change to mapping document
• Version number of Sign off copy will be Ver 1.0
• During SIT/UAT phase, any changes to mapping document will be ver 1.1 and incremented by 0.1
• After UAT, sign off copy will be Ver 2.0
Ver 1.0 Sign Off - signatures of all stakeholders
Ver 2.0 UAT Sign off – signatures of IT PM & Data Mapper
Hence, in BADM Source to Target Mapping folder or Dimension, we should only keep the following 2 copies of mapping
document for each application delivered by each project:
Ver 1.0 – Sign off during Mapping phase
Ver 2.0 – Sign off copy after UAT
Section 4b – Best Practices Contd…

Downstream Mapping Naming Structure: The document name must reflect the followings:
 Project Name or ITSR# or ITSC # such as ALM or ITSR 7338 or ITSC 38755
 Downstream Data Mart Name such as CSDM
 Version Number or CR Release such as Ver 1.0 CR 14
Example for mapping doc would be:
ALMS CSDM Downstream Mapping Ver 1.0

Version Number Control:


• During mapping phase, we start draft copy from version number 0.1
• Incremented by 0.1 when change to mapping document
• Version number of Sign off copy will be Ver 1.0
• During SIT/UAT phase, any changes to mapping document will be ver 1.1 and incremented by 0.1
• After UAT, sign off copy will be Ver 2.0
Ver 1.0 Sign Off - signatures of all stakeholders
Ver 2.0 UAT Sign off – signatures of IT PM & Data Mapper
Hence, in BADM Source to Target Mapping folder or Dimension, we should only keep the following 2 copies of mapping
document for each application delivered by each project:
Ver 1.0 – Sign off during Mapping phase
Ver 2.0 – Sign off copy after UAT
Section 4b – Best Practices Contd…

Data Model Creation:


Below are the key steps in the data modeling

 Identify entities
 Identify attributes of entities
 Identify relationships
 Apply naming conventions
 Assign keys and indexes
 Normalize to reduce data redundancy
 Denormalize to improve performance
Section 5 – Check list / Self Review Process

Data Mapping:
 Please start the self review after the completion of data mapping document. Each and every column in the doc should
provide the useful and meaningful information. Provide the information in the transformation rules as expected by the
OCBC reviewer.
 Once the self review is done, suggest to have an internal review by HCL Data Mapper before proceed to the OCBC
BADM team review to avoid rework / escalations.

Data Modeling:
 Check closely if all the names having correct naming conventions.
 Create the domains to provide the flexibility of changing a data type
 Create the indexes as per the OCBC standards.
 Assign all the data types to all the columns.
 Provide the table and column definitions for all the tables.
 Assign Null & Not null to all the columns.
 Create the default values, if any.
 Attach the abbreviation file with all the meaningful abbreviated tables and columns.
 Generate the DDLs from Erwin and add the database name for the datamart, send it to OCBC data modeler for review.
Once he/she approves will ask DBA to deploy it in the respective environment.
 Make the data model presentable to the client / business users by coloring the entities and the relationships / columns.
Section 6 – DOs and DON’Ts

Please do a self review after the completion of Data Model & Data Mapping. All the standards should be meet as per the client
expectations.
 For Data Modeling check all the data types, null, not null fields, indexes, generate the DDLs and pass it to the OCBC DM
get the approval and he/she will pass it to DBA to create the table structures.
 For Data Mapping, all the columns should be filled with proper formatting, coloring, if any, font, size, uppercase, lowercase,
etc.
 Every page of mapping document should be updated by the mapping name.
 Extraction criteria should be clearly mentioned.
THANK YOU

You might also like