You are on page 1of 27

Data Modeling

Communication Pack

12th December 2020

© 2020 Petroliam Nasional Berhad (PETRONAS)


All rights reserved. No part of this document may be reproduced in any form possible, stored in a retrieval system, transmitted and/or disseminated in any form or by any means (digital, mechanical,
hard copy, recording or otherwise) without the permission of the copyright owner.

Internal
Content

1. What is Data Modeling?

2. Data Modeling Work Process

3. Types of Data Model

4. Data Modeler Requirements & Deliverables

© 2019 Petroliam Nasional Berhad


2 (PETRONAS) |

Internal
What is Data Modeling?

Internal
Data modeling is the process of creating a visual representation of either a whole informa
system or parts of it to communicate connections between data points and structures.

To illustrate…

The types of data used and stored within the system

The relationships among these data types

The ways the data can be grouped and organized and its formats and attributes.

© 2019 Petroliam Nasional Berhad


4 (PETRONAS) |

Internal
Data model is used to provide answers to business questions, hence data model design should prioritize
the business stakeholders’ perspective.

Business Perspective Technical Perspective

To express the business process or event, To ensure all data objects required by the database are
allowing business stakeholders to easily measure accurately represented. A data model helps to provides
it. The model design should enable business to a clear picture of database design at the conceptual,
see and measure their business from a different physical and logical levels.
perspective.

© 2019 Petroliam Nasional Berhad


5 (PETRONAS) |

Internal
Values of doing Data Modelling
Quality Assurance
Just as architects consider blueprints before constructing a building, you should consider data
before building a system.

Reduce redundancies
Overview relationships and redundancies, resolve discrepancies, and integrate disparate
systems so they can work together.

Better performance
A sound model simplifies database tuning. A well-constructed database typically runs fast,
often quicker than expected.

Understanding the Business


The data and relationships represented in a data model provide a foundation on which to build
an understanding of business processes.

© 2019 Petroliam Nasional Berhad


6 (PETRONAS) |

Internal
Data Modeling Work Process

Internal
Data Modeler provide definitions and meanings to a sets of data; like an interior designer who bring “life” and
function to an empty room.

Additional inputs to a set of data by data


Data Modeling Work Process: an analogy with Building Construction modeler.

Data normalisation

Schema selection

Data Model

© 2019 Petroliam Nasional Berhad (PETRONAS) | 8

Internal
Data modeling process is designed for clarity and portability to ensure the outcome can be effectively
communicated across all levels of the organization.

Conceptual

Understanding
Understanding Build
Business Requirement Logical
Data Data Model

Physical

The process itself inherently encourages discussion, collaboration, and careful consideration of the transformation
of business requirements into system solutions.

© 2020 Petroliam Nasional Berhad (PETRONAS) |

Internal
Data Modeler Requirements &
Deliverables

10

Internal
Data modeler require comprehensive inputs in order to produce an accurate data model.

Mandatory
• Business requirement and scope.

• Data source dictionary – describes the content, format and structure of the database.

• Business logic – translate into technical logic for physical implementation.

• Data flow diagram – visualization of business data flow.


Note: all above key info are documented in the DG Toolkit : Data Template

Optional (to help in understanding the requirement & data)


• Sample of data

• View access to source system(s)

• Business kpi (dashboards, reports, and input forms)


11

© 2020 Petroliam Nasional Berhad (PETRONAS) |

Internal
In addition to the Data Model diagram, data modeler also prepare other documents for governance
purposes and to assist Data Engineer in building the ETL and API.

Data model diagram Target data Source-to-Target Detailed design


dictionary mapping document
12

1 2 3 4

© 2020 Petroliam Nasional Berhad (PETRONAS) |

Internal
Data Modeler produces 3 type of data models that served different purposes prior to physical data
model finalisation.

Defines WHAT the system contains. Defines HOW the system should be Describes HOW the system will be
implemented regardless of the implemented using a specific DBMS
The purpose is to organize, scope and DBMS. system.
define business concepts and rules. It
should be focused on things related to The purpose is to define the structure The purpose is actual implementation
the business and its requirements. of data elements and to set of the database.
relationships between them. It adds It should be focused on how logical
further information to the conceptual data should be represented and stored
data model elements. in a particular physical database.

© 2020 Petroliam Nasional13


Berhad (PETRONAS) |

Internal
Data Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies..

It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables

Normalization is used for mainly two purposes:

1 2

Eliminating redundant(useless) data. Ensuring data dependencies make sense i.e data is
logically stored.

© 2020 Petroliam Nasional Berhad (PETRONAS) | 14

Internal
Removal of data Redundancy by Normalising - (1/5)

Normalisation Form

0 NF 1 NF 2 NF 3 NF BCNF 4 NF

Not normalised,
• It should be in 1 NF.
contains repeating
• It should not have
Attributes, Groups
Partial Dependency.
and etc.

• It should only have • It should be in 2 NF.


single (atomic) • It should not have
valued attributes. Transitive Dependency.
• All the columns in a
table should have
unique names.

© 2020 Petroliam Nasional Berhad (PETRONAS) |

Internal
Removal of data Redundancy by Normalising - (2/5)

0 NF

CustID CustName AccountManager ManagerRoom ContactName1 ContactName2


Database is
171 Siti Aminah 12 Lisa Ying Meng NOT Excel
190 Jamal Ahmad 15 Mike Johan

1 NF

CustID CustName AccountManager ManagerRoom ContactID ContactID ContactName Remove


171 Siti Aminah 12 1 Repeating
1 Lisa
Groups
171 Siti Aminah 12 2 2 Ying Meng

190 Jamal Ahmad 15 3 3 Mike

190 Jamal Ahmad 15 4 4 Johan

© 2020 Petroliam Nasional Berhad (PETRONAS) |

Internal
Removal of data Redundancy by Normalising - (3/5)

2 NF

CustID CustName AccountManager ManagerRoom CustID ContactID ContactID ContactName


171 Siti Aminah 12 171 1 1 Lisa

190 Jamal Ahmad 15 171 2 2 Ying Meng

190 3 3 Mike

Remove 190 4 4 Johan


Redundancy

© 2020 Petroliam Nasional Berhad (PETRONAS) |

Internal
Removal of data Redundancy by Normalising (example) - (4/5)

3 NF

CustID CustName AccountManagerID CustID ContactID ContactID ContactName


171 Siti 1 171 1 1 Lisa

190 Jamal 2 171 2 2 Ying Meng

190 3 3 Mike
Remove
Transitive 190 4 4 Johan
Dependency

AccountManagerID AccountManager ManagerRoom


1 Aminah 12

2 Ahmad 15

© 2020 Petroliam Nasional Berhad (PETRONAS) |

Internal
Removal of data Redundancy by Normalising (example) - (5/5)

Final Data Model

Customer
CustomerContact Contact
CustID – Primary Key
CustID – Foreign Key ContactID – Primary Key
CustName
ContactID – Foreign Key ContactName
AccountManagerID – Foreign Key

AccountManager
AccountManagerID – Primary Key

AccountManager

ManagerRoom

© 2020 Petroliam Nasional Berhad (PETRONAS) |

Internal
Successfulness of a Data Modeling are impacted by the quality of the ecosystem.

Business requirement Data Technology


• Without business context, data • The high-level concept of Tools or techniques used for data
modeling are meaningless. underlaying data from a business modeling, technical expert must be
stakeholder or domain expert before familiar with the data modeling
• Data modeling reflects designing data model for an technique, tools and all associated
business rules and processes, application. technologies that are currently
both business stakeholder and available to ensure what’s possible
technical expert must have • Deep dive to data definition such as for the data modeling.
same understanding of data type, data relationship and
business requirements prior to business rules governing individual
data modeling. column in a database.

• Data Modeling shouldn’t • Data quality and availability may


occur in isolation. impact the data modeling outcome.

© 2020 Petroliam Nasional Berhad (PETRONAS) |

Internal
Thank you for your passion!

Internal
Best Practice
• Ensure that you understand the level of detail (the grain) in each fact table, the dimensions indicate the level of
detail.

• Avoid including non-additive facts, which cannot be used in metric computations. For example, instead of storing
a ratio or percentage, store the facts that can be used to calculate the percentage or ratio, or instead of storing unit
prices, store the extended price (units * unit price).

• Avoid placing attributes in fact tables. Fact tables should contain facts and foreign keys to attributes stored in other
dimensions.

• Facts placed in the same fact table must be at the same level of detail (grain) and from the same business process.

• Avoid snowflake structures, which are normalized dimension tables. Dimension tables should not have other
dimensions are parents or children. You may consider denormalized dimension tables or separate the dimension
independently and connect them to a fact table.

© 2020 Petroliam Nasional Berhad (PETRONAS) |

2
Internal 2
Best Practice
• Do not connect fact tables directly to other fact tables. You should connect them through a common dimension.

• Create common dimensions that can be reused when you create additional fact tables. For example, you should
have only one dimension table for customer, one for product, one for an employee, and so on.

© 2020 Petroliam Nasional Berhad (PETRONAS) |

2
Internal 3
Design Guidance Principle – Conceptual Model
• Aiming to provide business context as to business understanding of data. Ensure business stakeholder can
understand the conceptual model.

• Start with a very high level data model, showing major entities and primary relationships only. No attributes
showing in the data model.

• Every major entity showing in the data model should have at least one relationship to another entity, relationships
appearing in the data model must clearly display.

• Every relationship has a cardinality in both direction.

• Every parent and child relationship has a cardinality of “1” on the parent end, and many “M” on the child end.

• Every supertype and subtype relationship has a cardinality of “1” on the supertype, and “1” on subtype.

• Non-specific relationship line is typically employed to model relationship entity, however it is allow to have more
refined level of details by using identifying or non-identifying relationships.

© 2020 Petroliam Nasional Berhad (PETRONAS) |

2
Internal 4
Design Guidance Principle – Logical Model
• Describe entities and attributes, and the relationships that bind them providing a clear representation of the
business purpose of the data.

• Entities and attribute are normalized in Third Normal Form (3NF). Hence, each entity has exactly one unique
record. All non-key attributes fully depend on primary key attributes, and no non-key attributes depend on any
other non-key attributes.

• Non-specific relationship line between entities will be replaced with identifying or non-identifying relationships.

• An associative entity inherits its primary key from two other entities having Many-to-Many relationships.

• Indicate cardinality relationships between two entities. This cardinality notation must convey one of the following
meanings:
• Many-to-Many cardinality (m:n)
• Many-to-One cardinality (m:1) 0..m:1
• One-to-Many cardinality (1:n) 1:0..m
• One-to-One cardinality (1:1)

© 2020 Petroliam Nasional Berhad (PETRONAS) |

2
Internal 5
Design Guidance Principle – Physical Model
Physical database design is the process of converting the detailed logical data design into a design that can be
interpreted by the database system.

• Permitted name lengths is depending on database, hence table and column names must be sized to fit the
requirements of the target database tool.

• The naming convention used must be meaningful and consistent throughout the design.

• Designate a unique primary key column for every table.

• Column name should contain all of the elements of the logical attribute from which it was derived, but should be
abbreviated to fit within the maximum length.

• Implement table and column names in a way that is supported by the database system.

• The physical model will come with a data dictionary e.g. data type, length, description, keys.

• A certain amount of denormalization is usually necessary when implementing the physical data model. De-
normalize only if you can demonstrate a performance gain. Losses in maintaining data integrity must be justified
by the performance gain.
© 2020 Petroliam Nasional Berhad (PETRONAS) |
26
Internal
Design Guidance Principle – Physical Model
• Define alternate keys that will enhance performance by supporting common search paths.

• Define security requirements for every attribute and plan for implementation of security policies.

• Detailed document the actual mechanics of converting a logical data model to a physical data model, a detailed
explanation of the migration between these two data models. Assumptions used during designing of the data model
should also be included.

• Consider performance improvement by using database feature such as indexing, caching, cluster indices and etc.

• Referential integrity rules must be defined as constraints on a named foreign key for updates or deletions. These
rules must reflect the business rules for the associated data.

© 2020 Petroliam Nasional Berhad (PETRONAS) |

2
Internal 7

You might also like