Professional Documents
Culture Documents
Communication Pack
Internal
Content
Internal
What is Data Modeling?
Internal
Data modeling is the process of creating a visual representation of either a whole informa
system or parts of it to communicate connections between data points and structures.
To illustrate…
The ways the data can be grouped and organized and its formats and attributes.
Internal
Data model is used to provide answers to business questions, hence data model design should prioritize
the business stakeholders’ perspective.
To express the business process or event, To ensure all data objects required by the database are
allowing business stakeholders to easily measure accurately represented. A data model helps to provides
it. The model design should enable business to a clear picture of database design at the conceptual,
see and measure their business from a different physical and logical levels.
perspective.
Internal
Values of doing Data Modelling
Quality Assurance
Just as architects consider blueprints before constructing a building, you should consider data
before building a system.
Reduce redundancies
Overview relationships and redundancies, resolve discrepancies, and integrate disparate
systems so they can work together.
Better performance
A sound model simplifies database tuning. A well-constructed database typically runs fast,
often quicker than expected.
Internal
Data Modeling Work Process
Internal
Data Modeler provide definitions and meanings to a sets of data; like an interior designer who bring “life” and
function to an empty room.
Data normalisation
Schema selection
Data Model
Internal
Data modeling process is designed for clarity and portability to ensure the outcome can be effectively
communicated across all levels of the organization.
Conceptual
Understanding
Understanding Build
Business Requirement Logical
Data Data Model
Physical
The process itself inherently encourages discussion, collaboration, and careful consideration of the transformation
of business requirements into system solutions.
Internal
Data Modeler Requirements &
Deliverables
10
Internal
Data modeler require comprehensive inputs in order to produce an accurate data model.
Mandatory
• Business requirement and scope.
• Data source dictionary – describes the content, format and structure of the database.
Internal
In addition to the Data Model diagram, data modeler also prepare other documents for governance
purposes and to assist Data Engineer in building the ETL and API.
1 2 3 4
Internal
Data Modeler produces 3 type of data models that served different purposes prior to physical data
model finalisation.
Defines WHAT the system contains. Defines HOW the system should be Describes HOW the system will be
implemented regardless of the implemented using a specific DBMS
The purpose is to organize, scope and DBMS. system.
define business concepts and rules. It
should be focused on things related to The purpose is to define the structure The purpose is actual implementation
the business and its requirements. of data elements and to set of the database.
relationships between them. It adds It should be focused on how logical
further information to the conceptual data should be represented and stored
data model elements. in a particular physical database.
Internal
Data Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies..
It is a multi-step process that puts data into tabular form, removing duplicated data from the relation tables
1 2
Eliminating redundant(useless) data. Ensuring data dependencies make sense i.e data is
logically stored.
Internal
Removal of data Redundancy by Normalising - (1/5)
Normalisation Form
0 NF 1 NF 2 NF 3 NF BCNF 4 NF
Not normalised,
• It should be in 1 NF.
contains repeating
• It should not have
Attributes, Groups
Partial Dependency.
and etc.
Internal
Removal of data Redundancy by Normalising - (2/5)
0 NF
1 NF
Internal
Removal of data Redundancy by Normalising - (3/5)
2 NF
190 3 3 Mike
Internal
Removal of data Redundancy by Normalising (example) - (4/5)
3 NF
190 3 3 Mike
Remove
Transitive 190 4 4 Johan
Dependency
2 Ahmad 15
Internal
Removal of data Redundancy by Normalising (example) - (5/5)
Customer
CustomerContact Contact
CustID – Primary Key
CustID – Foreign Key ContactID – Primary Key
CustName
ContactID – Foreign Key ContactName
AccountManagerID – Foreign Key
AccountManager
AccountManagerID – Primary Key
AccountManager
ManagerRoom
Internal
Successfulness of a Data Modeling are impacted by the quality of the ecosystem.
Internal
Thank you for your passion!
Internal
Best Practice
• Ensure that you understand the level of detail (the grain) in each fact table, the dimensions indicate the level of
detail.
• Avoid including non-additive facts, which cannot be used in metric computations. For example, instead of storing
a ratio or percentage, store the facts that can be used to calculate the percentage or ratio, or instead of storing unit
prices, store the extended price (units * unit price).
• Avoid placing attributes in fact tables. Fact tables should contain facts and foreign keys to attributes stored in other
dimensions.
• Facts placed in the same fact table must be at the same level of detail (grain) and from the same business process.
• Avoid snowflake structures, which are normalized dimension tables. Dimension tables should not have other
dimensions are parents or children. You may consider denormalized dimension tables or separate the dimension
independently and connect them to a fact table.
2
Internal 2
Best Practice
• Do not connect fact tables directly to other fact tables. You should connect them through a common dimension.
• Create common dimensions that can be reused when you create additional fact tables. For example, you should
have only one dimension table for customer, one for product, one for an employee, and so on.
2
Internal 3
Design Guidance Principle – Conceptual Model
• Aiming to provide business context as to business understanding of data. Ensure business stakeholder can
understand the conceptual model.
• Start with a very high level data model, showing major entities and primary relationships only. No attributes
showing in the data model.
• Every major entity showing in the data model should have at least one relationship to another entity, relationships
appearing in the data model must clearly display.
• Every parent and child relationship has a cardinality of “1” on the parent end, and many “M” on the child end.
• Every supertype and subtype relationship has a cardinality of “1” on the supertype, and “1” on subtype.
• Non-specific relationship line is typically employed to model relationship entity, however it is allow to have more
refined level of details by using identifying or non-identifying relationships.
2
Internal 4
Design Guidance Principle – Logical Model
• Describe entities and attributes, and the relationships that bind them providing a clear representation of the
business purpose of the data.
• Entities and attribute are normalized in Third Normal Form (3NF). Hence, each entity has exactly one unique
record. All non-key attributes fully depend on primary key attributes, and no non-key attributes depend on any
other non-key attributes.
• Non-specific relationship line between entities will be replaced with identifying or non-identifying relationships.
• An associative entity inherits its primary key from two other entities having Many-to-Many relationships.
• Indicate cardinality relationships between two entities. This cardinality notation must convey one of the following
meanings:
• Many-to-Many cardinality (m:n)
• Many-to-One cardinality (m:1) 0..m:1
• One-to-Many cardinality (1:n) 1:0..m
• One-to-One cardinality (1:1)
2
Internal 5
Design Guidance Principle – Physical Model
Physical database design is the process of converting the detailed logical data design into a design that can be
interpreted by the database system.
• Permitted name lengths is depending on database, hence table and column names must be sized to fit the
requirements of the target database tool.
• The naming convention used must be meaningful and consistent throughout the design.
• Column name should contain all of the elements of the logical attribute from which it was derived, but should be
abbreviated to fit within the maximum length.
• Implement table and column names in a way that is supported by the database system.
• The physical model will come with a data dictionary e.g. data type, length, description, keys.
• A certain amount of denormalization is usually necessary when implementing the physical data model. De-
normalize only if you can demonstrate a performance gain. Losses in maintaining data integrity must be justified
by the performance gain.
© 2020 Petroliam Nasional Berhad (PETRONAS) |
26
Internal
Design Guidance Principle – Physical Model
• Define alternate keys that will enhance performance by supporting common search paths.
• Define security requirements for every attribute and plan for implementation of security policies.
• Detailed document the actual mechanics of converting a logical data model to a physical data model, a detailed
explanation of the migration between these two data models. Assumptions used during designing of the data model
should also be included.
• Consider performance improvement by using database feature such as indexing, caching, cluster indices and etc.
• Referential integrity rules must be defined as constraints on a named foreign key for updates or deletions. These
rules must reflect the business rules for the associated data.
2
Internal 7