You are on page 1of 56

Data Modeling and Design

Data Data Modeling


Architecture & Design

Data Quality Data Storage


& Operations

Data Data
Metadata
Governance Security

Data Warehousing Data Integration &


& Business Interoperability
Intelligence
Reference Document
& Master & Content
Data Management

DAMA-DMBOK2 Data Management Framework


Copyright © 2017 by DAMA International

DATA MODELLING AND DESIGN


1. Introduction

D
ata modeling is the process of discovering, analyzing, and scoping data requirements, and then
representing and communicating these data requirements in a precise form called the data model.
Data modeling is a critical component of data management. The modeling process requires that
organizations discover and document how their data fits together. The modeling process itself designs how data
fits together (Simsion, 2013). Data models depict and enable an organization to understand its data assets.
Objectives

Data modelling and design in DMBOK


Main stages of database system development
lifecycle.
Main phases of database design: conceptual,
logical, and physical design.

2
© Pearson Education Limited
DATA DEVELOPMENT 2013
[2016] 2
3
Introduction
§ Data models depict and enable an organization to
understand its data assets.
§ Schemes: Relational, Dimensional, Object-Oriented,
Fact-Based, Time-Based, and NoSQL.
§ Each model contains a set of components: entities,
relationships, facts, keys, and attributes.
§ Data models are critical to effective management of
data:
§ Provide a common vocabulary around data
§ Capture and document explicit knowledge about an
organization’s data and systems
§ Serve as a primary communications tool during projects
§ Provide the starting point for customization, integration, or
even replacement of an application
4
Data Model : Goals
Confirming and documenting understanding of
different perspectives facilitates:
q Formalization: A data model documents a concise
definition of data structures and relationships.
q Scope definition: A data model can help explain the
boundaries for data context and implementation of
purchased application packages, projects, initiatives,
or existing systems.
q Knowledge retention/documentation: A data model
can preserve corporate memory regarding a system
or project by capturing knowledge in an explicit
form.

5
1.3.4 Data Modeling Schemes

The six most common schemes used to represent data are: Relational, Dimensional, Object-Oriented, Fa
Based, Time-Based, and NoSQL. Each scheme uses specific diagramming notations (see Table 9).
Data Modelling Schemes
Table 9 Modeling Schemes and Notations

Scheme Sample Notations

Relational Information Engineering (IE)


Integration Definition for Information Modeling (IDEF1X)
Barker Notation
Chen
Dimensional Dimensional
Object-Oriented Unified Modeling Language (UML)
Fact-Based Object Role Modeling (ORM or ORM2)
Fully Communication Oriented Modeling (FCO-IM)
Time-Based Data Vault
Anchor Modeling
NoSQL Document
Column
Graph
Key-Value

This section will briefly explain each of these schemes and notations. The use of schemes depends in part on
database being built, as some are suited to particular technologies,
DATA DEVELOPMENT [2016] as shown in Table 10. 6
3.4.1 Relational

Relational Modeling
rst articulated by Dr. Edward Codd in 1970, relational theory provides a systematic way to organize data s
at they reflected their meaning (Codd, 1970). This approach had the additional effect of reducing redundanc
data storage. Codd’s insight was that data could most effectively be managed in terms of two-dimension
v Dr. Edward Codd in 1970, relational theory
lations. The term relation was derived from the mathematics (set theory) upon which his approach was base
ee Chapter 6.)provides a systematic way to organize data

so that
he design objectives for the they
relationalreflected their
model are to have meaning
an exact expression of business data and to have on
ct in one v Information
place Engineering
(the removal of redundancy). (IE) is ideal for the design of operation
Relational modeling
stems, which require entering information quickly and having it stored accurately (Hay, 2011).
v Integration Definition for Information
here are severalModeling
different kinds of(IDEF1X)
notation to express the association between entities in relational modelin
cluding Information Engineering (IE), Integration Definition for Information Modeling (IDEF1X), Bark
vChen
otation, and Barker
Notation.Notation
The most common form is IE syntax, with its familiar tridents or ‘crow’s feet’
v Chen
epict cardinality. Notation
(See Figure 39.)

Student Attend
Course

gure 39 IE Notation

DATA DEVELOPMENT [2016] 7


1.3.4.2 Dimensional

The concept of dimensional modeling started from a joint research project conducted by General Mill
Dartmouth College in the 1960’s. 33 In dimensional models, data is structured to optimize the query and an
of large amounts of data. In contrast, operational systems that support transaction processing are optimiz

Dimensional Modeling fast processing of individual transactions.

Dimensional data models capture business questions focused on a particular business process. The pr
being measured on the dimensional model in Figure 40 is Admissions. Admissions can be viewed by the
v Data is structured to optimize the query and
the student is from, School Name, Semester, and whether the student is receiving financial aid. Navigatio
be made from Zone up to Region and Country, from Semester up to Year, and from School Name up to S

analysis of large amounts of data


Level.

Geography
v Fact table Country

v the rows of a fact table Region

correspond to particular Zone

measurements and are


numeric Calendar

Year Semester
Admissions

Name Level
School

v Fact tables take up the most


space in the database (app. Yes/No
90%)
v Dimension Financial Aid

v represent the important objects of the business and


Figure 40 Axis Notation for Dimensional Models

The diagramming notation used to build this model – the ‘axis notation’ – can be a very eff
contain mostly textual descriptions
communication tool with those who prefer not to read traditional data modeling syntax.

Both the relational and dimensional conceptual data models can be based on the same business process
DATA DEVELOPMENT [2016] 8
this example with Admissions). The difference is in the meaning of the relationships, where on the rela
1.3.4.3 Object-Oriented (UML)
Object Oriented Modeling (UML)
The Unified Modeling Language (UML) is a graphical language for modeling software. The UML has a vari
of notations of which one (the class model) concerns databases. The UML class model specifies classes (ent
types) and their relationship types (Blaha, 2013).

Class Name Student


Attributes Stdntno : integer
Strtdt: date
Prgm: text
Operations ExpctGraddt: date
ActlGraddt: date

Figure 41 UML Class Model

Figure 41 illustrates the characteristics of a UML Class Model:

A Class diagram resembles an ER diagram except that the Operations or Methods section is not pres
in ER.
In ER, the closest equivalent to Operations would be Stored Procedures.
Attribute types (e.g., Date, Minutes) are expressed in the implementable application code language a
not in the physical database implementable terminology.
DATA DEVELOPMENT [2016] 9
1.3.4.4 Fact-Based Modeling (FBM)

Fact-Based Modeling, a family of conceptual modeling languages, originated in the late 1970s. These languages

Fact Based Modeling


are based in the analysis of natural verbalization (plausible sentences) that might occur in the business domain.
Fact-based languages view the world in terms of objects, the facts that relate or characterize those objects, and
each role that each object plays in each fact. An extensive and powerful constraint system relies on fluent
automatic verbalization and automatic checking against the concrete examples. Fact-based models do not use
v View the world in terms of objects, the facts that relate or
attributes, reducing the need for intuitive or expert judgment by expressing the exact relationships between

characterize those objects, and each role that each object plays in
objects (both entities and values). The most widely used of the FBM variants is Object Role Modeling (ORM),
which was formalized as a first-order logic by Terry Halpin in 1989.
each fact.
v Do not use attributes, reducing
1.3.4.4.1 Object Role Modeling (ORM or ORM2)
142 DMBOK2
the need for intuitive or expert
judgment by expressing the exact relationships between objects
required information (both
or queriesentities
presented in anyand
externalvalues)
formulation familiar to users, and then verbalizes
these examples at the conceptual level, in terms of simple facts expressed in a controlled natural language. This
v ORM
language is a restricted version of–natural
Object Role
language that Modeling
is unambiguous,
1.3.4.4.2 Fully Communication Oriented Modeling (FCO-IM)
so the semantics are readily grasped by
humans; it is also formal, so it can be used to automatically map FCO-IM the structures to inlower
is similar levels
notation and for
approach to ORM. The numbers in Figure 43 are references to verba
implementation Fully
v(Halpin, 2015). Communication Oriented Modeling (FCO-IM)
of facts. For example, 2 might refer to several verbalizations including “Student 1234 has first name Bil

Figure 42 illustrates an ORM model. Semester

Semester 1

Student Course
… in … enrolled in … Student 4 5 6 Course
Figure 42 ORM Model Attendance
2 3

Figure 43 FCO-IM Model


DATA DEVELOPMENT [2016] 10
DATA MODELING AND DESIGN 143

Time-Based
and knots. Anchors model entities and events, attributes model properties of anchors, ties model the

v Are used when data values must be associated in chronological


relationships between anchors, and knots are used to model shared properties, such as states.

order and with specific time values Student Attendance Course

v The Data Vault is a detail-oriented, time-based, and uniquely


linked set of normalized tables that support one or more
functional areas of business Student
Contact
Student
Characteristics
Course
Description

v Hub, link, satellite Figure 44 Data Vault Model

v Anchor Modeling DATA MODELING AND DESIGN 143

v Anchors: entities and events, attributes, ties: the relationships


1.3.4.5.2 Anchor Modeling

On the anchor model in Figure 45, Student, Course, and Attendance are anchors, the gray diamonds represent
Anchors model entities andbetween
events, attributesanchors,
model propertiesand knots:
of anchors, ties, andshared
ties model the properties,
the circles represent attributes. such as states
ps between anchors, and knots are used to model shared properties, such as states.

Student Attendance Course

Student Student Course


Contact Characteristics Description

ata Vault Model

Figure 45 Anchor Model


DATA DEVELOPMENT [2016] 11
chor Modeling
NoSQL
v Document
v For example, instead of storing Student, Course, and
Registration information in three distinct relational
structures, properties from all three will exist in a single
document called Registration
v Key-value
v store its data in only two columns (‘key’ and ‘value’)
v Column-oriented
v Closed to RDBMS with more complex data types including
unformatted text and imagery
v Graph
v relations are well represented as a set of nodes with an
undetermined number of connections between these nodes

DATA DEVELOPMENT [2016] 12


Schema
v Conceptual: This embodies the ‘real world’ view of
the enterprise being modeled in the database. It
represents the current ‘best model’ or ‘way of doing
business’ for the enterprise.
v External: The various users of the database
management system operate on sub-sets of the total
enterprise model that are relevant to their particular
needs.
v Internal: The ‘machine view’ of the data is described
by the internal schema. This schema describes the
stored representation of the enterprise’s information

DATA DEVELOPMENT [2016] 13


Database Development
Life Cycle

DATA MODELLING AND DESIGN 14


Stages of the Database System
Development Lifecycle

Information
product

DATA MODELLING AND DESIGN 15


Database Planning
v Management activities that allow stages of database
system development lifecycle to be realized as
efficiently and effectively as possible.
v Must be integrated with overall IS strategy of the organization
◦ Identification of enterprise plans and goal
◦ Evaluation of current IS to determine
◦ Appraisal of IT opportunities
v Database planning should also include development of standards
that govern:
◦how data will be collected,
◦how the format should be specified,
◦what necessary documentation will be needed,
◦how design and implementation should proceed.

DATA MODELLING AND DESIGN 16


System Definition
§ Describes scope and boundaries of database system and the
major user views.
§ User view defines what is required of a database system from
perspective of:
§a particular job role (such as Manager or Supervisor) or
§enterprise application area (such as marketing, personnel, or stock
control).
§ Identifying user views helps ensure that no major users of the
database are forgotten when developing requirements for new
system.
§ User views also help in development of complex database
system allowing requirements to be broken down into
manageable pieces.

DATA MODELLING AND DESIGN 17


Representation of a Database System with Multiple
User Views

DATA DEVELOPMENT [2016] 18


Example: System Boundaries for Dream Home
Database System

This will be discussed more in


Information System Design Course

DATA MODELING AND DESIGN 19


Major User Views for Dream Home Database System

DATA MODELING AND DESIGN 20


Requirements Collection and Analysis

vProcess of collecting and analyzing information about the part


of organization to be supported by the database system, and
using this information to identify users’ requirements of new
system.
vInformation is gathered for each major user view including:
v a description of data used or generated;
v details of how data is to be used/generated;
v any additional requirements for new database system.
vInformation is analyzed to identify requirements to be included
in new database system. Described in the requirements
specification.
vThe collected requirements need to be structured by using Data
Flow Diagram (DFD), UML, etc.

DATA MODELING AND DESIGN 21


Requirements Collection and Analysis
vAnother important activity is deciding
how to manage the requirements for a
database system with multiple user
views.
vExpress requirement specifications in
both words and diagrams.
vThree main approaches:
vcentralized approach;
vview integration approach;
vcombination of both approaches.
DATA MODELING AND DESIGN 22
Requirements Collection and Analysis

Centralized approach
vCollect requirements from each user view
vRequirements for each user view are merged
into a single set of requirements.
vA data model is created representing all user
views during the database design stage.

DATA DEVELOPMENT [2016] 23


Centralized Approach to Managing Multiple User
Views

DATA MODELING AND DESIGN 24


Requirements Collection and Analysis
View integration approach
v Requirements for each user view remain as separate
lists.
v Data models representing each user view are created
v Data model representing single user view (or a subset
of all user views) is called a local data model.
v Each model includes diagrams and documentation
describing requirements for one or more but not all
user views of database.
v Local data models are then merged at a later stage
during database design to produce a global data model,
which represents all user views for the database.

DATA MODELLING AND DESIGN 25


View Integration Approach to Managing Multiple User Views

© Pearson Education Limited


2010
DATA DEVELOPMENT [2016] 26
Database Design
vProcess of creating a design for a database that
will support the enterprise’s mission statement
and mission objectives for the required database
system.
vA model is a representation of something in
environment
vData model is the integrated collection of
specifications and related diagrams that represent
data requirements and designs
vBuilding data model requires answering
questions about entities, relationships, and
attributes.
DATA MODELING AND DESIGN 27
Database Design

Three phases of database design:


◦Conceptual database design
◦Logical database design
◦Physical database design.

DATA DEVELOPMENT [2016] 28


Conceptual Database Design

vProcess of constructing a model of the data used


in an enterprise, independent of all physical
considerations.
vData model is built using the information in
users’ requirements specification.
vConceptual data model is source of information
for logical design phase.

DATA MODELLING AND DESIGN 29


Conceptual data model
q High-level perspective on a subject area of
importance to the business
q Contains only the basic and critical business
entities within a given realm and function,
with a description of each entity and the
relationships between entities
q To create:
q Start with one subject area from the subject area
model
q Determine what objects are included within that
subject area, and how they relate to each other

DATA DEVELOPMENT [2016] 30


Conceptual Data Modeling Activities
ØSelect Scheme
ØSelect Notation:
ØChoosing a notation depends on standards within an
organization and the familiarity of users of a particular model
with a particular notation.
ØComplete Initial CDM: captures the viewpoint of a
user group
Ø Collect the highest-level concepts (nouns) that exist for the
organization, ex. Time, Geography,
Customer/Member/Client, Product/Service, and Transaction.
Ø Collect the activities (verbs) that connect these concepts. Ex:
Customers have multiple Geographic Locations (home,
work, etc.), Geographic Locations have many Customers.
Transactions occur at a Time, at a Facility, for a Customer,
selling a Product.
DATA MODELING AND DESIGN 31
Conceptual Data Modeling Activities
ØIncorporate Enterprise Terminology:
Ø Captures the enterprise perspective by ensuring consistency
with enterprise terminology and rules
Ø Ensure consistency with enterprise terminology and rules. Ex:
there would be some reconciliation work involved if the
audience conceptual data model had an entity called Client,
and the enterprise perspective called this same concept
Customer.
ØObtain Sign-off:
Ø the model is reviewed for data modeling best practices as well
as its ability to meet the requirements.

DATA DEVELOPMENT [2016] 32


Logical Database Design
Ø Process of constructing a model of the data used
in an enterprise based on a specific data model
(e.g. relational), but independent of a particular
DBMS and other physical considerations.
Ø Conceptual data model is refined and mapped
on to a logical data model.
Ø Logical data model is a detailed representation
of data requirements and the business rules that
govern data quality.
Ø Transform conceptual data model structures by
applying two techniques: normalization and
abstraction

DATA DEVELOPMENT [2016] 33


Logical Data Modeling Activities
v Analyze information requirements
v Identify business information needs
v Analyze existing documentations
v Use pre-existing data artifacts, industry data
models
v Add associative entities
v Add attributes
v Assign domains
v Assign keys
v Primary key, alternate keys

DATA DEVELOPMENT [2016] 34


Normalization
v The process of applying rules in order to organize
business complexity into stable data structures.
v The basic goal: to keep each attribute in only one
place to eliminate redundancy and the
inconsistencies that can result from redundancy.
v Step:
v First normal form (1NF)
v Second normal form (2NF)
v Third normal form (3NF)
v BCNF
v Fourth normal form (4NF)
v Fifth normal form (5NF)

DATA DEVELOPMENT [2016] 35


Physical Database Design

Process of producing a description of the


database implementation on secondary
storage.
Describes base relations, file
organizations, and indexes used to
achieve efficient access to data. Also
describes any associated integrity
constraints and security measures.
Tailored to a specific DBMS system.

DATA DEVELOPMENT [2016] 36


Physical Data Modeling Activities
v Resolve logical abstractions
v Add attribute detail
v Add reference data objects
v Create a matching separate code table
v Create a master shared code table.
v Embed rules or valid codes into the appropriate object’s
definition.
v Assign surrogate keys
v Denormalize for performance
v Index for performance
v Partition for performance
v Create views

DATA MODELLING AND DESIGN 37


DBMS Selection

— Selection of an appropriate DBMS to support


the database system.
— Undertaken at any time prior to logical design
provided sufficient information is available
regarding system requirements.
— Main steps to selecting a DBMS:
¡define Terms of Reference of study;
¡shortlist two or three products;
¡evaluate products;
¡recommend selection and produce report.

DATA DEVELOPMENT [2016] 38


DBMS Evaluation Features

DATA DEVELOPMENT [2016] 39


DBMS Evaluation Features

DATA DEVELOPMENT [2016] 40


Application Design
Design of user interface and application
programs that use and process the
database.
Database design and application design
are parallel activities.
Includes two important activities:
◦transaction design;
◦user interface design.

DATA DEVELOPMENT [2016] 41


Application Design - Transactions

An action, or series of actions, carried


out by a single user or application
program, which accesses or changes
content of the database.
Should define and document the high-
level characteristics of the transactions
required.

DATA DEVELOPMENT [2016] 42


Application Design - Transactions

The purpose of transaction design is to


define and document the high level of
characteristics of transactions:
◦data to be used by the transaction;
◦functional characteristics of the transaction;
◦output of the transaction;
◦importance to the users;
◦expected rate of usage.
Three main types of transactions:
retrieval, update, and mixed.
DATA DEVELOPMENT [2016] 43
Prototyping

Building working model of a database


system.
Purpose
◦to identify features of a system that work well, or
are inadequate;
◦to suggest improvements or even new features;
◦to clarify the users’ requirements;
◦to evaluate feasibility of a particular system
design.

DATA DEVELOPMENT [2016] 44


Implementation

Physical realization of the database and


application designs.
◦Use DDL to create database schemas and empty
database files.
◦Use DDL to create any specified user views.
◦Use 3GL or 4GL to create the application
programs. This will include the database
transactions implemented using the DML, possibly
embedded in a host programming language.

DATA DEVELOPMENT [2016] 45


Data Conversion and Loading

— Transferring any existing data into new database


and converting any existing applications to run
on new database.
— Only required when new database system is
replacing an old system.
¡DBMS normally has utility that loads existing files into
new database.
— May be possible to convert and use application
programs from old system for use by new system.

DATA DEVELOPMENT [2016] 46


Testing

Process of running the database system


with intent of finding errors.
Use carefully planned test strategies and
realistic data.
Testing cannot show absence of faults; it can
show only that software faults are present.
Demonstrates that database and application
programs appear to be working according
to requirements.

DATA DEVELOPMENT [2016] 47


Testing

— Should also test usability of system.


— Evaluation conducted against a usability
specification.
— Examples of criteria include:
¡Learnability;
¡Performance;
¡Robustness;
¡Recoverability;
¡Adaptability.

48
Operational Maintenance

Process of monitoring and maintaining


database system following installation.
Monitoring performance of system.
◦if performance falls, may require tuning or
reorganization of the database.
Maintaining and upgrading database
application (when required).
Incorporating new requirements into
database application.

DATA DEVELOPMENT [2016] 49


Tools
v Data modeling tools
v CASE tools: A software tool that helps software designers and
developers specify, generate and maintain some or all of the software
components
v Support provided by CASE tools include:
v data dictionary to store information about database system’s
data;
v design tools to support data analysis;
v tools to permit development of corporate data model, and
conceptual and logical data models;
v tools to enable prototyping of applications.
v Lineage tools
v allows the capture and maintenance of the source structures for
each attribute on the data model
v Ex. Gross Sales Amount might be sourced from several
applications and require a calculation to populate

DATA MODELING AND DESIGN 50


Tools
v Data profiling tools
v help explore the data content, validate it against existing
Metadata, and identify Data Quality gaps/deficiencies
v Metadata repositories
v stores descriptive information about the data model
v Data model pattern
v reusable modeling structures that can be applied to
a wide class of situations
v Elementary, assembly, integration
v Industry data model
v vendors or industry groups: ARTS (for retail), SID
(for communications), or ACORD (for insurance)
v Need to be customized
DATA DEVELOPMENT [2016] 51
Best Practice
vNaming convention
vISO 11179 Metadata Registry
vData modeling and database design
standards
vDatabase design
vPRISM:
vPerformance and ease of use
vReusability
vIntegrity
vSecurity
vMaintainability
DATA DEVELOPMENT [2016] 52
Data Model Governance
vData model and design quality
management
q Develop data modelling and design
standards
q Review data model and database design
quality
q Manage data versioning and integration
vData modelling metrics

DATA DEVELOPMENT [2016] 53


Data modeling and database design
standards
§ A list and description of standard data modeling and database design
deliverables
§ A list of standard names, acceptable abbreviations, and abbreviation
rules for uncommon words, that apply to all data model objects
§ A list of standard naming formats for all data model objects
§ A list and description of standard methods for creating and maintaining
these deliverables
§ A list and description of data modeling and database design roles and
responsibilities
§ A list and description of all Metadata properties captured in data
modeling and database design
§ Metadata quality expectations and requirements
§ Guidelines for how to use data modeling tools
§ Guidelines for preparing for and leading design reviews
§ Guidelines for versioning of data models
§ Practices that are discouraged

DATA DEVELOPMENT [2016] 54


Manage data model versioning and
integration
Each change should note:
v Why the project or situation required the change
v What and How the object(s) changed, including
which tables had columns added, modified, or
removed, etc.
v When the change was approved and when the
change was made to the model (not necessarily
when the change was implemented in a system)
v Who made the change
v Where the change was made (in which models)

DATA DEVELOPMENT [2016] 55


5.2 Data Modeling Metrics

There are several ways of measuring a data model’s quality, and all require a standard for comparison. One
method that will be used to provide an example of data model validation is The Data Model Scorecard®, which
Data model scorecard template
provides 11 data model quality metrics: one for each of ten categories that make up the Scorecard and an overall
score across all ten categories (Hoberman, 2015). Table 11 contains the Scorecard template.

Table 11 Data Model Scorecard® Template

# Category Total Model % Comments


score score
1 How well does the model capture the requirements? 15
2 How complete is the model? 15
3 How well does the model match its scheme? 10
4 How structurally sound is the model? 15
5 How well does the model leverage generic structures? 10
6 How well does the model follow naming standards? 5
7 How well has the model been arranged for 5
readability?
8 How good are the definitions? 10
9 How consistent is the model with the enterprise? 5
10 How well does the metadata match the data? 10
TOTAL SCORE 100

The model score column contains the reviewer’s assessment of how well a particular model met the scoring
criteria, with a maximum score being the value that appears in the total score column. For example, a reviewer
might give a model a score of 10 on “How well does the model capture the requirements?” The % column
DATA DEVELOPMENT [2016] 56

You might also like