Professional Documents
Culture Documents
Data Data
Metadata
Governance Security
D
ata modeling is the process of discovering, analyzing, and scoping data requirements, and then
representing and communicating these data requirements in a precise form called the data model.
Data modeling is a critical component of data management. The modeling process requires that
organizations discover and document how their data fits together. The modeling process itself designs how data
fits together (Simsion, 2013). Data models depict and enable an organization to understand its data assets.
Objectives
2
© Pearson Education Limited
DATA DEVELOPMENT 2013
[2016] 2
3
Introduction
§ Data models depict and enable an organization to
understand its data assets.
§ Schemes: Relational, Dimensional, Object-Oriented,
Fact-Based, Time-Based, and NoSQL.
§ Each model contains a set of components: entities,
relationships, facts, keys, and attributes.
§ Data models are critical to effective management of
data:
§ Provide a common vocabulary around data
§ Capture and document explicit knowledge about an
organization’s data and systems
§ Serve as a primary communications tool during projects
§ Provide the starting point for customization, integration, or
even replacement of an application
4
Data Model : Goals
Confirming and documenting understanding of
different perspectives facilitates:
q Formalization: A data model documents a concise
definition of data structures and relationships.
q Scope definition: A data model can help explain the
boundaries for data context and implementation of
purchased application packages, projects, initiatives,
or existing systems.
q Knowledge retention/documentation: A data model
can preserve corporate memory regarding a system
or project by capturing knowledge in an explicit
form.
5
1.3.4 Data Modeling Schemes
The six most common schemes used to represent data are: Relational, Dimensional, Object-Oriented, Fa
Based, Time-Based, and NoSQL. Each scheme uses specific diagramming notations (see Table 9).
Data Modelling Schemes
Table 9 Modeling Schemes and Notations
This section will briefly explain each of these schemes and notations. The use of schemes depends in part on
database being built, as some are suited to particular technologies,
DATA DEVELOPMENT [2016] as shown in Table 10. 6
3.4.1 Relational
Relational Modeling
rst articulated by Dr. Edward Codd in 1970, relational theory provides a systematic way to organize data s
at they reflected their meaning (Codd, 1970). This approach had the additional effect of reducing redundanc
data storage. Codd’s insight was that data could most effectively be managed in terms of two-dimension
v Dr. Edward Codd in 1970, relational theory
lations. The term relation was derived from the mathematics (set theory) upon which his approach was base
ee Chapter 6.)provides a systematic way to organize data
so that
he design objectives for the they
relationalreflected their
model are to have meaning
an exact expression of business data and to have on
ct in one v Information
place Engineering
(the removal of redundancy). (IE) is ideal for the design of operation
Relational modeling
stems, which require entering information quickly and having it stored accurately (Hay, 2011).
v Integration Definition for Information
here are severalModeling
different kinds of(IDEF1X)
notation to express the association between entities in relational modelin
cluding Information Engineering (IE), Integration Definition for Information Modeling (IDEF1X), Bark
vChen
otation, and Barker
Notation.Notation
The most common form is IE syntax, with its familiar tridents or ‘crow’s feet’
v Chen
epict cardinality. Notation
(See Figure 39.)
Student Attend
Course
gure 39 IE Notation
The concept of dimensional modeling started from a joint research project conducted by General Mill
Dartmouth College in the 1960’s. 33 In dimensional models, data is structured to optimize the query and an
of large amounts of data. In contrast, operational systems that support transaction processing are optimiz
Dimensional data models capture business questions focused on a particular business process. The pr
being measured on the dimensional model in Figure 40 is Admissions. Admissions can be viewed by the
v Data is structured to optimize the query and
the student is from, School Name, Semester, and whether the student is receiving financial aid. Navigatio
be made from Zone up to Region and Country, from Semester up to Year, and from School Name up to S
Geography
v Fact table Country
Year Semester
Admissions
Name Level
School
The diagramming notation used to build this model – the ‘axis notation’ – can be a very eff
contain mostly textual descriptions
communication tool with those who prefer not to read traditional data modeling syntax.
Both the relational and dimensional conceptual data models can be based on the same business process
DATA DEVELOPMENT [2016] 8
this example with Admissions). The difference is in the meaning of the relationships, where on the rela
1.3.4.3 Object-Oriented (UML)
Object Oriented Modeling (UML)
The Unified Modeling Language (UML) is a graphical language for modeling software. The UML has a vari
of notations of which one (the class model) concerns databases. The UML class model specifies classes (ent
types) and their relationship types (Blaha, 2013).
A Class diagram resembles an ER diagram except that the Operations or Methods section is not pres
in ER.
In ER, the closest equivalent to Operations would be Stored Procedures.
Attribute types (e.g., Date, Minutes) are expressed in the implementable application code language a
not in the physical database implementable terminology.
DATA DEVELOPMENT [2016] 9
1.3.4.4 Fact-Based Modeling (FBM)
Fact-Based Modeling, a family of conceptual modeling languages, originated in the late 1970s. These languages
characterize those objects, and each role that each object plays in
objects (both entities and values). The most widely used of the FBM variants is Object Role Modeling (ORM),
which was formalized as a first-order logic by Terry Halpin in 1989.
each fact.
v Do not use attributes, reducing
1.3.4.4.1 Object Role Modeling (ORM or ORM2)
142 DMBOK2
the need for intuitive or expert
judgment by expressing the exact relationships between objects
required information (both
or queriesentities
presented in anyand
externalvalues)
formulation familiar to users, and then verbalizes
these examples at the conceptual level, in terms of simple facts expressed in a controlled natural language. This
v ORM
language is a restricted version of–natural
Object Role
language that Modeling
is unambiguous,
1.3.4.4.2 Fully Communication Oriented Modeling (FCO-IM)
so the semantics are readily grasped by
humans; it is also formal, so it can be used to automatically map FCO-IM the structures to inlower
is similar levels
notation and for
approach to ORM. The numbers in Figure 43 are references to verba
implementation Fully
v(Halpin, 2015). Communication Oriented Modeling (FCO-IM)
of facts. For example, 2 might refer to several verbalizations including “Student 1234 has first name Bil
Semester 1
Student Course
… in … enrolled in … Student 4 5 6 Course
Figure 42 ORM Model Attendance
2 3
Time-Based
and knots. Anchors model entities and events, attributes model properties of anchors, ties model the
On the anchor model in Figure 45, Student, Course, and Attendance are anchors, the gray diamonds represent
Anchors model entities andbetween
events, attributesanchors,
model propertiesand knots:
of anchors, ties, andshared
ties model the properties,
the circles represent attributes. such as states
ps between anchors, and knots are used to model shared properties, such as states.
Information
product
Centralized approach
vCollect requirements from each user view
vRequirements for each user view are merged
into a single set of requirements.
vA data model is created representing all user
views during the database design stage.
48
Operational Maintenance
There are several ways of measuring a data model’s quality, and all require a standard for comparison. One
method that will be used to provide an example of data model validation is The Data Model Scorecard®, which
Data model scorecard template
provides 11 data model quality metrics: one for each of ten categories that make up the Scorecard and an overall
score across all ten categories (Hoberman, 2015). Table 11 contains the Scorecard template.
The model score column contains the reviewer’s assessment of how well a particular model met the scoring
criteria, with a maximum score being the value that appears in the total score column. For example, a reviewer
might give a model a score of 10 on “How well does the model capture the requirements?” The % column
DATA DEVELOPMENT [2016] 56