You are on page 1of 33

Data Modeling 2 Top-down Design i

Table of Contents

Table of Contents....................................................................................................................................i
List of Figures..........................................................................................................................................i
List of Tables...........................................................................................................................................ii
2 Top-down design..............................................................................................................................2-1
2.1 Top-down design by creating successive data models..............................................................2-1
2.2 The conceptual data model.......................................................................................................2-4
2.2.1 The maximum cardinality of a relationship........................................................................2-5
2.2.2 The minimum cardinality of a relationship.........................................................................2-7
2.2.3 Types of binary relationships..............................................................................................2-8
2.3 The logical data model..............................................................................................................2-8
2.3.1 Converting entities into tables...........................................................................................2-9
2.3.2 Representing relationships by primary and foreign keys with matching values...............2-10
2.3.3 Domains...........................................................................................................................2-13
2.4 The physical data model..........................................................................................................2-14
2.4.1 Implementing domains in Access.....................................................................................2-14
2.4.2 Implementing relationships in Access..............................................................................2-15
2.5 Example: Top-down dseign of a database for a university......................................................2-16
2.5.1 The conceptual data model for a university.....................................................................2-17
2.5.2 The logical relational data model of a university..............................................................2-18
2.5.3 The physical data model for a university..........................................................................2-22
2.6 The data model of a data warehouse......................................................................................2-23
2.7 Conclusion...............................................................................................................................2-24

List of Figures
Figure 2-1: The class diagram of a database (fig. 8.12 in the textbook 'Business Information
Management'......................................................................................................................................2-2
Figure 2-2: The three data models......................................................................................................2-3
Figure 2-3: A 'one to many' relationship and its cardinalities: sets and class diagram........................2-4
Figure 2-4: The 'one to one' relationship: sets and class diagram.......................................................2-6
Figure 2-5: The 'one to many' relationship: sets and class diagram....................................................2-6
Figure 2-6: The 'many to many' relationship: sets and class diagram.................................................2-7
Figure 2-7: Entity occurrences of an entity with a composite and a multivalued attribute, and the
corresponding tables...........................................................................................................................2-9
Figure 2-8: A 'one to many' relationship: sets, class diagrm and tables............................................2-10
Figure 2-9: A 'one to one' relationship: sets, class diagram and tables.............................................2-11
Data Modeling 2 Top-down Design ii

Figure 2-10: A 'many to many' relationship split up into two 'one to many' relationships: sets, class
diagram and tables............................................................................................................................2-12
Figure 2-11: The 'one to many' relationship 'Prof Teaches Class' in the physical data model of Access
..........................................................................................................................................................2-15
Figure 2-12: The 'one to one' relationship 'Prof has Office' in the physical data model of Access....2-16
Figure 2-13: The conceptual data model of a university...................................................................2-17
Figure 2-14: The logical data model of a relational database for a university...................................2-20
Figure 2-15: Thye physical data model in Access for a database for a university..............................2-22
Figure 2-16: The conceptual data model for a 'Sales' data warehouse.............................................2-23
Figure 2-17: The logical data model for a 'Sales' data warehouse.....................................................2-24

List of Tables
Table 2-1: The three possible combinations for the maximum cardinality.........................................2-5
Table 2-2: The three possible combinations for the minimum cardinality..........................................2-5
Table 2-3: The logical data model of the relational database 'University' as a list of tables.............2-21
Data Modeling 2 Top-down Design 1

2 Top-down design
When designing a data model, you can proceed in two ways: top-down or bottom-up. The
bottom-up approach will be discussed in Chapter 3. The top-down approach is covered here.
 With the top-down approach you design a database by creating three models that evolve
from a more abstract to a more concrete level. Departing from the conceptual data model
0
you pass through the logical data model 0 to the physical data model (or implementation
model of the data) 0. These different models are usually represented by graphical
schemes, such as a UML class diagram, or the Relationships tab in Access.
The top-down approachfor database design is mostly used for a new design where you have no
DBMS at all to the start from. The bottom-up is mostly used when you already have an existing
RDBMS, but there are structural problems. Both methodologies may be combined, where a first
design is done via the top-down approach, and in the end the logical or physical model is
checked for defects through the normalization process of the bottom-up approach.
This chapter two cases are worked out:
 The design of a simplified database designed for a university, where the most important
issues are illustrated. The top-down approach is used. It is the most useful approach
when one starts from scratch.
 Then a generic database for a data warehouse is shown 0, which usually has a star
structure. It has one single facts table, in the centre of the star structure, and linked to
multiple dimension tables, which contain information on one aspect (dimension)
described in the facts table. Models for Business Intelligence use data warehouses.
2.1 Top-down design by creating successive data models
Using a top-down approach you start with defining the scope of your system. Next you collect
information about the business processes that you want to automate by your system. To achieve
this you collect documents (forms, reports) and interview persons, or you observe the current
system.
You can bring together alls the information you obtained in a conceptual data model. Its
information is independent from both the choice for a specific database manegement system
type, and of the software package that you will actually use to implement the DBMS.
You can use all kinds of methodologies to get a graphical conceptual data model. Nowadays the
Entity Relationship diagram (ERD) is very popular. You can represent it in a UML class diagram.
An example of this can be found in the textbook Business Information Management, 2nd ed. by
C. Doom, 0 where a conceptual data model for (part of) a university is shown (Figure 2 -1).

0
Textbook Business Information Management – 2nd ed., ch. 7.4.1.
0
Textbook Business Information Management – 2nd ed., ch. 7.4.2.
0
Textbook Business Information Management – 2nd ed., ch. 8.3.
0
You can find more information about data warehouses in the textbook Business Information
Management – 2nd ed., ch. 5.2.
0
Textbook Business Information Management – 2nd ed., ch. 8, figure 8.12.
Data Modeling 2 Top-down Design 2

Figure 2-1: The class diagram of a database (fig. 8.12 in the textbook 'Business Information Management'

An Entity Relationship Diagram consist of


1. Entities: the sets of objects about which you want to store. One often makes a distinction
between the entity type that is an abstract description of the entity with all its properties,
and an entity occurrence that is one specific instance of this entity. In a UML class
diagram the entity types are called classes and their occurrences are called objects. A class
has properties called attributes, and actions you can perform with the objects called
methods. In the class diagram the class is indicated by a rectangle with three subdivisions
separated by horizontal lines: the upper section contains the class name (entity), the
middle section lists the attributes and the lower section lists the methods.
2. Relationships: the links between the entities. Every relationship has a degree and a
minimum and maximum cardinality.
o The degree of the relationship indicates how many entity types participate in the
relationship. The degree may be one, two or more. In this syllabus only
relationships with degree 2 are considered, so relationships between two entity
types.
o The cardinality of the relationship indicates how many occurrences from one
entity can be connected at least (minimum cardinality) and at most (maximum
cardinality) to how many occurrences of the other entity and vice versa.
In a UML class diagram the relationships are called associations identified by a
relationship line between the classes. The relationship name stands in the middle of the
relationship line. At each end of the relationship line are the minimum and maximum
cardinalities of the relationship, separated by two dots, next to the entities. The minimum
cardinality is located to the left of the two periods, the maximum cardinality is to the
right of the two dots.
There is a link between, on the one hand, entities and relationships from an ERD and, on the other
hand, sets and relationships in Mathematics:
 An entity corresponds to a set, an entity occurrence with an element of the set.
 A relationship with grade 2 corresponds to a relationship between two sets.
There are three kinds of data models, that evolve from more abstract to more concrete and
detailed (Figure 2 -2):

Conceptual data model


define volumes; select DBMS type (bv. RDBMS)

Logical data model


select package to implement (bv. MS Access)

Physical data model

Figure 2- 2: The three data models

 The conceptual data model:


o Business data, independent of the DBMS type
o Entities with their attibutes
o Relationships
Data Modeling 2 Top-down Design 3

o Entity Relationship Diagram (ERD)


o Primary keys are identifiers, there are no foreign keys.
o Tables do not exist at this level.
 The logical data model (design model of the data):
o This is designed taking into account volumes and after selecting a particular
database management system type, e.g. a RDBMS. The further development
depends on the selected DBMS type.
o When a RDBMS is selected entities are converted to tables and relationships are
represented by corresponding values of the primary key in a parent table and the
foreign key in a child table.
 The physical data model (implementation model of the data):
o This is designed after selecting a specific DBMS package to implement the DBMS,
e.g.. MS Access.
In MS Access all fields, data types, field properties and other properties and relationships are
defined.
Data Modeling 2 Top-down Design 4

2.2 The conceptual data model


The conceptual data model consists of entities and relationships.
In a UML class diagram of a relationship with degree 2, two symbols separated by two dots, such
as 1 .. *, at either side of the relationship line, designate the minimum and maximum cardinality
of the two entities or classes in the relationship. Important and potentially confusing is that these
symbols belong to the class where they are not listed, so to the class on the other side of the
relationship line.

B1
A A1 B2 B Entity A Entity B
B3 attributes 0..1 1..* attributes
A2 B4
B5 methods methods
A3 B6
. B7

Figure 2-3: A 'one to many' relationship and its cardinalities: sets and class diagram

 The maximum cardinality is shown to the right of the two dots and usually has the value
1 or *.
The maximum cardinality on the right hand side of the relationship line indicates that the
entity on the left hand side of the relationship line may be related to at most one (1) or to
more than one (*) objects on the right hand side of the relationship line. In other words,
for each object on the left hand side of the relationship line may there be more than 1
object at the right hand side of the relationship: Yes (*) or no (1)? 0
The maximum cardinality on the left hand side of the relationship line indicates that the
entity on the right hand side of the relationship line may be related to at most one (1) or
to more than one (*) objects on the left hand side of the relationship line. In other words,
for each object on the right hand side of the relationship line may there be more than 1
object at the left hand side of the relationship: Yes (*) or no (1)?
 The minimum cardinality is shown to the left of the two dots and usually has the value 0
or 1.
The minimum cardinality on the right hand side of the relationship line indicates that the
entity on the left hand side of the relationship line may be related to none (0) or to at
least one (1) objects on the right hand side of the relationship line. In other words, for
each object on the left hand side of the relationship line must there be an object at the
right hand side of the relationship: Yes (1) or no (0)? 0
The minimum cardinality on the left hand side of the relationship line indicates that the
entity on the right hand side of the relationship line may be related to none (0) or to at
least one (1) objects on the left hand side of the relationship line. In other words, for
each object on the right hand side of the relationship line must there be an object at the
left hand side of the relationship: Yes (1) or no (0)? 0
0
You may enter a numeric value greater than 1 instead of * for the maximum cardinality. There is no
difference as far as the theory is concerned.
0
You may enter a numeric value greater than 1 instead of 1 for the minimum cardinality. There is no
difference as far as the theory is concerned.
0
You may enter a numeric value greater than 1 instead of 1 for the minimum cardinality. There is no
difference as far as the theory is concerned.
Data Modeling 2 Top-down Design 5

By combining the possible values for the maximum cardinality at both sides (the two symbols at
the right of the two dots), there are three cases (Table 2 -1):
Table 2-1: The three possible combinations for the maximum cardinality

maximum cardinality left right


one to one (Figure 2 ..1 ..1
-4)

one to many (Figure 2 ..1 ..*


-5)
(or many to one) (or ..* ..1)

many to many (Figure 2 ..* ..*


-6)

By combining the possible values for the minimum cardinality at both sides (the two symbols at
the left of the two dots), there are three cases (Table 2 -2):
Table 2-2: The three possible combinations for the minimum cardinality

minimum cardinality left right


zero to zero (Figure 2 0.. 0..
-6)

zero to one 0.. 1..


(or one to zero) (Figure 2 (or 1.. 0..)
-4)

one to one (Figure 2 1.. 1..


-5)

2.2.1 The maximum cardinality of a relationship


The relationship type is usually defined by its maximum cardinality.
 In a one to one relationship between entities A and B each occurence of entity A may be
linked to at most 1 occurrence of entity B, and each occurence of entity B may be linked
to at most 1 occurrence of entity A.
This means that in each element of set A at most 1 arrow may depart and in each
element of set B at most 1 arrow may arrive (Figure 2 -4Error: Reference source not
found, left side).
In the class diagram the maximum cardinality of the one to one relationship is designated
by the number 1 next to entities A and B, to the right of the two dots (Figure 2 -4, right
side: 1..1 en 0..1 ).
Data Modeling 2 Top-down Design 6

A1 B1
A A2 B Entity A Entity B
A3 B2 attributes 1..1 0..1 attributes
A4 (or 1)
A5 . B3 methods methods
A6 .
B4

Figure 2-4: The 'one to one' relationship: sets and class diagram

 In a one to many relationship between entities A and B each occurence of entity A may
be linked to more than 1 occurrence of entity B, and each occurence of entity B may be
linked to at most 1 occurrence of entity A. 0
This means that in each element of set A at most 1 arrow may depart and in each
element of set B more than 1 arrows may arrive (Figure 2 -5, left side).
In the class diagram the maximum cardinality of the one to many relationship is
designated by the number 1 next to entity A and the asterisk * next to entity B, to the
right of the two dots (Figure 2 -5, right side: 1..1 en 1..* ).

B1
A A1 B2 B Entity A Entity B
B3 attributes 1..1 1..* attributes
A2 B4 (or 1)
B5 methods methods
A3 B6

Figure 2-5: The 'one to many' relationship: sets and class diagram

 In a many to many relationship between entities A and B each occurence of entity A may
be linked to more than 1 occurrence of entity B, and each occurence of entity B may be
linked to more than 1 occurrence of entity A.
This means that in each element of set A more than 1 arrows may depart and in each
element of set B more than 1 arrows may arrive (Figure 2 -6, left side).
In the class diagram the maximum cardinality of the many to many relationship is
designated by the the asterisk * next to entities A and B, to the right of the two dots
(Figure 2 -6, right side: 0..* en 0..* ).

A1 B1
A A2 B2 B Entity A Entity B
A3 B3 attributes 0..* 0..* attributes
A4 B4
A5 B5 methods methods
A6 B6

0
The same applies to a relationship with maximum cardinality many to one between the entities A and B,
where the roles of A and B are switched.
Data Modeling 2 Top-down Design 7

A7 . . B7

Figure 2-6: The 'many to many' relationship: sets and class diagram

2.2.2 The minimum cardinality of a relationship


The minimum cardinality is less important.
 In a relationship between entities A and B with minimum cardinality zero to zero each
occurence of entity A may or may not be linked to an occurrence of entity B, and each
occurence of entity B may or may not be linked to an occurrence of entity A.
This means that there may be elements of set A that do not participate in the relationship
(where no arrow departs) and that there may be elements of set B that do not participate
in the relationship (where no arrow arrives) (Figure 2 -6, left side).
Both entities A and B have minimum cardinality 0 and have optional participation in the
relationship.
In the class diagram of the relationship the minimum cardinality zero to zero is
designated by the number 0 next to entities A and B, to the right of the two dots (Figure
2 -6, right side: 0..* en 0..* ).
 In a relationship between entities A and B with minimum cardinality one to zero each
occurence of entity A mayor may not be linked to an occurrence of entity B, and each
occurence of entity B must be linked to an occurrence of entity A.
This means that there may be elements of set A that do not participate in the relationship
(where no arrow departs) and that all elements of set B must participate in the
relationship (an arrow arrives in each element) (Figure 2 -4, left side).
Entity A with minimum cardinality 0 has optional participation in the relationship. Entity
B with minimum cardinality 1 has mandatory participation in the relationship.
In the class diagram of the relationship the minimum cardinality one to zero is
designated by the number 1 next to entity A and the number 0 next to entity B, to the
right of the two dots (Figure 2 -4, right side: 1..1 en 0..1 ).
 In a relationship between entities A and B with minimum cardinality one to one each
occurence of entity A must be linked to an occurrence of entity B, and each occurence of
entity B must be linked to an occurrence of entity A.
This means that every element of set A that must participate in the relationship (an
arrow arrives in each element) and that every element of set B that must participate in
the relationship (an arrow arrives in each element) (Figure 2 -6, left side).
Both entities A and B have minimum cardinality 1 and have mandatory participation in
the relationship.
In the class diagram of the relationship the minimum cardinality one to one is designated
by the number 1 next to entities A and B, to the right of the two dots (Figure 2 -5, right
side: 1..1 en 1..* ).
The word participation is thus equivalent to minimum cardinality, where optional participation
stands for a minimum cardinality with value 0 and mandatory participation stands for a
minimum cardinality with value 1.
2.2.3 Types of binary relationships
In the name of relationship usually only the maximum cardinality is mentioned: one to one, one
to many (or many to one), and many to many.
Data Modeling 2 Top-down Design 8

You can create up to 10 different types of combinations of relationships with minimum and
maximum cardinalities.
Figure 2 -4 shows a one to one relationship, sometimes more extensively called one to one
(maximum cardinality), one to zero (minimum cardinality), or also one to one, mandatory to
optional:
 Entity A has minimum cardinality zero and maximum cardinality one. The symbols 0..1
next to entity B in the class diagram show this. Each entity occurrence in A has 0 or 1
related entity occurrence in B.
 Entity B has minimum cardinality one and maximum cardinality one. The symbols 1..1
next to entity A in the class diagram show this. Each entity occurrence in B has exactly 1
related entity occurrence in A.
Figure 2 -5 shows a one to many relationship, sometimes more extensively called one to many
(maximum cardinality), one to one (minimum cardinality), or also one to many, mandatory to
mandatory:
 Entity A has minimum cardinality one and maximum cardinality many. The symbols 1..*
next to entity B in the class diagram show this. Each entity occurrence in A thus has 1 or
more related entity occurrence in B.
 Entity B has minimum cardinality one and maximum cardinality one. The symbols 1..1
next to entity A in the class diagram show this. Each entity occurrence in B has exactly 1
related entity occurrence in A.
Figure 2 -6Error: Reference source not found shows a many to many relationship, sometimes
more extensively called many to many (maximum cardinality), zero to zero (minimum
cardinality), or also many to many, mandatory to mandatory:
 Entity A has minimum cardinality zero and maximum cardinality many. The symbols 0..*
next to entity B in the class diagram show this. Each entity occurrence in A thus has 0, 1
or more related entity occurrence in B.
 Entity B has minimum cardinality zero and maximum cardinality many. The symbols 0..*
next to entity A in the class diagram show this. Each entity occurrence in B thus has 0, 1
or more related entity occurrence in A.
2.3 The logical data model
Before you transform the conceptual data model iton a logical data model you need to estimate
volumes (especially numbers of entity occurrences) and to select a DBMS type. In this syllabus, a
Relational Database Management System (RDBMS) is selected.
In a RDBMS tables with the properties defined in chapter 1 are used. You have to convert entities
from the conceptual data model into tables and represent the relationships between two tables
by fields with common values in both tables.
2.3.1 Converting entities into tables
In most cases it is possible to convert an entity into a table and an attribute into a column. The
entity occurrences then become rows.
However, you must check whether the columns are composite (composed of different parts) or
single-valued (they contain only one value):
 Composite attributes such as name and address should be split up into single column
attributes such as last name, first name and street, postal code and municipality.
 Multivalued attributes should be replaced by single valued attributes. Usually you need an
additional table that contains single values in each cell in addition to a foreign key that
links the each row of this new table to a row of the original table bases upon equal
Data Modeling 2 Top-down Design 9

matches between the foreign key value in the new table and the primary key value in the
original table.
So you get two tables. The original table (without the deleted columns with the
multivalued attributes) is the parent table (parent). The new column with one row for
each value of the formerly multivalued attribute is the child table (child).
Example: The multivalued attribute email in the conceptual data model may have
multiple values for each entity occurrence (email address). Remove the multivalued
column email from the parent table representing the entity in the conceptual data model,
and create a new child table with a column email with only one email address in every
row, and a second foreign key column with the primary key value of the parent table
(Figure 2 -7).
Politicians (entity)
politn name email
r
1 Donald Trump donald@trump.com; donald.trump@president.us
2 Angela Merkel angela@cdu.de; angela.merkel@bundeskanzler.de
3 Emmanuel Macron emmanuel@republque-en-marche.fr; emmanuel.macron@president.fr

Politicians (table) Emailaddresses (table)


politn lastnam firstname emailaddressn politn email
r e r r
P1 Trump Donald E1 P1 donald@trump.com
P2 Merkel Angela E2 P1 donald.trump@president.us
P3 Macron Emmanuel E3 P2 angela@cdu.de
E4 P2 angela.merkel@bundeskanzler.de
E5 P3 emmanuel@republque-en-marche.fr
E6 P3 emmanuel.macron@president.fr

Figure 2-7: Entity occurrences of an entity with a composite and a multivalued attribute, and the corresponding tables

In the new tables there are no composite or multivalued columns. It is not allowed to have
multivalued columns in a table. Composite columns are allowed, but you should avoid them if you
want use to use the different part separately. Only date and time colums, each composed of three
parts must no be split, because every physical database system has functions to split them.
2.3.2 Representing relationships by primary and foreign keys with matching values
You can convert a relationship between two entities into a relationship between two tables,
where both tables share a column with the same values. The same method was already done in
when eliminating multivalued attributes.
You usually must add a foreigh key column (or columns) in one of the two tables. The foreign key
values must existe in the primary key of the related table. The table containing the foreign key is
called the child table. The other table containing the primary key with matching values is called
the parent table. In Figure 2 -7 Politicians is the parent table and Emailaddresses is the child
table.
However, this is only possible if each row in the child table is related to one single row in the
parent table, or to no row at all. If a row in the child table would be related to multiple rows in
the parent table, the foreign key would have to match more than one row in the parent table. This
can only be achieved by providing multiple values for the foreign key, and in a relational table
multivalued cells are not allowed! This has important consequences!
Depending on the relationship type (maximum cardinality) ther are three cases:
Data Modeling 2 Top-down Design 10

 In a one to many relationship the foreign key must always reside in the table at the
many side of the relationship line for each entity occurrence (row) in this child table has
at most one entity occurrence (row) in the parent table at the one side of the relationship
line with a matching primary key value (Figure 2 -8), which guarantees a singlevalued
foreign key.
Placing the foreign key in the table at the one side of the relationship line would cause a
multivalued foreign key, and that is not allowed! So, the foreign key must always reside in
the table at the many side of the relationship line.

(B1,A1) (conceptal)
A A1 (B2,A1) B Entity A Entity B
(B3,A1) attributes 1..1 1..* attributes
A2 (B4,A2) (or 1)
(B5,A3) methods methods
A3 (B6,A3)

Table A Table B (child)


(parent)
pk … pk fk …
A1 … B1 A1 parent (logical) child
A2 … B2 A1 Table A Table B
A3 … B3 A1 pk 1..1 1..* pk
B4 A2 … (or 1) fk
B5 A3 …
B6 A3

Figure 2-8: A 'one to many' relationship: sets, class diagrm and tables

So, a row in the parent table may be related to many rows in the child table, all having the
same foreign key value matching the primary key value of the parent table. But each row
in the child table may be related to at most one row in the parent table.
 In a one to one relationship you are free to choose where the foreign key resides, for for
each entity occurrence (row) in both tables has at most one entity occurrence (row) in the
other table with a matching primary key value, which guarantees a singlevalued foreign
key.
So, a row in the parent table may be related to at most one row in the child table, having a
foreign key value matching the primary key value of the parent table. And each row in the
child table may also be related to at most one row in the parent table.
o In practice the minimum cardinality will often decide in which table the foreign
key resides. In a one to one, one to zero relationship, the foreign key resides in
the table at the zero side (minimum cardinality) of the relationship line (Figure 2
-9).
o More general the foreign key usually resides in the table with the least number of
rows.
o In a one to one, one to one relationship both tables always have an equal
number of rows, for each row in one table is always related to exactly one row in
the other table. Then it does not matter which table is the parent table and which
table is the child table with the foreign key. In most cases both entities are
connected so strongly that they ususally are joined together in one single table.
Data Modeling 2 Top-down Design 11

A1 (B1,A1) (conceptual)
A A2 (B2,A2) B Entity A Entity B
A3 (B3,A3) attributes 1..1 0..1 attributes
A4 (B4,A4) (or 1)
A5 . methods methods
A6 .

Table A Table B (child)


(parent)
pk … pk fk …
A1 … B1 A1 parent (logical) child
A2 … B2 A2 Table A Table B
A3 … B3 A3 pk 1..1 0..1 pk
A4 … B4 A4 … (or 1) fk
A5 … …
A6 …

Figure 2-9: A 'one to one' relationship: sets, class diagram and tables

 In a many to many relationship the foreign key cannot always reside in either one of the
tables that represent the original entities, for this would cause a multivalued foreign key,
and that is not allowed! So, the foreign key cannot reside in in a table at any side of the
relationship line!

A1 B1 (conceptual)
A A2 B2 B Entity A Entity B
A3 B3 attributes 0..* 0..* attributes
A4 B4
A5 B5 methods methods
A6 B6
A7 . . B7

A1

(A1,B1)

B1

(logical)
A
A2
Data Modeling 2 Top-down Design 12

(A2,B1)

B2
B
(parent)

(2*child)

(parent)

A3

(A2,B2)

B3

Table A

Table AB

Table B

A4

(A2,B3)

B4

pk
1 0..*
(pk)
0..* 1
pk

A5

(A3,B4)
Data Modeling 2 Top-down Design 13

B5

fk1

A6

(A4,B4)

B6

fk2

A7 .

(A5,B4)

. B7

(A6,B5)
Data Modeling 2 Top-down Design 14

(A6,B6)
AB

Table A (parent)

Table AB (child)
Table B (parent)

pk

fk
fk

pk
Data Modeling 2 Top-down Design 15

A1

A1
B1

B1

A2

A2
B1

B2

A3

A2
B2
Data Modeling 2 Top-down Design 16

B3

A4

A2
B3

B4

A5

A3
B4

B5

A6

A4
B4
Data Modeling 2 Top-down Design 17

B6

A7

A5
B4

B7

A6
B5

A6
Data Modeling 2 Top-down Design 18

B6

Figure 2-10: A 'many to many' relationship split up into two 'one to many' relationships: sets, class diagram and tables

Hence, a many to many relationship can never appear in a logical data model. However
you can represent a many to many relationship indirectly by inserting an extra so called
intersection table (Figure 2 -10).
In the scheme with the sets A and B insert a third set AB between the two original sets A
and B. Each original arrow is now split up in tow arrows, with a 'halfway stop' in the new
set AB (Figure 2 -10, middle part, left side).
In this way the original many to many relationship between A and Bis split up into two
new relationships:
o A one to many relationship between A and AB: in each element of A many
arrows may depart, and in each element of AB exact 1 arrow arrives.
o A many to one relationship between AB en B: in each element of AB exact 1
arrow doparts, and in each element of B many arrows may arrive.
So, the new table AB is the child table in two one to many relationships with the tables A
or B. It has two columns, that are each separately a foreign key with values matching the
primary key values in A or B and in this way defining the relationship.
This implies that in a logical data model using a RDBMS only één – één en één – veel
relationships remain, while alle many to many relationships are split up into two one to many
relationships where an intersection table is added.
The foreign key defines the relationship. It must obey the referential integrity rule:
 The foreign key in de child table may only have values matching the primary key values
in the relationship's parent table.
 The foreign key in the child table may also be empty (have no value at all), but only if the
minimum cardinality of the child table is zero.
This last property enforces the minimum cardinality of the child table: 0
 If the minimum cardinality of the child table equals 1, then the foreign key must have a
value.
 If the minimum cardinality of the child table equals 0, then the foreign key may or may
not have a value.

0
There is no easy way to enforce the minimum cardinality of the parent table. Usually some program code
has to be written. That is far beyond the scope of this course.
Data Modeling 2 Top-down Design 19

2.3.3 Domains
In the logical data model for each column the domain must be defined, which is the set of all
possible values. Defining a domain somewhat similar to a data type, such as text, integer, real
number, date, time, currency, etc., but a domain is more general and more detailed at the same
time.
Here are three examples of domains:
 A country code consists of three capital letters and belongs to an official list of
abbreviations such as BEL, NLD, FRA…
 A price is expressed in euro and must be a non-negative number with two decimals.
 The marks on an exam are integers between 0 and 20. But codes like NP (not present)
or E (exemption) are also allowed.
2.4 The physical data model
After designing a logical data model for a RDMS you must select the package to implement this
data model. In this text MS Access is selected.
Then you have to convert the logical data model (relational design model of the data) into a
physical data model (implementation model of the data).
Each selected package has its own way of doing so, but it usually doesn't cause too many
problems.
The physical data model strongly depends on the selected package for the implementation, in
this case Access.
In Access you design the physical data model in the Table Design View (field names, data types, all
field properties, primary key, indexes, table properties, table name; see text about Access) and you
define the relationships and the foreign keys and referential integrity in the Relationships tab
and the Edit relationships dialog box (see text about Access).
When you have a composite primary key in the logical data model in most cases it is better to
replace it with a surrogate key, where Access srimary key automatically assigns values that
cannot be changed. To achieve this Access uses the AutoNumber data type. Only in intersection
tables that implicitly define a many to many relationship, you could keep the composite primary
key that is the combination of the two foreign keys.
2.4.1 Implementing domains in Access
Converting domains from the logival data model to the Physical dat model may be complex. Many
systems, as Access, don't support domains.
In Access a domain is implemented een domein by a data type (e.g. texts, dates, times), often
combined with the field properties field size, format en validation rule.
Bu sometimes the implementation is more complicated or even not possible. Three examples
show this.
The implementation of the domain van country code can be reailized by the data type Short
Text and the field size 3. When you want to limit the valid values further, you may choose one of
the next solutions:
 Create a lookup table containg all valid values and look for the country code in this list
using the Lookup Wizard.
 Design a validation rule like In ("BEL"; "NLD"; "FRA"; …). In practice there are to many
country codes to do so. When you want to limit the country codes to those of the
European Union, you could try this solution, but even then, the list is large.
Data Modeling 2 Top-down Design 20

 Do not apply extra restrictions, maar set the format to > op to convert all country codes
to uppercase. Doing so you do not really implement a domain.
The implementation of domain price can be reailized by the data type Currency, the field size
Euro and the decimal places 2. You may still enter more than two deimals, that won't be
displayed, but they will be stored! 0
The implementation of domain marks is not possible in Access: Access cannot put numbers and
text in one data type. There are two cumbersome approximations:
 Design two columns: marks with data type Number, field size Byte (or Integer) and
validation rule Between 0 And 20, and code with data type Short Text, field size 2 and
validation rule "NP" Or "E" Or Is Null. In queries for statistical purpose a criterion code:
Is Not Nullmust be added.
 Use one single column marks with data type Number, field size Byte (or Integer) and
validation rule Between -2 And 20, where the code -2 means "NP" and -1 means "E".
Now the user must know these codes -1 and -2 and their meaning, and in queries for
statistical purpose a criterion marks: >=0 must be added.
2.4.2 Implementing relationships in Access
In the logical data model, there are two relationship types: one to many and one to one:
Connect the primary key of the parent table (the values of the index properties primary and
unique are both Yes; if the primary key consists of a single column, the value of the field property
indexed is Yes (No Duplicates) ) with the foreign key In the child table.
 In the most common one to many relationship the foreign key may have the same value
many times. Enter No for the index property unique; if the foreign key consists of a single
column, enter Yes (Duplicates OK) or No for the field property indexed. Now Access will
automatically recognize the one to many relationship in the Edit Relationships dialog
box, and when you check Enforce referential integrity, the number 1 will appear next
to the primary key and the symbol ∞ next to the foreign key (Figure 2 -11).
 In the rare one to one relationship the foreign key must have unique values. Enter Yes
for the index property unique; if the foreign key consists of a single column, enter Yes
(No Duplicates) for the field property indexed. Now Access will automatically recognize
the one to one relationship in the Edit Relationships dialog box, and when you check
Enforce referential integrity, the number 1 will appear next to the primary key and
next to the foreign key (Figure 2 -12).

0
Met een macro zou je de ingevoerde waarde effectief kunnen afronden tot op twee decimalen.
Data Modeling 2 Top-down Design 21

Figure 2-11: The 'one to many' relationship 'Prof Teaches Class' in the physical data model of Access
Data Modeling 2 Top-down Design 22

Figure 2-12: The 'one to one' relationship 'Prof has Office' in the physical data model of Access

To create the one to one relationship in the Relationships tab you must drag from the
parent table to he child table (from Prof to Office in Figure 2 -11, top) 0. Doing so is the
only way to tell Access in which table the foreign key resides.
Foreign keys must share their domain with the primary in the parent table. In Access you
implement this this way:
 The foreign key has the same data type and field size as the primary key of its parent table.
The validation rules should also be the same, but the referential integrity automatically
assures this, so you may leave the foreign key's validation rule empty.
There is one exception to this:
 You must always be able to change the value of a foreign key. Hence in Access the data
type of a foreign key may never be AutoNumber. Replace it with the data type Number.
The field size must be the same as the field size as the primary key of its parent table,
usually Long Integer.
You can define the minimum cardinality of the child tabel in a relationship in the foreign key's
field property required:
 If the minimum cardinality is one, the values of the field property required is Yes (Figure
2 -11).
 If the minimum cardinality is zero, the values of the field property required is No (Figure
2 -12).
As most implementations of RDBMS Access is not able to define the minimum cardinality for the
parent table in a relationship.
2.5 Example: Top-down dseign of a database for a university
Ac an example create a top-down design of a database for a university by creating successive
conceptual, logical and physical data models.

0
In a one to many relationship Access recognizes the foreign key by looking at the value No in the index
property unique, but in a one to one relationship both related fields have the same value Yes in the index
property indexed, so Access recognizes the parent and child table by the sense of the dragging operation
(from parent to child).
Data Modeling 2 Top-down Design 23

Note in advance: The model is deliberately limited, and therefore incomplete. It is a simplified
variation of Figure 2 -1 with an extension to illustrate a one to one relationship. The number of
attributes is limited. The methods are omitted.
2.5.1 The conceptual data model for a university
The conceptual data model consists 5 entities and 5 relationships (Figure 2 -13).
The 5 entities are:
 The entity Student contains data about students: studentnr, name, address, date of birth
and phone number.
 The entity Prof contains data about professors: name and email address.
 The entity Course contains data about courses: cours number, course name en credits.
 The entity Class contains data about classes: term, weekday, begin hour, end hour, dates
(list of dates: multivalued attribute), classroom. An entity occurrence is a series of classes
that in a term take place every week at the same weekday and the same time.
 The entity Office contains data about office rooms assigned to professsors.
The 5 relationships are:
1. A Student studies a Course.
2. A Prof teaches a Course.
3. A Course consists of Classes.
4. A Prof teaches Classes.
5. A Prof has an Office.

Figure 2-13: The conceptual data model of a university

The cardinalities of the relationships are:


1. A student may study many courses, but he also may study no course at all (0..* at the
right of the relationship line Student – Course). A course may be studied by many
Data Modeling 2 Top-down Design 24

student, but also a course may be studied by no student at all (0..* at the left of the
relationship line Student – Course). The cardinality of the relationship Student –
Course is many to many, zero to zero (or many to many with optional participation for
both entities).
2. A prof may teach many courses, but he also may teach no course at all (0..* at the top of
the relationship line Prof – Course). A course may be taught by many profs, and must be
taught by at least one prof (1..* at the bottom of the relationship line Prof – Course). The
cardinality of the relationship Prof – Course is many to many, one to zero (or many to
many with optional participation for Prof and mandatory participation for Course).
3. A course may consist of many classes, en must consist of at least one college (1..* at the
right of the relationship line Course – Class). A class belongs to exactly one course (1 at
the left of the relationship line Course – Class). The cardinality of the relationship
Course – Class is one to many, one to one (or one to many with mandatory participation
for both entities).
4. A prof may teach many classes, but he also may teach no class at all (0..* at the top of the
relationship line Prof – Class). A course must be taught by exactly one prof (1 at the left
of the relationship line Prof – Class). The cardinality of the relationship Prof – Class is
one to many, one to zero (or one to many with optional participation for Prof and
mandatory participation for Class).
5. A prof may have at most one office room, but he also may have no office room at all (0..1
at the left of the relationship line Office – Prof). An office room may be used by at most
one prof, but also by no prof at all (0..1 at the right of the relationship line Office –
Docent). The cardinality of the relationship Office – Prof is one to one, zero to zero (or
one to one with optional participation for both entities).
Notes:
 Multivalued attributes are followed by [ ] behind the attribute name. In this data model
there is one multivalued attribute date[ ] belonging the entity Class.
 The class diagram displays most of the information described in the text above, but
sometimes the text may explain or expand the class diagram.
 The relationships and their cardinalities are mostly derived from so called business rules.
They can (and will) differ from university to university. Examples:
o Because each course must consist of at least one class (with a fixed time in the
week), video classes offered exclusively on distance are not included in this
model. Because room is an attribute of the entity Class, all classes on the same
weekday and the same time must always take part in the same room.
o A class can belong to only one course and be taught by only one prof. In real life
courses may share classes, but this model does not allow that. Also, in real life
many profs may teach the same class together, but again this model does not
allow that.
o A prof may or may not have an office room and an office room may belong to only
one prof. A different business rule might state that each prof must have an office
room and that rooms may or must be shared by many profs. That would make
the relationship Office – Prof have a many to one, zero to one cardinality. Again
this model does not allow this.
2.5.2 The logical relational data model of a university
The DBMS type is Relational DBMS.
Data Modeling 2 Top-down Design 25

To convert the conceptual data model into a logical relational data model the next actions must
be executed:
 Create a table for each entity. Define a primary key by using an attribute with unique
values, by combining attributes with unique values, or by adding a surrogate key.
 Single attributes are converted into columns. Split composite attributes in multiple
columns.
 Create an extra child table for each multivalued attribute with a column for the
multivalued attribute from the conceptual data model, dat has become singlevalued and
has a row for every value. This new child table has (probably a surrogate) primary key
and a foreign key with the primary key value of the original parent entity. Now there is a
one to many relationship with the original entity less the multivalued attribute as parent
table and the new table as child table.
 Define one to many relationships by adding a foreign key to the child table (at the many
side).
 Define one to one relationships by adding a foreign key to one of the tables that in this
way becomes the child table (you are free to choose the child table containing the foreign
key, but the best choice usually is the table with the least rows).
 Plit many to many relationship: Add an intersection table with the primary key columns
of both entities participating in the relationship. These columns become foreign keys in a
one to manyl relationship with the two original tables. These original tables are the
parent tables, and the intersection table is the child table. You assign a surrogate key as
primary key fort he intersection table, but sometimes the combination of both foreign
keys suits as a composite primary key.
 Write down for each relationship that all foreign keys must obey the referential integrity
rule: They may only assume values that exist in the primary key of the relationship's
parent table.
The results of the logical design may be represented in two ways:
 As a class diagram, where classes are represented by tables and the attributes by columns,
that must be singlevalued.
 Display a primary key by underlining it or by adding (PK) to its name.
 Display a foreign key by using italics or by adding (FK) to its name.
Figure 2 -14 displays the logical data model of the relational database Universities (the
numbers next to the relationships refer to those in Figure 2 -13, where for the split
relationships an a of b is added to the number):
 The primary keys are underlined and the foreign keys is displayed by italics in the
parent's table text color.
 The composite attributes name and address in the Student table and name in the Prof
table are split in multipe columns.
 The multivalued attribute date is removed from the College table, and moved to a new
Lecture table, which is the child table in a one to many relationship with the College
table.
 The many to many relationship Student Studies Course is replaced by two one to
many relationships where the original Student en Course tables have become parent
tables and the new StudentStudiesCourse intersection table is the child table. The
intersection table has two foreign keys studentnr and coursenr each containg primary
Data Modeling 2 Top-down Design 26

key values from their parent table. The combination of both foreign keys is primary key.
(An alternative is adding a surrogate key)
 The many to many relationship Prof Teaches Course is replaced by two one to many
relationships where the original Prof en Course tables have become parent tables and
the new ProfTeachesCourse intersection table is the child table. The intersection table
has two foreign keys profnr and coursenr each containg primary key values from their
parent table. The combination of both foreign keys is primary key. (An alternative is
adding a surrogate key)
This yields 8 tables and 8 relationships.
 Two new tables are intersection tables, each splitting up a many to many relationships in
two one to many relationships.
 The third new table originate from the multivalued attribute date in the Class entity of
the conceptual data model. The list of dates of the Class entity is moved to the child table
Lecture.
 For the one to one relationship Prof has Office you may choose between adding a
foreign key profnr in the Office table and adding a foreign key room in the Prof table.
Assuming there are more profs without office than offices without prof, the better choice
is adding adding a foreign key profnr in the Office table, which is the smallest table (with
the least rows). De opposite choice is also acceptable.

Figure 2-14: The logical data model of a relational database for a university
Data Modeling 2 Top-down Design 27

A second way to display the logical data model is making a list with table names, field names and
data types, 0 waarbij je per tabel de primaire en vreemde sleutels en hun referentiële
integriteitsregel kunt aangeven (Table 2 -3).
Table 2-3: The logical data model of the relational database 'University' as a list of tables

Student studentnr (N), lastname (T30), firstname (T30), street (T30),


postalcode (T6), municipality (T30), date of birth (D),
phonenr (T15)
PK: studentnr
Prof profnr (N), lastname (T30), firstname (T30), email (T50)
PK: profnr
Course coursenr (N), coursename (T50), credits (N)
PK: coursenr
Class classnr (N), term (N), weekday (T2), begin hour (T), end hour (T),
room (T6), coursenr (N), lectureID (N)
PK: classnr
FK: coursenr; referential integrity: values exist in Course (coursenr)
FK: profnr; referential integrity: values exist in Prof (profnr)
Office room (T6), phonenr (T15), profnr (N)
PK: room
FK: profnr; referential integrity: values exist in Prof (profnr)
Lecture lectureID (S), classnr (N), date (D)
PK: lectureID
FK: collegenr; referential integrity: values exist in Class (classnr)
StudentStudiesCours studentnr (N), coursenr (N)
e PK: (studentnr, coursenr)
FK: studentnr; referential integrity: values exist in Student
(studentnr)
FK: coursenr; referential integrity: values exist in Course (coursenr)
ProfTeachesCourse profnr (N), coursenr (N)
PK: (profnr, coursenr)
FK: profnr; referential integrity: values exist in Prof (profnr)
FK: coursenr; referential integrity: values exist in Course (coursenr)
Note:
 If you want to include a business rule that allows a class to take place in several rooms,
you could move room from the Class table to the Lecture table. 0
2.5.3 The physical data model for a university
The implemantation is done by Access.
 Design each table in the Table Design View. Fill in the field names, data types, field size and
all field properties. The main field properties are format, decimal places, validation rule,
required and indexed.
 Define primary keys

0
The characters between ( ) abbreviate the data types T (text, with the maximum text size), N (number,
with the number of decimals), C (currency, with the number of decimals) D (date), T (time), L (logical
value) and S (surrogate key).
0
If you already would have done this in the conceptual data model, Lecture should already be present at
the entity level with the attribute date and room, and there should have been an additional one to many
relatonship Class – Lecture. In that case there would have been no multivalued attributes at the
conceptual level.
Data Modeling 2 Top-down Design 28

 Verify that primary keys and foreign keys participating in the same relationship have the
same data type and field size. Only if the primary key's data type is AutoNumber the
foreign key's data type must be Number.
 Als define indexes. In general, there must be an index for the primary key (performed
automatically when defining the primary key). As a rule define an index for every
primary and foreign key and also for other important fields you wil often use to perform
a search or sort operation.
 A foreign key in a one to one relationship must have a unique index: Hence assign the
value Yes (No Duplicates) to the field property indexed for the foreign key profnr in the
Office table, or enter Yes at the de index property unique.
 A foreign key in a one to many relationship cannot have a unique index: Hence assign the
value Yes (Duplicates OK) to the field property indexed for these foreign keys, or enter
No at the de index property unique.
 You can look up a foreign key value in the parent table by using the Lookup Wizard. The
corresponding relationship will then be created automatically.

Figure 2-15: Thye physical data model in Access for a database for a university

 Create relationships in the Relationships tab. This is a graphical representation of the


physical data model of an Access database vormt. 0 Arrange the tables to make the layout
easy to understand (Figure 2 -15).
o In a one to many relationship drag a relationship line between the parent's table
primary key and the child's table foreign key. It is not important in which table
you start dragging. Access will recognize the foreign key by thee unique index
property.
o In the one to one relationship Prof - Office you must drag a relationship line
from the parent's table primary key to the child's table foreign key, so from the
Prof table to the Office table. Otherwise, Access won't recognize or misinterpret
the relationship type!

0
Wanneer je ze als gemaakt hebt met de Lookup Wizard, moet je ze enkel verifiëren.
Data Modeling 2 Top-down Design 29

2.6 The data model of a data warehouse


A data warehouse is a very large database with historical data, which is regularly replenished,
and further not altered or removed, but only consulted.
A data warehouse has a star structure 0. There is a central table with facts and a number of
supporting dimension tables. There is a relationship between the central facts table and each
dimension table. This relationship is always a one to many relationship, where the dimension
table is the parent table and the facts table is the child table.
As an example, consider a classic data warehouse with historical sales data, with the facts table
Sales, and four dimension tables Customer (who?), Product (what?), Site (where?) and Time
(when). Each sale involves exactly one customer, one product, one site and one time.
The conceptual data model for this data warehouse is shown in Figure 2 -16 (with only two
attributes for the facts entity). The fact entity Sales is colored red and the dimension entities
Customer, Product, Site and Time are colored green.

Figure 2-16: The conceptual data model for a 'Sales' data warehouse

The logical data model based on a RDBMS for this data warehouse is shown in Figure 2 -17 only
the key columns and two additional attributes for the entity fact are displayed). For all the
relationships are of the one to many type, this logical datam model is easily derived from the
conceptual data model. The primary keys are underlined; the foreign keys are displayed in italics.
All foreign keys reside in the facts table Sales.

0
Textbook Business Information Management – 2nd ed., p. 60-61.
Data Modeling 2 Top-down Design 30

Figure 2-17: The logical data model for a 'Sales' data warehouse

In regular OLTP databases one will try to avoid redundancy as much as possible. The reasons for
this are:
 redundant data require unnecessary additional storage space.
 changes to the data may cause inconsistencies.
This includes the removal of calculated columns.
Changes do not occur in data warehouses (except for additions). On the other hand, the
calculation of the calculated columns requires a lot of computational power from the system. By
allowing redundancy in the dimension tables by means of calculated columns, one can improve
the performance and thus the efficiency. The only drawback is that these calculated columns
require additional storage space.
So, the dimension table Time would contain the date and time to avoid redundancy. From these
two columns, you can use date and time functions to calculate the year, the number of the month
and day, the number of the week in the year, the number of the day in the week, and much more.
However, this is at the expense of performance. It is better that this data as are present as extra
(redundant) columns. Their value is calculated when adding the rows and stored in those
additional columns. Because this data is never changed, there is no risk of inconsistency.
In a regular database, these calculated columns are recalculated by a query each time they are
needed.
2.7 Conclusion
There are two methodologies to design databases:
 A top-down design, where one departs from a more abstract model (entities and
relationships) and more details are added in the further models.
 A bottom-up design, where one departs from existing tables and by means of the
normalization procesc redundant data is eliminated.
Both methods may be combined.
 The conceptual data model consists of entities and relationships. It is independent of the
DBMS type and the DBMD package which will be used to implement it. There are three
relationship types between entities: the one to one relationship, the one to many
relationship and the many to many relationship.
 The logical data model is specific for each DBMS type. In RDBMS tables are used and
relationships are defined by matching values of primary keys and foreign keys. This
logical data model is dependent on the DBMS type but independent of DBMS package
which will be used to implement it. In the logical data model of a RDBMS many to many
relationships are converted into a pair of one to many relationships, where an extra
intersection table is inserted.
 The physical data model is specific for each DBMS package that is used to implement it.
Most DBMS software packages are not able to implement all aspects of the logical data
model. For each package the physical model must be adapted to achive the best possible
implementation of the logical model, that meets performance standards.
Data warehouses use a star scheme with one facts entity and several dimension entities.
 There is a one to many relationship between each dimension entity and the facts entity.
 In a RDBMS the dimension tables are the parent tables and each facts table is the child
table in a one to many relationship. All foreign keys reside in the facts table.
Data Modeling 2 Top-down Design 31

 To improve performance data redundancy is accepted by storing results in calculated


columns, while in regular OTLP RDBMS calculated columns usually are recalculated each
time a query is executed.

You might also like