You are on page 1of 11

INFO2040

Topics
Revision

Distributed Computing Technologies

ER Diagrams
Relational Model
Normalisation

Database Design
Database Design

Slide 1

INFO204 Distributed Computing Technologies. Martin Sutton

Database Design

Slide 2

The Entity-Relationship Model


Entities
some objects are so similar that they form a
class of objects or an entity
types of objects include

Database Design

people such as customer, employee


real things such as car, raw material
business objects such as budget, invoice
events such as appointment, examination
places such as state, branch
imaginary things such as access rights
Slide 3

INFO204 Distributed Computing Technologies. Martin Sutton

INFO204 Distributed Computing Technologies. Martin Sutton

Entities
Entities have properties or characteristics
called attributes
An entity occurrence is a particular instance
of an entity
An entity occurrence will have a set of
attributes which distinguish it from all other
instances of the same entity

Database Design

Slide 4

INFO204 Distributed Computing Technologies. Martin Sutton

Attributes

Street

Attributes

Address

Composite
City

An entity is a conceptual unit


An entity is described by its attributes
Attributes are properties of an entity

State

Simple

DOB

Phone
No

Attributes may be

Null

Pass
port

measures or descriptions of properties possessed by


instances of the entity
simple or atomic values
composite or complex values
null
multi-valued
derived

Multi-valued

Age

Derived

Person
Database Design

Slide 5

INFO204 Distributed Computing Technologies. Martin Sutton

Identifiers

Relationships

Entities are classes of objects and represent sets


of instances of the entity
A requirement of all systems is to be able to
uniquely identify each individual instance of the
set

model the interactions between entities


ABC-123
Mary

Fred

AGF-45H

we cannot track the activities of the instances unless


each one can be identified

An identifier is an attribute or set of attributes


which uniquely specify the particular entity
instance

Fred AGF-45H
Mary MS-587

Database Design

Slide 7

INFO204 Distributed Computing Technologies. Martin Sutton

MS-587

Sue

Database Design

Slide 8

Sue ABC-123
INFO204 Distributed Computing Technologies. Martin Sutton

Entity Relationship Diagrams


Symbols used

Cardinality of Relationships
Person

Person

Person

Person

Relationship

Slide 9

Upper Bound

Lower Bound

Lower Bound

Database Design

Slide 11

Car

Each person drives 1 car


Each car is driven by 1 person

Car

Each person drives many cars


Each car is driven by 1 person

drives

Car

Each person drives 1 car


Each car is driven by many people

drives

Car

Each person drives many cars


Each car is driven by many people

INFO204 Distributed Computing Technologies. Martin Sutton

Participation Constraints

Course

drives

Entity
Person

Database Design

drives

Car

drives

Entity

Upper Bound

1:4

Enrol

0:N

Student

INFO204 Distributed Computing Technologies. Martin Sutton

The Relational Model


introduced by Codd in 1971
three major advantages
good communication tools between designers
and users, easily understood by both user and
computer professionals
easy to meet essential database criteria
through normalisation of relations
readily implemented directly on computers
using a relational DBMS such as Access
Database Design

Slide 12

INFO204 Distributed Computing Technologies. Martin Sutton

Relations

Relations

a relation consists of tuples and attributes


we think of a table made up of horizontal
rows and vertical columns
equivalent terms
relation
tuple
attribute

table
row
column

tuples or rows hold data about different


characteristics of a single entity
attributes or columns hold data about the
same characteristic of many different
entities
each attribute has a domain which defines
the type and the values which an attribute is
allowed to take
a domain is a set of values of the same type

Database Design

Slide 13

INFO204 Distributed Computing Technologies. Martin Sutton

Relation Name

Relation
Key
Tuples

Slide 14

INFO204 Distributed Computing Technologies. Martin Sutton

Attributes

Properties of Relations

Parts
PartNo Name Colour Weight Length
P1

Snorg

black

15

P7

Flimpet brown

36

124

P3

Gleftor

11

grey

Domains
PartNo
Name
Colour
Weight
Length

Database Design

alpha
string
( black, brown, grey )
numeric
numeric

there are no duplicate tuples


every row is unique
tuples are unordered
the order of row storage is not significant
each attribute has a unique name in the relation
column names are unique within a relation different
relations may have the same column names
attributes are unordered
the order of column storage is not significant
attribute values are atomic
the values stored in a column can not be divided up into
smaller values
Database Design

Slide 16

INFO204 Distributed Computing Technologies. Martin Sutton

Relation Keys

How Many Relations?

a relation key is a set of attributes whose values


can be used to select an individual tuple of a
relation
a relation key may consist of a single attribute if
that is sufficient to distinguish each tuple from
every other tuple
since each tuple is unique, we can always find a
relation key, even if it means combining all the
attributes as a composite relation key

it is possible to store all the information in a


single relation called the universal relation
in practice this would lead to redundancy
and the introduction of many Null values,
both of which are considered undesirable
most relational databases consist of a
number of relations

Database Design

Database Design

Slide 17

INFO204 Distributed Computing Technologies. Martin Sutton

Slide 18

Facts

Facts

a fact exists when the value of one attribute


determines one and one only value of another
attribute
PersonID DOB so DOB is a fact derived from
PersonID
but
DOB PersonID so PersonID is not a fact derived
from DOB
PersonID ParentID is a multi-valued fact
Database Design

Slide 19

INFO204 Distributed Computing Technologies. Martin Sutton

INFO204 Distributed Computing Technologies. Martin Sutton

facts may be basic or derived


if a relation has attributes StartDate and FinishDate as well
as DaysTaken then
StartDate and FinishDate are basic facts
DaysTaken is a derived fact

a relation should only store basic, single


valued facts
Database Design

Slide 20

INFO204 Distributed Computing Technologies. Martin Sutton

Facts

Full Functional Dependency

a fact exists when the value of one attribute


determines one and one only value of another
attribute
PersonID DOB so DOB is a fact derived from
PersonID
but
DOB PersonID so PersonID is not a fact derived
from DOB

a relation should only store basic, single


valued facts
Database Design

Slide 21

INFO204 Distributed Computing Technologies. Martin Sutton

Dependency Diagrams

PersonID DOB

Slide 23

Database Design

Slide 22

INFO204 Distributed Computing Technologies. Martin Sutton

previously we have described relations


using the following template or plan
RelationshipName = ({list of attributes}, {list of FDs})

but now we have another, more visual


description

determinant attribute
Determinant Attribute

Database Design

PersonID DOB
but can also see that
PersonID, Name DOB
the first example is a Full Functional Dependency
since the DOB is fully dependent on the PersonID
the second example is not a full functional
dependency since we do not need to know Name
in order to determine the DOB

Describing Relations (Tables)

the single valued fact that a person has a single


date of birth can be represented as the functional
dependency statement

Determnt Atrtribute

we know that

RelationshipName = (list of attributes)


+ (dependency diagram)
PersonID

DOB

INFO204 Distributed Computing Technologies. Martin Sutton

Database Design

Slide 24

INFO204 Distributed Computing Technologies. Martin Sutton

Normal Forms

Redundancies and Anomalies

a relation should only store facts about


relation keys

relations are converted to 1NF in order to


conform to the relational model and
reduce redundancies

all non-prime attributes (attributes which


are not a part of the relation key) should
be fully functionally dependent upon the
entire relation key

(storage of the same fact more than once)


reduce anomalies
(loss of data when deleting, the need to update
the same fact more than once, inability to add
data in some circumstances)

do redundancies and anomalies still exist in


1NF?
Database Design

Slide 25

INFO204 Distributed Computing Technologies. Martin Sutton

First Normal Form

Database Design

Slide 27

Com pany
XYZ
XYZ
XYZ
Acm e
Acm e
Acm e
A lp h a
A lp h a
A lp h a

It e m
G r o o z le
C h im e r a
F lu d g e r
F lu d g e r
G r o o z le
S c rim p o t
G r o o z le
S c rim p o t
C h im e r a

Slide 26

INFO204 Distributed Computing Technologies. Martin Sutton

Redundancies and Anomalies

identify any redundancies and anomalies in this


1NF relation
O r d e r ID
O 21
O 21
O 21
O3
O3
O3
O7
O7
O7

Database Design

Q ty
5
3
7
78
4
15
8
1
1

INFO204 Distributed Computing Technologies. Martin Sutton

company name is an example of


redundancy
anomalies occur when
deleting last OrderID for a Company also
deletes the Company name
updating the Company name requires multiple
updates
difficult to add a Company name until the
company has placed an order
Database Design

Slide 28

INFO204 Distributed Computing Technologies. Martin Sutton

Second Normal Form

Second Normal Form

a relation is in second normal form (2NF)


when
it is in 1NF
all of the non prime attributes (those attributes
not part of the relation key) must be FFD (fully
functionally dependent) on the entire relation
key
or the relation contains no partial dependencies
(no attribute is dependent on only a part of the
relation key (primary key))
Database Design

Slide 29

INFO204 Distributed Computing Technologies. Martin Sutton

convert to second normal form (2NF)


O r d e r ID
O 21
O 21
O 21
O3
O3
O3
O7
O7
O7
Database Design

First Normal Form


O r d e r ID
O 2 1
O 2 1
O 2 1
O 3
O 3
O 3
O 7
O 7
O 7

OrderID

Database Design

C o m p a n y
X Y Z
X Y Z
X Y Z
A c m e
A c m e
A c m e
A lp h a
A lp h a
A lp h a

Company

Slide 31

Ite m
G r o o z le
C h im e r a
F lu d g e r
F lu d g e r
G r o o z le
S c r im p o t
G r o o z le
S c r im p o t
C h im e r a

Item

Q ty
5
3
7
7 8
4
1 5
8
1
1

Qty

Com pany
XYZ
XYZ
XYZ
Acm e
Acm e
Acm e
A lp h a
A lp h a
A lp h a

Ite m
G r o o z le
C h im e r a
F lu d g e r
F lu d g e r
G r o o z le
S c r im p o t
G r o o z le
S c r im p o t
C h im e r a

Slide 30

Q ty
5
3
7
78
4
15
8
1
1

D is c
0
0
0
20
0
10
0
0
0

INFO204 Distributed Computing Technologies. Martin Sutton

Second Normal Form


D is c
0
0
0
2 0
0
1 0
0
0
0

O rderID
O 21
O3
O7

OrderID

C om pany
X YZ
A cm e
A lpha

Company

Disc

INFO204 Distributed Computing Technologies. Martin Sutton

OrderID

Database Design

Slide 32

O r d e r ID
O 2 1
O 2 1
O 2 1
O 3
O 3
O 3
O 7
O 7
O 7

Item

Ite m
G r o o z le
C h im e r a
F lu d g e r
F lu d g e r
G r o o z le
S c r im p o t
G r o o z le
S c r im p o t
C h im e r a

Qty

Q ty
5
3
7
7 8
4
1 5
8
1
1

D is c
0
0
0
2 0
0
1 0
0
0
0

Disc

INFO204 Distributed Computing Technologies. Martin Sutton

Second Normal Form

First Normal Form

convert to second normal form (2NF)


ProjID
P1
P1
P1
P2
P2
P3
P3
P3
P4

Database Design

P N am e
R oller
R oller
R oller
W histle
W histle
B igBoy
B igBoy
B igBoy
W olf

E m pID
E101
E102
E104
E103
E101
E103
E104
E102
E101

EN am e
M ary
John
Sue
B ill
M ary
B ill
Sue
John
M ary

Slide 33

JobD esc
C om p E ng
P rogram m er
S ec Leader
Technician
C om p E ng
Technician
S ec Leader
P rogram m er
C om p E ng

R ate$
65
60
75
55
65
55
75
60
65

H ours
5
13
7
78
4
15
8
10
14

INFO204 Distributed Computing Technologies. Martin Sutton

Second Normal Form

P r o jID
P1
P1
P1
P2
P2
P3
P3
P3
P4

PNam e
R o lle r
R o lle r
R o lle r
W h is t le
W h is t le
B ig B o y
B ig B o y
B ig B o y
W o lf

ProjID

E m p ID
E101
E102
E104
E103
E101
E103
E104
E102
E101

EN am e
M a ry
John
Sue
B ill
M a ry
B ill
Sue
John
M a ry

JobD esc
C om p Eng
P ro g ra m m e r
Sec Leader
T e c h n ic ia n
C om p Eng
T e c h n ic ia n
Sec Leader
P ro g ra m m e r
C om p Eng

PName EmpID EName JobDesc

Database Design

Slide 34

R a te $
65
60
75
55
65
55
75
60
65

Rate$

H o u rs
5
13
7
78
4
15
8
10
14

Hours

INFO204 Distributed Computing Technologies. Martin Sutton

Redundancies and Anomalies


relations are converted to 2NF in order to

ProjID

reduce redundancies

PName

(storage of the same fact more than once)

reduce anomalies
EmpID EName JobDesc

Rate$
Transitive
Dependency

ProjID
Database Design

Slide 35

EmpID

(loss of data when deleting,


the need to update the same fact more than once,
inability to add data in some circumstances)

do redundancies and anomalies still exist in


2NF?

Hours
INFO204 Distributed Computing Technologies. Martin Sutton

Database Design

Slide 36

INFO204 Distributed Computing Technologies. Martin Sutton

Second Normal Form


identify any redundancies and anomalies in this
2NF relation

EmpID EName JobDesc

Rate$
Transitive
Dependency

Database Design

Slide 37

INFO204 Distributed Computing Technologies. Martin Sutton

Redundancies and Anomalies


Rate$ is an example of redundancy
anomalies occur when
deleting last instance of a JobDesc also deletes
the Rate$ for that job
updating the Rate$ of a JobDesc requires
multiple updates
difficult to add a JobDesc and Rate$ until an
employee has that job
Database Design

Third Normal Form


a relation is in third normal form (3NF) when
it is in 2NF
it contains no transitive dependencies
a transitive dependency is a FFD on an attribute (or
attribute set) when the attribute (or attribute set) is not a
part of the relation key (primary key)

Slide 38

Second Normal Form


O rderID
O 21
O3
O7

OrderID

C om pany
X YZ
A cm e
A lpha

Company

a transitive dependency is a FFD on a non-prime


attribute (or non-prime attribute set)
Database Design

Slide 39

INFO204 Distributed Computing Technologies. Martin Sutton

INFO204 Distributed Computing Technologies. Martin Sutton

OrderID

Database Design

Slide 40

O r d e r ID
O 2 1
O 2 1
O 2 1
O 3
O 3
O 3
O 7
O 7
O 7

Item

Ite m
G r o o z le
C h im e r a
F lu d g e r
F lu d g e r
G r o o z le
S c r im p o t
G r o o z le
S c r im p o t
C h im e r a

Qty

Q ty
5
3
7
7 8
4
1 5
8
1
1

D is c
0
0
0
2 0
0
1 0
0
0
0

Disc

INFO204 Distributed Computing Technologies. Martin Sutton

Third Normal Form


OrderID

OrderID

Second Normal Form


ProjID

Company

Item

PName

EmpID EName JobDesc

Qty

Rate$
Transitive
Dependency

Qty
Database Design

Disc

Slide 41

ProjID
INFO204 Distributed Computing Technologies. Martin Sutton

Third Normal Form


ProjID

PName

EmpID EName JobDesc

ProjID
Database Design

Slide 43

EmpID

JobDesc

Rate$

Hours
INFO204 Distributed Computing Technologies. Martin Sutton

Database Design

Slide 42

EmpID

Hours
INFO204 Distributed Computing Technologies. Martin Sutton

You might also like