You are on page 1of 61

Fundamentals of Database Systems Semester 1, 2017

Fundamentals of
Database Systems
COMPSCI/SOFTENG 351
COMPSCI 751

Instructors: Gill Dobbie, Miika Hannula, Sebastian Link, Gerald Weber

Department of Computer Science, The University of Auckland

1
Fundamentals of Database Systems Semester 1, 2017

Conceptual Database Design:


The Entity-Relationship Model

2
Database Design

I Abundance of data required to meet organization’s information needs


I Information systems collect, manipulate and disseminate data
I This data is stored in databases

,→ Which data should be kept in the database?


,→ How shall the data be accessed?
I Answers indicate the target database design which organizes data
,→ For users to store, update and query effectively and efficiently
I Database design aims at achieving this goal
,→ complex for databases underlying huge information systems
I In practice, the design process is typically divided in four phases:
,→ Requirements analysis
,→ Conceptual design
,→ Logical design
,→ Physical design

3
Iterative Database Design

Requirements analysis

Conceptual Design

Logical Design

Physical Design

4
Conceptual Database Design

I The target of the database refers to all real-world objects


,→ essential to meet organization’s information needs, and
,→ whose data should thus be stored in the database
I Aim of conceptual design:
,→ provide an abstract description of the target of the database
,→ in terms of high-level concepts which tells us how
,→ the data should be structured in the database
I Input are information requirements of the users
I Output is a database schema
,→ consolidation of all user requirements, but
,→ does not yet contain any layout considerations
,→ in terms of relational tables, nor implementation details
,→ in terms of physical storage structures
I Use of some conceptual data model which provides
,→ the language for describing the database schema
I The most widely used of such conceptual data models is the
Entity-Relationship Model (ER model, ERM)

5
View of the World

Example: Welcome Harry. Harry is just about to open his own DVD rental
shop. He got everything: potential clients, DVDs to rent, ... well, it might be wise
to keep track of clients renting DVDs. Doing this by hand is a burdensome job, in
particular as Harry cannot afford a secretary.
So Harry needs a database which stores all the data necessary to provide the desired
information.
What about the target of the database to be designed? It contains Harry’s DVDs,
his clients and, of course, all the rentals.
I ERM views target of the database to consist of entities and relationships:
I Entities are basic objects in the target of the database

I Relationships are, e.g., connections between or objects derived from entities

I Example: Clients and DVDs are basic objects, that is, entities. Rentals are con-
nections between DVDs and clients, that is, relationships.
I This simple view of the world is the major reason for the success of the data model

6
Modeling Entities

I Once we identify Client as an abstract concept to model certain objects (clients)


in the target of the database, we have to decide how clients should be represented
in the database, that is, which data on clients should be stored
I This decision depends on the particular information needs
Example: Contact details like Name, Address, Phone, or other properties like Age
I Properties of entities like these ones are generally called attributes
Example: Let us choose the attributes Name, Birthday, Address and Phone to
be stored for clients. Let Johnny be Harry’s first client. We are interested in his
name (say John Fox), his birthday (08/08/1980), his address (say 88 Main Street)
and his Phone (say 3508888)
I In the same manner we may handle every other client
I Name, Birthday, Address and Phone are common attributes of all clients that may
be used to describe them, that is, we consider the abstract concept Client to
comprise the attributes Name, Birthday, Address and Phone

7
Entities, Attributes, and Keys

I As in the RDM, each attribute has a domain (its set of possible values)
I Throughout, let D = {Di }i∈I be a fixed family of sets, and call each Di a domain

I Attributes must allow us to distinguish between different clients


Example: Of course there might be a second client with name John Fox, that is, a
particular attribute may assume the same value for two clients. Hence, Name itself
does not provide sufficient information to distinguish between clients. But Name
and Birthday should be fine: We suppose that no two of Harry’s clients with the
same name were born on the same day.
I The attributes Name and Birthday form a key that allows us to uniquely identify
a particular client
I We hasten to point out that the previous assumption might be too strong in practice.
Then we have to add further attributes to the key. As entities are objects it is
reasonable to assume that they may be identified by their attributes, and it is part
of the conceptual design task to find these attributes

8
Entity Types

I Our observations so far motivate the definition:


An entity type E consists of
I a finite, non-empty set attr(E) = {A1, . . . , Am} of attributes,
I a domain assignment dom : attr(E) → D which assigns each attribute
A ∈ attr(E) its domain dom(A),
I and a non-empty subset id(E) ⊆ attr(E) of attributes, called the key.

For short, we write E = (attr(E), id(E)). The attributes in the key id(E) are
called the key attributes of E.

I How does this definition apply to our example?

Example: We are going to specify an entity type Client.


I Its attribute set is attr(Client) = {Name, Birthday, Address, Phone},
I its domain assignment is given by dom(Name) = string, dom(Birthday) =
date, dom(Address) = string, dom(Phone) = number,
I and its key is id(Client) = {Name, Birthday}.

9
Entities

I Client is an abstract concept that stands for all the real-world clients like Johnny

I The data stored on a real-world object is often written as a tuple like


,→ (Name: John Fox,Birthday: 08/08/1980,Address: 88 Main Street,Phone: 3508888)
,→ or, even shorter, (John Fox, 08/08/1980, 88 Main Street, 3508888)

I Remark: The first tuple notation above does not need a fixed order of the at-
tributes, since attributes have unique names. The second tuple notation, however,
requires a fixed order of the attributes. Thus before we can drop the attribute names
we must fix such an order of the attributes.
Example: Some more clients, namely JohnnyTwo, Lizzie and Debbie. Using the
order of attributes fixed before, these entities give rise to the following tuples:
,→ (John Fox, 06/06/1966, 66 Victoria Street, 3506060)
,→ (Lisa Hunter, 12/12/1982, 22 Te Awe Awe Street, 3502222)
,→ (Debra Gunner, 08/08/1988, 21 Park Road, 3501111)

10
Entities and Entity Sets

I Again, every client may be seen as a mapping that assigns each attribute a particular
value from the attribute domain
I This approach leads to our next definition

I For the sake of simplicity, we introduce the universal domain D = i∈I Di
which is just the union of all domains under consideration

1) An entity of type E is a mapping e : attr(E) → D with e(A) ∈ dom(A) for


all attributes A ∈ attr(E).
2) An entity set of type E is a finite set E t of entities of type E with unique
key values, that is, for all e1, e2 ∈ E t with e1(A) = e2(A) for all key attributes
A ∈ id(E) we must have e1 = e2.

I A finite set E t of entities is an entity set if two entities e1, e2 that share the same
values on all key attributes (that is, e1(A) = e2(A) for all A ∈ id(E)) are indeed
equal
I This property is called the unique-key-value property

11
The Unique-Key-Value Property

I Clearly, the clients Johnny, JohnnyTwo, Lizzie and Debbie are entities of type
Client. Together they form an entity set of type Client:
Client
Name Birthday Address Phone
John Fox 08/08/1980 88 Main Street 3508888
John Fox 06/06/1966 66 Victoria Street 3506060
Lisa Hunter 12/12/1982 22 Te Awe Awe Street 3502222
Debra Gunner 08/08/1988 21 Park Road 3501111

I The unique-key-value property of an entity set allows us to distinguish different


entities. No two entities in an entity set coincide in all the key attributes:
I while Johnny and JohnnyTwo have the same name, their birthdays are different

I This property allows us to identify a particular client.


I Suppose we know a client’s name and birthday (say Lisa Hunter, 12/12/1982),
then we may infer her address and phone (22 Te Awe Awe Street, 3502222) from
the entity set
I This property allows us to see an entity set as a relation.

12
Another Example

In addition to clients, the target of the database for Harry’s DVD rental shop
contains DVDs. So DVD is a further essential concept.
We are going to model this as an entity type, too. Again we have to decide which attributes are
useful and required, we have to specify their domains, and we have to select a key. Say we decide to
use
I the attributes attr(DVD) = {Title, Director, Year},
I the domain assignment is given by dom(Title) = string, dom(Director) = string,
dom(Year) = number,
I and we choose the key id(DVD) = {Title}

For simplicity, we use the following slightly condensed attribute specification:


attr(DVD) = {Title: string, Director: string, Year: number}

or, if the attribute domains are known, even attr(DVD) = {Title, Director, Year}
This gives us a surprisingly short specification of the entity type DVD in form of a
pair E = (attr(E), id(E)):
DVD = ({Title, Director, Year}, {Title})

13
Visualization of Entity Types

I Practice shows that the graphical representation has significant advantages over
textual specification for the purposes of communication between systems analysts,
database designers and potential users of the database
I Entity types are often visualized by rectangles.
I If desired, the attributes can be attached to the rectangle, and those attributes
forming the key are underlined.
Title–
Director– DVD
Year–
I Example: To visualize the entity type
Client = ({Name, Birthday, Address, Phone}, {Name, Birthday})
we use –Name
–Birthday
Client
–Address
–Phone
I Remark: Please recall that this convention implicitly supposes that we know about
the domains of the attributes

14
Modeling Relationships

I So far we found the abstract concepts Client and DVD. In both cases we decided
to specify them as entity types. What about rentals?

I Clearly, they are also in the target of the database and, thus, Rental is a further
abstract concept that deserves to be modeled.
I However, rentals can hardly be seen as basic objects. Roughly speaking, their
existence depends on some client hiring a DVD.
I Hence, rentals are rather relationships between entities than entities themselves:
each rental connects a client and a DVD.

I The abstract concept Rental has two components Client and DVD.
I In addition to the components, we might want to store further data on rentals in
the database.
Example: We choose additional attributes RentalDay and DueDay, both with
domain date.

15
Relationships, Attributes, and Keys

I These attributes are common attributes of all rentals that may be used to describe
them, that is, we consider the abstract concept Rental to comprise the components
Client and DVD and the attributes RentalDay and DueDay.
Example: Suppose Johnny rents the BlueDVD. Then we are interested in the
rental day (say 03/10/2011) and the due day (say 03/11/2011).
I As before, a key will be used to distinguish different rentals. However, a key of a
relationship type may also contain components.
Example: Johnny might rent more than one DVDs, and he might even rent the
BlueDVD again later on. Hence, neither Client and RentalDay, nor Client and
DVD provide sufficient information to distinguish between rentals. But DVD and
RentalDay should be fine.
I The component DVD together with the attribute RentalDay form a key that allows
us to uniquely identify a particular rental.
I Of course, for relationship types, a key may also contain components.

16
Relationship Types

A relationship type R consists of


I a finite, non-empty set comp(R) of components,
I a finite set attr(R) = {A1 , . . . , Am } of attributes,
I a domain assignment dom : attr(R) → D which assigns each attribute
A ∈ attr(R) its domain dom(A),
I and a non-empty subset id(R) ⊆ comp(R) ∪ attr(R), called the key.

For short, we write R = (comp(R), attr(R), id(R)).


I So long, components means entity types as defined before. Later on, we shall discuss
more option for components.
I A relationship type with n components is said to be n-ary. In particular, 1-ary, 2-ary
and 3-ary relationship types are called unary, binary and ternary.
Example: Rental is a binary relationship type.
I The components in id(R) are said to be key components, while the attributes in
id(R) are called key attributes of R.
I Throughout, entity types and relationship types are jointly called object types

17
An Example

We specify a relationship type Rental.


I Its components are comp(Rental) = {Client, DVD},
I its attributes are attr(Rental) = {RentalDay, DueDay},
I its domain assignment is dom(RentalDay) = date, dom(DueDay) = date,
I and its key is id(Rental) = {DVD, RentalDay}

Again we prefer to couple the domain assignment with the attributes by writing:
attr(Rental) = {RentalDay: date, DueDay: date}
or even to omit the domain assignment whenever we know about the domains:
attr(Rental) = {RentalDay, DueDay}
This gives again rise to a shorter specification of the relationship type Rental in
form of a triple R = (comp(R), attr(R), id(R)):
Rental = ({Client, DVD}, {RentalDay, DueDay}, {DVD, RentalDay})

18
Visualization of Relationship Types

I A relationship type is visualized by a diamond.

I It is linked by edges to the rectangles representing its components. For key com-
ponents, the corresponding edge is marked by a dot.

I If desired, the attributes are simply attached to the diamond. Key attributes are
underlined similar to key attributes of entity types.
—DueDay
Rental —RentalDay

R
–Name Title–
–Birthday
Client
–Address Director– DVD
–Phone Year–

19
Relationships

I Rental is an abstract concept that stands for all the real-world rentals.
I As before, each real-world rental is written in form of a tuple like
,→ (Client: Johnny, DVD: BlueDVD, RentalDay: 03/10/2011, DueDay: 03/11/2011)
,→ or, even shorter, (Johnny, BlueDVD, 03/10/2011, 03/11/2011)
I Each rental may be seen as a mapping that assigns each component a particular
entity from the entity set and each attribute a particular value from the attribute
domain.
I Let ent(E) denote the set of all entities of an entity type E.
I ent(Client) consists of the clients Johnny, JohnnyTwo, Lizzie, and Debbie.
I ent(DVD) consists of the DVDs BlueDVD, WhiteDVD, and RedDVD.

I Then each rental is a mapping


r : {Client, DVD, RentalDay, DueDay} → ent(Client)∪ent(DVD)∪date,
with a client r(Client) from ent(Client), a DVD r(DVD) from ent(DVD), a
rental date r(RentalDay) from date, and a due date r(DueDay) from date.

20
Relationships, and Relationship Sets

I This approach leads to the following definition:


1) A relationship of type R is a mapping

r : comp(R) ∪ attr(R) → ent(E) ∪ D
E∈comp(R)

with r(E) ∈ ent(E) for all components E ∈ comp(R) and r(A) ∈ dom(A) for
all attributes A ∈ attr(R).
2) A relationship set is a finite set Rt of relationships of type R with unique key
values, that is, for all r1, r2 ∈ Rt with r1(X) = r2(X) for all key components
and key attributes X ∈ id(R) we must have r1 = r2.
I Entities and relationships are jointly called objects
I A finite set Rt of relationships of type R is a relationship set if two relationship r1, r2
in Rt that share the same values on all key components and key attributes (that is,
r1(X) = r2(X) for all X ∈ id(R)) are actually equal
I This is again the unique-key-value property

21
The Unique-Key-Value Property

Example: A relationship set of type Rental:

Rental
Client DVD RentalDay DueDay
Johnny BlueDVD 03/10/2011 03/11/2011
Johnny BlueDVD 05/01/2012 05/02/2012
Johnny WhiteDVD 03/10/2011 03/11/2011
JohnnyTwo RedDVD 07/10/2011 07/11/2011
Debbie WhiteDVD 15/11/2011 15/12/2011
Lizzie BlueDVD 11/11/2011 11/12/2011

I The unique-key-value property allows us to distinguish them:


I While Johnny rents the BlueDVD twice, the two rentals may be distinguished
by their rental days

I Further, it allows us to identify a particular rental:


I Suppose we know the DVD and the day of its rental (say WhiteDVD,
15/11/2011), then we may infer the client (Debbie) from the relationship set

22
Components with Roles

I Example: Harry also rents his DVDs to teenagers like Debbie and her friends. Now
Harry is wondering whether their parents are clients, too. This information might
be helpful in running the shop.
I This can be modeled by a relationship type Descendent whose components are
both of type Client.
I To avoid confusion, roles are associated with the different components, such as
Child and Parent. This gives rise to the relationship type Descendent
I with components
comp(Descendent) = {Child:Client, Parent:Client},
I without additional attributes (and thus without additional domain assignment)
attr(Descendent) = ∅,
I and with key id(Descendent) = {Child:Client, Parent:Client}

I Again we may apply the usual convention and write Descendent =


({Child:Client, Parent:Client}, ∅, {Child:Client, Parent:Client})

23
Visualizing Roles

I When we visualize a relationship type Descendent we simply draw two edges


to the entity type Client, but label them by the corresponding roles Child and
Parent

Descendent

Child Parent
? ?
–Name
–Birthday
Client
–Address
–Phone

I Apart from the usage of roles the example above we may observe two further details:
I First, there is no need for relationship types to possess attributes. Attributes are
intended to capture properties which are useful in meeting the information needs
or to ensure that each relationship in a relationship set can be uniquely identified.
But this does not imply that attributes are compulsory for relationship types.

24
Observations

Example: A relationship set of type Descendent:

Descendent
Child:Client Parent:Client
Debbie Mary
Debbie Bob
Julie Mary
Julie Bob

I Second, the key may include all components or attributes of a relationship type.
While this is an extreme situation, it may well occur in practice.

I In most cases, however, it occurs that we do not need all components and attributes
in order to identify objects (entities or relationships) of some type uniquely. This
is because we are often interested in several properties that are useful to know but
irrelevant for the identification of the object itself.

25
Entity-Relationship Schemata

I Entities and relationships are objects in the target of the database. Consequently,
they should be stored in the database. By classifying these objects we found abstract
concepts like Client, DVD or Rental.
I These abstract concepts were modeled by entity types or relationship types. Together
they form the database schema for the database to be designed.
I Naturally, whenever we use a relationship type then all the entity types forming its
components should be in the database schema, too.
An Entity-Relationship schema (or ER schema, for short) is a finite set S
of entity types and relationship types such that for each relationship type R in S
and each of its components E or p : E in comp(R), we have that the entity type
E belongs to S as well.
I Every object type should have a unique name in the ER schema. Attributes, on the
other hand, need not to be unique.
I In fact, an attribute like Title or Name is likely to be an interesting property for
various object types.

26
Entity-Relationship Diagrams

I We may also visualize the overall ER schema.


I This graphical representation is called the diagram of the database schema, and
known to provide a more intuitive form to illustrate the abstract concepts under
discussion.
I Basically, we draw a graph that uses rectangles to represent entity types and
diamonds to represent relationship types. Moreover, we draw an edge from a
diamond to a rectangle whenever the corresponding relationship type involves the
corresponding entity type as a component.

The Entity-Relationship diagram (or ER diagram, for short) of an ER


schema S is a directed graph with the elements of S as nodes, and with edges from
a node R to a node E for all components E ∈ comp(R), and edges from node R
to node E labeled with p for all components p : E ∈ comp(R).

I ER schema and ER diagram are just two different ways of presenting essentially the
same information.

27
An Example

The ER schema for Harry’s shop consists of the entity types Client and DVD,
and of the relationship type Rental.
—DueDay
Rental —RentalDay

R
–Name Title–
–Birthday
Client
–Address Director– DVD
–Phone Year–

I So far we always attached the attributes to the entity or relationship types. This
increases the level of detail, but might decrease the readability of the diagram. For
that reason, we sometimes omit the attributes in the diagram.
I Finally, key components of a relationship type are marked by dots on the edges to
the corresponding entity types. Key attributes of object types are underlined, as
long as we decided to attach attributes to the rectangles and diamonds.

28
Database Instances

I Each of the object types in an ER schema stands for a set of objects in the target of
the database. Hence, the database will contain an object set for each of the object
types.

I Clearly, these sets may change over time: Hopefully, Harry’s DVD rental shop will
acquire new clients over time. Similarly, Harry might buy new DVDs or replace
some of the old ones.

I Each state of the database is called a database instance.

An instance I of an ER schema S assigns each entity type E in S an entity


set I(E), and each relationship type R in S a relationship set I(R) such that for
each relationship type R in S, for each of its components E or p : E, and for each
relationship r ∈ I(R) we have that the entity r(E) or r(p : E) belongs to I(E).

I The latter condition ensures consistency of the database instance: an entity may
only occur in a relationship if it belongs to the relevant entity set.

29
An Example

Client
Name Birthday Address Phone
John Fox 08/08/1980 88 Main Street 3508888
John Fox 06/06/1966 66 Victoria Street 3506060
Lisa Hunter 12/12/1982 22 Te Awe Awe Street 3502222
Debra Gunner 08/08/1988 21 Park Road 3501111

DVD
Title Director Year
Blue Velvet David Lynch 1986
White Oleander Peter Kosminsky 2003
The Hunt for Red October John McTiernan 1990

Rental
Client DVD RentalDay DueDay
Johnny BlueDVD 03/10/2011 03/11/2011
Johnny BlueDVD 05/01/2012 05/02/2012
Johnny WhiteDVD 03/10/2011 03/11/2011
JohnnyTwo RedDVD 07/10/2011 07/11/2011
Debbie WhiteDVD 15/11/2011 15/12/2011
Lizzie BlueDVD 11/11/2011 11/12/2011

30
Semantics of Relationships

I In the examples so far we were somewhat cheating. We used shortcuts like Johnny
or BlueDVD. These shortcuts do only exist in our imagination . . .
I . . . but in the database they are not available. To be precise the rental
(Johnny, BlueDVD, 03/10/2011, 03/11/2011)
should read as
((John Fox, 08/08/1980, 88 Main Street, 3508888), (Blue Velvet, David Lynch, 1986), 03/10/2011, 03/11/2011)

I In fact, it is not necessary to include all the attributes of the client into the rental:
I It suffices to include the key attributes
I By the unique-key-value property, we can identify the corresponding object

I Hence, we may condense the tuple above to


((John Fox, 08/08/1980), (Blue Velvet), 03/10/2011, 03/11/2011)
I For a relationship type, the keys of its components are called foreign keys
I Both approaches provide a way to define relationships:
I common to both options is the idea to take entities as components of relationships

31
Set Semantics vs. Foreign Key Semantics

I Different is the amount of attributes relationships ‘borrow’ from their components:


I the first option is set semantics: it uses the entire attribute set
I the second option is foreign key semantics: it uses only the key attributes

I Note: In the definitions above we used set semantics. Foreign key semantics leads to
slightly modified definitions of relationships, relationship sets and database instances

I The major difference to the original definition is that we no longer use the entire
entity e, but only its restriction e||id(E) to the key attributes of the type E

I Due to the unique-key-value property, a database instance in foreign key semantics


comprises the same information as a database instance in set semantics

I In this sense, foreign key semantics reflects the fact that we need only the values on
key attributes to uniquely identify an entity in an entity set

32
An Example
The set of rentals in set semantics and in foreign key semantics:
Rental
Client DVD RentalDay DueDay
(John Fox, 08/08/1980, (Blue Velvet, 03/10/2011 03/11/2011
88 Main Street, 3508888) David Lynch,1986)
(John Fox, 08/08/1980, (Blue Velvet, 05/01/2012 05/02/2012
88 Main Street, 3508888) David Lynch,1986)
(John Fox, 08/08/1980, (White Oleander, 03/10/2011 03/11/2011
88 Main Street, 3508888) Peter Kosminsky,2003)
(John Fox, 06/06/1966, (The Hunt for Red October, 07/10/2011 07/11/2011
66 Victoria Street, 3506060) John McTiernan,1990)
(Debra Gunner, 08/08/1988, (White Oleander, 15/11/2011 15/12/2011
21 Park Road, 3501111) Peter Kosminsky,2003)
(Lisa Hunter, 12/12/1982, (Blue Velvet, 11/11/2011 11/12/2011
22 Te Awe Awe Street, 3502222) David Lynch,1986)

Rental
Client DVD RentalDay DueDay
(John Fox,08/08/1980) (Blue Velvet) 03/10/2011 03/11/2011
(John Fox,08/08/1980) (Blue Velvet) 05/01/2012 05/02/2012
(John Fox,08/08/1980) (White Oleander) 03/10/2011 03/11/2011
(John Fox,06/06/1966) (The Hunt for Red October) 07/10/2011 07/11/2011
(Debra Gunner, 08/08/1988) (White Oleander) 15/11/2011 15/12/2011
(Lisa Hunter, 12/12/1982) (Blue Velvet) 11/11/2011 11/12/2011

33
Identifier Semantics

I Still, the idea to use shortcuts like Johnny or BlueDVD seems to be promising

I A suitable approach is to use identifiers:


I We fix a universal set ID of identifiers which can then be used as shortcuts, and
assign identifiers to entities, such as
the shortcut Johnny to the client (John Fox, 08/08/1980, 88 Main Street, 3508888)

I Thus, an entity in identifier semantics is just a pair (i, e) associating an identifier


i and an entity e:
(Johnny, (John Fox, 08/08/1980, 88 Main Street, 3508888))

1) An entity of type E (in identifier semantics) is a pair (i, e) with an identifier


i ∈ ID, and e being an entity of type E as defined before.
2) An entity set of type E (in identifier semantics) is a finite set E t of entities (in
identifier semantics) s.t. for any two pairs (i1, e1) and (i2, e2) in E t we have:
i1 = i2 if and only if e1 = e2.

34
An Example

I The unique-identifier property in part 2) of the definition ensures that each entity
in an entity set receives a unique identifier. Clearly, this shortcut may then be used
to identify the corresponding entity in the entity set.
Client
ID Name Birthday Address Phone
Johnny John Fox 08/08/1980 88 Main Street 3508888
JohnnyTwo John Fox 06/06/1966 66 Victoria Street 3506060
Lizzie Lisa Hunter 12/12/1982 22 Te Awe Awe Street 3502222
Debbie Debra Gunner 08/08/1988 21 Park Road 3501111

DVD
ID Title Director Year
BlueDVD Blue Velvet David Lynch 1986
WhiteDVD White Oleander Peter Kosminsky 2003
RedDVD The Hunt for Red October John McTiernan 1990

I Once we assigned identifiers to entities, we may use them within relationships. A


relationship r in identifier semantics is a pair (i, r) where i is an identifier assigned
to r, and r itself assigns every component E of R some identifier r(E).

35
Unique-Key-Value and Unique-Identifier Property

Rental
ID Client DVD RentalDay DueDay
i1 Johnny BlueDVD 03/10/2011 03/11/2011
i2 Johnny BlueDVD 05/01/2012 05/02/2012
i3 Johnny WhiteDVD 03/10/2011 03/11/2011
i4 JohnnyTwo RedDVD 07/10/2011 07/11/2011
i5 Debbie WhiteDVD 15/11/2011 15/12/2011
i6 Lizzie BlueDVD 11/11/2011 11/12/2011

I A relationship set has the unique-key-value property:


I if two relationships coincide on all key components and on all key attributes, then
they must be the same
I . . . and the unique-identifier property:
I identifiers that are used to identify relationships must be unique

I Of course, the rentals in our database also receive identifiers. Here we used artificial
identifiers i1, i2, . . .. This is usual in practice, and refers to the fact that an identifier
does not reflect an additional attribute, but is simply a technical means to allow a
more compact representation of data.

36
Adjusting to Identifier Semantics

I Adding the unique-identifier property to our definitions, we obtain:


1) A relationship (in identifier semantics) of type R is a pair (i, r) with i ∈ ID
and a mapping
r : comp(R) ∪ attr(R) → ID ∪ D
with r(E) ∈ ID for all components E ∈ comp(R), and r(A) ∈ dom(A) for all
attributes A ∈ attr(R).
2) A relationship set (in identifier semantics) is a finite set Rt of relationships
(in identifier semantics) with unique key values as defined before, and such that
for any two pairs (i1, r1) and (i2, r2) in Rt we have:
i1 = i2 if and only if r1 = r2.
I As before, the relationship set of type Rental in our example together with the
entity sets of types Client and DVD form a database instance.

I It only remains to slightly modify the definition of a database instance to adopt to


identifier semantics.

37
More Entity-Relationship Modeling

I Shortly after its introduction, the ERM became the most popular data model used
in conceptual database design
I Over the time, a number of extensions were proposed to overcome some minor
drawbacks and to increase the expressive power of the ERM
I After all, the ERM with its extensions provides effective and convenient means of
describing the target of the database
I Some extensions we shall discuss:
I Further ways to form relationship types
I Higher-order relationship types

I So far we used aggregation to form relationship types. To provide more freedom when
modelling objects in the target of the database, other ways of forming relationship
types were proposed including
I Specialization,
I Generalization, and
I Clusters.

38
Specialization

I Sometimes, an object in the target of the database can be represented by more than
just a single abstract concept.
I Example: Students are also persons. Graduate students are also students and thus
also persons.
I For this it would be good to derive abstract concepts from other abstract concepts,
such as the more specific concept Student from the more general concept Person.
I This idea is known as specialization. The derived object type is a subtype of the
more general supertype. A subtype inherits all features of its supertype, but often
adds some new properties.
I The subtype U may be modeled as a unary relationship type whose single component
is just its supertype C. Clearly, U may have some additional attributes, and we may
use C as the key for U :
U = ({C}, attr(U ), {C})
I Note: Every object of type C gives rise to at most one object of type U .

39
An Example of a Specialization Hierarchy

- Degree
Graduate
Student - Topic

- StudentId
- Position -Department
Student General Staff Lecturer
-Major -Subject

-Name
Person
- Address

40
Adding Some Further Relationship Types

Supervises

Graduate - Degree - Semester


Teaches - Textbook
Student - Topic

- StudentId
- Position -Department
Student General Staff Lecturer
-Major -Subject

-Name - No
Person Paper
- Address - Title

41
Generalization and Clusters

I Sometimes it is necessary to model alternatives, e.g., having a relationship to various


kinds of objects.

I Example: The department hires employees which might be lecturers or tutors or


general staff.

I For this it would be good to have an abstract concept that models all of them, that
is, we like to comprise several object types to a single new type.

I The idea is known as generalization as the new abstract concept is more general
than the individual ones.

A cluster type U consists of a finite, non-empty set comp(U ) of components


C1, . . . , Cn. We denote this cluster type by U = C1 ⊕ · · · ⊕ Cn.

I To use roles, we also allow components of the form pi : Ci rather than simply Ci.

42
An Example of a Cluster Type

- Degree
Graduate Employee
Student - Topic

- StudentId
- Position -Department -Department
Student General Staff Lecturer Tutor
-Major -Subject -Subject
-Phone

-Name
Person
- Address

43
Disjoint Unions of Object Sets

I Note: Clusters model disjoint unions.


I This means we have to check whether the object sets we put together are mutually
disjoint.
I If so, the object set I(U ) associated with a cluster U is just the union of the object
sets of the components:

n
I(U ) = I(C1) ∪ · · · ∪ I(Cn) = {o : o ∈ I(Ci)}
i=1

I If not, we make these object sets mutually disjoint:


I Attach the index i to each object o of type Ci , that is, we replace t by a (i, o)

I Thus, if the same object o occurs in another object set I(Cj ) with i ̸= j, the pairs
(j, o) and (i, o) are different
I Afterwards, the object set I(U ) associated with a cluster U is the disjoint union


n
I(U ) = {(i, o) : o ∈ I(Ci)}
i=1

44
Adding a Further Relationship Type

- Salary
Hires
- Since

- Degree
Graduate Employee
Student - Topic

- StudentId
- Position -Office -Office
Student General Staff Lecturer Tutor
-Major -Subject -Subject
-Phone

-Name -Name
Person Department
- Address

45
Higher-order Relationship Types

I So far, we allowed only entity types to occur as components of relationship types.


However, the examples above suggest to allow also
I relationship types, and
I cluster types

to occur as components of a relationship type.


I For convenience, we call these four kinds of types jointly object types.
I Entity types are just object types without components, while all other object
types have one or more components.
I We have to ensure that the components of an object type are well-defined. For
that we assign an order to each object type.

Let U be an object type with component set comp(U ). The order of U is


1) 0 if U is an entity type,
2) k if all components of U have order less than k and at least one of its components
has order k − 1.

46
Extended Entity-Relationship Schemata

I This allows us to extend the definition of an ER schema as follows:


An Entity-Relationship schema (or ER schema, for short) is a finite set S
of object types such that for each object type U in S and each of its components
C or p : C in comp(U ), we have that the object type C belongs to S as well.
I Similarly we may extend the definitions of a relationship, a relationship set and a
database instance.
I Recall: The ER diagram of an ER schema S is a directed graph with the elements
of S as nodes, and with edges from a node U to a node C for all components
C ∈ comp(U ), and edges from node U to node C labeled with p for all components
p : C ∈ comp(U ).
I The graph-theoretical point of view:
I An ER diagram is just a digraph without any directed cycles (also called a directed
acyclic graph or dag, for short).
I The order of an object type U is the maximum length of a directed path of the
ER diagram that starts with U .

47
An Example

- Month
LendingPeriod - Year - Advance
Billing
- Date

Rental - RentalNo

- IRD_No -Birthday - CopyNo


Employee -Salary Customer -Phones Copy
- Status
- Position

- Address - Name - DVD_Title


Branch - Tel
Company Person - Address
Company DVD
Company - Director
- RentalFee

- IRD_No - Conditions
Manager - Salary Supplies

- VendorName
Vendor - Representative
- Tel
- Amount
Buys - Date

48
Transforming ER Schemata into RDM Schemata

Requirements analysis

Conceptual Design

Logical Design

Physical Design

49
Our Running Example

I Consider the following ER diagram:

No Name
Date
No
Supplier DayOffer Article
Shortname
Price
Address QuantityOnStock

No Budget

Department Purchase Quantity

I Goal: derive the relational database schema


,→ corresponding to the ER schema of this diagram
,→ automatically, i.e., by means of an algorithm

50
Transformation of Entity Types

I We describe the transformation of ER schemata into relational database schemata

I Start with level 0 types and then work your way up gradually

I Entity types:

I E = (attr(E), id(E)) leads to a relation schema E ′ with attr(E ′) = attr(E)

I The domain assignment for the attributes of E and E ′ is the same

I E = (attr(E), id(E)) leads to a key id(E) on E ′

51
Example: Transformation of the Entity Types

I Three Entity Types:

I Department = ({No,Budget}, {No}),


I Supplier = ({No,Name,Address}, {No}),
I Article = ({No,Shortname,QuantityOnStock}, {No}),

I Result in three relation schemata:

I Department’ = {No,Budget} with key {No},


I Supplier’ = {No,Name,Address} with key {No},
I Article’ = {No,Shortname,QuantityOnStock} with key {No}.

52
Transformation of Relationship Types

I For relationship type R = (comp(R), attr(R), id(R)) and each component C ∈


comp(R) choose pairwise disjoint sets
k attr(C) = {C.A | A key attribute of C ′ originating from previous transformation of C}

of new attribute names not occurring in attr(R)


I R = (comp(R), attr(R), id(R)) leads to relation schema R′ with


attr(R ) = k attr(C) ∪ attr(R)
C∈comp(R)

I Domains: dom(C.A) = dom(A), and dom(A) unchanged for A ∈ attr(R)


I R = (comp(R), attr(R), id(R)) leads to key

k attr(C) ∪ (id(R) ∩ attr(R)) on R′
C∈id(R)∩comp(R)

I Each component C ∈ comp(R) defines a foreign key


[C.A1, . . . , C.An] ⊆ C ′[A1, . . . , An]
on R′ for id(C) = {A1, . . . , An}

53
Example: Transformation of Order-1 Relationships Types

I Derive new sets of attribute names for Entity types:

I k attr(Department) = {Department.No}
I k attr(Supplier) = {Supplier.No}
I k attr(Article) = {Article.No}

I DayOffer = ({Supplier, Article}, {Date,Price}, {Supplier, Article, Date})


becomes:

I DayOffer’ = {Supplier.No,Article.No,Date,Price} with

I key: {Supplier.No,Article.No,Date}

I foreign keys:
,→ [Supplier.No] ⊆ Supplier’[No]
,→ [Article.No] ⊆ Article’[No]

54
Example: Transformation of Order-2 Relationships Types

I Derive a new set of attribute names for components of Purchase:

I k attr(DayOffer) = {DayOffer.Supplier.No,DayOffer.Article.No,DayOffer.Date}

I Purchase = ({Department, DayOffer}, {Quantity}, {Department, DayOffer})


becomes:

I Purchase’ = {Department.No,DayOffer.Supplier.No,DayOffer.Article.No,DayOffer.Date,Quantity}
with

I key: {Department.No,DayOffer.Supplier.No,DayOffer.Article.No,DayOffer.Date}

I foreign keys:
[Department.No] ⊆ Department’[No]
[DayOffer.Supplier.No,DayOffer.Article.No,DayOffer.Date] ⊆
DayOffer’[Supplier.No,Article.No,Date]

55
How to Handle Cluster Types

I Cluster types used in conceptual design to model alternatives

I RDM does not provide similar concept

I Transform ER schema with clusters into equivalent ER schema without clusters

I Only necessary as pre-processing before actual transformation to RDM

I In general: clusters provide convenient way to model objects in target of database

I Avoidance of clusters not recommend as the size of ER schemata increases dramat-


ically and becomes harder to comprehend

56
Transformation of Cluster Types

I Making an ER Schema cluster-free is a simple process

I Cluster types in ER schema S that are not component of any relationship type can
be removed from S

I Consider relationship type R with cluster component C = C1 ⊕ · · · ⊕ Cn

I Replace R by n new relationship types R1, . . . , Rn:

I For each i = 1, . . . , n:
,→ Ri obtained from R by replacing every occurrence of C by Ci

I If some Ri still contains clusters, then repeat process of replacing these clusters
by their components

I Final ER schema is cluster-free and previous transformation to RDM can be applied

57
Example: Cluster-free ER Schema

I Consider the following ER diagram:

Name Title Description


Project

Professor PostGrad

Supervisor Student
Dept Subject Degree
Supervise
Name
since until
Associate Under
Prof Graduate

Dept Level ID Name

I Supervise={Supervisor, Student, Project},{since, until},{Supervisor, Project})


with

I Cluster Supervisor = Professor ⊕ AssociateProf


I Cluster Student = PostGrad ⊕ UnderGraduate

58
Example: Cluster-free ER Schema

I Using Supervisor we obtain

I Prof Supervision:
({Professor, Student, Project},{since,until},{Professor, Project})
I AProf Supervision:
({AssociateProf, Student, Project},{since,until},{AssociateProf, Project})

I Using Student subsequently we obtain

I Prof PostGrad Supervision:


({Professor, PostGrad, Project},{since,until},{Professor, Project})
I Prof UnderGrad Supervision:
({Professor, UnderGraduate, Project},{since,until},{Professor, Project})
I AProf PostGrad Supervision:
({AssociateProf, PostGrad, Project},{since,until},{AssociateProf, Project})
I AProf UnderGrad Supervision:
({AssociateProf, UnderGraduate, Project},{since,until},{AssociateProf, Project})

59
Main Contributors to the Entity-Relationship Model

I Peter PC Chen
I originator of the basic ER model
I The Entity-Relationship Model -
Toward a Unified View of Data, ACM ToDS, 1976.
I over 8,000 citations

I Bernhard Thalheim
I several extensions of the basic ER model,
including semantics and higher-order object types
I Entity-Relationship modeling -
Foundations of Database Technology, Springer, 2000.

60
Summary

I Conceptual models are formal high-level descriptions of the target of a database


,→ input is a natural language description of the target
,→ output is a conceptual schema of the target

I The Entity-Relationship model provides a foundation for conceptual modeling


,→ views the world in terms of entities, and their relationships
,→ can model specialization, generalization, hierarchies
,→ can model several expressive constraints important for database design

I Success of ER model based on several features


,→ constructs closely resemble those used in natural language
,→ has a formal semantics
,→ ER diagrams provide a visual representation of a conceptual schema
,→ can be used to consolidate conceptual schema with other stake-holders
,→ ER schemata can be transformed faithfully into relational database schemata
,→ resulting relational database schemata form a good basis for logical design

61

You might also like