You are on page 1of 14

Page |1

Session 4

Entity-Relationship (ER) Modeling

Session 4 objectives

1. Introduction to entity relationship diagrams (ERD)


2. Entities, attributes, and keys
3. Understanding relationships
4. Classification in the entity relationship model
5. Notations
6. Comparison to other notations

Review Questions

1. What role does the ER diagram play in the design process?


2. What two conditions must be met before an entity can be classified as a weak
entity? Give an example of a weak entity.
3. What is a strong (or identifying) relationship, and how is it depicted in a Crow’s
Foot ERD?
4. Given the business rule “an employee may have many degrees,” discuss its
effect on attributes, entities, and relationships. (Hint: Remember what a multi-
valued attribute is and how it might be implemented.)
5. What is a composite entity, and when is it used?
6. What is a recursive relationship? Give an example.
7. Discuss the difference between a composite key and a composite attribute. How
would each be indicated in an ERD?
8. What two courses of action are available to a designer when encountering a
multi-valued attribute?
9. What is a derived attribute? Give an example.
10. How is a relationship between entities indicated in an ERD? Give an example,
using the Crow’s Foot notation.
11. What three (often conflicting) database requirements must be addressed in
database design?
12. Briefly, but precisely, explain the difference between single-valued attributes
and simple attributes. Give an example of each.

(These questions are intended ONLY to be a self-test of your


comprehension of this session's material; answers to these
questions do not need to be turned in.). These questions are
just for your guidance.
Page |2

Lecture Notes

Entity-Relationship (ER) Modeling

Introduction to ERD
P. P. Chen first introduced the entity-relationship model in 1976. It facilitated database
design and complemented the relational data model concepts. The E-R model is a
semantic model at the conceptual level and it captures meanings as well as structure.

E-R models are represented in an entity relationship diagram or ERD, which uses
graphical representations to model database components in terms of entities, attributes,
and relationships.

Entities, Attributes, and Keys


An entity can be defined as anything about which we need to collect and store data. An
entity can be a person, for example a student; a place, for example building; an object
such as computer; a concept such as course; or an event such as course enrollment.

A particular occurrence of an entity is the entity instance. For example, each one of the
students who takes Database System class is an instance of the Student entity.

Entity type is a category of entity. The entity type forms the intension of the entity or the
permanent definition part, while all the entity instances that fulfill the definition at the
moment form the extension of the entity.

Entity set is a collection of entities of the same type. An entity set is represented in E- R
diagrams by a rectangle having the name of the entity inside.

Each entity has its properties and characteristics. A set of attributes can be used to
describe the defining properties or qualities of the entity. For example, the Student entity
will have attributes such as name, major, address, and GPA.

A domain is the set of permitted values for a given attribute. For example, the domain for
the student gender attribute consists of only two possible values, male and female.

An attribute may have null values if the value is unknown at the present time or is
undefined for a particular entity instance. For example, some students have not declared
their major yet, so the values for their major attribute are null.

On the other hand, a student may have more than one email addresses. So the email
address attribute of the student entity is a multi-valued attribute.
Page |3

An attribute may consist of other attributes. For example, the address of a student
includes street, city, state and zip code. We call attributes like address composite
attributes.

An attribute may be classified as a derived attribute, or an attribute whose value is


calculated or derived from other attributes. For example, a student’s age is an attribute
that can be computed using the difference between the current date and the date of birth
of the student.

The diagram below shows two entities, Student and Course. Student has attributes of
Student_ID, First_Name, Last_Name, Major, GPA, SSN, and Email. Course has
attributes of Course_No, Description, and Cost.

Figure 4.1 Sample Entities

Often it is important to separate entity instances. A key is an attribute, or group of


attributes whose values allow us to tell records apart. There are different kinds of keys.

A super key is an attribute or a set of attributes that uniquely identifies an entity. For
example, for the Student entity set, Student_ID is a super key because it can be used to
uniquely identify each student. On the other hand, First_Name plus Last_Name is not a
super key because you may have two or more students with the same name. If you have a
super key, then any set of attributes containing that super key is also a super key.
Therefore, the combination of student ID and first name/last name, is also a super key.

A candidate key is the minimal super key, one that does not contain extra attributes. For
example, Student_ID plus First_Name and Last_Name is a super key, but not a candidate
key because first name or last name is not necessary. Student_ID by itself can identify a
student. A key with more than one attribute is called composite key.

An entity may have several candidate keys. For example, student ID, social security
number, and email are all candidate keys for the student entity. However when
implementing the database, we usually choose one of the candidate keys as the normal
way of identifying entities and accessing records. This becomes the primary key. For
example, we use student ID as the primary key for student. Often, the other candidate
keys become alternate keys, whose unique values provide another method of accessing
records, in our example, social security number. For the Course entity, the primary key is
Course_No.
Page |4

The term secondary key usually means an attribute or set of attributes whose values, not
necessarily unique, are used as a means of accessing records. For the Student example,
the attribute set First_Name and Last_Name can be used as a secondary key. We can use
the name to help us find a student record if we do not know the student ID or social
security number.

Understanding relationships
Usually there is a natural business association, or relationship, exists between two or
more entities. For example, students take courses while faculty members teach courses.
Common properties of certain relationships can be defined as a relationship type and the
collection of relationships of that type forms the corresponding well defined relationship
set. The relationship set consists of relationship instances, or relationships that exist at a
given moment.

Relationships between two entities instances are called binary relationship set, for
example, the following diagram shows an “Enrolls” relationship between Student and
Course. The degree can also be ternary which links three entity sets, or N-nary which
links N entities.

Figure 4.2 Sample Binary Relationship

A relationship set may have descriptive attributes that belong to the relationship rather
than to any of the entities involved. For example, the attribute Grade is a descriptive
attribute for the Enroll relationship set. The value shows the final grade after a student
takes a course.

Another concept related to relationships is cardinality, which is the number of entity


instances to which another entity can map under the relationship.

A relationship R from X to Y is one-to-one (1:1) if each entity instance in X is associated


with at most one entity instance in Y and each entity instance in Y with at most one entity
instance in X. One-to-many (1:M) cardinality is each entity instance in X can be
associated with many instances of entity Y, but each entity instance in Y with at most one
entity instance in X. A relationship between X and Y is many-to-many (M:N) if each
entity instance in X can be associated with many instances of entity Y, and each entity
instance in Y with many instances of entity X.
Page |5

The following three diagrams illustrate 1:1, 1:M, and M:N relationship.

Figure 4.3 Sample 1:1, 1:M, and M:N relationship

There are cases that not all members of an entity set participate in a relationship. If every
member of an entity set must participate in a relationship, we call that total participation.
Partial participation refers to cases that some members of the entity set may not
participate in the relationship.

In a relationship, each entity has a function called its role in the relationship. It is
optional to name role of an entity, though it might be helpful in cases such as recursive
relationship or if multiple relationships exist between the same entity set. A recursive
relationship occurs when an entity set is related to itself.

Classification in the Entity Relationship model


An existence constraint, or existence dependency, can occur between two entities. Entity
Y is existence dependent on entity X if each instance of Y must have a corresponding
instance of X. Y must have total participation in its relationship with X.

If entity Y does not have its own candidate key, Y is called a weak entity, and entity X is
a strong entity. A weak entity may have a partial key, or a discriminator, that
Page |6

distinguishes instances of the weak entity that are related to the same strong entity. In the
following example, Building is a strong entity as its existence does not depend on other
entities and it has its own primary key. On the other hand, without a building, a room
would not exist so Room is a weak entity. Part of its primary key has to be derived from
the parent entity. Room_Number alone would not be unique enough as two different
building can both have a room with number 101. So in this case, the combination of
Building_Number and Room_Number is the primary key of the Room entity, while
Building_Number is derived from the Building entity.

Figure 4.4 Weak Entity and identifying Relationship

An identifying relationship, or strong relationship, is a relationship between a strong and


a weak entity, where the key of the strong entity is required to uniquely identify instances
of the weak entity. The “Has” relationship between Building and Room is a strong or
identifying relationship. A non-identifying relationship, or weak relationship, is a
relationship between two strong entities.

Notations
In an ERD using Chen’s notation, a rectangle is used to represent an entity, an oval to
represent an attribute, and a diamond to represent a relationship. These elements are
connected by lines. A complete list of Chen’s notation symbols can be found below.
Page |7

Figure 4.5 Chen’s Notation Symbols


(https://www.conceptdraw.com/solution-park/software-erd)

There are other conventions and one of the most widely used is Crow’s Foot notation,
which is the notation being used in most of examples in this and earlier lectures. As in
Chen’s, an entity is represented by rectangle with entity’s name. Unlike Chen’s, Crow’s
Foot notation writes attribute in attribute box below entity rectangle. Relationships are
illustrated as a straight line. A relationship can have a name, usually a strong verb phrase,
written on the relationship line. Below is a list of symbols of Crow’s Foot notation.
Page |8

Figure 4.6 Crow’s Foot Notation Symbols


(https://www.conceptdraw.com/solution-park/software-erd)

Comparison to other notations


This session presents a fairly balanced view of entity-relationship modeling. On the one
hand, a pragmatic focus on the aspects of the ERD that directly impact the design and
implementation of the database must be considered. On the other hand, properly
documenting the business needs that the database must support is equally important.
Therefore, we present both Crow's Foot and Chen notation ERDs and discuss the
differences between them. Crow's Foot tends to be more pragmatic, while Chen notation
has greater semantic content.

Among the points of comparison that we focus on in class are:


• In both notations, entities are modeled the same.
• Attributes are modeled differently -- in ovals with Chen notation and in an
attribute box with Crow's Foot.
• Identifiers are modeled the same.
• Single-valued attributes are modeled the same.
• Chen notation distinguishes multi-valued attributes with a double-line; however,
Crow's Foot notation does not distinguish multi-valued attributes. This can be
attributed to the pragmatism of Crow's Foot notation -- if you wouldn't implement
the design with a multi-valued attribute, then don't draw the design with a multi-
valued attribute.
• Chen notation distinguishes derived attributes with a dotted attribute line;
Page |9

however, Crow's Foot notation does not distinguish derived attributes.


• Chen notation commonly uses specific minimum and maximum cardinalities.
Specific minimum and maximum cardinalities are almost never used in practice
with Crow's Foot notations, which is why, in practice, most people who use
Crow's Foot notation refer to connectivity as maximum cardinality and
participation as minimum cardinality.
• Both notations typically distinguish weak entities; however, as a peculiarity of the
MS Visio tool, weak entities are implied through relationship strength.

Further Reading: M:N Relationship


One very important issue that often confuses students is the resolution of M:N
relationships into 1:M relationships.

Although M:N relationships may properly be viewed in a relational database model at the
conceptual level, such relationships should not be implemented, because their existence
creates undesirable redundancies. Therefore, M:N relationships must be decomposed into
1:M relationships to fit into the ER framework. For example, if you were to develop an
ER model for a video rental store, you would note that tapes can be rented more than
once and that customers can rent more than one tape.

Also we have to keep in mind that newly arrived tapes that have just been entered into
inventory have not yet been rented and that some tapes may never be rented at all if there
is no demand for them. Therefore, CUSTOMER is optional to TAPE. Assuming that the
video store only rents videos and that a CUSTOMER entry will not exist unless a person
coming into the video store actually rents that first tape, TAPE is mandatory to
CUSTOMER. On the other hand, if the store has other services, such as selling movies or
games, then a CUSTOMER entry could exist without having rented a video. In which
case, a TAPE is optional to CUSTOMER. Note that this discussion includes a very brief
description of the video store's operations and some business rules. The relationship
between customers and tapes would thus be M:N, as shown in Figure 4.7.
P a g e | 10

A customer can rent many tapes.


A tape can be rented by many customers.
Some customers do not rent tapes.
(Such customers might buy tapes or
other items.) Therefore, TAPE is
optional to CUSTOMER in the
rental relationship.
Some tapes are never rented.
Therefore, CUSTOMER is optional
to TAPE in the rental relationship.

Chen Model
M N
CUSTOMER rents TAPE

Crow’s Foot Model

rents
CUSTOMER TAPE

Figure 4.7 The M:N Relationship

Note that the ERD reflects two business rules:


1. A CUSTOMER may rent many TAPEs.
2. A TAPE can be rented by many CUSTOMERs.

The M:N relationship depicted in Figure 4.7 must be broken up into two 1:M
relationships through the use of a bridge entity, also known as a composite entity. The
composite entity, named RENTAL in the example shown in Figure 4.8, must include at
least the primary key components (CUS_NUM and TAPE_CODE) of the two entities it
bridges, because the RENTAL entity’s foreign keys must point to the primary keys of the
two entities CUSTOMER and TAPE.

Figure 4.8 Decomposition of the M:N Relationship

Several points about Figure 4.8 are worth emphasizing:


• The RENTAL entity’s PK could have been the combination TAPE_CODE +
CUS_NUM. This composite PK would have created strong relationships between
P a g e | 11

RENTAL and CUSTOMER and between RENTAL and TAPE. Because this
composite PK was not used, it is a candidate key.
• In this case, the designer made the decision to use a single-attribute PK rather
than a composite PK. Note that the RENTAL entity uses the PK RENT_NUM. It
is useful to point out that single-attribute PKs are usually more desirable than
composite PKs, especially when relationships must be established between the
RENTAL and some as yet unidentified entity. (You cannot use a composite PK as
a foreign key in a related entity!) In addition, a composite PK makes queries less
efficient.
• Note the placement of the optional symbols. Because a tape that is never rented
will never show up in the RENTAL entity, RENTAL has become optional to TAPE.
That's why the optional symbol has migrated from CUSTOMER to the opposite
side of RENTAL. Also, note the addition of a few attributes in each of the three
entities to make it easier to see what is being tracked.
• Because the M:N relationship has now been decomposed into two 1:M
relationships, the ER model shown in Figure 4.8 can be implemented. However,
“implementable” is not necessarily synonymous with “practical” or “useful.”
(We’ll modify the ERD in Figure 4.8 after some additional discussion.)
• Therefore, the relationships between CUSTOMER and RENTAL and between
TAPE and RENTAL are read as:
CUSTOMER generates RENTAL
TAPE enters RENTAL
• The dashed relationship lines indicate weak relationships. In this case, the
RENTAL entity’s primary key is RENT_NUM, and this PK did not use any
attribute from the CUSTOMER and TAPE entities.

The (implied) cardinalities in Figure 4.8 reflect the rental transactions. Each rental
transaction, i.e., each record in the RENTAL table, will reference one and only one
customer and one and only one tape. The (simplified!) implementation of this model may
thus yield the sample data shown in the database in Figure 4.9. The database's relational
diagram is shown in Figure 4.10.
P a g e | 12

Figure 4.9 The Rental Database Tables

The relational diagram that corresponds to the design in Figure 4.8 is shown in Figure
4.10.

Figure 4.10 The Rental Database Relational Diagram


P a g e | 13

The database’s TAPE and RENTAL tables contain some attributes that merit additional
discussion.
• The TAPE_CODE attribute values include a “trailer” after the dash. For example,
note that the third record in the TAPE table has a PK value of R2345-2. The
“trailer” indicates the tape copy. For example, the “-2” trailer in the PK value
R2345-2 indicates the second copy of the “Once Upon a Midnight Breezy” tape.
So why include a separate TAPE_COPY attribute? This decision was made to
make it easier to generate queries that make use of the tape copy value. (It’s much
more difficult to use a right-string function to “strip” the tape copy value than
simply using the TAPE_COPY value. And “simple” usually translates into “fast”
in a query environment; “fast” is a good thing!
• The RENTAL table uses two dates: RENT_DATE_OUT and
RENT_DATE_RETURN. This decision leaves the RENT_DATE_RETURN
value null until the tape is returned. Naturally, such nulls can be avoided by
creating an additional table in which the return date is not a named attribute. Note
the following few check-in and check-out transactions:

RENT_NUM TRANS_DATE
TRANS_TYPE
10050 Checked-out 10-Jan-2010
10050 Returned 11-Jan-2010
10051 Checked-out 10-Jan-2010
10051 Returned 11-Jan-2010
10052 Checked-out 11-Jan-2010
10053 Returned 10-Jan-2010
….. …………… ……………
….. …………… ……………

The decision to leave the RENT_DATE_RETURN date in the RENTAL table,


and leaving its value null until the tape is returned, is, again, up to the designer,
who evaluates the design according to often competing goals: simplicity,
elegance, reporting capability, query speed, index management, and so on.
P a g e | 14

Additional References

Introduction to entity relationship diagrams (ERD)

• Entity-Relationship Modeling: Historical Events, Future Trends, and Lessons


Learned: http://www.csc.lsu.edu/~chen/pdf/Chen_Pioneers.pdf
• Understanding ERD:
http://www.youtube.com/watch?v=QpKIDbmRN4I&feature=related

Understanding relationships

• Data Relationships: http://www.youtube.com/watch?v=RXOj0D80kRg


• Intro to Data Modeling:
http://www.databaseanswers.org/tutorial4_data_modelling/index.htm
• Entity Data Model Relationships: http://msdn.microsoft.com/en-
us/library/bb399189.aspx

Database Design

• Database Design: http://www.hit.ac.il/staff/leonidm/information-


systems/ch45.html
• Fundamentals of Database Design:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabase
Design.aspx
• SQL Database Design in 11 minutes:
http://www.youtube.com/watch?v=Ui8gU8_wsA8&feature=related
• 10 Common Database Design Mistakes: http://www.simple-
talk.com/sql/database-administration/ten-common-database-design-mistakes/

You might also like