You are on page 1of 77

Chapter 3:

Relational database design

Technology, Policy and


Management 1
Overview of Relational Data Model (cont.)

• Relational Model
– The data and the relations between them are
organized in tables.
– A table is a collection of records and each record in
a table contains the same fields organized in
columns.
– The records in the table form the rows of the table.

2
Properties of Relational Model
– Column values are of the same kind
– The Sequence of columns is insignificant
– The Sequence of rows is insignificant
– Each Column has a unique name
• Rows (tuples) in a single relation are unique.
• Relations are set of tuples.
• Attributes are atomic.
• The values of a column are drawn from the domain
associated with that column.
• The relation names in a relational database are distinct.

3
Structure of Relational Database
• Relation
– A two-dimensional table
• The columns are representing the attributes of the relation
• The rows (other than the heading row) represent tuples (records)
of the relation.

– Example
EMPLOYEES
EmpId Name BDate HAddress Phone
E001 Alemu Girma 01/10/70 Bole, o2, 1435 011-663-0712
E004 Kelem Belete 12/04/68 Arada, 01, 035 011-227-2525

4
Relational Schema
• A relation in a relational model consists of:
– The Relation schema: - that describes the column heads for
the table, and
– The Relation instance: - that is the table with the set of
tuples.

• The relation schema specifies:


– The relation's name
– Name for each attribute (field or column), and
– Domain of each attribute

5
Database Schema
• Database Schema
– The set of relation schema forms the schema for the relational
database known as database schema (relational database
schema).
• Example
– Employees (EmpId:string, Name:string, BDate:date, SubCity:string,
Kebel:integer, Phone:string)
– Projects (PrjId:integer, Name:string, SDate:date, DDate:date,
CDate:date)
– Teams (Name:string, Descr:string)
• Database Instance
– A snapshot of the data in the database at a given instant of a given
time.

6
Relational Model Constraints
• Constraints define possible database states
• Represents the semantics of the domain being modeled

1. Domain constraints
• Domain constraints describe that the values of an attribute
A must be an atomic value from the domain of A – dom (A).

• Domains of an attribute can be defined using the data


types associated with the domain.

7
Relational Model Constraints (cont.)
2. Key constraints
– Is a statement that shows a certain minimal subset of the
attributes of a relation is a unique identifier for a tuple in
the relation.
– A primary key is chosen that uniquely identifies a tuple in
the relation
– The primary Key attributes are indicated by underlying the
attributes in the relation schema.
• Example
– Employees (EmpId, Name, BDate, SubCity, Kebel, Phone)
– Projects (PrjId, Name, SDate, DDate, CDate)
– Teams (Name, Descr)

8
Relational Constraints (cont.)
3. Entity Integrity constraint
• A primary key must not contain a NULL value
– Else it may not be possible to identify some tuples
– For Example, if more than one tuple has a null value
in its primary key, we may not be able to distinguish
them

9
Relational Constraints (cont.)
4. Foreign key and Referential Integrity constraint
• Consider following 2 relation schemas:
– R1(A1, A2, ...An) and R2(B1, B2, ... Bm)
• Let PK be subset of {A1, ...,An} be primary key of R1
• A set of attributes FK is a foreign key of R2 if:
Attributes in FK have same domain as the attributes in PK of R1
For all tuples t2 in R2, there exists a tuple t1 in R1 such that
R2( t2[FK] )= R1( t1[PK]).
A referential integrity constraint from attributes FK of R2 to R1
means that FK is a foreign that refers to the primary key of R1.

10
Relational Constraints (cont.)
4. Foreign key and Referential Integrity constraint cont..
– It helpful to keeps data consistency
– The foreign key in the referencing relation requires a
match to a primary key in the referenced relation
• A relational database consists of related relations
through a foreign key

• Example
– Employees (EmpId, Name, BDate, SubCity, Kebel, Phone)
– WorkSchedule (SDate, EDate, HoursPerDay, Emp-ID)

11
Relational Constraints (cont.)
5. General Integrity constraint
• These constraints specify certain requirements for the
attribute to satisfy.

Example:
– We may require that The maximum working hours of employees
is 48 per week;

– Given such an integrity constraint specification, the DBMS will


reject inserts and updates that violate the constraint. This is very
useful in preventing data entry errors.

12
Mapping E/R Data Model to Relational
Model

Technology, Policy and


Management 13
Mapping E/R Model to Relational Model
• Entity Sets to Relations
– Strong entity sets
– Weak entity sets
• Attributes
• Relationship Sets to Relations
• Generalization and Specialization

14
E/R Diagram
Name Name
Descr EmpId BDate

Age
TEAMS Assigned EMPLOYEES
Address

City H Addrs Phone

WorksOn SubCity Kebele H No

Name
ProjId CustId Address

Name PROJECTS Owns CUSTOMER


S

SDate DDate

Partial E/R Model from Conceptual Data Model

15

Entity
Strong entity set
Sets to Relations
– Mapped to relations in relational model with
• the same name, and
• attributes.
– The primary keys assigned for the entity sets are also represented as keys
in the relations.
• Weak entity set
– For
• Weak entity set: W(a1, a2, … ai, ai+1… an)
• Identifying strong entity set: E(b1, b2, … bk, bk+1, bk+2, … bm)
– Then the relation for the weak entity set will be
• W(a1, a2, … ai, ai+1… an, b1, b2, … bk)
– The primary key for the weak entity set relation thus include:
• The discriminator of the weak entity set, and
• The primary key of the identifying strong entity set.

16
From E/R model to Relational Model
– Employees (EmpId, Name, BDate, age, …)
– Projects(ProjId, Name, SDate, DDate)
– Customers(CustId, Name, Address)
– Teams(ProjId, Name, Descr)

17
Attributes
• Composite attributes
– Separate attributes for each of the components of the attributes
Example:
– Employees (EmpId, Name, BDate, Age, City, Subcity, Kebele, HNo)
• Multivalued attributes
– Relations with the name of the attribute having attributes that corresponds
to:
• The components of the multivalued attribute, and
• The primary key of the entity set or relationship set of which the attribute
belongs.
– The primary key for the newly created relation consists of:
• The primary key of the entity set or relationship set, and
• The attribute or set of attributes from the multivalued attribute.
• Example
– Phone addresses (Phone1, Phone2, EmpId) or
• Phone addresses (phone, EmpId)

18
Relationship Sets to Relations: 1:1 Relationships

• For each binary 1:1 relationship set R in the ER


schema:
• Identify the relations S and T that correspond to the entity
sets participating in R
• Choose one of the relations—S, say—and include as foreign
key in S the primary key of T
– It is better to choose an entity set with total participation in R
in the role of S
– Include all the atomic attributes of the 1:1 relationship set R
as attributes of S

19
Relationship Sets to Relations: 1:M Relationships
• For each binary 1:M relationship set R:
a. Identify the relation S that represents the participating entity
set at the M-side of the relationship set
b. Include as foreign key in S the primary key of the
relation
T that represents the other entity set participating in R
• Each entity instance on the M-side is related to at most
one entity instance on the 1-side of the relationship set
• Any atomic attributes of the 1:M relationship set are
attributes of S.

20
Relationship to Relations: M:N Relationships
• For each binary M:N relationship type R:
a. Create a new relation S to represent the participating entity
sets

b. Include as foreign key attributes in S the primary keys of the


relations that represent the participating entity sets. This
combination forms the primary key of S

c. Include any atomic attributes of the M:N relationship set as


attributes of S

21
Generalization and Specialization (cont.)
Empno Name Salary

EMPLOYEES

ISA

FULL-TIME PART-TIME
EMPLOYEES EMPLOYEES

rank hperweek hRate

22
Generalization and Specialization (cont.)
• E/R Style
– Employees (empId, name, …)
– FullTimeEmp (empId, salary, …)
– PartTimeEmp (empId, hRate, hperWeek…)
• Use of Nulls
– Employees (empId, name, …, salary, …, hRate, hperWeek…)
• OO Approach
– Employees (empId, name, …)
– FullTimeEmp (empId, name, …, salary, …)
– PartTimeEmp (empId, name, …, hRate, hperWeek…)

23
Generalization and Specialization (cont.)
• Disjoint Generalization
– Total Participation
– Separate relation for each subclasses
 Fulltime (EmpId, Name, Salary, Rank)
 Parttime (EmpId, Name, Salary, hperweek,hrate)
– Partial Participation
– One relation with all attributes of the super class and the subclass
– And include an attribute type to indicate the type of sub class an entity belongs
too
 Employee (EmpId, Name, Salary, Rank, hperweek, hrate, jobtype)
– Overlapping Generalization
• Separate relation for each sub class and the superclass
 Employee (EmpId,Name,Salary)
 Fulltime (EmpId,Rank),
 Parttime (EmpId,hperweek,hrate)
• One big relation that consists the attributes of the sub classes and the superclass and
a boolean attribute that stores values true/false indicating to which sub class type an
entity belongs:
 Employee(empId, Name, Salary, rank, hperweek, hrate, Isparttime, Isfulltime) 24
Exercise
• Identify possible keys for each entity set and Map
the E/R data model to Relational data model.
∞ ∞
Publication

has
code includes
title

1 ∞
Department 1 publishes
∞ Report ∞
contains
∞ Topic_Area

title
name
name
written_for address
address
name research area

Contractor address

25
Dependencies

26
Dependencies
• The two most common pitfalls in a database are:
– Redundancy
• Repetition of information
– Loss of information
• Inability to present certain information

• Dependencies are kind of constraints that helps


to remove redundancy in relational database
design

27

Functional Dependencies
Let’s assume X and Y are non empty sets of attributes of a relation R
• We say for an instance r in R satisfies the FD XY, if the following holds true for
every pair of tuples t1 and t2 in the relation instance r
– If t1[X]=t2[X], then t1[Y]=t2[Y]
• X  Y in R specifies a constraint on all relation instances r(R)
• FDs are derived from the real-world constraints on the attributes
– The key attribute of a relation functionally determines all attributes of the
relation
Examples:
– For Employees relation
For Teams relation
» empId name projId, teamName description
» empId bDate
» empId gender

28
Functional Dependencies (cont.)
EMPLOYEES
EmpId Name BDate HAddress Phone
E001 Alemu Girma 01/10/70 Bole, o2, 1435 011-663-0712
E004 Kelem Belete 12/04/68 Arada, 01, 035 011-227-2525

Dependency Diagram
• It is a pictorial representation of all the functional dependencies in a
database
• An attribute is represented by a rectangle
• An arrow is drawn from the attribute set A to attribute set B whenever
the dependency AB holds true
Example- FD: EmpIdName, Bdate, HAddress

EmpId Name BDate HAddress

29
Types of functional dependencies
• Partial dependency
– It is a functional dependency in which a non-primary key
attribute/attributes functionally depend on part of a primary
key
• E.g. EmpId Name,Bdate
o Here the non-key attributes Name and Bdate are
determined by the partial key attribute EmpId in the
relation
• Transitive dependency
– It refers to a functional dependency in which none of the
attributes involves a primary key

30
Types of Functional Dependencies

31
Normal Forms & Normalization

32
Data Anomalies
• Well structured relations contain minimal redundancy of data
– They allow modification, insertion and deletion of data in the relation
without error
– Data Anomalies are errors/inconsistencies that arise due to redundantly
stored data in a relation
– The three most common anomalies in relational database design are:
• Insertion anomalies
• Deletion anomalies
• Modification anomalies (update anomalies)

Let’s consider the


Employees relation

33
Data Anomalies: Insertion anomalies
Insertion Anomalies
– These type of data anomalies occur when we try to insert new records to a
relation
– E.g. We can not insert a project unless an employee is assigned in a team to
work in a project
• We can not insert an employee unless he/she is assigned to a project
• We can insert name of the employee and leave the rest of the attributes
NULL, but this violates the integrity constraint

Deletion Anomalies
• These type of anomalies occur when critical data has been unintentionally
(perhaps) removed from the database
• E.g If we delete all the records for a project, then all employees who work
on that project will be deleted.

34
Data Anomalies
Update anomalies
• These problems arise when the database must make multiple
changes on records to reflect a single attribute change

E.g. Changing the name of the team name “Programmer” to


“Designer”, will cause this update to be made for all employees
working in the programmer team

35
Use of Attribute closure: Super keys & Candidate keys
• A set of attributes that determine the entire tuple is a super
key
• Example: Students (stud-id, name, dept, advisor)
– {stud-id, name} is a super key for the student table.
– Also {stud-id, name, advisor} etc.
• A minimal set of attributes that determines the entire tuple is
a candidate key
– {stud-id, name} is not a candidate key because we can remove the
name and have a super key
– stud-id is a candidate key
• If there are multiple candidate keys, the DB designer chooses
and designates one as the primary key

36
Normalization and Normal Forms
• Normalization is the process of decomposing unsatisfactory "bad"
relation R into fragments (i.e., smaller tables) R1, R2,.., Rn. Our goals
are:
– Lossless decomposition: The fragments should contain the same
information as the original table. Otherwise decomposition
results in information loss
– Dependency preservation: Dependencies should be preserved
within each Ri , i.e., otherwise, checking updates for violation of
functional dependencies may require computing joins, which is
expensive
• To retrieve complete information from a normalized database,
the JOIN operation must be used
– Good form: The fragments Ri should not involve redundancy.

37
Computing a Decomposition

• Decomposition
– Let R be a relation schema. The collection of {R1,
…,Rn} of relation schemas is a decomposition of R if
– R = R1U R2U….U Rn

– A good decomposition does not:


– Lose information
– Complicate checking of constraints
– Contain anomalies (or at least contains fewer anomalies)

38
Lossless-Join Decompositions
• We should be able to construct the instance of the original table from the
instances of the tables in the decomposition. Consider replacing the Marks
relation below by decomposing it into two relations: SGM and AM

39
Normal Forms
• Edgar F. Codd originally established three normal forms:
• 1NF
• 2NF and
• 3NF
– Later, others like BCNF, 4NF and 5NF were introduced and were
generally accepted
– 3NF is widely considered to be sufficient for most applications

– Most tables when reaching 3NF are also in BCNF (Boyce-Codd


Normal Form)

40
Normal Forms: First Normal Form (1NF)
• A relation (table) R is in 1NF if and only if all underlying
domains of attributes contain only atomic values
(simple/non divisible)
• 1NF allows removal of multivalued attributes,
composite attributes and their combination in the
relational schema
• Normalization (Decomposition)
– Form new relation for each non-atomic attribute or nested
relations

A table is in 1NF when all the key attributes are defined (no repeating groups in
the table) and when all remaining attributes are dependent on the primary key.
However, a table in 1NF still may contain partial or/and transitive dependencies
41
Normal Forms: First Normal Form (1NF)
• Consider the following relation Students:

Student
Stud-Id Name Course-Id Credits
2003 Alemu IT 3106 4
2003 Alemu IT 2301 4
1874 Tarik IT 2145 4

• It has a repeating group of attributes. To bring the relation into


1NF, we can do either of the following 2 options:

1.Make a determinant of the repeating group (or the multivalued


attribute) be a part of the primary key
2. Create a separate new relation for the repeating group

42
Normal Forms: First Normal Form (1NF) (contd.)

Option1: Make a determinant of the repeating group (or the


multivalued attribute) be a part of the primary key

Composite
Primary Key

Stud-Id Name Course-Id Credits


2003 Alemu IT 3106 4
2003 Alemu IT 2301 4
1874 Tarik IT 2145 4

43
Normal Forms: First Normal Form (1NF) (contd.)
• Option2: Remove the entire repeating group from the relation and create
another relation which would contain all the attributes of the repeating group,
plus the primary key from the first relation

Stud-Id Name Student


2003 Alemu
1874 Tarik
• In the new relation, the primary key from the original relation and the
determinant of the repeating group will comprise a primary key

Stud-Id Course-Id Credits


2003 IT 3106 4 Student - Course
2003 IT 2301 4
1874 IT 2145 4

44
Normal Forms: Second Normal Form (2NF)
• A relation schema R is in 2NF if
– it is in 1NF, and
– every non-prime attribute (attribute that is a member of the
primary key) A in R is fully functionally dependent on the
primary key

• Normalization (Decomposition)
– Decompose and set up a new relation for each partial key
with its dependent attribute (s)
– Keep a relation with the original primary key and any
attributes that are fully functionally dependent on it.

45
Normal Forms: Second Normal Form (2NF)
• Assume we have the following student Composite Partial
relation Primary Key dependencies

• Our goal is to bring the relation into 2NF by


removing the partial dependency

Stud-Id Name Course-Id Credits


Stud-Id Name 2003 Alemu IT 3106 4
2003 Alemu 2003 Alemu IT 2301 4
1874 Tarik 1874 Tarik IT 2145 4

Stud-Id Course-Id Course- Credits


2003 IT 3106 Id
2003 IT 2301 IT 3106 4
1874 IT 2145 IT 2301 4
IT 2145 4

46
Normal Forms: Third Normal Form (3NF)
• A relation schema R is in 3NF if
– It is in 2NF, and
– There would be no non-prime attribute of R that has
transitive dependencies on the primary key
• Normalization (Decomposition)
 Remove the attributes, which are dependent on a non-
key attribute, from the original relation.
 For each transitive dependency, create a new relation
with the non-key attribute which is a determinant in the
transitive dependency as a primary key, and the
dependent non-key attribute as a dependent attribute

47
Normal Forms: Third Normal Form (3NF)
Example: Let’s consider the Employees Transitive
relation below: Dependency
Goal: Remove transitive dependency

Emp_Id F_Name L_Name Dept_Id Dept


E-001 Alemu Chane 01 Acct.
E-002 Kelem Negash 02 Mgmt.

Emp_Id F_Name L_Name Dept_Id Dept_Id Dept

E-001 Alemu Chane 01 01 Acct.

E-002 Kelem Negash 02 02 Mgmt.

48
Denormalization
• Normalization is performed to reduce or eliminate Insertion, Deletion or
Update anomalies
• However, a completely normalized database may not be the most efficient or
effective implementation
• “Denormalization” is sometimes used to improve efficiency.

Denormalization
– Is the process of selectively taking normalized tables and re-combining the
data in them
– Usually driven by the need to improve query speed
• Query speed is improved at the expense of more complex or problematic DML
(Data manipulation language) for updates, deletions and insertions.

49
Example
Sample report layout given below, what is its 1NF,2NF &3NF
Table Structure Matches the Report Format
Database Tables and Normalization
• Problems with the Figure 5.1
– The project number is intended to be a primary key, but it
contains nulls.
– The table displays data redundancies.
– The table entries invite data inconsistencies.
– The data redundancies yield the following anomalies:
• Update anomalies.
• Addition anomalies.
• Deletion anomalies.
Database Tables and Normalization

• Conversion to First Normal Form


– A relational table must not contain repeating groups.
– Repeating groups can be eliminated by adding the
appropriate entry in at least the primary key
column(s).
Data Organization: First Normal Form

After

Before
First Normal Form (1 NF)

• 1NF Definition
– The term first normal form (1NF) describes the
tabular format in which:
• All the key attributes are defined.
• There are no repeating groups in the table.
• All attributes are dependent on the primary key.

Dependency
Dependency Diagram
Diagram
– The primary key components are bold, underlined, and shaded in a different
color.
– The arrows above entities indicate all desirable dependencies, i.e., dependencies
that are based on PK.
– The arrows below the dependency diagram indicate less desirable dependencies
-- partial dependencies and transitive dependencies.
Second Normal Form (2 NF)
• Conversion to Second Normal Form
– Starting with the 1NF format, the database can be converted
into the 2NF format by
• Writing each key component on a separate line, and then
writing the original key on the last line and
• Writing the dependent attributes after each new key.

PROJECT (PROJ_NUM, PROJ_NAME)


EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
Dependency Diagram
Second Normal Form (2 NF)

A table is in 2NF if:

• It is in 1NF and
• It includes no partial dependencies; that is,
no attribute is dependent on only a portion
of the primary key.
(It is still possible for a table in 2NF to
exhibit transitive dependency; that is, one
or more attributes may be functionally
dependent on nonkey attributes.)
Third Normal Form (3 NF)

• Conversion to Third Normal Form


– Create a separate table with attributes in a transitive
functional dependence relationship.

PROJECT (PROJ_NUM, PROJ_NAME)


ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
Third Normal Form (3 NF)

• 3NF Definition
– A table is in 3NF if:
• It is in 2NF and
• It contains no transitive dependencies.
Normalization

– Normalization will help us identify correct and


appropriate TABLES.
– Until Now we have 4 tables

PROJECT (PROJ_NUM, PROJ_NAME)


ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
NEXT ........

– We are going to identify the relationships between entities


(tables) including their cardinality, connectivity.
– We have to list out the Business Rules.

PROJECT (PROJ_NUM, PROJ_NAME)


ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
Business Rules
• The company manages many projects.
• Each project requires the services of many employees.
• An employee may be assigned to several different
projects.
• Some employees are not assigned to a project and
perform duties not specifically related to a project. Some
employees are part of a labor pool, to be shared by all
project teams.
• Each employee has a (single) primary job classification.
This job classification determines the hourly billing rate.
• Many employees can have the same job classification.
Normalization and Database Design

• Two Initial Entities:


PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME,
EMP_INITIAL, JOB_DESCRIPTION, JOB_CHG_HOUR)
Normalization and Database Design

•Three Entities After Transitive Dependency Removed

PROJECT (PROJ_NUM, PROJ_NAME)


EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,
JOB_CODE)
JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
The Modified ERD
Creation of Composite Entity ASSIGN
Normalization and Database Design

• Attribute ASSIGN_HOUR is assigned to the composite entity


ASSIGN.
• “Manages” relationship is created between EMPLOYEE and
PROJECT.

PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM)


EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,
EMP_HIREDATE, JOB_CODE)
JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
ASSIGN (ASSIGN_NUM, ASSIGN_DATE, PROJ_NUM, EMP_NUM,
ASSIGN_HOURS)
Relational Schema
SUMMARY
The Initial 1NF Structure
Identifying the Possible PK Attributes
Table Structures Based On
The Selected PKs
Exercise
1. Given the Grade report relation below and its functional dependencies,
normalize the relation

Gradereport (StudNo, StudName, Major, Advisor, CourseNo, Ctitle, InstName,


InstrucLocn, Grade)

Functional Dependencies:
• StudNo -> StudName  
• CourseNo -> Ctitle, InstrucName  
• InstrucName -> InstrucLocn  
• StudNo, CourseNo, Major -> Grade  
• StudNo, Major -> Advisor  
• Advisor -> Major

75
Examples
1NF: Remove repeating groups
– Student (StudNo, StudName)
– StudMajor (StudNo, Major, Advisor)
– StudCourse (StudNo, Major, Course No, Ctitle, InstrucName, InstructLocn,
Grade)

2NF: Remove partial key dependencies


– Student (StudNo, StudName)
– StudMajor (StudNo, Major, Advisor)
– StudCourse (StudNo, Major, CourseNo, Grade)
– Course (CourseNo, Ctitle, InstrucName, InstructLocn)

76
Examples

3NF: Remove transitive dependencies


• Student (StudNo, StudName)
• StudMajor (StudNo, Major, Advisor)
• StudCourse (StudNo,Major, CourseNo, Grade)
• Course (CourseNo, Ctitle, InstrucName)
• Instructor (InstructName, InstructLocn)

77

You might also like