Professional Documents
Culture Documents
• Relational Model
– The data and the relations between them are
organized in tables.
– A table is a collection of records and each record in
a table contains the same fields organized in
columns.
– The records in the table form the rows of the table.
2
Properties of Relational Model
– Column values are of the same kind
– The Sequence of columns is insignificant
– The Sequence of rows is insignificant
– Each Column has a unique name
• Rows (tuples) in a single relation are unique.
• Relations are set of tuples.
• Attributes are atomic.
• The values of a column are drawn from the domain
associated with that column.
• The relation names in a relational database are distinct.
3
Structure of Relational Database
• Relation
– A two-dimensional table
• The columns are representing the attributes of the relation
• The rows (other than the heading row) represent tuples (records)
of the relation.
– Example
EMPLOYEES
EmpId Name BDate HAddress Phone
E001 Alemu Girma 01/10/70 Bole, o2, 1435 011-663-0712
E004 Kelem Belete 12/04/68 Arada, 01, 035 011-227-2525
4
Relational Schema
• A relation in a relational model consists of:
– The Relation schema: - that describes the column heads for
the table, and
– The Relation instance: - that is the table with the set of
tuples.
5
Database Schema
• Database Schema
– The set of relation schema forms the schema for the relational
database known as database schema (relational database
schema).
• Example
– Employees (EmpId:string, Name:string, BDate:date, SubCity:string,
Kebel:integer, Phone:string)
– Projects (PrjId:integer, Name:string, SDate:date, DDate:date,
CDate:date)
– Teams (Name:string, Descr:string)
• Database Instance
– A snapshot of the data in the database at a given instant of a given
time.
6
Relational Model Constraints
• Constraints define possible database states
• Represents the semantics of the domain being modeled
1. Domain constraints
• Domain constraints describe that the values of an attribute
A must be an atomic value from the domain of A – dom (A).
7
Relational Model Constraints (cont.)
2. Key constraints
– Is a statement that shows a certain minimal subset of the
attributes of a relation is a unique identifier for a tuple in
the relation.
– A primary key is chosen that uniquely identifies a tuple in
the relation
– The primary Key attributes are indicated by underlying the
attributes in the relation schema.
• Example
– Employees (EmpId, Name, BDate, SubCity, Kebel, Phone)
– Projects (PrjId, Name, SDate, DDate, CDate)
– Teams (Name, Descr)
8
Relational Constraints (cont.)
3. Entity Integrity constraint
• A primary key must not contain a NULL value
– Else it may not be possible to identify some tuples
– For Example, if more than one tuple has a null value
in its primary key, we may not be able to distinguish
them
9
Relational Constraints (cont.)
4. Foreign key and Referential Integrity constraint
• Consider following 2 relation schemas:
– R1(A1, A2, ...An) and R2(B1, B2, ... Bm)
• Let PK be subset of {A1, ...,An} be primary key of R1
• A set of attributes FK is a foreign key of R2 if:
Attributes in FK have same domain as the attributes in PK of R1
For all tuples t2 in R2, there exists a tuple t1 in R1 such that
R2( t2[FK] )= R1( t1[PK]).
A referential integrity constraint from attributes FK of R2 to R1
means that FK is a foreign that refers to the primary key of R1.
10
Relational Constraints (cont.)
4. Foreign key and Referential Integrity constraint cont..
– It helpful to keeps data consistency
– The foreign key in the referencing relation requires a
match to a primary key in the referenced relation
• A relational database consists of related relations
through a foreign key
• Example
– Employees (EmpId, Name, BDate, SubCity, Kebel, Phone)
– WorkSchedule (SDate, EDate, HoursPerDay, Emp-ID)
11
Relational Constraints (cont.)
5. General Integrity constraint
• These constraints specify certain requirements for the
attribute to satisfy.
Example:
– We may require that The maximum working hours of employees
is 48 per week;
12
Mapping E/R Data Model to Relational
Model
14
E/R Diagram
Name Name
Descr EmpId BDate
Age
TEAMS Assigned EMPLOYEES
Address
Name
ProjId CustId Address
SDate DDate
15
•
Entity
Strong entity set
Sets to Relations
– Mapped to relations in relational model with
• the same name, and
• attributes.
– The primary keys assigned for the entity sets are also represented as keys
in the relations.
• Weak entity set
– For
• Weak entity set: W(a1, a2, … ai, ai+1… an)
• Identifying strong entity set: E(b1, b2, … bk, bk+1, bk+2, … bm)
– Then the relation for the weak entity set will be
• W(a1, a2, … ai, ai+1… an, b1, b2, … bk)
– The primary key for the weak entity set relation thus include:
• The discriminator of the weak entity set, and
• The primary key of the identifying strong entity set.
16
From E/R model to Relational Model
– Employees (EmpId, Name, BDate, age, …)
– Projects(ProjId, Name, SDate, DDate)
– Customers(CustId, Name, Address)
– Teams(ProjId, Name, Descr)
17
Attributes
• Composite attributes
– Separate attributes for each of the components of the attributes
Example:
– Employees (EmpId, Name, BDate, Age, City, Subcity, Kebele, HNo)
• Multivalued attributes
– Relations with the name of the attribute having attributes that corresponds
to:
• The components of the multivalued attribute, and
• The primary key of the entity set or relationship set of which the attribute
belongs.
– The primary key for the newly created relation consists of:
• The primary key of the entity set or relationship set, and
• The attribute or set of attributes from the multivalued attribute.
• Example
– Phone addresses (Phone1, Phone2, EmpId) or
• Phone addresses (phone, EmpId)
18
Relationship Sets to Relations: 1:1 Relationships
19
Relationship Sets to Relations: 1:M Relationships
• For each binary 1:M relationship set R:
a. Identify the relation S that represents the participating entity
set at the M-side of the relationship set
b. Include as foreign key in S the primary key of the
relation
T that represents the other entity set participating in R
• Each entity instance on the M-side is related to at most
one entity instance on the 1-side of the relationship set
• Any atomic attributes of the 1:M relationship set are
attributes of S.
20
Relationship to Relations: M:N Relationships
• For each binary M:N relationship type R:
a. Create a new relation S to represent the participating entity
sets
21
Generalization and Specialization (cont.)
Empno Name Salary
EMPLOYEES
ISA
FULL-TIME PART-TIME
EMPLOYEES EMPLOYEES
22
Generalization and Specialization (cont.)
• E/R Style
– Employees (empId, name, …)
– FullTimeEmp (empId, salary, …)
– PartTimeEmp (empId, hRate, hperWeek…)
• Use of Nulls
– Employees (empId, name, …, salary, …, hRate, hperWeek…)
• OO Approach
– Employees (empId, name, …)
– FullTimeEmp (empId, name, …, salary, …)
– PartTimeEmp (empId, name, …, hRate, hperWeek…)
23
Generalization and Specialization (cont.)
• Disjoint Generalization
– Total Participation
– Separate relation for each subclasses
Fulltime (EmpId, Name, Salary, Rank)
Parttime (EmpId, Name, Salary, hperweek,hrate)
– Partial Participation
– One relation with all attributes of the super class and the subclass
– And include an attribute type to indicate the type of sub class an entity belongs
too
Employee (EmpId, Name, Salary, Rank, hperweek, hrate, jobtype)
– Overlapping Generalization
• Separate relation for each sub class and the superclass
Employee (EmpId,Name,Salary)
Fulltime (EmpId,Rank),
Parttime (EmpId,hperweek,hrate)
• One big relation that consists the attributes of the sub classes and the superclass and
a boolean attribute that stores values true/false indicating to which sub class type an
entity belongs:
Employee(empId, Name, Salary, rank, hperweek, hrate, Isparttime, Isfulltime) 24
Exercise
• Identify possible keys for each entity set and Map
the E/R data model to Relational data model.
∞ ∞
Publication
has
code includes
title
1 ∞
Department 1 publishes
∞ Report ∞
contains
∞ Topic_Area
∞
title
name
name
written_for address
address
name research area
∞
Contractor address
25
Dependencies
26
Dependencies
• The two most common pitfalls in a database are:
– Redundancy
• Repetition of information
– Loss of information
• Inability to present certain information
27
•
Functional Dependencies
Let’s assume X and Y are non empty sets of attributes of a relation R
• We say for an instance r in R satisfies the FD XY, if the following holds true for
every pair of tuples t1 and t2 in the relation instance r
– If t1[X]=t2[X], then t1[Y]=t2[Y]
• X Y in R specifies a constraint on all relation instances r(R)
• FDs are derived from the real-world constraints on the attributes
– The key attribute of a relation functionally determines all attributes of the
relation
Examples:
– For Employees relation
For Teams relation
» empId name projId, teamName description
» empId bDate
» empId gender
28
Functional Dependencies (cont.)
EMPLOYEES
EmpId Name BDate HAddress Phone
E001 Alemu Girma 01/10/70 Bole, o2, 1435 011-663-0712
E004 Kelem Belete 12/04/68 Arada, 01, 035 011-227-2525
Dependency Diagram
• It is a pictorial representation of all the functional dependencies in a
database
• An attribute is represented by a rectangle
• An arrow is drawn from the attribute set A to attribute set B whenever
the dependency AB holds true
Example- FD: EmpIdName, Bdate, HAddress
29
Types of functional dependencies
• Partial dependency
– It is a functional dependency in which a non-primary key
attribute/attributes functionally depend on part of a primary
key
• E.g. EmpId Name,Bdate
o Here the non-key attributes Name and Bdate are
determined by the partial key attribute EmpId in the
relation
• Transitive dependency
– It refers to a functional dependency in which none of the
attributes involves a primary key
30
Types of Functional Dependencies
31
Normal Forms & Normalization
32
Data Anomalies
• Well structured relations contain minimal redundancy of data
– They allow modification, insertion and deletion of data in the relation
without error
– Data Anomalies are errors/inconsistencies that arise due to redundantly
stored data in a relation
– The three most common anomalies in relational database design are:
• Insertion anomalies
• Deletion anomalies
• Modification anomalies (update anomalies)
33
Data Anomalies: Insertion anomalies
Insertion Anomalies
– These type of data anomalies occur when we try to insert new records to a
relation
– E.g. We can not insert a project unless an employee is assigned in a team to
work in a project
• We can not insert an employee unless he/she is assigned to a project
• We can insert name of the employee and leave the rest of the attributes
NULL, but this violates the integrity constraint
Deletion Anomalies
• These type of anomalies occur when critical data has been unintentionally
(perhaps) removed from the database
• E.g If we delete all the records for a project, then all employees who work
on that project will be deleted.
34
Data Anomalies
Update anomalies
• These problems arise when the database must make multiple
changes on records to reflect a single attribute change
35
Use of Attribute closure: Super keys & Candidate keys
• A set of attributes that determine the entire tuple is a super
key
• Example: Students (stud-id, name, dept, advisor)
– {stud-id, name} is a super key for the student table.
– Also {stud-id, name, advisor} etc.
• A minimal set of attributes that determines the entire tuple is
a candidate key
– {stud-id, name} is not a candidate key because we can remove the
name and have a super key
– stud-id is a candidate key
• If there are multiple candidate keys, the DB designer chooses
and designates one as the primary key
36
Normalization and Normal Forms
• Normalization is the process of decomposing unsatisfactory "bad"
relation R into fragments (i.e., smaller tables) R1, R2,.., Rn. Our goals
are:
– Lossless decomposition: The fragments should contain the same
information as the original table. Otherwise decomposition
results in information loss
– Dependency preservation: Dependencies should be preserved
within each Ri , i.e., otherwise, checking updates for violation of
functional dependencies may require computing joins, which is
expensive
• To retrieve complete information from a normalized database,
the JOIN operation must be used
– Good form: The fragments Ri should not involve redundancy.
37
Computing a Decomposition
• Decomposition
– Let R be a relation schema. The collection of {R1,
…,Rn} of relation schemas is a decomposition of R if
– R = R1U R2U….U Rn
38
Lossless-Join Decompositions
• We should be able to construct the instance of the original table from the
instances of the tables in the decomposition. Consider replacing the Marks
relation below by decomposing it into two relations: SGM and AM
39
Normal Forms
• Edgar F. Codd originally established three normal forms:
• 1NF
• 2NF and
• 3NF
– Later, others like BCNF, 4NF and 5NF were introduced and were
generally accepted
– 3NF is widely considered to be sufficient for most applications
40
Normal Forms: First Normal Form (1NF)
• A relation (table) R is in 1NF if and only if all underlying
domains of attributes contain only atomic values
(simple/non divisible)
• 1NF allows removal of multivalued attributes,
composite attributes and their combination in the
relational schema
• Normalization (Decomposition)
– Form new relation for each non-atomic attribute or nested
relations
A table is in 1NF when all the key attributes are defined (no repeating groups in
the table) and when all remaining attributes are dependent on the primary key.
However, a table in 1NF still may contain partial or/and transitive dependencies
41
Normal Forms: First Normal Form (1NF)
• Consider the following relation Students:
Student
Stud-Id Name Course-Id Credits
2003 Alemu IT 3106 4
2003 Alemu IT 2301 4
1874 Tarik IT 2145 4
42
Normal Forms: First Normal Form (1NF) (contd.)
Composite
Primary Key
43
Normal Forms: First Normal Form (1NF) (contd.)
• Option2: Remove the entire repeating group from the relation and create
another relation which would contain all the attributes of the repeating group,
plus the primary key from the first relation
44
Normal Forms: Second Normal Form (2NF)
• A relation schema R is in 2NF if
– it is in 1NF, and
– every non-prime attribute (attribute that is a member of the
primary key) A in R is fully functionally dependent on the
primary key
• Normalization (Decomposition)
– Decompose and set up a new relation for each partial key
with its dependent attribute (s)
– Keep a relation with the original primary key and any
attributes that are fully functionally dependent on it.
45
Normal Forms: Second Normal Form (2NF)
• Assume we have the following student Composite Partial
relation Primary Key dependencies
46
Normal Forms: Third Normal Form (3NF)
• A relation schema R is in 3NF if
– It is in 2NF, and
– There would be no non-prime attribute of R that has
transitive dependencies on the primary key
• Normalization (Decomposition)
Remove the attributes, which are dependent on a non-
key attribute, from the original relation.
For each transitive dependency, create a new relation
with the non-key attribute which is a determinant in the
transitive dependency as a primary key, and the
dependent non-key attribute as a dependent attribute
47
Normal Forms: Third Normal Form (3NF)
Example: Let’s consider the Employees Transitive
relation below: Dependency
Goal: Remove transitive dependency
48
Denormalization
• Normalization is performed to reduce or eliminate Insertion, Deletion or
Update anomalies
• However, a completely normalized database may not be the most efficient or
effective implementation
• “Denormalization” is sometimes used to improve efficiency.
Denormalization
– Is the process of selectively taking normalized tables and re-combining the
data in them
– Usually driven by the need to improve query speed
• Query speed is improved at the expense of more complex or problematic DML
(Data manipulation language) for updates, deletions and insertions.
49
Example
Sample report layout given below, what is its 1NF,2NF &3NF
Table Structure Matches the Report Format
Database Tables and Normalization
• Problems with the Figure 5.1
– The project number is intended to be a primary key, but it
contains nulls.
– The table displays data redundancies.
– The table entries invite data inconsistencies.
– The data redundancies yield the following anomalies:
• Update anomalies.
• Addition anomalies.
• Deletion anomalies.
Database Tables and Normalization
After
Before
First Normal Form (1 NF)
• 1NF Definition
– The term first normal form (1NF) describes the
tabular format in which:
• All the key attributes are defined.
• There are no repeating groups in the table.
• All attributes are dependent on the primary key.
•
Dependency
Dependency Diagram
Diagram
– The primary key components are bold, underlined, and shaded in a different
color.
– The arrows above entities indicate all desirable dependencies, i.e., dependencies
that are based on PK.
– The arrows below the dependency diagram indicate less desirable dependencies
-- partial dependencies and transitive dependencies.
Second Normal Form (2 NF)
• Conversion to Second Normal Form
– Starting with the 1NF format, the database can be converted
into the 2NF format by
• Writing each key component on a separate line, and then
writing the original key on the last line and
• Writing the dependent attributes after each new key.
• It is in 1NF and
• It includes no partial dependencies; that is,
no attribute is dependent on only a portion
of the primary key.
(It is still possible for a table in 2NF to
exhibit transitive dependency; that is, one
or more attributes may be functionally
dependent on nonkey attributes.)
Third Normal Form (3 NF)
• 3NF Definition
– A table is in 3NF if:
• It is in 2NF and
• It contains no transitive dependencies.
Normalization
Functional Dependencies:
• StudNo -> StudName
• CourseNo -> Ctitle, InstrucName
• InstrucName -> InstrucLocn
• StudNo, CourseNo, Major -> Grade
• StudNo, Major -> Advisor
• Advisor -> Major
75
Examples
1NF: Remove repeating groups
– Student (StudNo, StudName)
– StudMajor (StudNo, Major, Advisor)
– StudCourse (StudNo, Major, Course No, Ctitle, InstrucName, InstructLocn,
Grade)
76
Examples
77