You are on page 1of 57

RDBMS-Day2

Continuation of ER modeling concepts


Logical Database design
Normalization

1
Recap of Day1 session

• RDBMS handles data in the form of relations, tuples and fields

• Keys identify tuples uniquely

• ER modeling is a diagrammatic representation of the conceptual design of a


database

• ER diagrams consist of Entity types, relationship types and attributes

ER/CORP/CRS/DB07/003
Copyright © 2004, 2
Infosys Technologies Ltd Version No: 2.0

Since day 2 is a continuation of day1 content, this recap is done here to maintain the continuity

2
Relationship participation

1 head 1
Employee department
of

tal
partia
l To

ER/CORP/CRS/DB07/003
Copyright © 2004, 3
Infosys Technologies Ltd Version No: 2.0

•All instances of the entity type Employee don’t participate in the relationship, Head-of.

•Every employee doesn’t head a department. So, employee entity type is said to partially participate
in the relationship.

•But, every department would be headed by some employee.

•So, all instances of the entity type Department participate in this relationship. So, we say that it is
total participation from the department side.

3
Attributes of a Relationship

Medicine

Number of days

dosage

Doctor Prescription Patient

ER/CORP/CRS/DB07/003
Copyright © 2004, 4
Infosys Technologies Ltd Version No: 2.0

These attributes best describe the relationship rather than any individual entity

4
Weak entity

E# id name

1 has N dependant
Employee

The dependant entity is represented by a double lined rectangle and


the identifying relationship by a double lined diamond

ER/CORP/CRS/DB07/003
Copyright © 2004, 5
Infosys Technologies Ltd Version No: 2.0

The identifying relationship is the one which relates the weak entity with the strong entity on which it
depends

5
Extended ER features (EER diagrams)

• Supertypes & Subtypes

• Sometimes different entity types are actually specializations of a more general


entity type
– Example: Rose, jasmine, lotus, lily are all flowers

• Some attributes are common to all, others are specific to one entity type

• Represented by a generalization hierarchy

• subtypes may be disjoint or overlapping

Copyright © 2004, ER/CORP/CRS/DB07/003


6
Infosys Technologies Ltd Version No: 2.0

The attributes that are common belong to the supertype and those that are specific
belong to the particular subtype

6
Supertypes & Subtypes
Colour Flower
Subset symbol

U
U

Jasmine Rose Lotus

Jasmine Rose Lotus


attributes attributes attributes

Copyright © 2004, ER/CORP/CRS/DB07/003


7
Infosys Technologies Ltd Version No: 2.0

This depicts a disjoint subset: – e.g a particular flower that is a jasmine can only belong to
the entity set jasmine and it can’t also belong to the set Rose

7
Overlapping Subtypes
Gender
id
Artist
DOB
Address
o

U
U

Singer Dancer actor

Singer Dancer actor


attributes attributes attributes

Copyright © 2004, ER/CORP/CRS/DB07/003


8
Infosys Technologies Ltd Version No: 2.0

The diagram depicts a overlapping subtype relationship– e.g a person who’s a singer could
also be a dancer.

8
Case study

9
A banking scenario
• Banks have customers.
• Customers are identified by name, custid, phone number and address.
• Customers can have one or more accounts
• Accounts are identified by an account number, account type (savings, current)
and a balance.
• Customers can avail loans.
• Loans are identified by loan id, loan type (car, home, personal)
• and an amount.
• Banks are identified by a name, code and the address of the main office.
• Banks have branches.
• Branches are identified by a branch number, branch name and an address.
• Accounts and loans are related to the banks’ branches.
• Create an ER diagram for a database to represent this application

ER/CORP/CRS/DB07/003
Copyright © 2004, 10
Infosys Technologies Ltd Version No: 2.0

Note: The solution that follows is just a sample and not the only solution

10
Solution Step 1: Identify the entities

• Bank
• Branch
• Customer
• Account
• Loan

Solution Step 2: Identify attributes of entities


• Bank Branch Customer
Name Branch# Name
Code Branch name Custid
Address Address Phone
Address
Account Loan
Account number Loan id
Account type Loan type
Balance Amount

ER/CORP/CRS/DB07/003
Copyright © 2004, 11
Infosys Technologies Ltd Version No: 2.0

11
Solution Step 3: Identify relationships between entities
• Bank has Branch
• Branch maintains accounts
• Branch offers loans
• Account is held by customer
• Loan is availed by customer

Solution Step 4: Analyze cardinality of relationships


•Bank has Branch : A bank has many branches->1:N
•Branch maintains accounts: One branch maintains many accounts-> 1:N
•Branch offers loans : One branch offers many loans -> 1:N
•Account is held by customer -> M:N
•Loan is availed by customer ->M:N

ER/CORP/CRS/DB07/003
Copyright © 2004, 12
Infosys Technologies Ltd Version No: 2.0

one account may be held by many customers (joint accounts) and one customer
may hold many accounts->
One loan may be availed by many customers (joint holders) and one customer may
avail many loans (car, housing etc)

12
Solution Step 5: Identify weak entities if any

Branch: Depends on strong entity Bank

Solution Step 6: Identify participation types


• Bank has Branch -> both total
• Branch maintains accounts-> Branch :partial Account: Total
• Branch offers loans -> Branch: partial Loan: Total
• Account is held by customer-> both Total
• Loan is availed by customer-> Loan : Total Customer: Partial

ER/CORP/CRS/DB07/003
Copyright © 2004, 13
Infosys Technologies Ltd Version No: 2.0

13
Represented diagrammatically

ER/CORP/CRS/DB07/003
Copyright © 2004, 14
Infosys Technologies Ltd Version No: 2.0

An ER diagram is typically read top to bottom and left to right

The cardinality, the participation type, the attribute type etc need to be expressed to the extent to
which it is required for the given problem.

14
Logical database design
Converting ER diagrams to relational schema

Logical database design


Process of converting the conceptual model into an equivalent representation in the implementation
model (relational/hierarchic/network etc.)
We will focus on the relational model

Relational database design


Convert ER model into relational schema (a specification of the table definitions and their foreign key
links)
There are well defined rules for this conversion

15
Converting Strong entity types
• Each entity type becomes a table

• Each single-valued attribute becomes a column

• Derived attributes are ignored

• Composite attributes are represented by components

• Multi-valued attributes are represented by a separate table

• The key attribute of the entiry type becomes the primary key of the table
back

ER/CORP/CRS/DB07/003
Copyright © 2004, 16
Infosys Technologies Ltd Version No: 2.0

16
Entity example
• Here address is a composite attribute

• Years of service is a derived attribute


(can be calculated from date of joining
and current date)

• Skill set is a multi-valued attribute

• The relational Schema

Employee (E#, Name, Door_No, Street, City, Pincode, Date_Of_Joining)

Emp_Skillset( E#, Skillset)

ER/CORP/CRS/DB07/003
Copyright © 2004, 17
Infosys Technologies Ltd Version No: 2.0

As per the rules:

Derived attributes are ignored


Composite attributes are represented by components
Multi-valued attributes are represented by a separate table

17
Converting weak entity types
• Weak entity types are converted
into a table of their own, with the
primary key of the strong entity
acting as a foreign key in the table

• This foreign key along with the key


of the weak entity form the
composite primary key of this table

The Relational Schema

Employee (E# ,…….)

Dependant (Employee, Dependant_ID, Name, Address)

ER/CORP/CRS/DB07/003
Copyright © 2004, 18
Infosys Technologies Ltd Version No: 2.0

Here dependant is a weak entity. Dependant doesn’t mean anything to the problem without the
information on for which employee the person is a dependant.

18
Converting relationships
• The way relationships are represented depends on the cardinality and the
degree of the relationship

• The possible cardinalities are:


• 1:1, 1:M, N:M

• The degrees are:


• Unary
• Binary
• Ternary …

ER/CORP/CRS/DB07/003
Copyright © 2004, 19
Infosys Technologies Ltd Version No: 2.0

19
Unary 1:1
•Consider employees who are also a
couple

•The primary key field itself will become


foreign key in the same table

• Employee( E#, Name,... Married_to)

ER/CORP/CRS/DB07/003
Copyright © 2004, 20
Infosys Technologies Ltd Version No: 2.0

20
Unary 1:N
•The primary key field itself will
become foreign key in the same
table

•Same as unary 1:1

• Employee( E#, Name,…,Manager)

ER/CORP/CRS/DB07/003
Copyright © 2004, 21
Infosys Technologies Ltd Version No: 2.0

21
Unary M:N
Guarantor_of
M

Employee N

• There will be two resulting tables. One to represent the entity and another to
represent the M:N relationship as follows

• Employee( E#, Name,…)

• Guaranty( Guarantor, beneficiary)

ER/CORP/CRS/DB07/003
Copyright © 2004, 22
Infosys Technologies Ltd Version No: 2.0

22
Binary 1:1

1 head 1
Employee department
of
l
partia

ta
To
l
Case 1:Combination of participation types

• The primary key of the partial participant will become the foreign key of the
total participant

• Employee( E#, Name,…)

• Department (Dept#, Name…,Head)

ER/CORP/CRS/DB07/003
Copyright © 2004, 23
Infosys Technologies Ltd Version No: 2.0

23
Binary 1:1

Employee Sits_on CHAIR

Case 2: Uniform participation types

• The primary key of either of the participants can become a foreign key in the
other

• Employee (E#,name…)

• Chair( item#, model, location, used_by)


• (or)
• Employee ( E#, Name….Sits_on)

• Chair (item#,….)

ER/CORP/CRS/DB07/003
Copyright © 2004, 24
Infosys Technologies Ltd Version No: 2.0

24
Binary 1:N

1 N
Teacher teaches Subject

• The primary key of the relation on the “1” side of the relationship becomes a foreign
key in the relation on the “N” side

• Teacher (ID, Name, Telephone, ...)

• Subject (Code, Name, ..., Teacher)

ER/CORP/CRS/DB07/003
Copyright © 2004, 25
Infosys Technologies Ltd Version No: 2.0

25
Binary M:N
N
M
Borrowed by
Book Employee

• A new table is created to represent the relationship

• Contains two foreign keys - one from each of the participants in the relationship

• The primary key of the new table is the combination of the two foreign keys

• Book (Acc#,Title…) Employee (E#,Name,…)

• Issue (Book#, Borrower#)

ER/CORP/CRS/DB07/003
Copyright © 2004, 26
Infosys Technologies Ltd Version No: 2.0

26
Ternary relationship
• Represented by a new table

• The new table contains three foreign keys -


one from each of the participating Entities

• The primary key of the new table is the


combination of all three foreign keys

• Prescription (Doctor#, Patient #,


Medicine_Name)

ER/CORP/CRS/DB07/003
Copyright © 2004, 27
Infosys Technologies Ltd Version No: 2.0

27
Normalization

28
A well structured table
• Well-structured table - contains minimal redundancy and allows users to insert, modify,
and delete the rows without errors or inconsistencies.

• The possible anomalies could be

• Insertion Anomalies
• Deletion Anomalies
• Modification Anomalies

• An Example table:

Member Member
Project# Project_name Member#
Name Address

ER/CORP/CRS/DB07/003
Copyright © 2004, 29
Infosys Technologies Ltd Version No: 2.0

Insertion Anomalies

Experienced when we attempt to store a value for one field but cannot do so because the
value of another field is unknown
e.g., cannot add a new employee information until he /she is assigned a project

Deletion Anomalies
Experienced when value of a field is unexpectedly removed when value for another field is
deleted
e.g., cannot delete the project detail (project could be dropped) of the employee without
deleting the only copy of the employee information

Modification Anomalies
Experienced when changes to multiple records of a table are needed to update a single
value of a field
e.g., for changing an employee’s address, it has to be changed in every row corresponding
to the project he/she would be a member of

29
Normalization
• The formal process that can be followed to achieve a good database design

• Also used to check that an existing design is of good quality

• The different stages of normalization are known as “normal forms”

ER/CORP/CRS/DB07/003
Copyright © 2004, 30
Infosys Technologies Ltd Version No: 2.0

To understand this, we need to understand the concept of functional dependency

30
Functional dependency
• An attribute Y of a relation schema R is functionally dependent on another
attribute X of R if the value of the attribute X uniquely determines the value of
the attribute Y x -> y

• We say here “x determines y” or “y is functionally dependent on x”


• X→Y does not imply that Y→X

• If the value of an attribute “Marks” is known then the value of an attribute


“Grade” is determined since Marks→ Grade

• Types of functional dependencies:

– Full dependency
– Partial dependency
– Transitive dependency

ER/CORP/CRS/DB07/003
Copyright © 2004, 31
Infosys Technologies Ltd Version No: 2.0

31
Full dependencies

An attribute B of a relation R is fully functionally dependent on


attribute A of R if it is functionally dependent on A & not
functionally dependent on any proper subset of A.
Report( S#,C#,Title,Lname,Room#,Marks)

S#, C# Marks

ER/CORP/CRS/DB07/003
Copyright © 2004, 32
Infosys Technologies Ltd Version No: 2.0

This implies that for a given pair of (S#,C#) values occurring in the relation Report there is exactly
one value of Marks. ie Marks are dependent on S# & C# as a composite pair, but not on either
individually

32
Partial dependencies
An attribute B of a relation R is partially dependent on attribute
A of R if it is functionally dependent on any proper subset of A.

Report( S#,C#,Title,Lname,Room#,Marks)

C# Title

C# LName

ER/CORP/CRS/DB07/003
Copyright © 2004, 33
Infosys Technologies Ltd Version No: 2.0

The attributes Title, LName are said to be partially dependent on the key (S#, C#) since they are
dependent only on C# and not on S#.

33
Transitive dependencies

An attribute B of a relation R is transitively dependent on


attribute A of R if it is functionally dependent on an attribute C
Which in turn is functionally dependent on A or any proper
subset of A.
Report( S#,C#,Title,Lname,Room#,Marks)

C# LName LName Room#

ER/CORP/CRS/DB07/003
Copyright © 2004, 34
Infosys Technologies Ltd Version No: 2.0

The attribute Room# is said to be transitively dependent on the key C# since it is dependent on
LName which in turn is dependent on C#.

34
First normal form: 1NF
• A relation schema is in 1NF if all of its attributes are:
• single-valued
• restricted to assuming atomic values,
• functionally dependent on the primary key

• 1NF implies:
• Composite attributes are represented only by their component attributes

• Attributes cannot have multiple values

• Attributes cannot have complete tuples as values

ER/CORP/CRS/DB07/003
Copyright © 2004, 35
Infosys Technologies Ltd Version No: 2.0

In relational database design it is not practically possible to have a table which is not in 1NF.

35
Prime Vs Non-Prime Attributes

• An attribute of a relation R that belongs to any key of R is said to be a prime


attribute and that which doesn’t is a non-prime attribute

– E.g Report( S#,C#,Title,Lname,Room#,Marks)

• S# is a prime attribute
• C# is a prime attribute
• Title is a non-prime attribute

ER/CORP/CRS/DB07/003
Copyright © 2004, 36
Infosys Technologies Ltd Version No: 2.0

36
Second normal form: 2NF
• A relation schema R is in 2NF if it is in 1NF and every non-prime attribute is
fully functionally dependent on every key of R

Consider the relational schema:


Empdetails( E#, Project#, Role, Number_Of_shares, Share_worth)

In this,
E#, Project# -> Role

E# -> Number_Of_shares

ER/CORP/CRS/DB07/003
Copyright © 2004, 37
Infosys Technologies Ltd Version No: 2.0

Number_of_shares depends only on E# irrespective of the project currently working


on. i.e. partial dependency

37
A typical snapshot may look like…

E# Project# Role Num_of_shares Share_worth

102 Abc DV 100 5,00,000


119 Ppq ML 150 7,50,000
198 Abc DV 100 5,00,000
102 Hjk ML 100 5,00,000
127 Edf DV 200 10,00,000
102 Bnm ML 100 5,00,000

ER/CORP/CRS/DB07/003
Copyright © 2004, 38
Infosys Technologies Ltd Version No: 2.0

The share worth is fixed for number of shares and is unnecessarily repeated
The stock detail is repeated for every project information of an employee. So if more shares are
allotted to the employee, it needs to be updated on multiple rows
If an employee is not yet allotted to any project, the remaining information can’t be captured

38
After decomposing…
Empdetails( E#, Project#, Role, Loan_amount, Loan_type, Interest_rate)

Becomes

Emp_Project (E#, P#, Role)

Emp_Stock (E#, Num_of_Shares, Share_worth)

• This avoids the anomalies that were present in the original relation

• Still redundancy remains due to the transitive dependency

E#->Num_of_shares

Num_of_shares->Share_worth

ER/CORP/CRS/DB07/003
Copyright © 2004, 39
Infosys Technologies Ltd Version No: 2.0

39
Third normal form:3 NF

A relation schema R is in 3NF if it is in 2NF and every non-prime attribute is


non-transitively dependent on every key of R

• Applying this, the relation


Emp_Stock (E#, Num_of_Shares,Share_worth)

Will be decomposed into

Emp_Stock( E#, Num_of_Shares)


StockWorth (Num_of_Shares,Share_worth)

ER/CORP/CRS/DB07/003
Copyright © 2004, 40
Infosys Technologies Ltd Version No: 2.0

40
Boyce-Codd normal form:BCNF

A relation R is in BCNF if, for every non-trivial functional dependency A->B in it, it
is true that A is a superkey of R In other words, every determinant is a candidate
key

•BCNF is a stronger form of 3NF

•3NF states that every non-prime attribute must be non-transitively dependent


on every key

•BCNF states that every attribute (prime or non-prime) must be non-


transitively dependent on every key

ER/CORP/CRS/DB07/003
Copyright © 2004, 41
Infosys Technologies Ltd Version No: 2.0

41
An example

• Consider the relation:


– Courses (Dept#, Course#, Lecturer#, Num_Students)
• Assumptions:
– Each Department offers may courses
– Course# is unique within a Department only*
– Each Lecturer belongs to one Dept only
– Each Lecturer may handle several courses within the dept.
– A particular course offered by a department may be handled by a single
lecturer.

ER/CORP/CRS/DB07/003
Copyright © 2004, 42
Infosys Technologies Ltd Version No: 2.0

* The same course id may refer to a different course offered by a different department

42
The functional dependencies
• {Dept#,Course#}->Lecturer#

• {Dept#,Course#}-> Num-of_students

• {Lecturer#,Course#}->Num-of_students

• Lecturer# -> Course#

The candidate keys are:

• {Dept#,Course#}

• {Lecturer#,Course#}

ER/CORP/CRS/DB07/003
Copyright © 2004, 43
Infosys Technologies Ltd Version No: 2.0

43
A sample table
Dept# Course# Lecturer# Num_students

D1 C1 L1 20
D1 C2 L1 15
D1 C3 L2 42
… … … …
D2 C5 L3 12
D2 C6 L4 19

ER/CORP/CRS/DB07/003
Copyright © 2004, 44
Infosys Technologies Ltd Version No: 2.0

44
Observations
• In the table, the only non-prime attribute is Num-of_students.

• It depends on every key of the table non-transitively

• So, it is in 3NF

• But, the fact that Lecturer L1 belongs to department D1 is repeated –


redundancy

• Lecturer#->Dept#. In this, the attribute Dept# is only partially dependent on the


key

The solution
• Course_Offering(Lecturer#, Course#, Num-of-Students)

• Lecturer(Lecturer#, Dept#)

ER/CORP/CRS/DB07/003
Copyright © 2004, 45
Infosys Technologies Ltd Version No: 2.0

Every determinant is a candidate key. So, BCNF

45
Denormalization Normalised tables

• Denormalization can be described as a


process for reducing the degree of
normalization with the aim of improving
query processing performance

• However, reducing the degree of


normalization of a table may lead to
inconsistencies and this option has to be
dealt with after careful thinking
After Denormalization

• The usefulness of Denormalization is as


such debatable

ER/CORP/CRS/DB07/003
Copyright © 2004, 46
Infosys Technologies Ltd Version No: 2.0

Denormalization may not always give an optimal solution

46
Exercise- Recruitment
• The HR dept of an organization is planning for a big recruitment drive.
• They wish to organize the data required for the process, in a database. The data that
needs to be captured is as follows:

Applicant Interviewer Qualifications


Enrollment id, Employee id Qualification(hsc, sslc, etc),
Name, Name year of passing,
Address, Extension number University/board
Date of birth, class
Gender,
Telephone no Previous employment
Employer name
Address
Telephone
Designation
Reason for leaving
Date joined
Date left
Last drawn salary.

ER/CORP/CRS/DB07/003
Copyright © 2004, 47
Infosys Technologies Ltd Version No: 2.0

47
Functional dependencies
• Enroll#, -> Name,
• Enroll ->Address,
• Enroll ->DOB,
• Enroll -> Gender,
• Enroll -> Phone,
• Enroll -> interviewer
• Interviewer -> Int_Name (transitive dependency)
• Interviewer -> Extension (transitive dependency)

Qualifications
{Enroll#, qualification, year_of_passing } -> awarded_by
{Enroll#, qualification, year_of_passing } -> class
Assumptions:
A person may acquire the same qualification several times from the same
university (e.g M.A in english, M.A in history)
Only one degree can be obtained in an year

ER/CORP/CRS/DB07/003
Copyright © 2004, 48
Infosys Technologies Ltd Version No: 2.0

48
Functional dependencies
Employment
Enroll#, Employername,date_joined ->designation
Enroll#, Employername, date_joined ->
reason_for_Leaving,
Enroll#, Employername, date_joined -> date_left
Enroll#, Employername, date_joined -> last_slary
Employername -> address (partial dependency)
Employername -> telephone (partial dependency)

ER/CORP/CRS/DB07/003
Copyright © 2004, 49
Infosys Technologies Ltd Version No: 2.0

49
1NF
• Applicant( Enroll#, Name, Address, DOB, Gender, Phone, interviewer,
Int_Name, Extension)

• Qualifications( Enroll#, qualification, year_of_passing, awarded by ,class)

• Employment( Enroll#, Employername, date_joined, address, telephone,


designation, reason_for_Leaving, date_left, last_slary)

ER/CORP/CRS/DB07/003
Copyright © 2004, 50
Infosys Technologies Ltd Version No: 2.0

50
2NF
• Applicant( Enroll#, Name, Address, DOB, Gender, Phone, interviewer,
Int_Name, Extension)

• Qualifications( Enroll#, qualification, year_of_passing, awarded_by ,class)

• Employment( Enroll#, Employername, date_joined, designation,


reason_for_Leaving, date_left, last_slary )

• Employer( Employername, Address, Phone)

Removal of partial dependencies


Copyright © 2004, 51
ER/CORP/CRS/DB07/003
Infosys Technologies Ltd Version No: 2.0

51
3NF
• Applicant( Enroll#, Name, Address, DOB,
Gender, Phone, interviewer)

Panel( interviewer, Name, Extn)

Removal of transitive dependencies

ER/CORP/CRS/DB07/003
Copyright © 2004, 52
Infosys Technologies Ltd Version No: 2.0

52
Class work

53
Caterers’ association
• A caterer’s association in a city wants to build a database of all the caterers
who are members of the association. Every type of item has got an item code
for e.g idly- itemcode bf1, dosa bf2, aloo paratha- bf 3, south indian meals –
ln1etc. Each caterer may have outlets spread over the city. It is not necessary
that all caterers or all branches of a single caterer must provide all item types.
Design the relational schema for the above requirement. Normalize the
relations.

ER/CORP/CRS/DB07/003
Copyright © 2004, 54
Infosys Technologies Ltd Version No: 2.0

54
The data
• Unnormalized Data Items for Caterer’s Association
Caterer name
Membership id
main location
Branches 1..n
item ID 1...n
item Name 1...n
location where available 1...n
Caterer grade

ER/CORP/CRS/DB07/003
Copyright © 2004, 55
Infosys Technologies Ltd Version No: 2.0

55
Summary
• Entity types could participate in relationships fully or partially
• Extended ER features represent generalization-specialization hierarchy etc
• There are guidelines for converting entities, relationships , attributes into
corresponding equivalent in relational model
• Functional dependencies could be full, partial, or transient
• If a table contains atomic values then it is in 1NF
• If a table is in 1NF and also free of partial dependencies, it is in 2NF
• If a table is in 2NF and is free of transitive dependencies then it is in 3NF
• If a table is in 3NF and every determinant is a candidate key, then it is in BCNF

ER/CORP/CRS/DB07/003
Copyright © 2004, 56
Infosys Technologies Ltd Version No: 2.0

56
Thank You!

ER/CORP/CRS/DB07/003
Copyright © 2004, 57
Infosys Technologies Ltd Version No: 2.0

57

You might also like