You are on page 1of 101

Please read this disclaimer before proceeding:

This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
20IT403
DATABASE MANAGEMENT SYSTEMS

Department: IT
Batch/Year:2020 – 2024/II
Created by:
Ms.R.ASHA
Assistant Professor
Date:09.03.2022
1.TABLE OF CONTENTS

1. Contents
2. Course Objectives

3. Pre Requisites

4. Syllabus

5. Course outcomes

6. CO- PO/PSO Mapping

7. Lecture Plan

8. Activity based learning

9. Lecture Notes

10. Assignments

11. Part A Question & Answer

12. Part B Question & Answer

13. Supportive online Certification courses

14. Real time Applications in day to day life and to Industry

15. Contents beyond the Syllabus

16. Assessment Schedule

17. Prescribed Text Books & Reference Books

18. Mini Project suggestions


2. COURSE OBJECTIVES

To understand the basic concepts of Data modeling and Database Systems.

To understand SQL and effective relational database design concepts.

To know the fundamental concepts of transaction processing, concurrency control

techniques and recovery procedure.

To understand efficient data querying and updates, with needed configuration

To learn how to efficiently design and implement various database objects and
entities

6
3. PRE REQUISITES

• 20GE101 Problem Solving and C Programming

• 20CS201 Data Structures

7
4. SYLLABUS
DATABASE MANAGEMENT SYSTEMS

UNIT I DATABASE CONCEPTS 9

Concept of Database and Overview of DBMS - Characteristics of databases, Database


Language, Types of DBMS architecture – Three-Schema Architecture - Introductions
to data models types- ER Model- ER Diagrams Extended ER Diagram reducing ER to
table Applications: ER model of University Database Application.
SQL fundamentals Views - Integrity Procedures, Functions, Cursor and Triggers
Embedded SQL Dynamic SQL.

UNIT II DATABASE DESIGN 9


Design a DB for Car Insurance Company - Draw ER diagram and convert ER model to
relational schema. Evaluating data model quality - The relational Model Schema Keys-
Relational Algebra Domain Relational Calculus- Tuple Relational Calculus -
Fundamental operations. Relational Database Design and Querying Undesirable
Properties of Relations Functional Dependency: Closures- Single Valued Dependency
Single valued Normalization (1NF, 2NF 3NF and BCNF) - Desirable properties of
Decompositions 4NF - 5NF De-normalization

UNIT III TRANSACTIONS 9

Transaction Concepts – ACID Properties – Schedules – Serializability – Concurrency


Control – Need for Concurrency – Locking Protocols – Two Phase Locking – Deadlock
– Transaction Recovery - Save Points – Isolation Levels – SQL Facilities for
Concurrency and Recovery

UNIT IV DATA STORAGE AND QUERYING 9


RAID – File Organization – Organization of Records in Files – Indexing and Hashing
–Ordered Indices – B+ tree Index Files – B tree Index Files – Static Hashing –
Dynamic Hashing – Overview of physical storage structure- stable storage, failure
classification -log based recovery, deferred database modification, check-pointing-
File Structures:-Index structures-Primary, Secondary and clustering indices. Single
and multilevel indexing.

Query Processing Overview – Algorithms for SELECT and JOIN operations – Query
optimization using Heuristics and Cost Estimation

UNIT V ADAVNCED TOPICS 9

Distributed database Implementation Concurrent transactions - Concurrency control


Lock based Time stamping-Validation based. NoSQL, NoSQL Categories - Designing
an enterprise database system - Client Server database
5. COURSE OUTCOMES

CO1: Implement SQL and effective relational database design concepts.

CO2: Map ER model to Relational model to perform database design effectively. CO3:

Compare and contrast various indexing strategies in different database


systems.

CO4: Implement queries using normalization criteria and optimization techniques.

CO5: Analyse how advanced databases differ from traditional databases.

CO6: Design and deploy an efficient and scalable data storage node for varied kind of
application requirements.
6. CO- PO/PSO MAPPING

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 2 1 1 1 1 1 1 2 2 2 2 2
CO2 3 2 2 1 1 1 1 2 2 2 2 2
CO3 2 1 1 1 1 1 1 2 2 2 2 2
CO4 2 1 1 1 1 1 1 2 2 2 2 2
CO5 2 1 1 1 1 1 1 2 2 2 2 2
CO6 2 1 1 1 1 1 1 2 2 2 2 2

PSO1 PSO2 PSO3


CO1 2 2 2
CO2 2 3 2
CO3 2 2 2
CO4 2 2 2
CO5 2 2 2
CO6 2 2 2
8. ACTIVITY BASED LEARNING

Peer review on database design using relational model.


9. LECTURE NOTES
UNIT - II

Design a DB for Car Insurance Company


Construct an E-R diagram for a car insurance company whose customers own one or
more cars each. Each car has associated with it zero to any number of recorded
accidents. Each insurance policy covers one or more cars and has one or more
premium payments associated with it. Each payment is for a particular period of time,
and has an associated due date, and the date when the payment was received.

Entity Types

Customer

Car

Insurance policy

Accident

Payment

Entity Type: Customer

Attributes: Cust_Id, cust_name, cust_address, and cust_address

Primary key: cust_id

Entity Type: Car

Attributes: car_num, make, color and year

Entity Type: Accident

Attributes: report_id, location, date and damage_costs

Primary key: report_id


Entity Type: Insurance policy

Attributes: policy_num, policy_name and coverage

Primary key: policy_num

Entity Type: Payment

Attributes: payment_id, payment_date, due_date and amount

Primary key: payment id

Relation ship between entity types

Relationship set: Owns

Participating entity types: customer and car. Customer owns a car.

Mapping cardinality: One –many

Relationship set: covers

Participating entity types: car and insurance_policy

Mapping cardinality: one to many

Relationship set: meets

Participating entity types: car and accident

Mapping cardinality: one to many

Relationship set: Payments

Participating entity types: Insurance_policy and premium payments

Mapping cardinality: one to many


convert ER model to relational schema
Mapping strong entity types

Relation: Customer
Cust_Id cust_name cust_address cust_phone

Relation: Car

car_num Make Color year Policy_num Cust_id

Relation: Accident

Report_id Location Date Car_num Damage_cost

Relation: Insurance policy

Policy_num Policy_name coverage Payment_id

Relation: Payment

Payment_id Payment_name Due_date amount


Evaluating data model quality

The following guidelines are often critical to the success of database design:

Work interactively with the users as much as possible.

• Follow a structured methodology throughout the data modeling process.

Employ a data-driven approach.

• Incorporate structural and integrity considerations into the data models.

•Combine conceptualization, normalization, and transaction validation techniques


into the data modeling methodology.

• Use diagrams to represent as much of the data models as possible.

•Use a Database Design Language (DBDL) to represent additional data semantics


that cannot easily be represented in a diagram.

• Build a data dictionary to supplement the data model diagrams and the DBDL.

• Be willing to repeat steps.


The relational Model Schema Keys
There are generally many restrictions or constraints on the actual values in a
database state. These constraints are derived from the rules in the miniworld that
the database represents.

Constraints on databases can generally be divided into three main categories:


1.Constraints that are inherent in the data model. We call these inherent model-
based constraints or implicit constraints.

2.Constraints that can be directly expressed in the schemas of the data model,
typically by specifying them in the DDL. We call these schema-based constraints
or explicit constraints.

Constraints that cannot be directly expressed in the schemas of the data model,
and hence must be expressed and enforced by the application programs or in
some other way. We call these application-based or semantic constraints or
business rules.

Another important category of constraints is data dependencies, which include


functional dependencies and multivalued dependencies. They are used mainly for
testing the ―goodness‖ of the design of a relational database and are utilized in a
process called normalization.

The schema-based constraints include domain constraints, key constraints,


constraints on NULLs, entity integrity constraints, and referential integrity
constraints.
Domain Constraints

Domain constraints specify that within each tuple, the value of each attribute A
must be an atomic value from the domain dom(A).
The data types associated with domains typically include standard numeric data
types for integers (such as short integer, integer, and long integer) and real
numbers (float and double-precision float). Characters, Booleans, fixed-length
strings, and variable-length strings are also available, as are date, time,
timestamp, and other special data types.

Domains can also be described by a subrange of values from a data type or as an


enumerated data type in which all possible values are explicitly listed.

Key Constraints and Constraints on NULL Values

A relation is defined as a set of tuples. By definition, all elements of a set are


distinct; hence, all tuples in a relation must also be distinct.

This means that no two tuples can have the same combination of values for all
their attributes.

Usually, there are other subsets of attributes of a relation schema R with the
property that no two tuples in any relation state r of R should have the same
combination of values for these attributes. Suppose that we denote one such
subset of attributes by SK; then for any two distinct tuples t1 and t2 in a relation
state r of R, we have the constraint that:

t1[SK] ≠ t2[SK]
Any such set of attributes SK is called a superkey of the relation schema R. A
superkey SK specifies a uniqueness constraint that no two distinct tuples in any
state r of R can have the same value for SK.

Every relation has at least one default superkey— the set of all its attributes. A
superkey can have redundant attributes, however, so a more useful concept is that
of a key, which has no redundancy.

A key k of a relation schema R is a superkey of R with the additional property that


removing any attribute A from K leaves a set of attributes K
′that is not a superkey of
R any more.

Hence, a key satisfies two properties:


Two distinct tuples in any state of the relation cannot have identical values for (all)
the attributes in the key. This uniqueness property also applies to a superkey.

It is a minimal superkey—that is, a superkey from which we cannot remove any


attributes and still have the uniqueness constraint hold. This minimality property is
required for a key but is optional for a superkey.

Hence, a key is a superkey but not vice versa. A superkey may be a key (if it is
minimal) or may not be a key (if it is not minimal).

Consider the STUDENT relation. The attribute set {Ssn} is a key of STUDENT
because no two student tuples can have the same value for Ssn.8 Any set of
attributes that includes Ssn—for example, {Ssn, Name, Age}—is a superkey.

However, the superkey {Ssn, Name, Age} is not a key of STUDENT because
removing Name or Age or both from the set still leaves us with a superkey. In
general, any superkey formed from a single attribute is also a key.

A key with multiple attributes must require all its attributes together to have the
uniqueness property.
In general, a relation schema may have more than one key. In this case, each of
the keys is called a candidate key.

It is common to designate one of the candidate keys as the primary key of the
relation.

when a relation schema has several candidate keys, the choice of one to become
the primary key is somewhat arbitrary; however, it is usually better to choose a
primary key with a single attribute or a small number of attributes.

The other candidate keys are designated as unique keys and are not underlined.
Another constraint on attributes specifies whether NULL values are or are not
permitted.

For example, if every STUDENT tuple must have a valid, non-NULL value for the
Name attribute, then Name of STUDENT is constrained to be NOT NULL.

Entity Integrity, Referential Integrity, and Foreign Keys


The entity integrity constraint states that no primary key value can be NULL.
This is because the primary key value is used to identify individual tuples in a
relation.

Key constraints and entity integrity constraints are specified on individual relations.

The referential integrity constraint is specified between two relations and is


used to maintain the consistency among tuples in the two relations.

Informally, the referential integrity constraint states that a tuple in one relation
that refers to another relation must refer to an existing tuple in that relation.

The attribute Dno of EMPLOYEE gives the department number for which each
employee works; hence, its value in every EMPLOYEE tuple must match the
Dnumber value of some tuple in the DEPARTMENT relation.
A set of attributes FK in relation schema R1 is a foreign key of R1 that references
relation R2 if it satisfies the following rules:

1. The attributes in FK have the same domain(s) as the primary key attributes PK of
R2; the attributes FK are said to reference or refer to the relation R2.

2.A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK
for some tuple t2 in the current state r2(R2) or is NULL.

In the former case, we have t1[FK] = t2[PK], and we say that the tuple t1 references
or refers to the tuple t2.

In this definition, R1 is called the referencing relation and R2 is the referenced


relation. If these two conditions hold, a referential integrity constraint from R1 to R2
is said to hold.

Another class of general constraints, sometimes called semantic integrity


constraints, are not part of the DDL and have to be specified and enforced in a
different way.

Examples of such constraints are the salary of an employee should not exceed the
salary of the employee‘s supervisor and the maximum number of hours an
employee can work on all projects per week is 56. Such constraints can be
specified and enforced within the application programs that update the database,
or by using a general-purpose constraint specification language. Mechanisms
called triggers and assertions can also be used to specify the contraints.

state constraints define the constraints that a valid state of the database must
satisfy.

transition constraints, can be defined to deal with state changes in the


database.

An example of a transition constraint is: ―the salary of an employee can only


increase.‖ Such constraints are typically enforced by the application programs or
specified using active rules and triggers.
RELATIONAL ALGEBRA
The relational Algebra is a procedural query language. It consists of a set of
operations that take one or two relations as input and produce a new relation as
their result.

The fundamental Operations in Relational Algebra are:

select

project

union

set difference

cartesian product

rename

Here the select, project and rename operations are called unary operations,
because they operate on one relation. The other three operations operate on pairs
of relations and are, therefore called binary operations.
1) The select operation:

The select operation selects tuples that satisfy a given predicate.


The lower case Greek letter sigma (σ)is used to denote the selection. The predicate
appears as the subscript to σ.

Comparisons are allowed in the predicate using relational operators, =, , >, . <,
..

Several predicates can be combined into a larger predicate by using the connectives
 (and),  (or),  (not).

Question 1: - Select those tuples of the loan relation where the branch-name
is Perryridge.

Relational algebra query:

The equivalent relational algebra query for the given question is,

σbranch-name = ―perryridge‖ (loan)

The result of the query is given below:

Question 2:- Find all the tuples in which the amount lent is more than 1200
in loan relation.
Relational algebra query:
The equivalent relational algebra query for the given question is,
σamount >1200 (loan)
The result of the query is given below.
loan-number branch-name Amount
L-14 Downtown 1500
L-15 Perryridge 1500
L-16 Perryridge 1300
L-23 RedWood 3000

Question 3: - Find those tuples pertaining to loans of more than 1200 made by the
perryridge branch.

Relational Algebra query:


The equivalent relational algebra query for the given question is,
σbranch-name = ―perryridge‖amount >1200 (loan)

The result of the query is given below.

loan-number branch-name Amount

L-15 Perryridge 1500

L-16 Perryridge 1300


2) The Project Operation:
The project operation is a unary operation that returns its argument relation, with
certain attributes left out. Projection is denoted by the Greek letter pi (). The
attributes that should appear in the result are listed as subscript to the . The
argument relation follows in the parenthesis.

Composition of relational operations:

The relational operations can be composed together into a relational algebra


expression.

Question 1:- List all the loan-numbers and amount of the loan.

Relational algebra query:

The equivalent relational algebra query for the given question is,

loan-number, amount(loan)

The result of the query is given


Question 2:- Find those customers who live in Harrison.
Relational algebra query:
The equivalent relational algebra query for the given question is,
customer-name (σcustomer-city = ―Harrison‖(customer))
Here the expression is given as argument to the projection operation instead of the
name of a relation.
Result:
Customer- name
Hayes

Jones

3) The Union Operation:


The union operation is the binary operation which combines two relations. The union
operation is denoted by the letter (). For a union operation r s to be valid, two
conditions must hold.

The relations r and s must be of the same arity. That is they must have the
same number of attributes.

The domains of the ith attribute of r and ith attribute of s must be the same
for all i.

Question 1:- Find the names of all customers who have either an
account or a loan or both?

Relational algebra query:

The equivalent relational algebra query for the given question is,

customer-name (borrower) customer-name (depositor)


Result:

4) The set difference operation:


The set-difference operation denoted by (–), finds the tuples that are in one
relation but are not in another. It is a binary operation. For a set difference
operation r–s to be valid , the relations r and s should be of the same arity and the
domains of the ith attribute of r and ith attribute of s must be the same.

Question 1:- Find all the customers of the bank who have an account but not a
loan?

Relational algebra query:

The equivalent relational algebra query for the given question is,

customer-name (depositor) – customer-name (borrower)

Result:
5) The Cartesian product operation:
The Cartesian product operation, denoted by (X), allows combining
information from any two relations.

The cartesian product of r1 and r2 is written as r1X r2.

If r1 contains n1 tuples and r2 contains n2 tuples, then there are n1*n2 ways of
choosing a pair of tuples-one tuple from each relation.

So, there are n1*n2 tuples in r.

Example:

Consider the borrower and loan relations given below.

borrower relation: Loan relation:

customer- loan-no loan-no branch-name amt


name
Adams L-16 L-93 Roundhill 900

Hayes L-93 L-16 Downtown 1500

Jackson L-15 L-15 Perryridge 1300

Question 1:- Find the names of all customers who have a loan at the perryridge branch.

Computation of Relational algebra query:

This question needs the information in both the loan relation and the borrower relation.

If the query is written as

σbranch-name = ―perryridge‖(borrower X loan)


customer borrower.l loan.loan- branch- amount
name oan – number name
number
Adams L-16 L-93 Round Hill 900

Adams L-16 L-16 Downtown 1500

Adams L-16 L-15 Perryridge 1300

Hayes L-93 L-93 Round Hill 900

Hayes L-93 L-16 Downtown 1500

Hayes L-93 L-15 Perryridge 1300

Jackson L-15 L-93 Round Hill 900

Jackson L-15 L-16 Downtown 1500

Jackson L-15 L-15 Perryridge 1300

Result ofσbranch-name = ―perryridge‖(borrower X loan):

customer borrower. loan.loan- branch- amount


- name loan – number name
number
Adams L-16 L-15 Perryridge 1300

Hayes L-93 L-15 Perryridge 1300

Jackson L-15 L-15 Perryridge 1300

The above relation pertains results to only perryridge branch. However, the
customer-name column may contain customers who do not have a loan at
the perryridge branch. Therefore to obtain the correct result the query has
to be written as below.
σborrower.loan-number = loan.loan-number(σbranch-name = ―perryridge‖
(borrower X loan))

The result of the above query is given below:


Finally since only the customer-name is needed the projection operation is used as
below.

Пcustomer-name(σborrower.loan-number = loan.loan-number (σbranch-name = ―perryridge‖(borrower X


loan)))

Final result:

Customer-name

Jackson

6. The Rename Operation:


The rename operator is used to rename the attributes. The rename operator is
denoted by the lowercase Greek letter rho (). Given a relational algebra
expression E, the expressionx(E) returns the result of expression E under name
x.

A second form rename operation is as follows.

x(A1,A2,…,An)(E)
Returns the result of expression E under the name x, and with the attributes
renamed to A1, A2,….,An.

Examples:

Consider the account relation.


acc-no branch-name balance

A-101 Downtown 500

A-102 Perryridge 400

A-201 Brighton 900


Question 1:- Find the largest account balance in the bank?

Computation:
This query requires to (1) compute first a temporary relation consisting of those
balances that are not the largest and (2) take the set difference between the
relation П balance (account) and the temporary relation just computed, to obtain the
result.

Step 1:
To compute the temporary relation, it is needed to compare the values of all
account balances. This comparison is done by computing the cartesianproduct
account X account and forming a selection to compare the value of any two
balances appearing in one tuple. The rename operation is used to rename one
reference to the account relation.

The expression for temporary relation that consists of the balances that are not
largest is

П account.balance(σaccount.balance<d.balance(account Xd(account)))
This expression gives those balances in the account relation for which a larger
balance appears somewhere in the account relation renamed as d. The result
contains all balances except the largest one as shown next.

The result of account X d (account ) is


account.ac account.br account.ba d.accno d.branch- d.balance
cno anch- lance name
name
A-101 Downtown 500 A-101 Downtown 500

A-101 Downtown 500 A-102 Perryridge 400

A-101 Downtown 500 A-201 Brighton 900

A-102 Perryridge 400 A-101 Downtown 500

A-102 Perryridge 400 A-102 Perryridge 400

A-102 Perryridge 400 A-201 Brighton 900

A-201 Brighton 900 A-101 Downtown 500

A-201 Brighton 900 A-102 Perryridge 400

A-201 Brighton 900 A-201 Brighton 900

The result ofσaccount.balance<d.balance(account Xd(account ))

account.a account.b account.b d.accno d.branch- d.balance


ccno ranch- alance name
name
A-101 Downtown 500 A-201 Brighton 900

A-102 Perryridge 400 A-101 Downtown 500

A-102 Perryridge 400 A-201 Brighton 900

The result of П account.balance(σaccount.balance<d.balance(account Xd(account )))


account.balance

500

400

400
Step 2:

The query to find the largest account balance in the bank can be written as:

Пbalance(account) –

П account.balance(σaccount.balance<d.balance

(accountXd(account )))

The result of this query is given below

balance

900

Question 2:- Find the names of all customers who live on the same street
and in the same city as smith.

Consider customer relation.

customer customer customer


-name -street -city
Adams Spring Harrison
Curry North Rye
Brooks Main Harrison
Smith North Rye

Computation:
The smith‘s street and city can be obtained by,
Пcustomer-street, customer-city (σcustomer-name = smith‖(customer)))
In order to find other customers with this street and city, the customer relation must be
referred second time. The rename operation is used for this purpose. The resulting

expression is given below:


Пcustomer.customer-name(σcustomer.customer-street = smith-
addr.streetcustomer.customer-city = smith.addr.city(customer X smith-
addr(street,city) (Пcustomer-street, customer-city (σcustomer-name =
―smith‖(customer)))))
In the previous expression customer-street and customer-city is renamed to street
and city. The result of the preceding expression is given below.

Additional Operations:

The additional operations in relational algebra are:

Set intersection operation

Natural join operation

Division operation

Assignment operation

Set intersection operation:


The set intersection operation is denoted by (). It returns the result that is
common. It is a binary operation.The set intersection can be done with a pair of
set difference operation.

r s = r – (r – s)

Example: - Consider the borrower and depositor relation.

Question 1: Find all customers who have both loan and an account?

Relational algebra expression:

The equivalent relational algebra query for the given question is,

customer-name (borrower)customer-name (depositor)


Natural join operation:
The natural join operation is a binary operation that allows combining
certain selections and Cartesian product into one operation. It is denoted by the
―join‖ symbol

The natural join operation forms a cartesian product of its two arguments,
performs a selection forcing equality on those attributes that appear in both
relation schemas and finally removes duplicate attributes.

Example 1:Consider the loan and borrower relation. Refer Cartesian product
operation for the relations.

Question 1:- Find the names of all customers who have a loan at the bank,
and find the amount of the loan?

Relational Algebra Expression:

Пcustomer-name, loan-number, amount (borrower loan)


Since the schemas for borrower and loan have the attribute loan-number in
common, the natural join operation considers only pair of tuples that have the
same value on loan number. It combines each such pair of tuples into a single tuple
on the union of two schemas. After performing the projection the following result
is obtained.

Result:
Customer-name Loan-number Amount

Adams L-16 1500

Hayes L-93 900

Jackson L-15 1300


Example 2: Consider customer, account and depositor relations
given below:

customer- customer- customer-


name street city
Adams North Brooklyn

Hayes Main Harrison

Brooks Alma Stanford

John Main Woodside

Jones North Harrison

acc-no branch- balance


name
A-101 Downtown 500
A-102 Perryridge 400
A-201 Brighton 900
A-217 Brighton 750

customer-name acc-no

Hayes A-102

John A-101

John A-201

Jones A-217
Question 2:- Find the names of all branches with customers who have an
account in the bank and who live in Harrison?
Relational algebra Expression:
Пbranch-name, (σcustomer-city = ―Harrison‖(customer account
depositor))

Result:
Branch-name

Brighton

Perryridge

3. The division operation:


The division operation denoted by (), is suited to queries that include the phrase
―for all‖.
Example:
Consider branch, account and depositor relations. Refer natural join
operation for account and depositor relations.
branch relation:

branch-name branch-city assets branch-name

Brighton Brooklyn 7100000 Brighton

Downtown Brooklyn 900000 Downtown

Mianus Horseneck 400000 Mianus

North Rye 170000 North


Question 1:- Find all customers who have an account at all the branches located in
Brooklyn?

Relational algebra expression:

The expression for obtaining all branches in Brooklyn is,

r1 = Пbranch-name(σbranch-city = ―Brooklyn‖(branch))

The result of this expression is,

The (customer-name, branch-name) pairs of all customers who has an account at a branch
can be found by the expression,

r2 = Пcustomer-name, branch-name (depositor account)

The result of the above expression is,

customer-name branch-name
Hayes Perryridge
John Downtown
John Brighton
Jones Brighton

To find customers who appear in r2 with every branch name in r1, the divide operation is
used as given below.
Пcustomer-name, branch-name (depositor account) Пbranch-name(σbranch-city = ―Brooklyn‖(branch))

Result:

Customer-name
John
The Assignment Operation:

The assignment operation, denoted by , works like assignment in a programming


language. For example r – scan be written as

temp1  r

temp2  s

Result  temp1– temp2


5. Extended Relational Algebra Operations:

Generalized Projection:
The generalized projection operation extends the projection operation by
allowing arithmetic functions to be used in the projection list. The generalized
projection operation has the form

ПF1,F2,….,Fn(E)

Where E is any relational-algebra expression, and each of F1, F2,…, Fn is an


arithmetic expression involving constants and attributes in the schema of E.

Example:Consider the relation credit-info

Question:
The credit-info relation lists the credit limit and expenses so far done. To find how
much more each person can spend, the following expression is written:

Пcustomer-name, limit – credit-balance (credit-info)


The attribute resulting from the expression limit – credit-balance does not have a
name. The rename operation can be applied for this purpose as below.

Пcustomer-name, (limit – credit-balance) as credit-available (credit-info)

Result:
6. Aggregate Functions:

Aggregate functions take a collection of values and return a single value as a result

avg: average value


min: minimum value
max: maximum value
sum: sum of values
count: number of values

Aggregate functions in relational algebra:The symbol g is used for aggregate


operations. It is known as calligraphic G.

Question:

To find out the total sum of salaries of all part-time employees in the bank, the
following relational algebra expression is used.

gsum(salary)(pt-works)
7. Groups:
The result can be grouped based on some attribute. For example to partition the
relation pt-works into groups based on the branch, and to apply aggregation on
each group, the query is written as below.

g
branch-name sum(salary)(pt-works)

The result of the expression is given below:


First the relation is grouped based on branch-name without performing aggregation
as shown below.

Final result after aggregation:


8. Outer join:
The outer join operation is an extension of the join
operation to deal with missing information.
Example:Consider the relations employee and ft-works as
below.

To generate single relation from the above two relations, a


possible approach to use is the natural join operation.
The expression is given below.
Employee ft-works.
The result of this expression is given below.
In the above relation the street and city information about smith is lost, since the
tuple describing smith is absent from the ft-works relation. Similarly the branch
name and salary information about gates is lost, since the tuples describing gates
is absent from the employee relation.The outer join operation can be used to avoid
this loss of information. There are three forms of outer join operation. They are:

Left outer Join (ii) Right outer Join (iii) Full outer Join
(i)The left outer join: This takes all tuples in the left relation that did not match
with any tuple in the right relation, pads the tuples with null values for all other
attributes from the right relation, and adds them to the result of the natural join.
The result of employee ft-works is given below

(ii) The right outer join: it is symmetric with the left outer join. It pads tuples
from the right relation that did not match any from the left relation with nulls and
adds them to the result of the natural join. The result of employee ft-works is
given below.
(iii) The full outer join: it does both of the above operations, padding tuples from
the left relation that did not match any from the right relation, as well as tuples
from the right relation that did not match any from the left relation, and adding
them to the result of the join. The below relation shows the result of employee
ft-works.
Relational Calculus
A relational calculus expression creates a new relation, which is specified in
terms of variables that range over rows of the stored database relations (in tuple
calculus) or over columns of the stored relations (in domain calculus).

In a calculus expression, there is no order of operations to specify how to retrieve


the query result—a calculus expression specifies only what information the result
should contain.

This is the main distinguishing feature between relational algebra and


relational calculus.

Relational calculus is considered to be a nonprocedural language.

This differs from relational algebra, where we must write a sequence of


operations to specify a retrieval request; hence relational algebra can be
considered as a procedural way of stating a query.

Tuple Relational Calculus

The tuple relational calculus is based on specifying a number of tuple variables.

Each tuple variable usually ranges over a particular database relation, meaning
that the variable may take as its value any individual tuple from that relation.

A simple tuple relational calculus query is of the form

{t | COND(t)}
where t is a tuple variable and COND (t) is a conditional expression involving
t.

The result of such a query is the set of all tuples t that satisfy COND (t).
For example, to find all employees whose salary is above $50,000, we can write
the following tuple calculus expression:

{t | EMPLOYEE(t) AND t.Salary>50000}

The condition EMPLOYEE(t) specifies that the range relation of tuple variable t
is EMPLOYEE.
Each EMPLOYEE tuple t that satisfies the condition t.Salary>50000 will be
retrieved. Notice that t.Salary references attribute Salary of tuple variable t;

To retrieve only some of the attributes—say, the first and last names—we write

t.Fname, t.Lname | EMPLOYEE(t) AND t.Salary>50000}


Informally, we need to specify the following information in a tuple
relational calculus expression:

For each tuple variable t, the range relation R of t. This value is specified by a
condition of the form R(t).

A condition to select particular combinations of tuples. As tuple variables range


over their respective range relations, the condition is evaluated for every possible
combination of tuples to identify the selected combinations for which the condition
evaluates to TRUE.

A set of attributes to be retrieved, the requested attributes. The values of these


attributes are retrieved for each selected combination of tuples.
Query 0. Retrieve the birth date and address of the employee (or employees)
whose name is John B. Smith.

Q0: {t.Bdate, t.Address | EMPLOYEE(t) AND t.Fname=‗John‘ AND t.Minit=‗B‘


AND t.Lname=‗Smith‘}

In tuple relational calculus, we first specify the requested attributes t.Bdate and
t.Address for each selected tuple t. Then we specify the condition for selecting a
tuple following the bar (|)—namely, that t be a tuple of the EMPLOYEE relation
whose Fname, Minit, and Lname attribute values are ‗John‘, ‗B‘, and ‗Smith‘,
respectively.

Expressions and Formulas in Tuple Relational Calculus

A general expression of the tuple relational calculus is of the form

{t1.Aj, t2.Ak, ... , tn.Am | COND(t1, t2, ..., tn, tn+1, tn+2, ..., tn+m)}
where t1, t2, … , tn, tn+1, … , tn+m are tuple variables, each Ai is an attribute of the
relation on which ti ranges, and COND is a condition or formula of the tuple
relational calculus.

A formula is made up of predicate calculus atoms, which can be one of the following:

1. An atom of the form R(ti), where R is a relation name and ti is a tuple variable.
This atom identifies the range of the tuple variable ti as the relation whose name is

R. It evaluates to TRUE if ti is a tuple in the relation R, and evaluates to FALSE


otherwise.

2. An atom of the form ti.A op tj.B, where op is one of the comparison operators in
the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an attribute of the
relation on which ti ranges, and B is an attribute of the relation on which tj ranges.
3. An atom of the form ti.A op c or c op tj.B, where op is one of the comparison
operators in the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an attribute
of the relation on which ti ranges, B is an attribute of the relation on which tj ranges,
and c is a constant value.

Each of the preceding atoms evaluates to either TRUE or FALSE for a specific
combination of tuples; this is called the truth value of an atom.

In general, a tuple variable t ranges over all possible tuples in the universe. For
atoms of the form R(t), if t is assigned to a tuple that is a member of the specified
relation R, the atom is TRUE; otherwise, it is FALSE.

In atoms of types 2 and 3, if the tuple variables are assigned to tuples such that
the values of the specified attributes of the tuples satisfy the condition, then the
atom is TRUE.

A formula (Boolean condition) is made up of one or more atoms connected via


the logical operators AND, OR, and NOT and is defined recursively by Rules 1 and
2 as follows:
Rule 1: Every atom is a formula.
Rule 2: If F1 and F2 are formulas, then so are (F1 AND F2), (F1 OR F2), NOT (F1),
and NOT (F2). The truth values of these formulas are derived from their
component formulas F1 and F2 as follows:
a. (F1 AND F2) is TRUE if both F1 and F2 are TRUE; otherwise, it is FALSE.
b. (F1 OR F2) is FALSE if both F1 and F2 are FALSE; otherwise, it is TRUE.
c. NOT (F1) is TRUE if F1 is FALSE; it is FALSE if F1 is TRUE.
d. NOT (F2) is TRUE if F2 is FALSE; it is FALSE if F2 is TRUE.
The Existential and Universal Quantifiers
In addition, two special symbols called quantifiers can appear in formulas; these
are the universal quantifier (∀) and the existential quantifier (∃).

We define a tuple variable in a formula as free or bound according to the following


rules:

An occurrence of a tuple variable in a formula F that is an atom is free in F.

An occurrence of a tuple variable t is free or bound in a formula made up of


logical connectives—(F1 AND F2), (F1 OR F2), NOT(F1), and NOT(F2)— depending
on whether it is free or bound in F1 or F2 (if it occurs in either).

Notice that in a formula of the form F = (F1 AND F2) or F = (F1 OR F2), a tuple
variable may be free in F1 and bound in F2, or vice versa; in this case, one
occurrence of the tuple variable is bound and the other is free in F.

All free occurrences of a tuple variable t in F are bound in a formula F′ofthe


form F′=(∃t)(F) or F′=(∀t)(F). The tuple variable is bound to the quantifier
specified in F′.Forexample, consider the following formulas:

F1: d.Dname = ‗Research‘

F2: (∃t)(d.Dnumber = t.Dno)

F3: (∀d)(d.Mgr_ssn = ‗333445555‘)


The tuple variable d is free in both F1 and F2, whereas it is bound to the (∀)
quantifier in F3. Variable t is bound to the (∃) quantifier in F2.
Rule 3: If F is a formula, then so is (∃t)(F), where t is a tuple variable. The formula
(∃t)(F) is TRUE if the formula F evaluates to TRUE for some (at least one) tuple
assigned to free occurrences of t in F; otherwise, (∃t)(F) is FALSE.

Rule 4: If F is a formula, then so is (∀t)(F), where t is a tuple variable. The formula


(∀t)(F) is TRUE if the formula F evaluates to TRUE for every tuple (in the universe)
assigned to free occurrences of t in F; otherwise, (∀t)(F) is FALSE.

The (∃) quantifier is called an existential quantifier because a formula (∃t)(F) is


TRUE if there exists some tuple that makes F TRUE. For the universal quantifier,
(∀t)(F) is TRUE if every possible tuple that can be assigned to free occurrences of t
in F is substituted for t, and F is TRUE for every such substitution. It is called the
universal or for all quantifier because every tuple in the universe of tuples must
make F TRUE to make the quantified formula TRUE.

Sample Queries in Tuple Relational Calculus

Query 1. List the name and address of all employees who work for the ‗Research‘

department.

Q1: {t.Fname, t.Lname, t.Address | EMPLOYEE(t) AND (∃d)(DEPARTMENT(d)


AND d.Dname=‗Research‘ AND d.Dnumber=t.Dno)}
Query 2. For every project located in ‗Stafford‘, list the project number, the
controlling department number, and the department manager‘s last name, birth date,
and address.

Q2: {p.Pnumber, p.Dnum, m.Lname, m.Bdate, m.Address | PROJECT(p) AND

EMPLOYEE(m) AND p.Plocation=‗Stafford‘ AND ((∃d)(DEPARTMENT(d) AND


p.Dnum=d.Dnumber AND d.Mgr_ssn=m.Ssn))}
Using the Universal Quantifier in Queries
Whenever we use a universal quantifier, it is quite judicious to follow a few rules to
ensure that our expression makes sense. We discuss these rules with respect to
the query Q3.

Query 3. List the names of employees who work on all the projects controlled by
department number 5. One way to specify this query is to use the universal
quantifier as shown:

Q3: {e.Lname, e.Fname | EMPLOYEE(e) AND ((∀x)(NOT(PROJECT(x)) OR NOT


(x.Dnum=5) OR ((∃w)(WORKS_ON(w) AND w.Essn=e.Ssn AND

x.Pnumber=w.Pno))))}

Query 4. List the names of employees who have no dependents.

Q4: {e.Fname, e.Lname | EMPLOYEE(e) AND (NOT (∃d)(DEPENDENT(d) AND


e.Ssn=d.Essn))}

Query 5. List the names of managers who have at least one dependent.

Q5: {e.Fname, e.Lname | EMPLOYEE(e) AND ((∃d)(∃ρ)(DEPARTMENT(d) AND


DEPENDENT(ρ) AND e.Ssn=d.Mgr_ssn AND ρ.Essn=e.Ssn))}

This query is handled by interpreting managers who have at least one dependent
as managers for whom there exists some dependent.

The Domain Relational Calculus


Domain calculus differs from tuple calculus in the type of variables used in
formulas:Rather than having variables range over tuples, the variables range over
single values from domains of attributes.
To form a relation of degree n for a query result, we must have n of these domain
variables—one for each attribute. An expression of the domain calculus is of the
form

{x1, x2, ..., xn | COND(x1, x2, ..., xn, xn+1, xn+2, ..., xn+m)} where x1, x2, … , xn,
xn+1, xn+2, … , xn+m are domain variables that range over domains (of
attributes), and COND is a condition or formula of the domain relational calculus.

A formula is made up of atoms. The atoms of a formula are slightly different from
those for the tuple calculus and can be one of the following:

1.An atom of the form R(x1, x2, … , xj), where R is the name of a relation of degree
j and each xi, 1 ≤ i ≤ j, is a domain variable. This atom states that a list of values of

<x1, x2, … , xj> must be a tuple in the relation whose name is R, where xi is the
value of the ith attribute value of the tuple. To make a domain calculus expression
more concise, we can drop the commas in a list of variables; thus, we can write:

{x1, x2, ..., xn | R(x1 x2 x3) AND ...} instead of: {x1, x2, ... , xn | R(x1, x2, x3) AND
...}

2.An atom of the form xi op xj, where op is one of the comparison operators in the
set {=, <, ≤, >, ≥, ≠}, and xi and xj are domain variables.

3.An atom of the form xi op c or c op xj, where op is one of the comparison


operators in the set {=, <, ≤, >, ≥, ≠}, xi and xj are domain variables, and c is a
constant value. As in tuple calculus, atoms evaluate to either TRUE or FALSE for a
specific set of values, called the truth values of the atoms. In case 1, if the domain
variables are assigned values corresponding to a tuple of the specified relation R,
then the atom is TRUE. In cases 2 and 3, if the domain variables are assigned values
that satisfy the condition, then the atom is TRUE.
Query 0. List the birth date and address of the employee whose name is ‗John

B. Smith‘.

Q0: {u, v | (∃q) (∃r) (∃s) (∃t) (∃w) (∃x) (∃y) (∃z) (EMPLOYEE(qrstuvwxyz) AND
q=‗John‘ AND r=‗B‘ AND s=‗Smith‘)}
Query 1. Retrieve the name and address of all employees who work for the
‗Research‘ department.

Q1: {q, s, v | (∃z) (∃l) (∃m) (EMPLOYEE(qrstuvwxyz) AND DEPARTMENT(lmno)


AND l=‗Research‘ AND m=z)}
Query 2. For every project located in ‗Stafford‘, list the project number, the
controlling department number, and the department manager‘s last name, birth
date, and address.

Q2: {i, k, s, u, v | (∃j)(∃m)(∃n)(∃t)(PROJECT(hijk) AND EMPLOYEE(qrstuvwxyz)


AND DEPARTMENT(lmno) AND k=m AND n=t AND j=‗Stafford‘)}

Query 6. List the names of employees who have no dependents.

Q6: {q, s | (∃t)(EMPLOYEE(qrstuvwxyz) AND (NOT(∃l)(DEPENDENT(lmnop)


AND t=l)))}
RELATIONAL DATABASE DESIGN
Relation schema corresponds to the programming-language notion of type
definition.

The term relation instance refers to a specific instance of a relation, i.e.,


containing a specific set of rows.

A1, A2, …, Anare attributes

R = (A1, A2, …, An ) is a relation schema


• Formally, given sets D1, D2, …. Dn a relation r is a subset of
D1 x D2 x … x Dn

• Relation is a set of n-tuples (a1, a2, …, an) where each ai Di

• Relation – cartesian product of domains

• The current values (relation instance) of a relation are specified by a table

•An element t of r is a tuple, represented by a row in a table

Example:

instructor = (ID, name, dept_name, salary)


The current values (relation instance) of a relation are specified by a table

An element t of r is a tuple, represented by a row in a table

Relation Instance:

• The current values (relation instance) of a relation are specified by a table

• An element t of r is a tuple, represented by a row in a table

• Order of tuples is irrelevant (tuples may be stored in an arbitrary order)


Undesirable Properties of Relations
A bad design may have several properties, including:
• Repetition of information.
• Inability to represent certain information.
• Loss of information.
FUNCTIONAL DEPENDENCIES:
DEFINITION: Let r be a relation and let X and Y be arbitrary subsets of the set of
attributes of r. Then we say that Y is functionally dependent on X, i.e. X->Y (X
functionally determines Y) if and only if each X value in r has associated with it
precisely one Y value in r.

X Y holds if whenever two tuples have the same value for X, they must have
the same value for Y

For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then
t1[Y]=t2[Y]

X Y in R specifies a constraint on all relation instances r(R)

Examples of FD constraints
Social security number determines employee name

SSN ENAME

Project number determines project name and location

PNUMBER {PNAME, PLOCATION}


Employee ssn and project number determines the hours per week that the
employee works on the project

{SSN, PNUMBER} HOURS

TRIVIAL AND NONTRIVIAL DEPENDENCIES


One way to reduce the size of the set of FDs is to eliminate the
trivial dependencies. An FD is trivial if and only if the right side is a subset of the
left side.

EX: {supplier-no, part-no}->supplier-no

Generally it is advisable to avoid trivial dependencies.


CLOSURE OF A SET OF DEPENDENCIES:

Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F.

There are circumstances such that some FDs might imply


others.

For example, {supplier-no, part-no}->{city, qty} implies both of the following.

{supplier-no, part-no}->{city}

{supplier-no, part-no}->{qty}
As another example, consider the relation R with attributes A,B and C, such that
the FDs A->B and B->C both hold for R. Then it is easy to see that the FD A->C
also holds for R. The FD A->C is an example of a transitive FD i.e. C is said to
depend on A transitively via B. The set of all FDs that are implied by a given set S
of FDs is called the closure of S, written S+

The task of computing S+ from S can be done by the following rules:


Let A,B and C be arbitrary subsets of the set of attributes of given relation
R and let AB mean the union of A and B. Then we have:

Reflexivity: if B is a subset of A then A->B

Augmentation: If A->B, then AC->BC.

Transitivity: If A->B and B->C, then A->C.

Self-determination: A->A

Decomposition: If A->BC, then A->B and A->C.

Union: If A->B and A->C, then A->BC.

Composition: If A->B and C->D, then AC->BD.


Example:

Let R be the relation with attributes A,B,C,D,E,F and the FDs are:

A->BC B->E CD->EF

We now show that the FD AD->F holds for R and is thus a member of the closure of

the given set:

A->BC (given)

A->C (1, decomposition)

AD->CD (2, augmentation)

CD->EF (given)

AD->EF (3 &4, transitivity) AD->F (5, decomposition)

CLOSURE OF A SET OF ATTRIBUTES:

Closure of a set of attributes X with respect to Fis the set X+ of all attributes that

are functionally determined by X.

X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

The closure S+ of a given set S of FDs can be computed by means of an

algorithm that says ―Repeatedly apply the rules from the previous section until

they stop producing new FDs‖.

Let R be the relation, Z be the set of all attributes of R and S be the set of

FDs that hold for R. From this we can determinate the set of all attributes of R that

is functionally dependent on Z i.e. the closure Z+ of Z under S.


A simple algorithm for computing this closure is given in the below pseudo code:
CLOSURE[Z,S]=Z;
do ―forever‖;

for each FD X->Y in S


do;
if X C CLOSURE[Z,S]

then CLOSURE[Z,S]=CLOSURE[Z,S]UY;
end
if CLOSURE[Z,S] did not change
on this iteration then leave the loop;
end;

Example:
Suppose we are given a relation R with attributes A,B,C,D,E,F and FDs are:
A->BC
E->CF
B->E
CD->EF
We now compute the closure{A,B}+ of the set of attributes {A,B} under this set of
FDs.

We initialize the result CLOSURE[Z,S] to {A,B}


We now go round the inner loop four times, once for each of the given FDs. On the
first iteration (for the FD A->BC), we find that the left side is a subset of
CLOSURE[Z,S]. so we add attributes (B and C) to the result. CLOSURE[Z,S] is now
the set {A,B,C}.

On the second iteration (for the FD E->CF), we find that the left side is not a subset
of the result, which thus remains unchanged.
On the third iteration (for the FD B->E), we add E to CLOSURE[Z,S], which now has
the value {A,B,C,E}.

On the fourth iteration (for the FD CD->EF), CLOSURE[Z,S] remains unchanged.


Now we go round the inner loop four times again. On the first iteration, the result
does not change; on the second, it expands to {A,B,C,E,F}, on the third and fourth
it does not change.

Now we go round the inner loop four times again. CLOSURE[Z,S] does not change,
and so the whole process terminates with {A,B}+ = {A,B,C,E,F}.

Thus if Z is a set of attributes of relation R and S is a set of FDs that hold for R, then
set of FDs that hold for R with Z as the left side is the set consisting of all FDs of
the form Z->Z‘, where Z‘ is some subset of the closure Z+ of Z under S. The closure
S+ of the original set S of FDs is then the union of all such sets of FDs, taken over
all possible attribute sets Z.

Two sets of FDs F and G are equivalent if:

• Every FD in F can be inferred from G, and

• Every FD in G can be inferred from F

•Hence, F and G are equivalent if F+ =G+

Definition (Covers):

F covers G if every FD in G can be inferred from F (i.e., if G+ subset-of F+)

F and G are equivalent if F covers G and G covers F


A set of FDs is minimal if it satisfies the following conditions:

Every dependency in F has a single attribute for its RHS.


We cannot remove any dependency from F and have a set of dependencies that is
equivalent to F.

We cannot replace any dependency X -> A in F with a dependency Y -> A, where


Y proper-subset-of X ( Y subset-of X) and still have a set of dependencies that is
equivalent to F.

IRREDUCIBLE SETS OF DEPENDENCIES:


Let S1 and S2 be two sets of FDs. If every FD implied by S1 is
implied by S2 i.e. if S1+ is a subset of S2+, we say that S2 is a cover of S1.

This means that if the DBMS enforces the FDs in S2, then it will
automatically be enforcing the FDs in S1.
If S2 is a cover of S1 and S1 is a cover for S2 i.e. if S1+ = S2+ -
we say that S1 and S2 are equivalent. In this case if the DBMS enforces the FDs in
S2 it will automatically be enforcing the FDs in S1 and vice versa.

A set S of FDs is said to be irreducible, if and only if it satisfies


the following three properties:

The right side of every FD in S involves just one attribute.


The left side of every ED in S is irreducible in turn-meaning that no attribute can
be discarded from the determinant without changing the closure S+. This type of
FD is called as left irreducible.

No FD in S can be discarded from S without changing the closure S+.

Example:

Consider the relation PARTS for which the following FDs hold:

PART-NO->PART-NAME
PART-NO->COLOUR

PART-NO->WEIGHT

PART-NO->CITY

This set of FDs is easily seen to be irreducible. The right side is a


single attribute in each case and the left side is also irreducible in turn. So none of
the FD‘s can be discarded without changing the closure.

The following sets of FDs are not irreducible:


PART-NO->{PART-NAME,COLOUR} The right side of the first FD is not singleton
set.

PART-NO->WEIGHT

PART-NO->CITY

{PART-NO,PART-NAME}->COLOUR- The first FD here can be simplified by dropping


PART-NAME from left without changing the closure

PART-NO-> PART-NAME

PART-NO->WEIGHT

PART-NO->CITY

PART-NO-> PART-NO The first FD can be discarded without changing the closure.

PART-NO-> PART-NAME
PART-NO->COLOUR

PART-NO->WEIGHT

PART-NO->CITY

So, for every set of FDs there exist at least one equivalent set
that is irreducible.
Example:

Consider the relation R with attributes A,B,C,D and FDs:

A->BC

B->C

A->B

AB->C

AC->D

We now compute the irreducible set of FDs that is equivalent to the given set:

The first step is to rewrite the FDs such that each has a singleton right side:

A->B

A->C

B->C

A->B

AB->C

AC->D

In this, the FD A->B occurs twice, so one occurrence can be eliminated.

Next, attribute C can be eliminated from the left side of the FD AC->D, because we
have A->C which can be written as

AA->AC (by augmentation)

AC->D (given)

So, A->D by transitivity. Thus C on the left side of AC->D is redundant.


Next AB->C can be eliminated, because we have

A->C (given)

AB->CB (by augmentation)

So, AB->C (by decomposition)


Finally, the FD A->C is implied by the FDs A->B and B->C, so it can also be
eliminated.

The final irreducible sets of FDs are:

A->B

B->C

A->D

NOTE:
The irreducible sets can also be represented by the terms minimal sets,
minimal cover and canonical cover.

Single Valued Functional Dependency:


A simple example of single value functional dependency is when
Roll_Number is the primary key of an entity and Student_Name is some single
valued attribute of the entity.
Then, Roll_Number → Student_Name
Roll_Numbe Student_Name Student_Address
r
011 Jayesh Umre Burhanpur
012 Kunal Batra Burhanpur
013 Nilesh Nimbhorkar Ichchapur
014 Aryan Jagdale Ujjain
ANOMALIES
Database anomalies, are problems that arise due to the limitations or flaws within a given
database. Anomalies can be classified into insertion anomalies, deletion anomalies,and
modification anomalies. These anomalies are discussed based on the EMP_DEPT relation
given below.

Insertion Anomalies:
Insertion anomalies can be differentiated into two types, illustrated by the following examples
based on the EMP_DEPT relation:

To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for
the department that the employee works for, or NULLs (if the employee does not work for
a department as yet). For example, to insert a new tuple for an employee who works in
department number 5, we must enter all the attribute values

of department 5 correctly so that they are consistent with the corresponding values for
department 5 in other tuples in EMP_DEPT.

It is difficult to insert a new department that has no employees as yet in the EMP_DEPT
relation. The only way to do this is to place NULL values in the attributes for employee.
This violates the entity integrity for EMP_DEPT because SSN is its primary key.
Deletion Anomalies:

If we delete from EMP_DEPT an employee tuple that happens to represent the last employee
working for a particular department, the information concerning that department is lost
from the database.

For example if we delete the details of Borg.James E who works for Headquarters then the
details of that department is lost.

Modification Anomalies:
In EMP_DEPT, if we change the value of one of theattributes of a particular departmentsay,
the manager of department 5, we mustupdate the tuples of all employees who work in
that department; otherwise, thedatabase will become inconsistent. If we fail to update
some tuples, the same departmentwill be shown to have two different values for manager
in different employeetuples, which would be wrong.

NORMALIZATION

Normalization of data can be considered a process of analyzing the given relationschemas


based on their FDs and primary keys to achieve the desirable properties of(1) minimizing
redundancy and (2) minimizing the insertion, deletion, and updateanomalies.

Any attribute involved in a candidate key is a prime attribute

All other attributes are called non-prime attributes.

A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R
with the property that no two tuples t1 and t2 in any legal relation state r of R will have
t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will
cause K not to be a superkeyany more.

There are two important properties of decompositions:

Non-additive or losslessness of the corresponding join

Preservation of the functional dependencies.


Note that:

Property (a) is extremely important and cannot be sacrificed.

Property (b) is less stringent and may be sacrificed.

Single valued Normalization

First normal form, second normal form, third normal form and BCNF are under the
category of single valued normalization

First Normal Form:

Definition:
First normal form (1NF) disallows multi-valued attributes, composite attributes,
and their combinations. It states that the domain of an attribute must include only
atomic (simple, indivisible) values.
Description:

Consider the relation department in figure 15.9 (a). We assume that each
department can have a number of locations. This is not in 1NF because Dlocations
is not an atomicattribute, as illustrated by the first tuple in Figure 15.9(b).

There are three main techniques to achieve first normal form for such a relation:
Expand the key so that there will be a separate tuple in the original DEPARTMENT
relation for each location of a DEPARTMENT, as shown in Figure 15.9(c). In this
case, the primary key becomes the combination {Dnumber, Dlocation}. This
solution has the disadvantage of introducing redundancy in the relation.

If a maximum number of values is known for the attribute—for example, if it is


known that at most three locations can exist for a department—replace the
Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and
Dlocation3. This solution has the disadvantage of introducing NULL values if most
departments have fewer than three locations.

Remove the attribute Dlocations that violates 1NF and place it in a separate
relation DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT.
The primary key of this relation is the combination {Dnumber, Dlocation}, as shown
in the below Figure 15.10 A distinct tuple in DEPT_LOCATIONS exists for each
location of a department. This decomposes the non-1NF relation into two 1NF
relations.

Figure.15.10. Decomposition of 15.9 (a)


Among the three the third is considered to be the best solution because it does not suffer
from redundancy.

Second Normal Form:

Definition.

A relation schema R is in 2NF if it is in 1NF and satisfies full functional dependency. i.e.,
every nonprime attribute A in R is fully dependent on the primary key of Rand not part of
it.

Figure 15.10(a) Normalizing EMP_PROJ into 2NF relations

The EMP_PROJ relation in Figure 15.10 (a) is in 1NF but is not in 2NF. The nonprime
attribute Ename violates 2NF because of FD2, as do the nonprime attributes Pname and
Plocation because of FD3. The functional dependencies FD2 and FD3 make Ename,
Pname, and Plocation partially dependent on the primary key {Ssn, Pnumber} of
EMP_PROJ, thus violating the 2NF.

If a relation schema is not in 2NF, it can be second normalized or 2NF normalizedinto a


number of 2NF relations in which nonprime attributes are associated onlywith the part of
the primary key on which they are fully functionally dependent.

Therefore, the functional dependencies FD1, FD2, and FD3 in Figure 15.10(a) lead tothe
decomposition of EMP_PROJ into the three relation schemas EP1, EP2, and EP3shown in
Figure 15.10(a), each of which is in 2NF.
Third Normal Form:
Definition: A relation schema R is in 3NF if it satisfies 2NF and no nonprime
attribute of R is transitively dependent on the primary key.

Description: The dependency Ssn→Dmgrssnis transitive through Dnumber in


EMP_DEPT in Figure 15.10(b), because both thedependencies Ssn→Dnumber and
Dnumber→Dmgrssn hold and Dnumber is neithera key itself nor a subset of the
key of EMP_DEPT.

The relation schema EMP_DEPT in Figure 15.10(b) is in 2NF, since no partial


dependencieson a key exist. However, EMP_DEPT is not in 3NF because of the
transitivedependency of Dmgrssn (and also Dname) on Ssn via Dnumber.

We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas
ED1 and ED2shown in Figure 15.10(b). ED1 and ED2 represent independententity
facts about employees and departments. A NATURAL JOIN operation onED1 and
ED2 will recover the original relation EMP_DEPT.

NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y
is not a candidate key.

When Y is a candidate key, there is no problem with the transitive dependency


BOYCE/CODD NORMAL FORM (BCNF):
A relation schema R is in BCNF with respect to a set F of functional dependencies if
for all functional dependencies in F+ of the form 

whereR and R,at least one of the following holds:

 is trivial (i.e., )

is a superkey for R

A relation is in BCNF if and only if every determinant is a candidate key.

Example:

Consider the relation SJT{student, subject, teacher}

The following constrains apply to the relation:

A subject can have multiple advisors

For a single subject multiple students can register

Each teacher teaches only one subject

For each subject, each student of that subject is taught only by one teacher.

Each subject can be taught by multiple advisors

The below table shows the sample values for SJT


Student subject
teacher
123 Physics Faculty1
123 Music Faculty2
456 Biology Faculty3
789 Physics Faculty4
999 Physics Faculty1
The two functional dependencies from the constraints

{student, subject}-> teacher

teacher->subject

The FD diagram for SJT is shown below:

S Student

J Subject

T Teacher

The relation suffers from

Insertion anamoly: if a new faculty (say faculty 5) joins and no subject is assigned,
the faculty cannot be inerted as the prime attribute cannot be null.

Updation anamoly: if a student with id 789 is deleted, then Faculty 4 will get deleted.

This difficulty is caused by the fact that the attribute teacher is a determinant but
not a candidate key. Whenever a non prime attribute determines the one or more
prime attribute then the relation violates BCNF. The teacher(non-prime attribute)
The solution to the problem is to split or decompose the original relation by two
BCNF projections as below:

ST{student, teacher}

TJ{Teacher, subject}

Student Teacher
123 Faculty1
123 Faculty2
456 Faculty3
789 Faculty4
999 Faculty1
Teacher subject
Faculty1 Physics
Faculty2 Music
Faculty3 Biology
Faculty4 Physics
Faculty1 Physics

The anamolies can be overcome from the decompose of relation SJT into two
relations ST and TJ

MULTIVALUED DEPENDENCIES
Multivalued Dependency (MVD) represents a dependency between attributes
(for ex X,Y and Z) in a relation, such that for each value of X there is a set of
values for Y and a set of values for Z. However, the set of values for Y and Z
are independent of each other.

MVD is represented as

X-->>Y(X multi determines Y)


By symmetry whenever X-->>Y holds in R, so does X-->>Z. hence it can
be written as X-->>Y|Z.

Multivalued dependencies are a consequence of first normal form


(INF), which disallowed an attribute in a tuple to have a set of values. If two or
more Multivalued independent attributes are available in the same relation we get
into a problem of having to repeat every value of one of the attributes with every
value of the other attribute to keep the relation state consistent and to maintain
the independence among the attributes involved.

For example, consider the relation Emp shown below:


Ename Pname Dname
Emp
Smith X john
Smith Y Anna
Smith X Anna
Smith Y john

A tuple in this Emp relation represents the fact that an employee whose name is Ename
works on the project whose name is Pname and has a dependent whose name is Dname.

An employee may work on several projects and may have several dependents
and the employee‘s projects and the dependents are independent of one another. To keep
the relation state consistent we must have a separate tuple to represent every
combination of an employee‘s dependent and an employee‘s project.

This constraint is specified as a multivalued dependency on the Emp relation. In


the ex, the MVD‘s are Ename-->>Pname and Ename-->>Dname or

Ename-->>Pname|Dname.
The Emp with Ename Smith works on projects with Pname X and Y and has 2
dependents with Dname ‗john‘ and ‗Anna‘. If we stored only the first two tuples in emp(<
‘smith’, ‘X’, ‘john’ > and < ‘smith’, ‘Y’, ‘Anna’>), we would incorrectly show
associations between project X, john and project Y, Anna. These should not be
conveyed, because no such meaning is intended in this relation.

Hence we must store the other 2 tuples (< ‘smith’, ‘X’, ‘Anna’>) and (<
‘smith’, ‘Y’, ‘john’>) to show that {X,Y} and {john,anna} are associated only
with smithie., there is no association between Pname and Dname which mean that the
two attributes are independent.

An MVD X-->>Y in R is called a trivial MVD if

Y is a subset of X or

XUY=R

An MVD that does not satisfy the above condition is non trivial.
FOURTH NORMAL FORM:
A relation schema R is in 4NF with respect to a set of dependencies F, if
either they are trivial or if for every non trivial MVD x-->>y in F+, x is a
super key for R.

The Emp relation in the example is not in 4NF because in the non trivial
MVD‘s ename-->>pname and ename-->>dname,ename is not a super key
of emp.

Emp_proj Emp_dep
ename pname ename dname

Smith x Smith john

Smith y Smith anna

Therefore, the emp relation is decomposed into Emp_proj and Emp_dep as


given below:

The above relations are in 4NF because both are trivial.

Join Dependencies and Fifth Normal Form

JOIN DEPENDENCIES AND 5NF:

Join dependency:

Join dependencies constrain the set of legal relations over a schema R to those
relations for which a given decomposition is a lossless-join decomposition.

Let R be a relation schema and R1 , R2 ,..., Rn be a decomposition of R. If R = R1


 R2 …. Rn, we say that a relation r(R) satisfies the join dependency *(R1 , R2
,..., Rn) if:

r =R1 (r) ⋈R2(r) ⋈…… ⋈Rn(r)

A join dependency is trivial if one of the Ri is R itself.


FIFTH NORMAL FORM
A relation schema R is in fifth normal form (5NF) (or Project-Join Normal Form
(PJNF)) with respect to a set F of functional, multivalued, and join dependencies if,

for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F),

everyRi is a superkey of R.

Consider the Supply relation which has the join dependency.

Supply:

Sname Partname Projname

Smith Bolt proj x

Smith Nut Proj y

Adamsky Bolt Proj y

Walton Nut Proj z

Adamsky Nail Proj x

Adamsky Bolt Proj x

Smith Bolt Proj y

The Supply relation having the join dependency(R1,R2,R3) is decomposed into


three realtions R1,R2 and R3 that are each in 5NF.
The natural join of any two of these relations produces spurious tuples, but
applying natural join to all three together does not.

The natural join of all three produces the state of the original
relation.

R1 R2 R3
Sname Partnam Sname Projnam Partnam Projnam
e e e e
Smith Bolt Smith Proj x Bolt Proj x
Smith Nut Smith Proj y Nut Proj y
Adamsky Bolt Adamsky Proj y Bolt Proj y
Walton Nut Walton Proj z Nut Proj z
Adamsky Nail Adamsky Proj x Nail Proj x

The natural join of any two of these relations produces spurious tuples, but
applying natural join to all three together does not.

The natural join of all three produces the state of the original relation.

R1 R2 (R1 R2) R3

Sname Partname Projname Sname Partname Projname


Smith Bolt Proj x Smith Bolt Proj x
Smith Nut Proj y Smith Nut Proj y
Smith Bolt Proj y Smith Bolt Proj y
Smith Nut Proj x Adamsky Bolt Proj y
Adamsky Bolt Proj y Adamsky Bolt Proj x
Adamsky Bolt Proj x Walton Nut Proj z
Walton Nut Proj z Adamsky Nail Proj x
Adamsky Nail Proj y
Adamsky Nail Proj x
NON-LOSS DECOMPOSITION:
Consider the relation suppliers with attributes supplier-no, status and city.
The below figure shows the sample values for this relation.
SUPPLIERS:
Supplie Status City
r-no
S3 30 Chenna
i
S5 30 Delhi
The two corresponding decomposition for the above relation is given below:
a) SST: SC:

Supplier- Status Supplier- City


no no
S3 30 S3 Chennai

S5 30 S5 Delhi
b) SST STC

Supplier- Status Status City


no
30 Chennai
S3 30
S5 30 30 Delhi
Examining the two decompositions, we observe that:

In case a, no information is lost; the SC and SST values still tells that the supplier S3 has
status 30 and city Chennai, and supplier S5 has status 30 and city Delhi. Thus, the first
decomposition is non loss.

In case b, by contrast, information is definitely lost. It is possible to tell that both


suppliers have status 30, but we cannot tell which supplier has which city. Thus, the
second decomposition is lossy.

The case a is lossless because if we join SST and SC, the original relation
suppliers is obtained. The case b is lossy because the join of SST and SC does not get
back the original relation suppliers.

Properties of Relational Decompositions

Relation Decomposition and Insufficiency of Normal Forms:

Universal Relation Schema:A relation schema R = {A1, A2, …, An} that includes all
the attributes of the database.

Universal relation assumption:Every attribute name is unique.

Decomposition:
The process of decomposing the universal relation schema R into a set of relation
schemas D = {R1,R2, …, Rm} that will become the relational database schema by using the
functional dependencies.

Attribute preservation condition:

Each attribute in R will appear in at least one relation schema Ri in the decomposition so
that no attributes are ―lost‖.

Another goal of decomposition is to have each individual relation Ri in the decomposition


D be in BCNF or 3NF.

Additional properties of decomposition are needed to prevent from generating spurious


tuples
Dependency preservation property of decomposition
Definition: Given a set of dependencies F on R, the projection of F on Ri, denoted
by πRi(F) where Riis a subset of R, is the set of dependencies X →Y in F+ such that
the attributes in X U Y are all contained in Ri.

Hence, the projection of F on each relation schema Ri in the decomposition D is the


set of functional dependencies in F+, the closure of F, such that all their left- and
right-hand-side attributes are in Ri.

Dependency Preservation Property:


A decomposition D = {R1, R2, ..., Rm} of R is dependency-preserving with respect to
F if the union of the projections of F on each Ri in D is equivalent to F; that is

Lossless (Non-additive) Join Property of a Decomposition:


Definition: Lossless join property: a decomposition D = {R1, R2, ..., Rm} of R has
the lossless (nonadditive) join property with respect to the set of dependencies F
on R if, for every relation state r of R that satisfies F, t ;hj;hjghe following holds,
where * is the natural join of all the relations in D:
DENORMALIZATION:

• It is a database optimization technique in which we add redundant data to one or


more tables.
• It help us avoid costly joins in a relational database.
PROS OF DENORMALIZATION:
 Retrieving data is faster since we do fewer joins
 Queries to retrieve can be simpler(and therefore less likely to have bugs), since
we need to look at fewer tables.
CONS OF DENORMALIZATION:
 Updates and inserts are more expensive.
 De-normalization can make update and insert code harder to write.
 Data may be inconsistent . Which is the ―correct‖ value for a piece of data?
 Data redundancy necessitates more storage.
10.ASSIGNMENT

1. Draw an ER-diagram for the banking enterprise with


almost all components and explain. CO2, PO1,PO2,PO3

2) A university registrar‘s office maintains data about the


following entities :

(1)Courses, including number, title, credits, syllabus, and


prerequisites;

(2)Course offerings, including course number, year,


semester, section number, instructor, timings, and
classroom;

(3) Students, including student-id, name, and program; and

(4)Instructors, including identification number, name,


department, and title. Further, the enrollment of students in
courses and grades awarded to students in each course
they are enrolled for must be appropriately modeled.

Construct an E-R diagram for the registrar‘s office.


Document all assumptions that you make about the
mapping constraints.

CO2, PO1,PO2,PO3
11. Part A Question & Answer

S.No Question and Answers CO K


1 List the possible operations is Relational Algebra. CO2 K1
Select operation
Project operation
Union operation
Set Difference operation
Cartesian Product operation
Rename operation
Set-Intersection operation
Natural-join operation
Division
Assignment operation
2 Define Super key. CO2 K1
A super key is a set of one or more attributes that, taken
collectively, allow us to identify uniquely an entity in the
entity set. For example, the social-security attribute of the
entity set customer is sufficient to distinguish one customer
entity from another. Similarly, the combination of customer-
name and social security is a super key for the entity set
customer.
3 Define Primary key. CO2 K1
Superkeys for which no proper subset is a super key. Such
minimal superkeys are called candidate keys or primary keys.
For example, the social-security attribute of the entity set
customer is sufficient to distinguish one customer entity from
another.

4 Define Assertions. CO2 K1


An assertion is a predicate expressing a condition that we
wish the database always satisfied.
E.g.) create assertion check
5 What is a SELECT operation?
The select operation selects tuples that satisfy a
given predicate.
6 What is a PROJECT operation?
The project operation is a unary operation that returns its
argument relation with certain attributes left out.
Projection is denoted by pie
S.No Question and Answers CO K
7 Define inherent model-based constraints or implicit CO2 K1
constraints.

Constraints that are inherent in the data model.

8 Define schema-based constraints or explicit CO2 K1


constraints.
Constraints that can be directly expressed in the schemas of
the data model, typically by specifying them in the DDL

9 Define application-based or semantic constraints or CO2 K1


business rules.
Constraints that cannot be directly expressed in the schemas
of the data model, and hence must be expressed and
enforced by the application programs or in some other way.

10 Domain Constraints CO2 K1


Domain constraints specify that within each tuple, the value
of each attribute A must be an atomic value from the domain
dom(A).

11 Define entity integrity constraint CO2 K1


The entity integrity constraint states that no primary key
value can be NULL. This is because the primary key value is
used to identify individual tuples in a relation.
Key constraints and entity integrity constraints are specified
on individual relations.

12 Define referential integrity constraint CO2 K1


Referential integrity constraint is specified between two
relations and is used to maintain the consistency among
tuples in the two relations.
Informally, the referential integrity constraint states that a
tuple in one relation that refers to another relation must
refer to an existing tuple in that relation.
S.No Question and Answers CO K
13 What are the uses of functional dependencies? To test CO2 K1
relations to see whether they are legal under a given set of
functional dependencies. To specify constraints on the set of
legal relations.

14 Explain trivial dependency? Functional dependency of CO2 K1


the form a ß is trivial if ß C a. Trivial functional dependencies
are satisfied by all the relations.

15 What is meant by computing the closure of a set of CO2 K1


functional dependency? + The closure of F denoted b y F
is the set of functional dependencies logically implied by F.

16 What is meant by normalization of data? It is a process CO2 K1


of analyzing the given relation schemas based on their
Functional Dependencies (FDs) and primary key to achieve
the properties Minimizing redundancy Minimizing insertion,
deletion and updating anomalies

17 What is Attribute preservation condition? CO2 K1


Each attribute in R will appear in at least one relation
Schema Ri in the decomposition so that no attributes are
―lost‖.
18 Define Fourth Normal Form. CO2 K1
A relation schema R is in 4NF with respect to a set F of FD‘s
if for all FD‘s of the form A ->> B (Multi valued Dependency),
where A is contained in R and B is contained in R, at least
one of the following holds:
A ->> B is a trivial MD. A is a superkey for schema R.
19 Define 5NF or Join Dependencies. CO2 K1
Let R be a relation schema and R1, R2, …, Rn be a
decomposition of R. The join dependency *(R1, R2, …Rn) is
used to restrict the set of legal relations to those for which
R1,R2,…Rn is a lossless-join decomposition of R. Formally, if
R= R1 U R2U …U Rn, we say that a relation r® satisfies the
join dependency *(R1, R2, …Rn) if R = A join dependency is
trivial if one of the Ri is R itself.
S.No Question and Answers CO K

20 Define - Irreducible Set of Dependencies. CO2 K1


A functional depending set S is irreducible if the set has the
following three properties:
Each right set of a functional dependency of S contains
only one attribute.
Each left set of a functional dependency of S is
irreducible. It means that reducing any
one attribute from left set will change the content of S (S will
lose some information).
Reducing any functional dependency will change the
content of S.
21 List the pitfalls in Relational Database Design. CO2 K1
Repetition of information
Inability to represent certain information
22 List the properties of decomposition. CO2 K1
Lossless join
Dependency Preservation
No repetition of information
23 Define 3 Normal Form. CO2 K1
A relation schema R is in 3 NF with respect to a set F of FD‘s
if for all FD‘s of the form A -
> B, where A is contained in R and B is contained in R, at
least one of the following holds:
A -> B is a trivial FD
A is a superkey for schema R.
Eachattribute in B – A is contained in a candidate key for R
24 Define BCNF. CO2 K1
A relation schema R is in BCNF with respect to a set F of
FD‘s if for all FD‘s of the form A
-> B, where A is contained in R and B is contained in R, at
least one of the following holds:
A -> B is a trivial FD
A is a superkey for schema R.
25 Define 1st Normal form
If the Relation R contains only the atomic fields then that
Relation R is in first normal form.
E.g.) R = (account no, balance) – first normal form.
S.No Question and Answers CO K
26 Define Second Normal Form. CO2 K1
A relation schema R is in 2 NF with respect to a set F of FD‘s
if for all FD‘s of the form A -
> B, where A is contained in R and B is contained in R, and A
is a superkey for schema R.
27 Define Functional Dependency. CO2 K1
Functional dependencies are constraints on the set of legal
relations. They allow us to
express facts about the enterprise that we are modeling with
our database.
Syntax: A -> B e.g.) account no -> balance for account table.

28 What is Generalization? CO2 K1


Generalization is a bottom-up design process which combines
a number of entity sets that share the same features into a
higher-level entity set. For example employee and customer
both share the attributes name, street, and city which are
combined into higher level entity set person.
12. Part B Questions

S.No Question and Answers CO K


1 Draw an ER-diagram for car insurance company CO2 K1

2 Explain the mapping of ER diagram to relational schema for car CO2 K1


insurance company
3 State the need for Normalization of a database and explain CO2 K1
the various normal forms (1st, 2nd, 3rd, BCNF, 4th, 5th and
Domain-key) with suitable examples.

4 What are normal-forms? Explain the types of Normal form with CO2 K1
an example.
5 What are the pitfalls in the relational database design? With a CO2 K1
suitable example, explain the role of functional dependency in
the process of normalization.
6 Consider the relation R(A,B,C,D,E) with functional CO2 K1
dependencies. {A→BC, CD→E, B→D, E→A}. Identify super
keys. Find Fc, F+.
Explain Boyce-codd normal form with an example. Also state
how it differs from that of 3NF.

7 Consider the universal relation R = {A, B, C, D, E, F, G, H, I} CO2 K1


and the set of functional dependencies F = { {{A,B}  {C}, {A}
 {D,E}, {B}  {F}, {F}  {G,H} ,{D}  {I,J}}. What is the
key for R? Decompose R into 2NF, then 3NF relations.
8 Explain the principles of : (i) loss less join decomposition (ii) CO2 K1
join dependencies (iii) fifth normal form.
9 With relevant examples discuss the various operations in CO2 K1
Relational Algebras
10 Explain tuple relational calculus CO2 K1
11 Explain domain relational calculus CO2 K1
13. SUPPORTIVE ONLINE CERTIFICATION COURSES
Sl.N Name of the Name of the Website Link
o. Institute Course
1. coursera Database https://www.coursera.org/lea
Management rn/database-management
Essentials
2. coursera Database systems https://www.coursera.org/spe
Specialization cializations/database-systems
3. Udemy Introduction to https://www.udemy.com/cour
Database se/database-engines-crash-
Engineering course/
4. Udemy Relational https://www.udemy.com/cour
Database Design se/relational-database-
design/
5. Udemy Database Design https://www.udemy.com/cour
se/database-design/
6. Udemy Database Design https://www.udemy.com/cour
Introduction se/cwdatabase-design-
introduction/
7. Udemy The Complete https://www.udemy.com/cour
Database Design se/the-complete-database-
& Modeling modeling-and-design-
Beginners Tutorial beginners-tutorial/
8. Udemy Database Design https://www.udemy.com/cour
and MySQL se/calebthevideomaker2-
database-and-mysql-classes/
9. NPTEL Data Base https://onlinecourses.nptel.ac
Management .in/noc21_cs04/preview
System
14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY

Application and Uses of Database Management System (DBMS)

Railway Reservation System.


Library Management System
Banking System
Universities and colleges Management Systems
Credit card transactions.
Social Media Sites
Telecommunications

Finance Applications
15.CONTENT BEYOND SYLLABUS

Introduction to Hierarchical Database Model


Hierarchical Database Model, as the name suggests, is a database model in which
the data is arranged in a hierarchical tree edifice. As it is arranged based on the
hierarchy, every record of data tree should have at least one parent, except for the
child records in the last level, and each parent should have one or more child
records. The Data can be accessed by following through the classified structure,
always initiated from the Root or the first parent. Hence this model is named as
Hierarchical Database Model.

What is Hierarchical Database Model


It is a data model in which data is represented in the tree-like structure. In this
model, data is stored in the form of records which are the collection of fields. The
records are connected through links and the type of record tells which field is
contained by the record. Each field can contain only one value.

It must have only one parent for each child node but parent nodes can have more
than one child. Multiple parents are not allowed. This is the major difference
between the hierarchical and network database model. The first node of the tree is
called the root node. When data needs to be retrieved then the whole tree is
traversed starting from the root node. This model represents one- to- many
relationships.

Let us see one example: Let us assume that we have a main directory which
contains other subdirectories. Each subdirectory contains more files and directories.
Each directory or file can be in one directory only i.e. it has only one parent.
15.CONTENT BEYOND SYLLABUS

Here A is the main directory i.e. the root node. B1 and B2 are their child or
subdirectories. B1 and B2 also have two children C1, C2 and C2, C3 respectively.
They may be directories or other files. This depicts one- to- many relationships.

Uses of Hierarchical Database Model

The uses of the database model are as explained here.


A Hierarchical database model was widely used during the Mainframe Computers
Era. Today, it is used mainly for storing file systems and geographic information. It is
used in applications where high performance is required such as telecommunications
and banking. A hierarchical database is also used for Windows Registry in the
Microsoft Windows operating system. It is useful where the following two conditions
are met:

The data should be in a hierarchical pattern i.e. parent-child relationship must be


present.

The data in a hierarchical pattern must be accessed through a single path only.
Advantages

Few advantages are listed below.


Data can be retrieved easily due to the explicit links present between the table
structures.

Referential integrity is always maintained i.e. any changes made in the parent table
are automatically updated in a child table.

Promotes data sharing.

It is conceptually simple due to the parent-child relationship.

Database security is enforced.

Efficient with 1: N relationships.

A clear chain of command or authority.

High performance.

Disadvantages

Below are some of the disadvantages given.

If the parent table and child table are unrelated then adding a new entry in the child
table is difficult because additional entry must be added in the parent table.

Complex relationships are not supported.

Redundancy which results in inaccurate information.

Change in structure leads to change in all application programs.

M: N relationship is not supported.

No data manipulation or data definition language.


Features

Some features are pointed out below:


Many to many relationships: It only supports one – to – many relationships.
Many to many relationships are not supported.

Problem in Deletion: If a parent is deleted then the child automatically gets


deleted.

Hierarchy of data: Data is represented in a hierarchical tree-like structure.


Parent-child relationship: Each child can have only one parent but a parent can
have more than one children.

Pointer: Pointers are used for linking records that tell which is a parent and which
child record is.

Disk input and output is minimized: Parent and child records are placed or stored
close to each other on the storage device which minimizes the hard disk input and
output.

Fast navigation: As parent and child are stored close to each other so access time
is reduced and navigation becomes faster.
Examples
Let us take an example of college students who take different courses. A course can
be assigned to an only single student but a student can take as many courses as
they want therefore following one to many relationships.
Now we can represent the above hierarchical model as relational tables as shown
below:

Student

Course
16. ASSESSMENT SCHEDULE

Assessment Tools Proposed Date

Assessment 1 20.9.2021

Assessment 2 22.10.2021

Model Exam 18.11.2021


17. PRESCRIBED TEXT BOOKS &REFERENCE BOOKS

TEXT BOOKS:

1. Elmasri R. and S. Navathe, ―Fundamentals of Database Systems‖,


Pearson Education, 7th Edition, 2016.

2. Abraham Silberschatz, Henry F.Korth, ―Database System Concepts‖,


Tata McGraw Hill , 7th Edition, 2021.

3. Elmasri R. and S. Navathe, Database Systems: Models, Languages,


Design and Application Programming, Pearson Education, 2013

REFERENCES:

1. Raghu Ramakrishnan, Gehrke ―Database Management Systems‖,


MCGraw Hill, 3rd Edition 2014.

2. Plunkett T., B. Macdonald, ―Oracle Big Data Hand Book‖ , McGraw Hill,
First Edition, 2013

3. Gupta G K , ―Database Management Systems‖ , Tata McGraw Hill


Education Private Limited, New Delhi, 2011.

4. C. J. Date, A.Kannan, S. Swamynathan, ―An Introduction to Database


Systems‖, Eighth Edition, Pearson Education, 2015.

5. Maqsood Alam, Aalok Muley, Chaitanya Kadaru, Ashok Joshi, Oracle


NoSQL Database: Real-Time Big Data Management for the Enterprise,
McGraw Hill Professional, 2013.

6. Thomas Connolly, Carolyn Begg, ― Database Systems: A Practical


Approach to Design, Implementation and Management‖, Pearson , 6th
Edition, 2015.
18. MINI PROJECT SUGGESTIONS

Design of Relational schema for the following and also apply normalization. The
topics are not limited to listed as below. Students can choose topics of their interest
1) Blood bank management system
Hospitals will get register to request the blood they want. And some donors will get signup to
this blood bank to donate the blood. These donors will be available to donate in the particular
areas according to the registered data. The hospitals will request for the blood and blood bank
will provide the details of donors near to the hospital. Blood bank also shows the availability of
blood groups to the hospitals. We can also maintain the data of donated blood to the hospitals.

2) School management system


Staff details will be stored with id and all the staff details will be stored in the system. And we
retrieve them at any time by using their id. Students information will also be stored in the
system and students marks also can be stored. Salary management can also be done in this
system for the staff members of the school. Fees of the students can also be maintained in the
system. Another feature will contain sections information and the section class teacher.

3) Payroll management system


create a system where the admin will be the manager. The manager will log in with his id and
he will add all the details about the employees and he can add any new employees who are
joined in the organization. add a feature to calculate the salaries of the employees based on
their designation and attendance. Add a feature to display the details of all the employees in the
organization and we can also display the details and salaries of the employees which are
calculated in the current month.

4) Railway system
Users can book the train tickets to reach their destination. n this option includes things like the
present station and destination station and the train that they want to travel in and provide the
user to check the details of the train by using the train id and it must also show the details of
train arrival time, in which platform the train is arriving and departure timings of the train. also
add an option in which that will allow the user to book a meal while traveling on the train. And
we can also add the option which shows the price range of a different class of booking like AC,
second class, sleeper, and others. And try to think yourself to add any options.
5) Hospital Data Management
assign unique IDs to the patients and store the relevant information under the same. add the
patient‘s name, personal details, contact number, disease name, and the treatment the patient is
going through. mention under which hospital department the patient is (such as cardiac, gastro,
etc.). add information about the hospital‘s doctors. A doctor can treat multiple patients, and
he/she would have a unique ID as well. Doctors would also be classified in different
departments. add the information of ward boys and nurses working in the hospital and assigned
to different rooms. Patients would get admitted into rooms, so add that information in your
database too.
Thankyou

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.

You might also like