Professional Documents
Culture Documents
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
20IT403
DATABASE MANAGEMENT SYSTEMS
Department: IT
Batch/Year:2020 – 2024/II
Created by:
Ms.R.ASHA
Assistant Professor
Date:09.03.2022
1.TABLE OF CONTENTS
1. Contents
2. Course Objectives
3. Pre Requisites
4. Syllabus
5. Course outcomes
7. Lecture Plan
9. Lecture Notes
10. Assignments
To learn how to efficiently design and implement various database objects and
entities
6
3. PRE REQUISITES
7
4. SYLLABUS
DATABASE MANAGEMENT SYSTEMS
Query Processing Overview – Algorithms for SELECT and JOIN operations – Query
optimization using Heuristics and Cost Estimation
CO2: Map ER model to Relational model to perform database design effectively. CO3:
CO6: Design and deploy an efficient and scalable data storage node for varied kind of
application requirements.
6. CO- PO/PSO MAPPING
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 2 1 1 1 1 1 1 2 2 2 2 2
CO2 3 2 2 1 1 1 1 2 2 2 2 2
CO3 2 1 1 1 1 1 1 2 2 2 2 2
CO4 2 1 1 1 1 1 1 2 2 2 2 2
CO5 2 1 1 1 1 1 1 2 2 2 2 2
CO6 2 1 1 1 1 1 1 2 2 2 2 2
Entity Types
Customer
Car
Insurance policy
Accident
Payment
Relation: Customer
Cust_Id cust_name cust_address cust_phone
Relation: Car
Relation: Accident
Relation: Payment
The following guidelines are often critical to the success of database design:
• Build a data dictionary to supplement the data model diagrams and the DBDL.
2.Constraints that can be directly expressed in the schemas of the data model,
typically by specifying them in the DDL. We call these schema-based constraints
or explicit constraints.
Constraints that cannot be directly expressed in the schemas of the data model,
and hence must be expressed and enforced by the application programs or in
some other way. We call these application-based or semantic constraints or
business rules.
Domain constraints specify that within each tuple, the value of each attribute A
must be an atomic value from the domain dom(A).
The data types associated with domains typically include standard numeric data
types for integers (such as short integer, integer, and long integer) and real
numbers (float and double-precision float). Characters, Booleans, fixed-length
strings, and variable-length strings are also available, as are date, time,
timestamp, and other special data types.
This means that no two tuples can have the same combination of values for all
their attributes.
Usually, there are other subsets of attributes of a relation schema R with the
property that no two tuples in any relation state r of R should have the same
combination of values for these attributes. Suppose that we denote one such
subset of attributes by SK; then for any two distinct tuples t1 and t2 in a relation
state r of R, we have the constraint that:
t1[SK] ≠ t2[SK]
Any such set of attributes SK is called a superkey of the relation schema R. A
superkey SK specifies a uniqueness constraint that no two distinct tuples in any
state r of R can have the same value for SK.
Every relation has at least one default superkey— the set of all its attributes. A
superkey can have redundant attributes, however, so a more useful concept is that
of a key, which has no redundancy.
Hence, a key is a superkey but not vice versa. A superkey may be a key (if it is
minimal) or may not be a key (if it is not minimal).
Consider the STUDENT relation. The attribute set {Ssn} is a key of STUDENT
because no two student tuples can have the same value for Ssn.8 Any set of
attributes that includes Ssn—for example, {Ssn, Name, Age}—is a superkey.
However, the superkey {Ssn, Name, Age} is not a key of STUDENT because
removing Name or Age or both from the set still leaves us with a superkey. In
general, any superkey formed from a single attribute is also a key.
A key with multiple attributes must require all its attributes together to have the
uniqueness property.
In general, a relation schema may have more than one key. In this case, each of
the keys is called a candidate key.
It is common to designate one of the candidate keys as the primary key of the
relation.
when a relation schema has several candidate keys, the choice of one to become
the primary key is somewhat arbitrary; however, it is usually better to choose a
primary key with a single attribute or a small number of attributes.
The other candidate keys are designated as unique keys and are not underlined.
Another constraint on attributes specifies whether NULL values are or are not
permitted.
For example, if every STUDENT tuple must have a valid, non-NULL value for the
Name attribute, then Name of STUDENT is constrained to be NOT NULL.
Key constraints and entity integrity constraints are specified on individual relations.
Informally, the referential integrity constraint states that a tuple in one relation
that refers to another relation must refer to an existing tuple in that relation.
The attribute Dno of EMPLOYEE gives the department number for which each
employee works; hence, its value in every EMPLOYEE tuple must match the
Dnumber value of some tuple in the DEPARTMENT relation.
A set of attributes FK in relation schema R1 is a foreign key of R1 that references
relation R2 if it satisfies the following rules:
1. The attributes in FK have the same domain(s) as the primary key attributes PK of
R2; the attributes FK are said to reference or refer to the relation R2.
2.A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK
for some tuple t2 in the current state r2(R2) or is NULL.
In the former case, we have t1[FK] = t2[PK], and we say that the tuple t1 references
or refers to the tuple t2.
Examples of such constraints are the salary of an employee should not exceed the
salary of the employee‘s supervisor and the maximum number of hours an
employee can work on all projects per week is 56. Such constraints can be
specified and enforced within the application programs that update the database,
or by using a general-purpose constraint specification language. Mechanisms
called triggers and assertions can also be used to specify the contraints.
state constraints define the constraints that a valid state of the database must
satisfy.
select
project
union
set difference
cartesian product
rename
Here the select, project and rename operations are called unary operations,
because they operate on one relation. The other three operations operate on pairs
of relations and are, therefore called binary operations.
1) The select operation:
Comparisons are allowed in the predicate using relational operators, =, , >, . <,
..
Several predicates can be combined into a larger predicate by using the connectives
(and), (or), (not).
Question 1: - Select those tuples of the loan relation where the branch-name
is Perryridge.
The equivalent relational algebra query for the given question is,
Question 2:- Find all the tuples in which the amount lent is more than 1200
in loan relation.
Relational algebra query:
The equivalent relational algebra query for the given question is,
σamount >1200 (loan)
The result of the query is given below.
loan-number branch-name Amount
L-14 Downtown 1500
L-15 Perryridge 1500
L-16 Perryridge 1300
L-23 RedWood 3000
Question 3: - Find those tuples pertaining to loans of more than 1200 made by the
perryridge branch.
Question 1:- List all the loan-numbers and amount of the loan.
The equivalent relational algebra query for the given question is,
loan-number, amount(loan)
Jones
The relations r and s must be of the same arity. That is they must have the
same number of attributes.
The domains of the ith attribute of r and ith attribute of s must be the same
for all i.
Question 1:- Find the names of all customers who have either an
account or a loan or both?
The equivalent relational algebra query for the given question is,
Question 1:- Find all the customers of the bank who have an account but not a
loan?
The equivalent relational algebra query for the given question is,
Result:
5) The Cartesian product operation:
The Cartesian product operation, denoted by (X), allows combining
information from any two relations.
If r1 contains n1 tuples and r2 contains n2 tuples, then there are n1*n2 ways of
choosing a pair of tuples-one tuple from each relation.
Example:
Question 1:- Find the names of all customers who have a loan at the perryridge branch.
This question needs the information in both the loan relation and the borrower relation.
The above relation pertains results to only perryridge branch. However, the
customer-name column may contain customers who do not have a loan at
the perryridge branch. Therefore to obtain the correct result the query has
to be written as below.
σborrower.loan-number = loan.loan-number(σbranch-name = ―perryridge‖
(borrower X loan))
Final result:
Customer-name
Jackson
x(A1,A2,…,An)(E)
Returns the result of expression E under the name x, and with the attributes
renamed to A1, A2,….,An.
Examples:
Computation:
This query requires to (1) compute first a temporary relation consisting of those
balances that are not the largest and (2) take the set difference between the
relation П balance (account) and the temporary relation just computed, to obtain the
result.
Step 1:
To compute the temporary relation, it is needed to compare the values of all
account balances. This comparison is done by computing the cartesianproduct
account X account and forming a selection to compare the value of any two
balances appearing in one tuple. The rename operation is used to rename one
reference to the account relation.
The expression for temporary relation that consists of the balances that are not
largest is
П account.balance(σaccount.balance<d.balance(account Xd(account)))
This expression gives those balances in the account relation for which a larger
balance appears somewhere in the account relation renamed as d. The result
contains all balances except the largest one as shown next.
500
400
400
Step 2:
The query to find the largest account balance in the bank can be written as:
Пbalance(account) –
П account.balance(σaccount.balance<d.balance
(accountXd(account )))
balance
900
Question 2:- Find the names of all customers who live on the same street
and in the same city as smith.
Computation:
The smith‘s street and city can be obtained by,
Пcustomer-street, customer-city (σcustomer-name = smith‖(customer)))
In order to find other customers with this street and city, the customer relation must be
referred second time. The rename operation is used for this purpose. The resulting
Additional Operations:
Division operation
Assignment operation
r s = r – (r – s)
Question 1: Find all customers who have both loan and an account?
The equivalent relational algebra query for the given question is,
The natural join operation forms a cartesian product of its two arguments,
performs a selection forcing equality on those attributes that appear in both
relation schemas and finally removes duplicate attributes.
Example 1:Consider the loan and borrower relation. Refer Cartesian product
operation for the relations.
Question 1:- Find the names of all customers who have a loan at the bank,
and find the amount of the loan?
Result:
Customer-name Loan-number Amount
customer-name acc-no
Hayes A-102
John A-101
John A-201
Jones A-217
Question 2:- Find the names of all branches with customers who have an
account in the bank and who live in Harrison?
Relational algebra Expression:
Пbranch-name, (σcustomer-city = ―Harrison‖(customer account
depositor))
Result:
Branch-name
Brighton
Perryridge
r1 = Пbranch-name(σbranch-city = ―Brooklyn‖(branch))
The (customer-name, branch-name) pairs of all customers who has an account at a branch
can be found by the expression,
customer-name branch-name
Hayes Perryridge
John Downtown
John Brighton
Jones Brighton
To find customers who appear in r2 with every branch name in r1, the divide operation is
used as given below.
Пcustomer-name, branch-name (depositor account) Пbranch-name(σbranch-city = ―Brooklyn‖(branch))
Result:
Customer-name
John
The Assignment Operation:
temp1 r
temp2 s
Generalized Projection:
The generalized projection operation extends the projection operation by
allowing arithmetic functions to be used in the projection list. The generalized
projection operation has the form
ПF1,F2,….,Fn(E)
Question:
The credit-info relation lists the credit limit and expenses so far done. To find how
much more each person can spend, the following expression is written:
Result:
6. Aggregate Functions:
Aggregate functions take a collection of values and return a single value as a result
Question:
To find out the total sum of salaries of all part-time employees in the bank, the
following relational algebra expression is used.
gsum(salary)(pt-works)
7. Groups:
The result can be grouped based on some attribute. For example to partition the
relation pt-works into groups based on the branch, and to apply aggregation on
each group, the query is written as below.
g
branch-name sum(salary)(pt-works)
Left outer Join (ii) Right outer Join (iii) Full outer Join
(i)The left outer join: This takes all tuples in the left relation that did not match
with any tuple in the right relation, pads the tuples with null values for all other
attributes from the right relation, and adds them to the result of the natural join.
The result of employee ft-works is given below
(ii) The right outer join: it is symmetric with the left outer join. It pads tuples
from the right relation that did not match any from the left relation with nulls and
adds them to the result of the natural join. The result of employee ft-works is
given below.
(iii) The full outer join: it does both of the above operations, padding tuples from
the left relation that did not match any from the right relation, as well as tuples
from the right relation that did not match any from the left relation, and adding
them to the result of the join. The below relation shows the result of employee
ft-works.
Relational Calculus
A relational calculus expression creates a new relation, which is specified in
terms of variables that range over rows of the stored database relations (in tuple
calculus) or over columns of the stored relations (in domain calculus).
Each tuple variable usually ranges over a particular database relation, meaning
that the variable may take as its value any individual tuple from that relation.
{t | COND(t)}
where t is a tuple variable and COND (t) is a conditional expression involving
t.
The result of such a query is the set of all tuples t that satisfy COND (t).
For example, to find all employees whose salary is above $50,000, we can write
the following tuple calculus expression:
The condition EMPLOYEE(t) specifies that the range relation of tuple variable t
is EMPLOYEE.
Each EMPLOYEE tuple t that satisfies the condition t.Salary>50000 will be
retrieved. Notice that t.Salary references attribute Salary of tuple variable t;
To retrieve only some of the attributes—say, the first and last names—we write
For each tuple variable t, the range relation R of t. This value is specified by a
condition of the form R(t).
In tuple relational calculus, we first specify the requested attributes t.Bdate and
t.Address for each selected tuple t. Then we specify the condition for selecting a
tuple following the bar (|)—namely, that t be a tuple of the EMPLOYEE relation
whose Fname, Minit, and Lname attribute values are ‗John‘, ‗B‘, and ‗Smith‘,
respectively.
{t1.Aj, t2.Ak, ... , tn.Am | COND(t1, t2, ..., tn, tn+1, tn+2, ..., tn+m)}
where t1, t2, … , tn, tn+1, … , tn+m are tuple variables, each Ai is an attribute of the
relation on which ti ranges, and COND is a condition or formula of the tuple
relational calculus.
A formula is made up of predicate calculus atoms, which can be one of the following:
1. An atom of the form R(ti), where R is a relation name and ti is a tuple variable.
This atom identifies the range of the tuple variable ti as the relation whose name is
2. An atom of the form ti.A op tj.B, where op is one of the comparison operators in
the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an attribute of the
relation on which ti ranges, and B is an attribute of the relation on which tj ranges.
3. An atom of the form ti.A op c or c op tj.B, where op is one of the comparison
operators in the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an attribute
of the relation on which ti ranges, B is an attribute of the relation on which tj ranges,
and c is a constant value.
Each of the preceding atoms evaluates to either TRUE or FALSE for a specific
combination of tuples; this is called the truth value of an atom.
In general, a tuple variable t ranges over all possible tuples in the universe. For
atoms of the form R(t), if t is assigned to a tuple that is a member of the specified
relation R, the atom is TRUE; otherwise, it is FALSE.
In atoms of types 2 and 3, if the tuple variables are assigned to tuples such that
the values of the specified attributes of the tuples satisfy the condition, then the
atom is TRUE.
Notice that in a formula of the form F = (F1 AND F2) or F = (F1 OR F2), a tuple
variable may be free in F1 and bound in F2, or vice versa; in this case, one
occurrence of the tuple variable is bound and the other is free in F.
Query 1. List the name and address of all employees who work for the ‗Research‘
department.
Query 3. List the names of employees who work on all the projects controlled by
department number 5. One way to specify this query is to use the universal
quantifier as shown:
x.Pnumber=w.Pno))))}
Query 5. List the names of managers who have at least one dependent.
This query is handled by interpreting managers who have at least one dependent
as managers for whom there exists some dependent.
{x1, x2, ..., xn | COND(x1, x2, ..., xn, xn+1, xn+2, ..., xn+m)} where x1, x2, … , xn,
xn+1, xn+2, … , xn+m are domain variables that range over domains (of
attributes), and COND is a condition or formula of the domain relational calculus.
A formula is made up of atoms. The atoms of a formula are slightly different from
those for the tuple calculus and can be one of the following:
1.An atom of the form R(x1, x2, … , xj), where R is the name of a relation of degree
j and each xi, 1 ≤ i ≤ j, is a domain variable. This atom states that a list of values of
<x1, x2, … , xj> must be a tuple in the relation whose name is R, where xi is the
value of the ith attribute value of the tuple. To make a domain calculus expression
more concise, we can drop the commas in a list of variables; thus, we can write:
{x1, x2, ..., xn | R(x1 x2 x3) AND ...} instead of: {x1, x2, ... , xn | R(x1, x2, x3) AND
...}
2.An atom of the form xi op xj, where op is one of the comparison operators in the
set {=, <, ≤, >, ≥, ≠}, and xi and xj are domain variables.
B. Smith‘.
Q0: {u, v | (∃q) (∃r) (∃s) (∃t) (∃w) (∃x) (∃y) (∃z) (EMPLOYEE(qrstuvwxyz) AND
q=‗John‘ AND r=‗B‘ AND s=‗Smith‘)}
Query 1. Retrieve the name and address of all employees who work for the
‗Research‘ department.
Example:
Relation Instance:
X Y holds if whenever two tuples have the same value for X, they must have
the same value for Y
For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then
t1[Y]=t2[Y]
Examples of FD constraints
Social security number determines employee name
SSN ENAME
Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F.
{supplier-no, part-no}->{city}
{supplier-no, part-no}->{qty}
As another example, consider the relation R with attributes A,B and C, such that
the FDs A->B and B->C both hold for R. Then it is easy to see that the FD A->C
also holds for R. The FD A->C is an example of a transitive FD i.e. C is said to
depend on A transitively via B. The set of all FDs that are implied by a given set S
of FDs is called the closure of S, written S+
Self-determination: A->A
Let R be the relation with attributes A,B,C,D,E,F and the FDs are:
We now show that the FD AD->F holds for R and is thus a member of the closure of
A->BC (given)
CD->EF (given)
Closure of a set of attributes X with respect to Fis the set X+ of all attributes that
X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
algorithm that says ―Repeatedly apply the rules from the previous section until
Let R be the relation, Z be the set of all attributes of R and S be the set of
FDs that hold for R. From this we can determinate the set of all attributes of R that
then CLOSURE[Z,S]=CLOSURE[Z,S]UY;
end
if CLOSURE[Z,S] did not change
on this iteration then leave the loop;
end;
Example:
Suppose we are given a relation R with attributes A,B,C,D,E,F and FDs are:
A->BC
E->CF
B->E
CD->EF
We now compute the closure{A,B}+ of the set of attributes {A,B} under this set of
FDs.
On the second iteration (for the FD E->CF), we find that the left side is not a subset
of the result, which thus remains unchanged.
On the third iteration (for the FD B->E), we add E to CLOSURE[Z,S], which now has
the value {A,B,C,E}.
Now we go round the inner loop four times again. CLOSURE[Z,S] does not change,
and so the whole process terminates with {A,B}+ = {A,B,C,E,F}.
Thus if Z is a set of attributes of relation R and S is a set of FDs that hold for R, then
set of FDs that hold for R with Z as the left side is the set consisting of all FDs of
the form Z->Z‘, where Z‘ is some subset of the closure Z+ of Z under S. The closure
S+ of the original set S of FDs is then the union of all such sets of FDs, taken over
all possible attribute sets Z.
Definition (Covers):
This means that if the DBMS enforces the FDs in S2, then it will
automatically be enforcing the FDs in S1.
If S2 is a cover of S1 and S1 is a cover for S2 i.e. if S1+ = S2+ -
we say that S1 and S2 are equivalent. In this case if the DBMS enforces the FDs in
S2 it will automatically be enforcing the FDs in S1 and vice versa.
Example:
Consider the relation PARTS for which the following FDs hold:
PART-NO->PART-NAME
PART-NO->COLOUR
PART-NO->WEIGHT
PART-NO->CITY
PART-NO->WEIGHT
PART-NO->CITY
PART-NO-> PART-NAME
PART-NO->WEIGHT
PART-NO->CITY
PART-NO-> PART-NO The first FD can be discarded without changing the closure.
PART-NO-> PART-NAME
PART-NO->COLOUR
PART-NO->WEIGHT
PART-NO->CITY
So, for every set of FDs there exist at least one equivalent set
that is irreducible.
Example:
A->BC
B->C
A->B
AB->C
AC->D
We now compute the irreducible set of FDs that is equivalent to the given set:
The first step is to rewrite the FDs such that each has a singleton right side:
A->B
A->C
B->C
A->B
AB->C
AC->D
Next, attribute C can be eliminated from the left side of the FD AC->D, because we
have A->C which can be written as
AC->D (given)
A->C (given)
A->B
B->C
A->D
NOTE:
The irreducible sets can also be represented by the terms minimal sets,
minimal cover and canonical cover.
Insertion Anomalies:
Insertion anomalies can be differentiated into two types, illustrated by the following examples
based on the EMP_DEPT relation:
To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for
the department that the employee works for, or NULLs (if the employee does not work for
a department as yet). For example, to insert a new tuple for an employee who works in
department number 5, we must enter all the attribute values
of department 5 correctly so that they are consistent with the corresponding values for
department 5 in other tuples in EMP_DEPT.
It is difficult to insert a new department that has no employees as yet in the EMP_DEPT
relation. The only way to do this is to place NULL values in the attributes for employee.
This violates the entity integrity for EMP_DEPT because SSN is its primary key.
Deletion Anomalies:
If we delete from EMP_DEPT an employee tuple that happens to represent the last employee
working for a particular department, the information concerning that department is lost
from the database.
For example if we delete the details of Borg.James E who works for Headquarters then the
details of that department is lost.
Modification Anomalies:
In EMP_DEPT, if we change the value of one of theattributes of a particular departmentsay,
the manager of department 5, we mustupdate the tuples of all employees who work in
that department; otherwise, thedatabase will become inconsistent. If we fail to update
some tuples, the same departmentwill be shown to have two different values for manager
in different employeetuples, which would be wrong.
NORMALIZATION
A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R
with the property that no two tuples t1 and t2 in any legal relation state r of R will have
t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will
cause K not to be a superkeyany more.
First normal form, second normal form, third normal form and BCNF are under the
category of single valued normalization
Definition:
First normal form (1NF) disallows multi-valued attributes, composite attributes,
and their combinations. It states that the domain of an attribute must include only
atomic (simple, indivisible) values.
Description:
Consider the relation department in figure 15.9 (a). We assume that each
department can have a number of locations. This is not in 1NF because Dlocations
is not an atomicattribute, as illustrated by the first tuple in Figure 15.9(b).
There are three main techniques to achieve first normal form for such a relation:
Expand the key so that there will be a separate tuple in the original DEPARTMENT
relation for each location of a DEPARTMENT, as shown in Figure 15.9(c). In this
case, the primary key becomes the combination {Dnumber, Dlocation}. This
solution has the disadvantage of introducing redundancy in the relation.
Remove the attribute Dlocations that violates 1NF and place it in a separate
relation DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT.
The primary key of this relation is the combination {Dnumber, Dlocation}, as shown
in the below Figure 15.10 A distinct tuple in DEPT_LOCATIONS exists for each
location of a department. This decomposes the non-1NF relation into two 1NF
relations.
Definition.
A relation schema R is in 2NF if it is in 1NF and satisfies full functional dependency. i.e.,
every nonprime attribute A in R is fully dependent on the primary key of Rand not part of
it.
The EMP_PROJ relation in Figure 15.10 (a) is in 1NF but is not in 2NF. The nonprime
attribute Ename violates 2NF because of FD2, as do the nonprime attributes Pname and
Plocation because of FD3. The functional dependencies FD2 and FD3 make Ename,
Pname, and Plocation partially dependent on the primary key {Ssn, Pnumber} of
EMP_PROJ, thus violating the 2NF.
Therefore, the functional dependencies FD1, FD2, and FD3 in Figure 15.10(a) lead tothe
decomposition of EMP_PROJ into the three relation schemas EP1, EP2, and EP3shown in
Figure 15.10(a), each of which is in 2NF.
Third Normal Form:
Definition: A relation schema R is in 3NF if it satisfies 2NF and no nonprime
attribute of R is transitively dependent on the primary key.
We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas
ED1 and ED2shown in Figure 15.10(b). ED1 and ED2 represent independententity
facts about employees and departments. A NATURAL JOIN operation onED1 and
ED2 will recover the original relation EMP_DEPT.
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y
is not a candidate key.
Example:
For each subject, each student of that subject is taught only by one teacher.
teacher->subject
S Student
J Subject
T Teacher
Insertion anamoly: if a new faculty (say faculty 5) joins and no subject is assigned,
the faculty cannot be inerted as the prime attribute cannot be null.
Updation anamoly: if a student with id 789 is deleted, then Faculty 4 will get deleted.
This difficulty is caused by the fact that the attribute teacher is a determinant but
not a candidate key. Whenever a non prime attribute determines the one or more
prime attribute then the relation violates BCNF. The teacher(non-prime attribute)
The solution to the problem is to split or decompose the original relation by two
BCNF projections as below:
ST{student, teacher}
TJ{Teacher, subject}
Student Teacher
123 Faculty1
123 Faculty2
456 Faculty3
789 Faculty4
999 Faculty1
Teacher subject
Faculty1 Physics
Faculty2 Music
Faculty3 Biology
Faculty4 Physics
Faculty1 Physics
The anamolies can be overcome from the decompose of relation SJT into two
relations ST and TJ
MULTIVALUED DEPENDENCIES
Multivalued Dependency (MVD) represents a dependency between attributes
(for ex X,Y and Z) in a relation, such that for each value of X there is a set of
values for Y and a set of values for Z. However, the set of values for Y and Z
are independent of each other.
MVD is represented as
A tuple in this Emp relation represents the fact that an employee whose name is Ename
works on the project whose name is Pname and has a dependent whose name is Dname.
An employee may work on several projects and may have several dependents
and the employee‘s projects and the dependents are independent of one another. To keep
the relation state consistent we must have a separate tuple to represent every
combination of an employee‘s dependent and an employee‘s project.
Ename-->>Pname|Dname.
The Emp with Ename Smith works on projects with Pname X and Y and has 2
dependents with Dname ‗john‘ and ‗Anna‘. If we stored only the first two tuples in emp(<
‘smith’, ‘X’, ‘john’ > and < ‘smith’, ‘Y’, ‘Anna’>), we would incorrectly show
associations between project X, john and project Y, Anna. These should not be
conveyed, because no such meaning is intended in this relation.
Hence we must store the other 2 tuples (< ‘smith’, ‘X’, ‘Anna’>) and (<
‘smith’, ‘Y’, ‘john’>) to show that {X,Y} and {john,anna} are associated only
with smithie., there is no association between Pname and Dname which mean that the
two attributes are independent.
Y is a subset of X or
XUY=R
An MVD that does not satisfy the above condition is non trivial.
FOURTH NORMAL FORM:
A relation schema R is in 4NF with respect to a set of dependencies F, if
either they are trivial or if for every non trivial MVD x-->>y in F+, x is a
super key for R.
The Emp relation in the example is not in 4NF because in the non trivial
MVD‘s ename-->>pname and ename-->>dname,ename is not a super key
of emp.
Emp_proj Emp_dep
ename pname ename dname
Join dependency:
Join dependencies constrain the set of legal relations over a schema R to those
relations for which a given decomposition is a lossless-join decomposition.
for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F),
everyRi is a superkey of R.
Supply:
The natural join of all three produces the state of the original
relation.
R1 R2 R3
Sname Partnam Sname Projnam Partnam Projnam
e e e e
Smith Bolt Smith Proj x Bolt Proj x
Smith Nut Smith Proj y Nut Proj y
Adamsky Bolt Adamsky Proj y Bolt Proj y
Walton Nut Walton Proj z Nut Proj z
Adamsky Nail Adamsky Proj x Nail Proj x
The natural join of any two of these relations produces spurious tuples, but
applying natural join to all three together does not.
The natural join of all three produces the state of the original relation.
R1 R2 (R1 R2) R3
S5 30 S5 Delhi
b) SST STC
In case a, no information is lost; the SC and SST values still tells that the supplier S3 has
status 30 and city Chennai, and supplier S5 has status 30 and city Delhi. Thus, the first
decomposition is non loss.
The case a is lossless because if we join SST and SC, the original relation
suppliers is obtained. The case b is lossy because the join of SST and SC does not get
back the original relation suppliers.
Universal Relation Schema:A relation schema R = {A1, A2, …, An} that includes all
the attributes of the database.
Decomposition:
The process of decomposing the universal relation schema R into a set of relation
schemas D = {R1,R2, …, Rm} that will become the relational database schema by using the
functional dependencies.
Each attribute in R will appear in at least one relation schema Ri in the decomposition so
that no attributes are ―lost‖.
CO2, PO1,PO2,PO3
11. Part A Question & Answer
4 What are normal-forms? Explain the types of Normal form with CO2 K1
an example.
5 What are the pitfalls in the relational database design? With a CO2 K1
suitable example, explain the role of functional dependency in
the process of normalization.
6 Consider the relation R(A,B,C,D,E) with functional CO2 K1
dependencies. {A→BC, CD→E, B→D, E→A}. Identify super
keys. Find Fc, F+.
Explain Boyce-codd normal form with an example. Also state
how it differs from that of 3NF.
Finance Applications
15.CONTENT BEYOND SYLLABUS
It must have only one parent for each child node but parent nodes can have more
than one child. Multiple parents are not allowed. This is the major difference
between the hierarchical and network database model. The first node of the tree is
called the root node. When data needs to be retrieved then the whole tree is
traversed starting from the root node. This model represents one- to- many
relationships.
Let us see one example: Let us assume that we have a main directory which
contains other subdirectories. Each subdirectory contains more files and directories.
Each directory or file can be in one directory only i.e. it has only one parent.
15.CONTENT BEYOND SYLLABUS
Here A is the main directory i.e. the root node. B1 and B2 are their child or
subdirectories. B1 and B2 also have two children C1, C2 and C2, C3 respectively.
They may be directories or other files. This depicts one- to- many relationships.
The data in a hierarchical pattern must be accessed through a single path only.
Advantages
Referential integrity is always maintained i.e. any changes made in the parent table
are automatically updated in a child table.
High performance.
Disadvantages
If the parent table and child table are unrelated then adding a new entry in the child
table is difficult because additional entry must be added in the parent table.
Pointer: Pointers are used for linking records that tell which is a parent and which
child record is.
Disk input and output is minimized: Parent and child records are placed or stored
close to each other on the storage device which minimizes the hard disk input and
output.
Fast navigation: As parent and child are stored close to each other so access time
is reduced and navigation becomes faster.
Examples
Let us take an example of college students who take different courses. A course can
be assigned to an only single student but a student can take as many courses as
they want therefore following one to many relationships.
Now we can represent the above hierarchical model as relational tables as shown
below:
Student
Course
16. ASSESSMENT SCHEDULE
Assessment 1 20.9.2021
Assessment 2 22.10.2021
TEXT BOOKS:
REFERENCES:
2. Plunkett T., B. Macdonald, ―Oracle Big Data Hand Book‖ , McGraw Hill,
First Edition, 2013
Design of Relational schema for the following and also apply normalization. The
topics are not limited to listed as below. Students can choose topics of their interest
1) Blood bank management system
Hospitals will get register to request the blood they want. And some donors will get signup to
this blood bank to donate the blood. These donors will be available to donate in the particular
areas according to the registered data. The hospitals will request for the blood and blood bank
will provide the details of donors near to the hospital. Blood bank also shows the availability of
blood groups to the hospitals. We can also maintain the data of donated blood to the hospitals.
4) Railway system
Users can book the train tickets to reach their destination. n this option includes things like the
present station and destination station and the train that they want to travel in and provide the
user to check the details of the train by using the train id and it must also show the details of
train arrival time, in which platform the train is arriving and departure timings of the train. also
add an option in which that will allow the user to book a meal while traveling on the train. And
we can also add the option which shows the price range of a different class of booking like AC,
second class, sleeper, and others. And try to think yourself to add any options.
5) Hospital Data Management
assign unique IDs to the patients and store the relevant information under the same. add the
patient‘s name, personal details, contact number, disease name, and the treatment the patient is
going through. mention under which hospital department the patient is (such as cardiac, gastro,
etc.). add information about the hospital‘s doctors. A doctor can treat multiple patients, and
he/she would have a unique ID as well. Doctors would also be classified in different
departments. add the information of ward boys and nurses working in the hospital and assigned
to different rooms. Patients would get admitted into rooms, so add that information in your
database too.
Thankyou
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
reliance on the contents of this information is strictly prohibited.