You are on page 1of 71

www.jntuworld.

com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd d a ta W a re housing

Lecture 1.
What do you mean by Data and Database?

Data can be divided into three categories.

Raw data – this could be “85” – doesn’t have meaning when it stands alone. It might mean
something if you knew it was weight of a man in Kilograms.

Related raw data is a group (data set or data file) of organized raw data that can be tied
together. For example, it could be a group of Names, weights, blood group and identification
numbers, all tied to the Identity cards issued to patients at hospitals

Cleaned raw data is all the above after being validated or processed through some process.
Such a process might ensure that blood groups doesn’t have any value as “red” or “black” for
example only allowed values could be of the kind A,A+,B,B+ etc.

Data can be acquired from many different sources. It must always be evaluated as to which
category it belongs, and if it needs any additional validation before analysis that produces
information.

Database:

A database consists of an organized collection of interrelated data for one or more uses,
typically in digital form.

Examples of databases could be: Database for Educational Institute or a Bank, Library, Railway
Reservation system etc.

What Is a DBMS?

? Consists of two things- a Database and a set of programs.


? Database is a very large, integrated collection of data.
? The set of programs are used to Access and Process the database.
? So DBMS can be defined as the software package designed to store and manage or
process the database.
? Management of data involves
o Definition of structures for the storage of information
o Methods to manipulate information
o Safety of the information stored despite system crashes.
? Database models real-world enterprise by entities and relationships.
o Entities (e.g., students, courses, class, subject)

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd d a ta W a re housing

o Relationships (e.g., Arjun studies in Class -EEE VII)


File System

S et ofprogram s File
S ystem

? Data is stored in Different Files in forms of Records


? The programs are written time to time as per the requirement to manipulate the data
within files.
o A program to debit and credit an account
o A program to find the balance of an account
o A program to generate monthly statements

Disadvantages of File system over DBMS

Most explicit and major disadvantages of file system when compared to database management
system are as follows:
? Data Redundancy- The files are created in the file system as and when required by an
enterprise over its growth path. So in that case the repetition of information about an
entity cannot be avoided.
Eg. The addresses of customers will be present in the file maintaining information
about customers holding savings account and also the address of the customers will be
present in file maintaining the current account. Even when same customer have a saving
account and current account his address will be present at two places.
? Data Inconsistency: Data redundancy leads to greater problem than just wasting the
storage i.e. it may lead to inconsistent data. Same data which has been repeated at several
places may not match after it has been updated at some places.
For example: Suppose the customer requests to change the address for his account in
the Bank and the Program is executed to update the saving bank account file only but his
current bank account file is not updated. Afterwards the addresses of the same customer
present in saving bank account file and current bank account file will not match.
Moreover there will be no way to find out which address is latest out of these two.
? Difficulty in Accessing Data: For generating ad hoc reports the programs will not already
be present and only options present will to write a new program to generate requested
report or to work manually. This is going to take impractical time and will be more
expensive.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd d a ta W a re housing

For example: Suppose all of sudden the administrator gets a request to generate a list
of all the customers holding the saving banks account who lives in particular locality of
the city. Administrator will not have any program already written to generate that list but
say he has a program which can generate a list of all the customers holding the savings
account. Then he can either provide the information by going thru the list manually to
select the customers living in the particular locality or he can write a new program to
generate the new list. Both of these ways will take large time which would generally be
impractical.
? Data Isolation: Since the data files are created at different times and supposedly by
different people the structures of different files generally will not match. The data will be
scattered in different files for a particular entity. So it will be difficult to obtain
appropriate data.
For example: Suppose the Address in Saving Account file have fields: Add line1, Add
line2, City, State, Pin while the fields in address of Current account are: House No.,
Street No., Locality, City, State, Pin. Administrator is asked to provide the list of
customers living in a particular locality. Providing consolidated list of all the customers
will require looking in both files. But they both have different way of storing the address.
Writing a program to generate such a list will be difficult.
? Integrity Problems: All the consistency constraints have to be applied to database through
appropriate checks in the coded programs. This is very difficult when number such
constraint is very large.
For example: An account should not have balance less than Rs. 500. To enforce this
constraint appropriate check should be added in the program which add a record and the
program which withdraw from an account. Suppose later on this amount limit is
increased then all those check should be updated to avoid inconsistency. These time to
time changes in the programs will be great headache for the administrator.
? Security and access control: Database should be protected from unauthorized users.
Every user should not be allowed to access every data. Since application programs are
added to the system
For example: The Payroll Personnel in a bank should not be allowed to access
accounts information of the customers.
? Concurrency Problems: When more than one users are allowed to process the database.
If in that environment two or more users try to update a shared data element at about the
same time then it may result into inconsistent data.
For example: Suppose Balance of an account is Rs. 500. And User A and B try to
withdraw Rs 100 and Rs 50 respectively at almost the same time using the Update
process.
Update:
1. Read the balance amount.
2. Subtract the withdrawn amount from balance.
3. Write updated Balance value.
Suppose A performs Step 1 and 2 on the balance amount i.e it reads 500 and subtract 100
from it. But at the same time B withdraws Rs 50 and he performs the Update process and
he also reads the balance as 500 subtract 50 and writes back 450. User A will also write
his updated Balance amount as 400. They may update the Balance value in any order

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd d a ta W a re housing

depending on various reasons concerning to system being used by both of the users. So
finally the balance will be either equal to 400 or 450. Both of these values are wrong for
the updated balance and so now the balance amount is having inconsistent value forever.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd d a ta W a re housing

Lecture 2
Why Use a DBMS?
? Data independence and efficient access.
? Reduced application development time.
? Data integrity and security.
? Uniform data administration.
? Concurrent access, recovery from crashes.

Role of DBMS:

The earlier Information system will work as follows:

S et ofprogram s File
U sers S ystem Disk

While the DBMS will be another layer of software package placed between the file system and
set of application programs. The Role of DBMS can described by the following diagram at a
very high level.
A pplication P rogram s

DBM S

File S ystem

Disk

Role of DBMS

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd d a ta W a re housing

Instances and Schema:

Database schema: Overall design of the database. An analogy to the programming language
could be the definition of various variables with their data types. In case of relational database
management system the definition of table names, and their fields with data types will be the
database schema.

Database Instance: The collection of information stored in the database at a particular moment is
called database instance. An analogy to the programming languages would be the values stored
in the variables during the execution of programs. In case of relational database management
system the data stored in various tables at a particular time is the instance of the database.

Data Abstraction: Three-Level Architecture of DBMS:


Since many of the database system users are not computer trained, developers hide the
complexity from users through several level of abstraction, to simplify user’s interaction with the
system:
? Physical Level:
o Lowest level of abstraction
o Describes ‘how’the data are actually stored
o Complex low-level data structures are defined by system programs which are
generally hidden from high level computer programs also.
o In the case of relational database management systems the files and indexes used
are described at physical level of abstraction.
o It is similar as a programming language hides exact way of storing the values
defined by variables or records or arrays. Thus defining exact way of storing a
record or an array defined by suppose C language will be called physical level of
abstraction.
o Physical schema is used at the physical level of abstraction.
? Logical Level:
o Describes what data are stored in the database and what relationships exist among
those data.
o Entire database is represented in simple structure which may be specified by very
complex structures at physical level.
o In the case of relational database management systems definitions of Tables and
their fields are defined at logical level of abstraction.
o An analogy with programming language for logical level of data abstraction is the
definitions of record structures or arrays in a programming language (say C).
o Logical Schema is used at logical level of abstraction.
? View Level:
o Describes only part of the entire database.
o Many users will not be concerned with all the information stored in a database
o System may provide several views for the several type of users of the database
which will show only the concerned part of the database.
o View schema is used at view level of abstraction.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd d a ta W a re housing

View level

View 1 View 2 - - - -- - View n

- - --

L ogicalL evel

P hysicalL evel

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Lecture 3

Data Independence:

Physical Data Independence:


? allow changes in physical schema without changes in logical schema or application
programs to be rewritten.
? The changes in physical schema can include: using new storage devices, using different
data structures, using different file organizations or storage structures or changing file
index. All these changes should be possible without changes in logical schema or
application programs to be rewritten.
Logical Data Independence:
? allow changes in logical schema without causing application programs to be rewritten.
? The changes like addition or deletion of entities, attributes or relationships come in
logical schema changes and they should be possible without rewriting the already written
application programs.

Major components of a DBMS:


1. Data Definition Language Interpreter/ Compiler
2. Data Manipulation Language Compiler
3. Query processor
4. Database Manager

Data Definition Language (DDL):

? This language provides a set of commands which can be used to define


o what is the data in database.
o what is the relationship between various data elements
o what are the integrity constraints put on various data items needed to be satisfied
o etc.

? It will be used to define the records or structures of database.

? The DDL statements are compiled to form the Data Dictionary or Data Directory which
contains the meta data i.e. data about data.

? The data dictionary is consulted by DBMS before any operation on data.

Data Manipulation Language(DML):

? It is a language that enables users access or manipulate data from the database.
? This consists of very high level statements that are used to specify the operations to be
performed on the database.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Query Language:

It is the portion of DML that is used to access or retrieve the information from the database.

Database Manager:

? This is the software that takes care for execution of all the statements specified in DDL or
DML. This software handles all the problems of a database and is responsible for
providing all of the features claimed above like data consistency, non-redundant data,
atomicity, concurrency control, easy access to data etc.
? It may be subdivided into two major components:
o Transaction Manager
o Storage Manager

A Transaction is a collection of operations that performs a single logical function in a database


application. Transaction manager takes care for identifying the transaction and their proper
execution. It is responsible to provide features like atomicity, concurrency control etc.

Storage Manager is responsible for the interaction with the file system and provides an
appropriate level of physical level of data abstraction. It is responsible to provide easy access to
database to the users.

The overall system structure of the database management system could be shown as below:

Application Application Q uery DB S chem e


Interface P rogram s

DM L Q uery DDL
Com piler P rocessor Com piler

O bject Database DBM S


Code M anager

File Data DIS K


Data
M anager Dictionary

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Types Of Users:

? DBA: Person who designs the database and writes database schema in DDL based on the
design
? Sophisticated Users: People who know DML commands and operate on database
directly.
? Application Programmers: People who operate on the database through the application
programs usually written in some high level computer language like C, Java, VB etc.
? Naïve Users: People who executes the application programs through APIs written
specifically for their requirements. They are generally not aware of the computer
technology e.g. tellers, agents, registrars, librarian etc.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Lecture 4
Data Models
? A data model is a collection of concepts for describing data, data relationships, data
semantics and consistency constraints.
? A schema is a description of a particular collection of data, using a given data model.
? Primary categories for various data models are:
o Object-based logical models
? Provide very high level design of the database
? Provide flexible structuring capabilities. The most popular ones are as
follows:
? The Entity-Relationship model
? The object oriented model
? The semantic model
? The functional model
o Record-based logical models
? Provide more implementation based design
? Specify overall logical structure of the database and provide high level
description of the implementation
? The most popular ones are as follows:
? Relational Model
? Network Model
? Hierarchical Model
o Physical models
? Describe data at the lowest level
? Captures aspects of database-system implementation
? Widely known are unifying model and frame-memory model
Entity Relationship model (E-R Model):
? Identifies basic elements, or objects, or entities which are core to the data base
? Consider for example the Library database. Most basic entities of a library can be
identified as books and users. There are other basic entities like suppliers, magazines,
journals etc..
? We describe the database generally be diagrams called E-R diagrams when using ER
Model.
? The sample E-R diagram for the above mentioned simple library database having only
two entities Books and Users can be formed as follows.
? All of the entities of type book will be represented by an entity set.(represented by
rectangle)
BO O K

? Similarly we represent all users by an entity set


U S ER

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

? We identify what are various attributes that describes the entities of an entity set. A book
is described in library by its Accession Number, Call Number, Title, Author, Publisher,
Year of publication etc. They are attached to entities as Ellipses as shown below:

Acc.N o. Yr.ofP ub.

BO O K

P ublisher
CallN o..

T itle Author

? Similarly we associate attributes which defines a user in library to the respective entity in
E-R diagram as shown below

Card.N o.

U S ER

N am e

U serT ype U serID

? Apart from entities E-R model describes the relationships between the entities. They are
again seen as relationship sets existing between Entity sets. For example a user can
borrow a book from library. All of those relationships between any book of library to any
user are represented by a relationship set. We can name it as ‘borrowed by’ relationship
set. Borrowed By relationship can again have its own attributes which exists only when a
relationship exists. For example ‘Date of Issue’ exists only when a book has been
borrowed by a particular user it is neither the attribute of Book nor of User. We represent
the relationships by diamond in E-R diagrams as shown below:

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Acc.N o.
Card.N o.
BO O K Borr. U S ER
CallN o.. By
U serID

P ublisher
U serT ype
Date ofissue
Yr.ofP ub. N am e

T itle Author

Relational Model:

? Both the data and relationships are represented by tables


? So it is closer to implementation of the ER diagrams designed in first phase of modeling.
? The entities BOOK and USER can be represented by respective tables where columns of
the table represent the attributes of the entity. Every Column has unique column name
corresponding to its attribute name. The values are filled into the table for different
entities of this set.

Book Table:
Acc. No. Call No. Title Author Publisher Yr. of
Publication
312 245 Database System Silberschatz, McGrawHill 1997
Concepts Korth, Sudarshan
433 23 Fundamental of Elmasri, Navathe Addison Wesley 1999
Database Systems

User Table:
Card No. Name User Type User ID
422 Abhishek Student 0706412234
4322 Mr. Lalit Faculty 23456789

? The relationships can also be represented by tables. They include only those attributes of
the related entities which are sufficient to identify them uniquely and possibly attributes
which are specific to the relation

Borrowed By Table:
Book Acc. No. User Card No. Date of Issue
312 422 03/08/2010
433 4322 05/08/2010

? The above relationship table shows that User Abhishek has borrowed ‘Database System
Concepts’ by ‘Silberschatz, Korth, Sudarshan’ and user ‘Mr. Lalit’ has borrowed
‘Fundamental of Database Systems’ by ‘Elmasri, Navathe’ from library on 03/08/2010
and 05/08/2010 respectively.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Network Model:
? Data in Network Model are represented by collection of records.
? Relationships between data are represented by links or pointers.
Book1 User1
312 245 Database … * 422 Abhishek Student … *
System
Concepts

* * *
Book2
433 23 Fundamental … *
of Database 4322 Mr. Lalit Faculty … *
Systems
Book3
434 24 Fundamental … *
of Database
Systems
? The above diagram shows 5 data records each having the several data values for
corresponding attributes and an extra field marked as ‘*’which is used for link or pointer.
? Whenever a relationship exists between two data elements that is explicitly shown by
using the pointers. So relationship of BOOK1 and BOOK2 with USER1 is shown by
pointers in BOOK1 and BOOK2 records. Similarly the USER1 is related to these books
and they can be shown by circular linked list. This list contains only pointers. One such
list is pointed by the link field of USER1 which in turn contains list of pointers to all of
the books which are borrowed by USER1 and last pointer points back to USER1.
? So the combination of data records and links can be used in any way to form the network
of data as per the convenience of designers and programmers.

Hierarchical Model:
? This one is very similar to Network Model in terms that it also uses records and links to
represent data and relationships respectively.
? The difference is that Network Model forms a network or graph of data and connections
while Hierarchical model forms only trees which doesn’t allow cycles.
? The data elements present in the model have parent-child relationships. Where the data
nodes which are pointing are called parents and those nodes which are pointed by their
parents are called child.
? Any child data node cannot be pointed by two different parents.
? For example if we put Books as parents and Users as children then two books cannot
point to same user record. In that case we will have to replicate the record of user for
each book
Book1 Book2
312 245 Database … * 433 23 Fundamental … *
System of Database
Concepts Systems
434 24 Fundamental … *
of Database
Systems

User1 User1 User 2


422 Abhishek Student … * 422 Abhishek Student … * 4322 Mr. Lalit Faculty … *

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Lecture 5
Entity-Relationship Model

Various symbols used in the E-R diagrams are as follows:

Symbol Meaning

Entity Type

Weak Entity Type

Relationship Type

Identifying Relationship ( for weak entity)

Attribute

Key Attribute

Multivalued Attribute

Composite Attribute

Derived Attribute

E1 R E2 Total Participation of E2 in R

E1 R E2 One to Many relationship between entity sets E1 to E2

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Description of some symbols used in E-R diagrams:

? An Entity is a basic element of a system identified by a set of attributes and has


independent existence e.g. a Student, a faculty, a subject etc. in a college. An Entity type
defines a set of entities that have the same attributes.
? A relationship type R among n entity types E1,E2,… ,En defines a set of associations
among these types.
? The Entity types which do not have key attributes of their own are called Weak Entity
Types. For example: Entity Type ‘Dependent’ , related to an ‘Employee’. Which have
attributes ‘Dependent name’, ‘Birth Date’, ‘Sex’ and ‘Relationship’. Two dependents of
distinct employees may have the same values for all these attributes but they will still be
distinct entities as they are linked to different Employee.
? Entities belonging to a weak entity type are identified by being related to specific entities
from another entity type in combination with some of their attribute values. This other
entity type is called identifying owner and the relationship type that relates the weak
entity type to its owner is called identifying relationship type. For example in the above
example the ‘Dependent’ is weak entity whose owner is ‘Employee’ and the relationship
between ‘Dependent’and ‘Employee’will be a identifying relationship type.
? The attributes which are keys to identify an entity in an entity set uniquely are called key
attributes. For example attribute ‘Roll no.’ for an entity type ‘Student’is a key attribute.
? Some attributes may have many values at the same time for an entity of entity set. For
example the attribute ‘College Degrees’ which lists the name of degrees obtained by a
person. It may have one value for a person but may have more than one value for others.
Such attributes are called multi-valued attributes.
? The attributes which are formed be combining smaller subparts are called composite
attributes. For example an ‘Address’ attribute is formed by several sub parts like
‘Street’, ‘City’, ‘State’and ‘Pin code’.
? The attributes which need not be stored with the entities and can be calculated from the
values of other attributes which are stored are called derived attributes. For example if
we have an attribute ‘Birth Date’ storing date of birth of a person then the attribute ‘Age’
need not be stored as it can be calculated whenever we access that entity. So ‘Age’ is a
derived attribute.
? When all the entities of an entity set/ entity type have to participate in a particular
relationship set/type, it is called total participation. Weak entities have total
participation in identifying relationships with their identifying owners.

Degree of Relationship Set:


? The number of entity sets participating in a relationship is the degree of relationship
? Example of Unary Relationship. Only one entity set participates in the relationship.
M anager
Em ployee M anages
?
S ubordinate

? Example of Binary relationship. Two entity sets participate in a relationship.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Book Borrow ed U ser


By

? Example of ternary relationship. Three entity types participate in a relationship.

Account O w ns Branch

Custom er

? N-ary relationship associates n entity sets.

Mapping Constraints:
? Two most important type of mapping constraints are Mapping cardinalities and Existence
Dependencies.
? For a binary relationship set R between entity sets A and B, The mapping cardinalities
must be one of the following:
o One to One: An entity in A is associated to at most one entity in B and vice-versa.
o One to Many: An entity in A is associated with any number of entities in B but an
entity in B is associated to at most one entity in A.
o Many to One: An entity in A is associated to only one entity in B while an entity
in B is associated to any number of entities in A.
o Many to Many: An entity in A is related to any number of entities in B and vice-
versa.

O ne to O ne O ne to M any

M any to O ne M any to M any

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

? If the existence of entity x depends on the existence of entity y, then x is said to be


existence dependent on y. If y is deleted so is x. y is dominant entity and x is
subordinate entity. Example a payment entity is dependent on the loan entity. Payment
are identifies by payment number, payment date, and payment amount.
? Weak entities are existent dependent on their identifying owner. But every dependent
entity may not be a weak entity. In the above example of loan and payment entities
payment entity is having the payment number as unique key.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Lecture 6
Keys:

? Super Key: set of one or more attributes, when taken collectively, can identify uniquely
an entity in the entity set. There can be more than one super key of an Entity set.
? Candidate Key: The minimal super key is a candidate key i.e those super keys of the
entity set, who doesn’t have any subset which are also a super key are called candidate
keys. There can be more than one candidate key of an Entity set.
? Primary Key: A candidate key which used as the key by database administrator while
implementing the database management system is called primary key of the entity set.
? Example: Consider an Entity Set named “Student” which has following set of attributes:
1. Student ID
2. Roll Number
3. Name
4. Father’s Name
5. DOB
6. Address
? Various entities of the above entity set will have the values for all the fields. But no two
entities i.e. no two students will have same values for all the six attributes. So one of the
super key is set containing following attributes:
Super Key1: (Student ID, Roll Number, Name, Father’s Name, DOB, Address)
Also Super Key2: (Student ID, Roll Number, Name, Father’s Name, DOB) will not have
same values for any two students in “Student” entity set. Similarly other super key of the
above entity set are following:
Super Key3: (Student ID, Roll Number, Name, Father’s Name)
Super Key4: (Student ID, Roll Number, Name)
Super Key5: (Student ID, Roll Number)
Super Key6: (Student ID)
Also Roll number is unique for a student so
Super Key7: (Roll Number)
All of them are sufficient to identify a particular student entity in the Entity Set of all
students.

The Super Key6 and Super Key7 is the candidate key of this entity set as no subset of any
of them is a super key. They are minimal subsets which are super keys.
Candidate Key1: (Student ID)
Candidate Key2: (Roll Number)

Any one of the candidate keys can be used as primary key. So


Primary Key: Either (Student ID) Or (Roll Number)
? The Primary Key of the many-to-many Relationship set is formed by including the
primary keys of all participating Entity sets.
? The Primary Key of the many-to-one and one-to-many Relationship set is formed by
including only the primary key of entity set from which many entities takes part in the
association to one of other participating entity set. E.g. In case of “Borrowed By”

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

relationship between entities “Book” and “User” which is a many-to-one relationship in


the sense many books may be borrowed by one user but one book cannot be borrowed by
many users. So for “Borrowed By” relationship the primary key will only contain the
Primary key of Entity Set “Book”.
? In case of One-to-One type the primary key of any one of the participating entity set is
used as primary key of the relationship.

Weak Entity Sets:


? Example: Consider an entity set “LOAN” having attributes (Loan Number, Loan
Amount, Customer ID) containing all the loans taken. The customer after taking a loan
pay loan in installments. Consider the entity set “PAYMENT” representing all the
payments made against all loans taken. The attributes of the “PAYMENT” are (Payment
Number, Payment Date, Payment Amount). The Payment number refers to the number of
payment made by the customer against a particular loan. The first payment made against
all the loans will have payment number as 1. Second payment for any loan will have
payment number as 2 and so on.
We can see that there can be two entities in “PAYMENT” which have values for all
attributes same and still they are two different entities. E.g. Suppose there are two loans
having following values
Loan 1: (Loan Number= 1, Loan Amount= Rs. 2000, Customer ID = A)
Loan 2: (Loan Number= 2, Loan Amount= Rs. 3000, Customer ID = B)
Now Customer A makes his first payment on date 03/03/2010 of amount Rs100 and by
chance B also makes his first payment on same date of same amount. So the two payment
entities will have following values:
Payment by A: (Payment Number=1, date= 03/03/2010, Payment amount=Rs 100)
Payment by B: (Payment Number=1, date= 03/03/2010, Payment amount=Rs 100)
It means that no set attributes of “PAYMENT” will have unique values. So the entity set
“Payment” will not have any key. Above mentioned two payments actually refers to
different payments because they are made against different loan but in the “Payment”
they doesn’t have any difference.
? Such Entity Sets which doesn’t have sufficient attributes to form a primary key are
called Weak Entity Sets.
We should have relationships defined for weak entities which associate them with a
strong entity in order to identify the different entities of these Entity Sets. Each entity of
“Payment” is actually linked to some loan so the entity set “Payment” is dependent and
“Loan” is Owner. The relationship “Loan-Payment” between “Loan” and “Payment”
which associates loans with their payments is must to identify each payment in
“Payment”.
? Such a relationship which links dependent entities of a weak entity set to their
owners in strong entity set to identify the weak entities is called identifying
relationship. Shown as doubly outlined diamond in ER Diagrams.
Also Every Entity of a weak entity existence dependent on some entity of their owners.
So every entity of a weak entity set must participate in the identifying relationship.
? Weak entity sets must have total participation in the identifying relationships.
Shown by double lines.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

? The following diagram represents the above described entity sets and relationships by an
E-R Diagram.
L oan N o.
P aym ent N o.

A m ount
P aym ent L oan - L oan
Date P aym ent

Cust.Id.
A m ount

? The Weak entity sets may also be modeled as a multivalued, composite attribute of the
owner entity set. Modeling as multivalued, composite attribute will be appropriate when
weak entity set participate only in the identifying relationship and number of attributes
are less. Otherwise modeling as Weak entity set will be more appropriate.
? A Weak Entity set may have several entities which have same values of all the attributes
provided they are related to different strong owner entity. But all the weak entities related
to a particular strong owner entity must be distinguishable. The set of attributes which
allows making a distinction between the weak entities related to a particular strong entity
is called the partial key or discriminator of the weak entity set.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Lecture 7
Extended E-R Features:
? Specialization: It is the process of identifying the subclass/es of an entity set which are
special from other entities of this set in terms of attributes or relationships they make.
Consider the design of a database for an academic institution. While designing we
identified an entity set “Employee” which represent all the employees of this. The
attributes of this may be (Employee ID, Employee Name, Date of Joining, Address) but
then we see a subclass of this set of employees, called set of all “Teachers” which is
different from other employees. We may have other employees in the sub classes called
“Admin Staff” or “Technical Lab Staff” or “Other Staff” like peons and other workers.
All of these subclasses form specialization of class “Employee”. All the attributes and
associations of class “Employee” will also be there with all the subclasses. Every teacher
will have an employee ID, Name, DOJ, Address similarly the admin staff, technical lab
staff and other staff members. Sub classes may have some attributes or associations of
their own which make them different from others. The Teachers will have a subject they
teach, Department they belong to, Expertise they have. The teachers will have
associations with different entities like “Classes” they teach in, “Projects” they guide etc.
These attributes and associations will not be there with other employees sub sets
? Generalization: This is again a similar process as specialization but it is just opposite of
that. It is the process of combining the subclasses into a general class and moving the
common attributes and associations from subclasses to the general class. It is just the
different practical approach. In specialization we start from the general classes and forms
the special classes out of them while in generalization we start from various low level
classes and forms the general classes by combining several of them identifying the
common features in them. So in the above example of Academic Institute we may start
thinking of “Teachers” as an Entity set and then “Technical Lab Staff” and Then “Admin
staff” and then observing that they have several fields in common like “Employee Id”,
“name”, “ Address” etc. we combine them to define a general class “Employee” which
will only those attributes which are common in all the above three classes. And then these
classes will not have these common fields rather they will be there for all employees
collectively in the general entity set “Employee”.
? The result of both the above process is same. We get a hierarchy of classes and
subclasses which can be represented by a tree structure. The result of generalization and
specialization will be like this.
N am e Address

EM P ID Em ployee DO J

ISA

T eacher Adm in S taff L ab S taff O therS taff

S ubject Deptt. R ole L ab N am e

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

? Aggregation: Consider a “Borrower” relationship set that associates customers from


“Customer” entity set to the loans they borrowed in entity set “Loan”. Suppose the bank
decides to attach an employee to some customer-loan relationships of “Borrower” based
on probably the size of loan or status of customer. This employee called the loan officer
will be responsible for tracking up and following up the status of the loan time to time.
This suggests a relationship that exists between “Customer”, “Borrower” and
“Employee”. So we can draw the simple ERD as follows:

Cust-nam e
L oan N um ber

Cust-ID Custom er Borrow er L oan


A m ount

Address
L oan-officer

Em ployee

Em p-ID Address
Em p-nam e

o The diagram above may imply that the relationships Borrower and Loan-Officer
may be combined into one. But then it will require that a loan-officer must be
combined to every Customer-loan pair, which is not true.
o The above diagram also have redundancy as every customer-loan pair in “Loan-
officer” is also in “Borrower”.
? More appropriate way of representing the above set of relationships would be to consider
the entire relationship Borrower with its associated entities Customer and Loan as an
entity i.e. an aggregate entity, and then representing relationship Loan-Officer between
the entities “Employee” and the above aggregated entity. As follows:

Cust-nam e L oan N um ber

Cust-ID Custom er Borrow er L oan


A m ount
Address

L oan-officer

Em ployee
Em p-ID Address
Em p-nam e

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

? Important Terms related to generalization and specialization:


o Attribute inheritance: All the attributes of higher level entity sets are inherited by
the respective lower level entity sets. Also the relationships in which higher level
entity sets participates all the respective lower level entity level entity sets also
participates.
o Disjoint: The lower level entity sets corresponding to a higher level entity sets are
called disjoint if an entity doesn’t belong to more than one lower level entity sets.
o Overlapping: When the same entity may belong to more than one lower level
entity set.
o Complete Generalization: When each higher level entity belongs to at least one
lower-level entity set.
o Partial Generalization: When some higher-level entities may not belong to any
lower-level entity set.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Lecture 8
Reduction of ER Schema to Tables:
? Strong Entity Sets: These are the entity sets for which we have a set of attributes which
are called primary key (or simply a key). To represent such an entity set in a form of table
we will have a column in a table for each attribute of the entity schema. Each entity of
that entity set will be represented by a row in table having values for each attribute.
For Example- the entity set BOOK as referred earlier has following attributes: Acc. No.,
Call No., Title, Author, Publisher and Yr of Publication. Also consider that we have
only two books in the library- a book on “Database System Concepts” by Silbershatz,
Korth and Sudarshan published in 1997 having accession number as 312 and Call no. as
245 and another book on “Fundament of Database Systems” by Elmasri and Navathe
published in year 1999 having accession no. as 433 and call no. as 23. Then entity set
BOOK will be represented in tabular form as follows:
Acc. No. Call No. Title Author Publisher Yr. of
Publication
312 245 Database System Silberschatz, McGrawHill 1997
Concepts Korth, Sudarshan
433 23 Fundamental of Elmasri, Navathe Addison Wesley 1999
Database Systems

? Weak Entity Sets: These are the sets where we cannot identify the different entities only
looking at their attributes we should be able to establish a link between a weak entity and
some of the entity from another strong entity set which is called owner of the weak entity
set. Such entity sets when represented in a tabular form will have a column for key
attributes of owner apart from other columns for the attributes of the entity set.
For Example: Consider the PAYMENT entity set which is a weak entity set dependent on
its owner entity set LOAN. LOAN has Loan No. and Loan Amount as its attributes with
Loan No. as the key attribute and PAYMENT has Payment No., Payment Amount and
Payment Date as the set of attributes (no key as it’s a weak entity set but Payment No. is
a partial Key). Consider the following table is there for LOAN:

Loan No. Loan


Amount
L-1 Rs. 10000
L-2 Rs. 40000

Also consider that a payment of Rs 100 is made for L-1 on 22/08/2010 as its first
payment and a payment of Rs 300 is made for L-2 on 25/08/2010 as its first payment then
table corresponding to entity set PAYMENT will look like this:

Loan No. Payment No. Payment Payment


Amount Date
L-1 1 Rs. 100 22/08/2010
L-2 1 Rs. 300 25/08/2010

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Notice: We have included Loan No. as a column even though it was not an attribute of
entity set PAYMENT because it is the Key attribute of the owner entity set LOAN. Loan
No. and Payment No. in combination forms the primary key of this table.
? Relationships: To represent a relationship of an ERD in tabular form we have a column
corresponding to key attributes of each of the participating entity with a column for each
attribute which is directly associated to the relationship set only.
For example: We have defined earlier the relationship BORROWED BY which exists
between the entity sets BOOK and USER. It has an attribute Date of Issue directly
associated to it. The tabular representation of BORROWED BY will have a column for
Acc. No. a column corresponding key of BOOK and a column for Card No.
corresponding to key of USER and a column for DOI corresponding to the attribute of
relationship set. The table may look like this where rows represents all the borrowings
which are there in the library:
Book Acc. No. User Card No. Date of Issue
312 422 03/08/2010
433 4322 05/08/2010
? Existentially Dependent Entity Sets: Since the existence of all the entities of a
dependent entity set depends on the existence of some entity of its owner. We may
remove the table representing the relationship that is there between an existentially
dependent entity and its dominant entity by adding the Column in table for dependent
entity corresponding to key of dominant entity.
For Example: ACCOUNT having attributes Account No.(key) and Balance is
existentially dependent on BRANCH having attributes Branch Id(key) and Address. So
the table for relationship BRANCH-ACCOUNT which associates the accounts to
branches may be removed by just adding a column naming Branch Id in table
representing ACCOUNT. The Table will have following columns:

Account No. Balance Branch Id

? Identifying Relationship Sets: These are the relationship sets represented as doubly
outlined diamonds in ERD which form an associate a weak entity set to its owner. Since
we have already included the Primary key of Strong owner Entity set in the table of weak
entity set so we do not require a separate table to represent the identifying relationships.
? Multivalued Attributes: The attributes of an entity which can have more than one value
is called multi valued attribute. They are marked by doubly outlined ovals in ERD.
For example: Consider “Dependants” an attribute of an EMPLOYEE. Since there may be
more than one dependent of an employee we will represent this as a multivalued attribute
of an EMPLOYEE. But if we represent it as a column in the table for the entity set we
will not be able to put all of the values for a row. A multivalued attribute is represented as
a different table similar to the weak entities where you will have a column corresponding
to the primary key of the entity and a column corresponding to each sub attribute of
multivalued attribute(there will be sub attributes in case multivalued attribute itself is a
composite attribute). We show the ERD and corresponding Table for such an analogy

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

EM P N am e
EM P ID
EM P L O YEE

Dependent

Dependent N o.
N am e R elation

Employee Table
EMPID EMPName
EMP-1 Rajan
EMP-2 Sartaj

Dependant Table
EMPID Dependent No. Name Relation
EMP-1 1 Shashi Mother
EMP-1 2 Rachna Spouse
EMP-2 1 Shahina Spouse
EMP-2 2 Rehman Son

Here we see that Rajan is having dependents his mother and his wife and Sartaj also
having two dependents Shahina his wife and Rehman his son.

? Generalization: Consider following Generalization example:


AccN o ACCO U N T Balance

ISA

S A VIN G CU R R EN T

InterestR ate O verdraftA m ount

It shows a general class of entities ACCOUNT which two special classes SAVING and
CURRENT referring to savings bank account and current bank account.
It can be represented in the following way
1. Tables for general case:
ACCOUNT
AccNo Balance

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

SAVING:
AccNo InterestRate

CURRENT
AccNo OverdraftAmount

2. Tables when generalization is disjoint and complete


Above case has both properties Disjoint and Complete. No account can both be saving
and current account and Every account has to be either Saving Account or Current
Account.In such case we my club the Account table into its child tables.We will have
only two tables as follows
SAVING:
AccNo Balance InterestRate

CURRENT:
AccNo Balance OverdraftAmount

? Aggregation:In case of ERD given earlier referring LOAN, CUSTOMER,


BORROWER, EMPLOYEE and LOAN-OFFICER

Cust-nam e L oan N um ber

Cust-ID Custom er Borrow er L oan


A m ount
Address

L oan-officer

Em ployee
Em p-ID Address
Em p-nam e

we can have the following tables


o Loan: with attributes LoanNumber and Amount.
o Customer: with attributes Cust-Name,Cust-ID, Address.
o Borrower: with attributes Cust-ID and LoanNumber.
o Employee: with attributes Emp-ID, Emp-Name, Address.
o LoanOfficer: with attributes Cust-ID, LoanNumber and Emp-ID.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture 9
Relational Model
 A relational database consists of a collection of tables, each of which is assigned a unique
name.
 A row in table represents a relationship among a set of values.
 Table is a collection of rows or relationships which is similar to a mathematical relation
i.e. a set of tuples.
Basic Structure
 A mathematical binary relation is an association of values from one set to another set. Ex.
Less-than relation associates a set of integers with another set of integer. Consider

1, 1,
2, 2,
A= 3, B= 3,
… …

1, 1,
Relationship less-than from A to B =
2, 2,
3, 3,
… …

Or it can be represented as set of tuples <x,y> where x is an element from A and y is an


element from B such that x<y i.e.

Relationship less-than from A to B = {<1,2>,<1,3>,<1,4>…,<2,3>,<2,4>,…,<3,4>…}.


Similarly,
Relationship equal-to from A to B= {<1,1>,<2,2>,<3,3>…}.
And
Relationship greater than from A to B = {<2,1>,<3,1>,<3,2>,<4,1>,<4,2>,<4,3>…}

While the Cartesian product of A and B contains all such tuple <x,y> where x belongs to
A and y belongs to B i.e. A × B =

  1,1   1,2   1,3  ...


 2,1   2,2   2,3  ...
 
 
 3,1   3,2   3,3  ...
 ... ... ... ...
 So we can see any relation from A to B above will be a subset of A × B.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

 Now Suppose in a relational model a table T has columns titled as A, B and C. If T is


representing an entity set then A, B, C will be its attributes. Every attribute corresponds
to a limited number of values that can be assigned to it.
 The set of values that can be assigned to a particular column is called domain for that. So
A, B and C will have their specified domains. Suppose those domain sets are denoted as
DA, DB, and DC.
 Any row of the table will have a value from DA in first column, DB in second column, and
DC in third column. So a row of relational table is similar to a tuple of a mathematical
relation between the sets DA, DB, and DC.
 For Example: Consider an BOOK table having attributes acc-no, title, and author. To
make it simple we restrict the domain for acc-no. as A={100, 101, 102}, for title as
B={“DBMS”, “COMPILER”, “OS”} and for author as C={“Ramanuj”, “Aristotle” and
“Silbershatz”}. That means the first column of BOOK can have any value from only A,
second from only B and third from only C. the Cartesian product of these domain sets can
be represented in a tabular form as :
100 “DBMS” “Ramanuj”
A×B×C= 100 “DBMS” “Aristotle”
100 “DBMS” “Silbershatz”
100 “COMPILER” “Ramanuj”
100 “COMPILER” “Aristotle”
100 “COMPILER” “Silbershatz”
100 “OS” “Ramanuj”
100 “OS” “Aristotle”
200 “DBMS” “Ramanuj”
200 “DBMS” “Aristotle”
200 “DBMS” “Silbershatz”
200 “COMPILER” “Ramanuj”
200 “COMPILER” “Aristotle”
200 “COMPILER” “Silbershatz”
200 “OS” “Ramanuj”
200 “OS” “Aristotle”
200 “OS” “Silbershatz”
300 “DBMS” “Ramanuj”
300 “DBMS” “Aristotle”
300 “DBMS” “Silbershatz”
300 “COMPILER” “Ramanuj”
300 “COMPILER” “Aristotle”
300 “COMPILER” “Silbershatz”
300 “OS” “Ramanuj”
300 “OS” “Aristotle”
300 “OS” “Silbershatz”

 Now we can observe any valid table representing the entity set BOOK for a library given
above restriction on domains will have only a subset of the rows from the above table
which represents A × B × C. For Example a valid entity set for all books in the library
can be
BOOK 100 “DBMS” “Silbershatz”
200 “DBMS” “Ramanuj”
300 “COMPILER” “Silbershatz”

 So we can say any table of relational model is actually similar to the mathematical
relation.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

 Every row of such relational table is similar to a tuple of a mathematical relation. Let the
tuple variable ‘t’ refers to the first tuple (first row) in above mentioned BOOK table then
we can various elements of the tuple as t[acc-no]= 100, t[title]= “DBMS” and t[author] =
“Silbershatz”.

Query Languages
 A language in which a user requests information from the database is called a query
language.
o Procedural- user instructs the system to perform a sequence of operations on the
database to compute the desired result. E.g. Relational algebra
o Nonprocedural- user describes the information desired without giving a specific
procedure for obtaining the desired information. E.g. tuple calculus, domain
calculus.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture 10
Relational Algebra:
 Basic operations:
o Selection (σ) Selects a subset of rows from relation.
o Projection (π) Selects a subset of columns from relation.
o Cross-product (×) Allows us to combine two relations.
o Set-difference () Tuples in reln. 1, but not in reln. 2.
o Union (U) Tuples in reln. 1 and in reln. 2.
o Rename( ρ) Use new name for the Tables or fields.
 Additional operations:
o Intersection (∩) , join( ), division(÷): Not essential, but (very!) useful.
 Since each operation returns a relation, operations can be composed! (Algebra is
“closed”.)
Projection
 Deletes attributes that are not in projection list.
 Schema of result contains exactly the fields in the projection list, with the same names
that they had in the (only) input relation. ( Unary Operation)
 Projection operator has to eliminate duplicates! (as it returns a relation which is a set)
o Note: real systems typically don’t do duplicate elimination unless the user
explicitly asks for it. (Duplicate values may be representing different real world
entity or relationship)
Consider the BOOK table:
Acc-No Title Author
100 “DBMS” “Silbershatz”
200 “DBMS” “Ramanuj”
300 “COMPILER” “Silbershatz”
400 “COMPILER” “Ullman”
500 “OS” “Sudarshan”
600 “DBMS” “Silbershatz”

πTitle(BOOK) =
Title
“DBMS”
“COMPILER”
“OS”

Selection
 Selects rows that satisfy selection condition.
 No duplicates in result! (Why?)
 Schema of result identical to schema of (only) input relation.
 Result relation can be the input for another relational algebra operation! (Operator
composition.)
σAcc-no>300(BOOK) =
Acc-No Title Author
400 “COMPILER” “Ullman”
500 “OS” “Sudarshan”
600 “DBMS” “Silbershatz”

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

σTitle=”DBMS”(BOOK)=
Acc-No Title Author
100 “DBMS” “Silbershatz”
200 “DBMS” “Ramanuj”
600 “DBMS” “Silbershatz”

πAcc-no (σTitle=”DBMS” (BOOK))=


Acc-No
100
200
600

Union, Intersection, Set-Difference


 All of these operations take two input relations, which must be union-compatible:
o Same number of fields.
o `Corresponding’ fields have the same type.
 What is the schema of result?
Consider:
Borrower Depositor
Cust-name Loan-no Cust-name Acc-no
Ram L-13 Suleman A-100
Shyam L-30 Radheshyam A-300
Suleman L-42 Ram A-401

List of customers who are either borrower or depositor at bank= πCust-name (Borrower) U
πCust-name (Depositor)=
Cust-name
Ram
Shyam
Suleman
Radeshyam

Customers who are both borrowers and depositors = πCust-name (Borrower) ∩ π Cust-name
(Depositor)=
Cust-name
Ram
Suleman

Customers who are borrowers but not depositors = πCust-name (Borrower)  πCust-name
(Depositor)=
Cust-name
Shyam

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-11
Cartesian-Product or Cross-Product (S1 × R1)
 Each row of S1 is paired with each row of R1.
 Result schema has one field per field of S1 and R1, with field names `inherited’ if
possible.
 Consider the borrower and loan tables as follows:

Borrower: Loan:
Cust-name Loan-no Loan-no Amount
Ram L-13 L-13 1000
Shyam L-30 L-30 20000
Suleman L-42 L-42 40000

Cross product of Borrower and Loan, Borrower × Loan =


Borrower.Cust- Borrower.Loan- Loan.Loan- Loan.Amount
name no no
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000

The rename operation can be used to rename the fields to avoid confusion when two field
names are same in two participating tables:

For example the statement, ρLoan-borrower(Cust-name,Loan-No-1, Loan-No-2,Amount)( Borrower × Loan)


results into- A new Table named Loan-borrower is created where it has four fields which
are renamed as Cust-name, Loan-No-1, Loan-No-2 and Amount and the rows contains
the same data as the cross product of Borrower and Loan.

Loan-borrower:
Cust-name Loan-No-1 Loan-No-2 Amount
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000

Rename Operation:
 It can be used in two ways :
o ( ) return the result of expression E in the table named x.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

o ( , ,…, )(
) return the result of expression E in the table named x with
the attributes renamed to A1, A2,…, An.
o It’s benefit can be understood by the solution of the query “ Find the largest
account balance in the bank”
It can be solved by following steps:
1. Find out the relation of those balances which are not largest.
a. Consider Cartesion product of Account with itself i.e. Account
× Account
b. Compare the balances of first Account table with balances of
second Account table in the product.
c. For that we should rename one of the account table by some
other name to avoid the confusion
d. It can be done by following operation
ΠAccount.balance (σAccount.balance < d.balance(Account× ρ d(Account))
e. So the above relation contains the balances which are not
largest.
2. Subtract this relation from the relation containing all the balances i.e .
Πbalance (Account).
3. So the final statement for solving above query is
Πbalance (Account)- ΠAccount.balance (σAccount.balance < d.balance(Account× ρd(Account))

Additional Operations
Natural Join ( ⋈ )
 Forms Cartesian product of its two arguments, performs selection forcing equality
on those attributes that appear in both relations
 For example consider Borrower and Loan relations, the natural join between them
Borrower ⋈ Loan will automatically perform the selection on the table returned
by Borrower × Loan which force equality on the attribute that appear in both
Borrower and Loan i.e. Loan-no and also will have only one of the column named
Loan-No.
 That means Borrower ⋈ Loan = σBorrower.Loan-no = Loan.Loan-no (Borrower × Loan).
 The table returned from this will be as follows:
Eliminate rows that does not satisfy the selection criteria “σBorrower.Loan-no = Loan.Loan-
no” from Borrower × Loan =
Borrower.Cust- Borrower.Loan- Loan.Loan- Loan.Amount
name no no
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam
Shyam L-30
L-30
L-30
L-42
20000
40000
Suleman L-42 L-13 1000
Suleman
Suleman L-42
L-42
L-30
L-42
20000
40000

And will remove one of the column named Loan-no.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

 i.e. Borrower ⋈ Loan =


Cust-name Loan-no Amount
Ram L-13 1000
Shyam L-30 20000
Suleman L-42 40000

Division Operation:
 denoted by ÷ is used for queries that include the phrase “for all”.
 For example “Find customers who has an account in all branches in branch city
Agra”. This query can be solved by following statement.
ΠCustomer-name. branch-name (Depositor ⋈ ) ÷ Πbranch-name (σBranch-city=”Agra”(Branch)

 The division operations can be specified by using only basic operations as


follows: Let r(R) and s(S) be given relations for schema R and S with S ⊆ R
r ÷ s = ΠR-S(r) - ΠR-S ((ΠR-S (r) × s) - ΠR-S,S (r))

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-12
Tuple Relational Calculus
 Relational algebra is an example of procedural language while tuple relational
calculus is a nonprocedural query language.
 A query is specified as:
{t | P(t)}, i.e it is the set of all tuples t such that predicate P is true for t.

 The formula P(t) is formed using atoms which uses the relations, tuples of
relations and fields of tuples and following symbols
o ∈( belongs to),<,>,≤,≥,≠,=, (comparison operators)
 These atoms can then be used to form formulas with following symbols
o ∀ ( universal qualifier generally called "for all")
o ∃ ( existential qualifier generally called "there exists")
o ∧ ( and),∨ (or), ℸ( not)
 For example : here are some queries and a way to express them using tuple
calculus:
o Find the branch-name, loan-number and amount for loans over Rs 1200.
{t| t ∈ Loan ∧ t[amount] > 1200}.

o Find the loan number for each loan of an amount greater that Rs1200.
{t| ∃ s ∈ Loan(t[loan-number] = s[loan-number] ∧ s[amount] >1200}

o Find the names of all the customers who have a loan from the Sadar
branch.
{t | ∃ s ∈ Borrower ( t customer-name = s customer-name ∧
∃ u ∈ Loan ( u[loan-number] = s[loan-number
∧ u[branch-name] = "Sadar"))}

o Find all customers who have a loan , an account, or both at the bank
{t| ∃ s ∈ Borrower ( t[customer-name] = s[customer-name])
⋁ ∃ u ∈ Depositor (t[customer-name] = u[customer-name])}

o Find only those customers who have both an account and a loan.
{t| ∃ s ∈ Borrower ( t[customer-name] = s[customer-name])
∧ ∃ u ∈ Depositor (t[customer-name] = u[customer-name])}

o Find all customers who have an account but do not have loan.
{t| ∃u ∈ Depositor (t[customer-name] = u[customer-name]) ∧
ℸ ∃ s ∈ Borrower ( t[customer-name] = s[customer-name])}

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

o Find all customers who have an account at all branches located in Agra
{t | ∀ w ∈ Branch( w[branch-city] = "Agra" ⇒
∃ s ∈ Depositor ( t customer-name = s customer-name
∧ ∃ u ∈ Account ( u[account-number] = s[account-number]
∧ u[branch-name] = w[branch-name])))}

Domain Relational Calculus


 Domain relational calculus is another non procedural language for expressing database
queries.
 A query is specified as:
{<x1,x2,…,xn> | P(x1,x2,…,xn)} where x1,x2,…,xn represents domain variables. P
represent a predicate formula as in tuple calculus
 Since the domain variables are referred in place of tuples the formula doesn’t refer the
fields of tuples rather they refer the domain variables.
 For example the queries in domain calculus are mentioned as follows:
o Find the branch-name, loan-number and amount for loans over Rs 1200.
{<b, l, a>| <b, l, a> ∈ Loan ∧ a > 1200}.

o Find the loan number for each loan of an amount greater that Rs1200.
{< l >| ∃ b,a( <b, l, a> ∈ Loan ∧ a >1200}

o Find the names of all the customers who have a loan from the Sadar branch and
find the loan amount
{< c, a > | ∃ l(< c, l > ∈ Borrower
∃ b( < b, l, a >∈ Loan ∧ b="Sadar"))}

o Find names of all customers who have a loan , an account, or both at the Sadar
Branch
{<c>| ∃ l(< c, l > ∈ Borrower ∧ ∃ b, a(<b, l, a> ∈ Loan ∧ b ="Sadar"))
⋁ ∃ a(<c, a> ∈ Depositor ∧ ∃ b, n(<b, a, n> ∈ Account ∧ b ="Sadar"))}

o Find only those customers who have both an account and a loan.
{<c>| ∃ l(<c, l> ∈ Borrower ) ∧ ∃ a(<c, a> ∈ Depositor )}

o Find all customers who have an account but do not have loan.
{t| ∃ a(<c, a> ∈ Depositor ) ∧ ℸ ∃ l(<c, l> ∈ Borrower )}

o Find all customers who have an account at all branches located in Agra
{<c> | ∀ x, y, z(<x, y, z> ∈ Branch) ∧ y = "Agra" ⇒
∃ a, b(<x, a, b> ∈ Account ∧ <c, a>∈ Depositor)}
Outer Join
 Outer join operation is an extension of join operation to deal with missing information
 Suppose that we have following relational schemas:

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Employee( employee-name, street, city)


Fulltime-works(employee-name, branch-name, salary)
A snapshot of these relations is as follows:
Employee:
employee-name street city
Ram M G Road Agra
Shyam New Mandi Road Mathura
Suleman Bhagat Singh Road Aligarh

Fulltime-works
employee-name branch-name salary
Ram Sadar 30000
Shyam Sanjay Place 20000
Rehman Dayalbagh 40000

Suppose we want complete information of the full time employees.


 The natural join (Employee ⋈ Fulltime-works)will result into the loss of information for
Suleman and Rehman because they don’t have record in both the tables ( left and right
relation). The outer join will solve the problem.
 Three forms of outer join:
o Left outer join(⊐⋈):the tuples which doesn’t match while doing natural join
from left relation are also added in the result putting null values in missing field
of right relation.
o Right outer join(⋈⊏):the tuples which doesn’t match while natural join from
right relation are also added in the result putting null values in missing field of left
relation.
o Full outer join(⊐⋈⊏): include both of the left and right outer joins i.e. adds the
tuples which did not match either in left relation or right relation and put null in
place of missing values.
 The result for three forms of outer join are as follows:
Left join: Employee ⊐⋈ Fulltime-works=
employee-name street City branch-name salary
Ram M G Road Agra Sadar 30000
Shyam New Mandi Road Mathura Sanjay Place 20000
Suleman Bhagat Singh Road Aligarh Null Null

Right join: Employee ⋈⊏ Fulltime-works=


employee-name street city branch-name salary
Ram M G Road Agra Sadar 30000
Shyam New Mandi Road Mathura Sanjay Place 20000
Rehman null null Dayalbagh 40000

Full join: Employee ⊐⋈⊏ Fulltime-works=


employee-name street city branch-name salary
Ram M G Road Agra Sadar 30000
Shyam New Mandi Road Mathura Sanjay Place 20000
Suleman Bhagat Singh Road Aligarh null null
Rehman null null Dayalbagh 40000

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Aggregate Functions
 Aggregate functions are functions that take a collection of values and return a single
value as a result.
 Examples are sum, avg, count, max, min.
 Find the total balance of all the accounts
sumbalance(Account).
 Find the no of borrowers
countcustomer-name(Borrower)
 Find the distinct customers who are either borrowers or depositors.
count-distinctcustomer-name(Borrower ⋃ Depositor)
 The aggregate functions can be applied on sub groups of the rows in the table rather than
on all of the rows of table using the denoted by symbol( ).
 For example we want to find the total salary of all the full time employees branch wise. It
can be specified as follows:
branch-name (Fulltime-works)
Group1: branch name = sadar
Fulltime-works
employee-name branch-name salary
Ram Sadar 30000 Group2: branch name = sanjay place
Shyam Sanjay Place 20000
Rehman Dayalbagh 40000
Group3: branch name = Dayalbagh
Suleman Sadar 25000

The result of aggregate function with grouping specified above will be:
branch-name sum of salary
Sadar 55000
Sanjay Place 20000
Dayalbagh 40000

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-13
Structured Query Language (SQL)

Introduction
 Commercial database systems use more user friendly language to specify the queries.
 SQL is the most influential commercially marketed product language.
 Other commercially used languages are QBE, Quel, and Datalog.

Basic Structure
 The basic structure of an SQL consists of three clauses: select, from and where.
 select: it corresponds to the projection operation of relational algebra. Used to list the
attributes desired in the result.
 from: corresponds to the Cartesian product operation of relational algebra. Used to list
the relations to be scanned in the evaluation of the expression
 where: corresponds to the selection predicate of the relational algebra. It consists of a
predicate involving attributes of the relations that appear in the from clause.
 A typical SQL query has the form:
select A1, A2,…, An
from r1, r2,…, rm
where P
o Ai represents an attribute
o rj represents a relation
o P is a predicate
o It is equivalent to following relational algebra expression:
o ΠA1 ,A2,…,An (σP (r1 × r2 ×…×rm ))

[Note: The words marked in dark in this text work as keywords in SQL language. For example
“select”, “from” and “where” in the above paragraph are shown in bold font to indicate that
they are keywords]

Select Clause
Let us see some simple queries and use of select clause to express them in SQL.

 Find the names of all branches in the Loan relation


select branch-name
from Loan
 By default the select clause includes duplicate values. If we want to force the elimination
of duplicates the distinct keyword is used as follows:
select distinct branch-name
from Loan
 The all key word can be used to specify explicitly that duplicates are not removed. Even
if we not use all it means the same so we don’t require all to use in select clause.
select all branch-name
from Loan

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

 The asterisk “*” can be used to denote “all attributes”. The following SQL statement will
select and all the attributes of Loan.
select *
from Loan
 The arithmetic expressions involving operators, +, -, *, and / are also allowed in select
clause. The following statement will return the amount multiplied by 100 for the rows in
Loan table.
select branch-name, loan-number, amount * 100
from Loan

Where Clause
 Find all loan numbers for loans made at “Sadar” branch with loan amounts greater than
Rs 1200.
select loan-number
from Loan
where branch-name= “Sadar” and amount > 1200
 where clause uses uses logival connectives and, or, and not
 operands of the logical connectives can be expressions involving the comparison
operators <, <=, >, >=, =, and < >.
 between can be used to simplify the comparisons
select loan-number
from Loan
where amount between 90000 and 100000

From Clause
 The from clause by itself defines a Cartesian product of the relations in the clause.
 When an attribute is present in more than one relation they can be referred as relation-
name.attribute-name to avoid the ambiguity.
 For all customers who have loan from the bank, find their names and loan numbers
select distinct customer-name, Borrower.loan-number
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number

The Rename Operation


 Used for renaming both relations both relations and attributes in SQL
 Use as clause: old-name as new-name
 Find the names and loan numbers of the customers who have a loan at the “Sadar”
branch.
select distinct customer-name, borrower.loan-number as loan-id
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number and
branch-name = “Sadar”
we can now refer the loan-number instead by the name loan-id.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

 For all customers who have a loan from the bank, find their names and loan-numbers

select distinct customer-name, T.loan-number


from Borrower as T, Loan as S
where T.loan-number = S.loan-number
 Find the names of all branches that have assets greater than at least one branch located in
“Mathura”.
select distinct T.branch-name
from branch as T, branch as S
where T.assets > S.assets and S.branch-city = “Mathura”

String Operation
 Two special characters are used for pattern matching in strings:
o Percent ( % ) : The % character matches any substring
o Underscore( _ ): The _ character matches any character
 “%Mandi”: will match with the strings ending with “Mandi” viz. “Raja Ki mandi”,
“Peepal Mandi”
 “_ _ _” matches any string of three characters.
 Find the names of all customers whose street address includes the substring “Main”
select customer-name
from Customer
where customer-street like “%Main%”

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-14
Set Operations
 union, intersect and except operations are set operations available in SQL.
 Relations participating in any of the set operation must be compatible; i.e. they must have
the same set of attributes.
 Union Operation:
o Find all customers having a loan, an account, or both at the bank
(select customer-name
from Depositor )
union
(select customer-name
from Borrower )
It will automatically eliminate duplicates.
o If we want to retain duplicates union all can be used
(select customer-name
from Depositor )
union all
(select customer-name
from Borrower )
 Intersect Operation
o Find all customers who have both an account and a loan at the bank
(select customer-name
from Depositor )
intersect
(select customer-name
from Borrower )
o If we want to retail all the duplicates
(select customer-name
from Depositor )
intersect all
(select customer-name
from Borrower )
 Except Opeartion
o Find all customers who have an account but no loan at the bank
(select customer-name
from Depositor )
except
(select customer-name
from Borrower )
o If we want to retain the duplicates:
(select customer-name
from Depositor )
except all
(select customer-name

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

from Borrower )
Aggregate Functions
 Aggregate functions are those functions which take a collection of values as input and
return a single value.
 SQL offers 5 built in aggregate functions-
o Average: avg
o Minimum:min
o Maximum:max
o Total: sum
o Count:count
 The input to sum and avg must be a collection of numbers but others may have
collections of non-numeric data types as input as well
 Find the average account balance at the Sadar branch
select avg(balance)
from Account
where branch-name= “Sadar”
The result will be a table which contains single cell (one row and one column) having
numerical value corresponding to average balance of all account at sadar branch.
 group by clause is used to form groups, tuples with the same value on all attributes in
the group by clause are placed in one group.
 Find the average account balance at each branch
select branch-name, avg(balance)
from Account
group by branch-name
 By default the aggregate functions include the duplicates.
 distinct keyword is used to eliminate duplicates in an aggregate functions:
 Find the number of depositors for each branch
select branch-name, count(distinct customer-name)
from Depositor, Account
where Depositor.account-number = Account.account-number
group by branch-name
 having clause is used to state condition that applies to groups rather than tuples.
 Find the average account balance at each branch where average account balance is more
than Rs. 1200
select branch-name, avg(balance)
from Account
group by branch-name
having avg(balance) > 1200
 Count the number of tuples in Customer table
select count(*)
from Customer
 SQL doesn’t allow distinct with count(*)
 When where and having are both present in a statement where is applied before having.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Nested Subqueries
 A subquery is a select-from-where expression that is nested within another query.
 Set Membership
o The in and not in connectives are used for this type of subquery.
o “Find all customers who have both a loan and an account at the bank”, this query
can be written using nested subquery form as follows
select distinct customer-name
from Borrower
where customer-name in(select customer-name
from Depositor )
o Select the names of customers who have a loan at the bank, and whose names are
neither “Smith” nor “Jones”
select distinct customer-name
from Borrower
where customer-name not in(“Smith”, “Jones”)
 Set Comparison
o Find the names of all branches that have assets greater than those of at least one
branch located in Mathura
select branch-name
from Branch
where asstets > some (select assets
from Branch
where branch-city = “Mathura” )
o Apart from > some others comparison could be < some , <= some , >= some ,
= some , < > some.
o Find the names of all branches that have assets greater than that of each branch
located in Mathura
select branch-name
from Branch
where asstets > all (select assets
from Branch
where branch-city = “Mathura” )
o Apart from > all others comparison could be < all , <= all , >= all , = all ,
< >all.

Department of Electrical and Electronics By: Sulabh Bansal


www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Lecture-15
Views
? In SQL create view command is used to define a view as follows:
create view v as <query expression>
where <query expression> is any legal query expression and v is the view name.
? The view consisting of branch names and the names of customers who have either an
account or a loan at the branch. This can be defined as follows:
create view All-customer as
(select branch-name, customer-name
from Depositor, Account
where Depositor.account-number=account.account-number)
union
(select branch-name, customer-name
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number)
? The attributes names may be specified explicitly within a set of round bracket after the
name of view.
? The view names may be used as relations in subsequent queries. Using the view All-
customer find all customers of Sadar branch
select customer-name
from All-customer
where branch-name= “Sadar”
? A create-view clause creates a view definition in the database which stays until a
command - drop view view-name - is executed.

Modification of Database
? Deletion
o In SQL we can delete only whole tuple and not the values on any particular
attributes. The command is as follows:
delete from r
where P.
where P is a predicate and r is a relation.
o delete command operates on only one relation at a time. Examples are as follows:
o Delete all tuples from the Loan relation
delete from Loan
o Delete all of the Smith’s account records
delete from Depositor
where customer-name = “Smith”
o Delete all loans with loan amounts between Rs 1300 and Rs 1500.
delete from Loan
where amount between 1300 and 1500

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

o Delete the records of all accounts with balances below the average at the bank
delete from Account
where balance < ( select avg(balance)
from Account)

? Insertion
o In SQL we either specify a tuple to be inserted or write a query whose result is a
set of tuples to be inserted. Examples are as follows:
o Insert an account of account number A-9732 at the Sadar branch having balance
of Rs 1200
insert into Account
values(“Sadar”, “A-9732”, 1200)
the values are specified in the order in which the corresponding attributes are
listed in the relation schema.
o SQL allows the attributes to be specified as part of the insert statement
insert into Account(account-number, branch-name, balance)
values(“A-9732”, “Sadar”, 1200)
insert into Account(branch-name, account-number, balance)
values(“Sadar”, “A-9732”, 1200)
o Provide for all loan customers of the Sadar branch a new Rs 200 saving account
for each loan account they have. Where loan-number serve as the account number
for these accounts.
insert into Account
select branch-name, loan-number, 200
from Loan
where branch-name = “Sadar”
? Updates
o Used to change a value in a tuple without changing all values in the tuple.
o Suppose that annual interest payments are being made, and all balances are to be
increased by 5 percent.
update Account
set balance = balance * 1.05
o Suppose that accounts with balances over Rs10000 receive 6 percent interest,
whereas all others receive 5 percent.
update Account
set balance = balance * 1.06
where balance > 10000

update Account
set balance = balance * 1.05
where balance <= 10000

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net

Le c ture N ote sForDBM S a nd Da ta M ining a nd Da ta W a re housing

Data Definition Language


? Data Types in SQL
o char(n): fixed length character string, length n.
o varchar(n): variable length character string, maximum length n.
o int: an integer.
o smallint: a small integer.
o numeric(p,d): fixed point number, p digits( plus a sign), and d of the p digits are
to right of the decimal point.
o real, double precision: floating point and double precision numbers.
o float(n): a floating point number, precision at least n digits.
o date: calendar date; four digits for year, two for month and two for day of month.
o time: time of day n hours minutes and seconds.
? Domains can be defined as
create domain person-name char(20).
the domain name person-name can be used to define the type of an attribute just like
built-in domain.
? Schema Definition in SQL
o create table command is used to define relations.
create table r (A1D1, A2D2,… , AnDn,
<integrity constraint1>,
… ,
<integrity constraintk>)
where r is relation name, each Ai is the name of attribute, Di is the domain type of
values of Ai. Several types of integrity constraints are available to define in SQL.
o Integrity Constraints which are allowed in SQL are
primary key(Aj1, Aj2,… , Ajm)
and
check(P) where P is the predicate.
o drop table command is used to remove relations from database.
o alter table command is used to add attributes to an existing relation
alter table r add A D
it will add attribute A of domain type D in relation r.
alter table r drop A
it will remove the attribute A of relation r.

Departm ent ofElectricaland Electronics By:S ulabh Bansal

www.jntuworld.com
s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-16
Integrity Constraints
 Integrity Constraints guard against accidental damage to the database.
 Integrity constraints are predicates pertaining to the database.
 Domain Constraints:
o Predicates defined on the domains are Domain constraints.
o Simplest Domain constraints are defined by defining standard data types of the
attributes like Integer, Double, Float, etc.
o We can define domains by create domain clause also we can define the
constraints on such domains as follows:
create domain hourly-wage numeric(5,2)
constraint wage-value-test check(value >= 4.00)
So we can use hourly-wage as data type for any attribute where DBMS will
automatically allow only values greater than or equal to 4.00.
o Other examples for defining Domain constraints are as follows:
create domain account-number char(10)
constraint account-number-null-test check(value not null)
create domain account-type char(10)
constraint account-tyope-test
check(value in ( “Checking”, “Saving”))
By using the later domain of two above the DBMS will allow only values for any
attribute having type as account-type i.e. Checking and Saving.
 Referential Integrity:
o Foreign Key: If two table R and S are related to each other, K1 and K2 are
primary keys of the two relations also K1 is one of the attribute in S. Suppose we
want that every row in S must have a corresponding row in R, then we define the
K1 in S as foreign key. Example in our original database of library we had a table
for relation BORROWEDBY, containing two fields Card No. and Acc. No. .
Every row of BORROWEDBY relation must have corresponding row in USER
Table having same Card No. and a row in BOOK table having same Acc. No..
Then we will define the Card No. and Acc. No. in BORROWEDBY relation as
foreign keys.
o In other way we can say that every row of BORROWEDBY relation must refer to
some row in BOOK and also in USER tables.
o Such referential requirement in one table to another table is called Referential
Integrity.
o Referential Integrity constraints are defined by defining some of the attributes in a
table, which forms primary key of some other table, as foreign key.
 Functional Dependencies
o Suppose in a relation having schema R, α ⊆ R and β ⊆ R. A functional
dependency α→β holds on R if, in any table having schema R, for every two rows
r1 and r2 the values of attributes α are same in r1 and r2 then values of attributes β
are also same.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

o Consider for example the table as follows


Seq A B C D
1 a1 b1 c1 d1
2 a1 b2 c1 d2
3 a2 b2 c2 d2
4 a2 b3 c2 d3
5 a3 b3 c2 d4

Check if A→C Holds, find pair of rows where value of A is same


 row 1 and 2, value of A is same and C is also same
 row 3 and 4, Value of A is same and C is also same
 No other two rows having same value on A, So A→C holds.
Check if C→A Holds, find pair of rows where value of C is same
 row 1 and 2, value of C is same and A is also same
 row 3 and 4, value of C is same and A is also same
 row 4 and 5, value of C is same but A is not same, So C→A doesn’t hold.

We can prove AB→D also holds, find pair of rows where value of A and B
are both same
 No row where A and B both are same, So AB→D holds

o If K is a super key of a relation R then it means functional dependency K→R


holds and vice versa.
o Armstrong’s Rules: Suppose there is a given relation R and a set of functional
dependencies F that holds on R. Then these rules can be used to derive all of the
other functional dependencies which are logically implied from the given relation
R and functional dependencies F.
 Reflexivity rule: if α is a set of attributes and β ⊆ α, then α→β holds.
 Augmentation rule: if α→β holds and is a set of attributes, then
α→γβ holds.
 Transitivity rule: if α→β holds and β→γ holds, then α→γ holds.
o Additional rules are also formed to simplify deriving new functional dependencies
since applying Armstrong’s rules is a lengthy and tiresome task. Although we can
generate all the functional dependencies using only Armstrong’s rule.
 Union rule: if α→β holds and α→γ holds, then α→βγ holds.
 Decomposition rule. if α→βγ holds, then α→β holds and α→γ holds.
 Pseudotransitivity rule. If α→β holds and γβ→δ holds, then αγ→δ
holds.
o Closure of Functional Dependencies: Suppose the given set of functional
dependencies is F for a given relation schema R. When we apply various rules
stated above and generate all of the possible newer functional dependencies. Then
the set containing all these newer functional dependencies and the given set of

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

functional dependencies F is called the closure of functional dependencies and is


denoted as F+.
o Consider schema R=( A, B, C, G, H, I ) and the set of functional dependencies F
containing following functional dependencies.
 A→B
 A→C
 CG→H
 CG→I
 B→H

 Find other functional dependencies that can be derived using various rules
given above
 Examples are as follows-
 A→H can be derived using functional dependencies 1 and 5 and
transitivity rule.
 CG→HI can be derived using functional dependencies 3 and 4 and union
rule.
 AG→I can be derived using 2 and 4 and Pseudotransitivity.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-17
Normal Forms
 Some of the undesirable properties that a bad database design may have
o Repetition of information
o Inability to represent certain information
o Incapability to maintain integrity of data
 The normal forms of relational database theory provide criteria for determining a table's
degree of vulnerability to logical inconsistencies and anomalies.
 The higher the normal form applicable to a table, the less vulnerable it is to
inconsistencies and anomalies.
 Each table has a "highest normal form" (HNF): by definition, a table always meets the
requirements of its HNF and of all normal forms lower than its HNF; also by definition, a
table fails to meet the requirements of any normal form higher than its HNF.
 Generally known hierarchy of normal forms is as follows First Normal Form(1NF),
Second Normal Form(2NF), Third Normal Form(3NF), Fourth Normal Form(4NF), Fifth
Normal Form(5NF).
 We will discuss only up to 3NF of above hierarchy and another normal form Boyce-Codd
Normal Form(BCNF) in this course.

First Normal Form

 According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to


some relation", which means, specifically, that it satisfies the following five conditions:

1. There's no top-to-bottom ordering to the rows.


2. There's no left-to-right ordering to the columns.
3. There are no duplicate rows.
4. Every row-and-column intersection contains exactly one value from the
applicable domain (and nothing else).
5. All columns are regular [i.e. rows have no hidden components such as row IDs,
object IDs, or hidden timestamps].

 Examples of tables (or views) that would not meet this definition of 1NF are:

o A table that lacks a unique key. Such a table would be able to accommodate
duplicate rows, in violation of condition 3.
o A view whose definition mandates that results be returned in a particular order, so
that the row-ordering is an intrinsic and meaningful aspect of the view. This
violates condition 1. The tuples in true relations are not ordered with respect to
each other.
o A table which is having at least one nullable attribute. A nullable attribute would
be in violation of condition 4, which requires every field to contain exactly one
value from its column's domain. It should be noted, however, that this aspect of

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

condition 4 is controversial. It marks an important departure from Codd's later


vision of the relational model, which made explicit provision for nulls.

 Codd states that the "values in the domains on which each relation is defined are required
to be atomic with respect to the DBMS." Codd defines an atomic value as one that
"cannot be decomposed into smaller pieces by the DBMS (excluding certain special
functions)." Meaning a field should not be divided into parts with more than one kind of
data in it such that what one part means to the DBMS depends on another part of the
same field.
 Suppose a novice designer wish to record the names and telephone numbers of
customers. He defines a customer table which looks like this:

Customer

Telephone
Customer ID First Name Surname
Number

123 Robert Ingram 555-861-2025

456 Jane Wright 555-403-1659

789 Maria Fernandez 555-808-9633

 The designer then becomes aware of a requirement to record multiple telephone


numbers for some customers. He reasons that the simplest way of doing this is to
allow the "Telephone Number" field in any given record to contain more than one
value:

Customer Telephone
First Name Surname
ID Number

123 Robert Ingram 555-861-2025

555-403-1659
456 Jane Wright
555-776-4100

789 Maria Fernandez 555-808-9633

Assuming, however, that the Telephone Number column is defined on some Telephone
Number-like domain (e.g. the domain of strings 12 characters in length), the

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

representation above is not in 1NF. 1NF (and, for that matter, the RDBMS) prevents a
single field from containing more than one value from its column's domain.

 Repeating groups across columns: The designer might attempt to get around this
restriction by defining multiple Telephone Number columns:

Customer First
Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
ID Name

123 Robert Ingram 555-861-2025

456 Jane Wright 555-403-1659 555-776-4100 555-403-1659

789 Maria Fernandez 555-808-9633

This representation, however, makes use of nullable columns, and therefore does not
conform to Date's definition of 1NF. Even if the view is taken that nullable columns are
allowed, the design is not in keeping with the spirit of 1NF.Tel. No. 1, Tel. No. 2., and
Tel. No. 3. share exactly the same domain and exactly the same meaning; the splitting of
Telephone Number into three headings is artificial and causes logical problems. These
problems include:

o Difficulty in querying the table. Answering such questions as "Which


customers have telephone number X?" and "Which pairs of customers share a
telephone number?" is awkward.
o Inability to enforce uniqueness of Customer-to-Telephone Number links
through the RDBMS. Customer 789 might mistakenly be given a Tel. No. 2
value that is exactly the same as her Tel. No. 1 value.

o Restriction of the number of telephone numbers per customer to three. If a


customer with four telephone numbers comes along, we are constrained to
record only three and leave the fourth unrecorded. This means that the
database design is imposing constraints on the business process, rather than
(as should ideally be the case) vice-versa.

 Repeating groups within columns: The designer might, alternatively, retain the
single Telephone Number column but alter its domain, making it a string of sufficient
length to accommodate multiple telephone numbers:

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Customer First Telephone


Surname
ID Name Numbers

123 Robert Ingram 555-861-2025

555-403-1659,
456 Jane Wright
555-776-4100

789 Maria Fernandez 555-808-9633

This design is consistent with 1NF according to Date’s definition but not according to
Codd’s definition. It presents several design issues. The Telephone Number heading
becomes semantically woolly, as it can now represent either a telephone number, a list of
telephone numbers, or indeed anything at all. A query such as "Which pairs of customers
share a telephone number?" is more difficult to formulate, given the necessity to cater for
lists of telephone numbers as well as individual telephone numbers. Meaningful
constraints on telephone numbers are also very difficult to define in the RDBMS with this
design.

 A design that complies with 1NF:A design that is unambiguously in 1NF makes
use of two tables: a Customer Name table and a Customer Telephone Number table.

Customer Name Customer Telephone

Customer First Customer Telephone


Surname
ID Name ID Number

123 Robert Ingram 123 555-861-2025

456 Jane Wright 456 555-403-1659

789 Maria Fernandez 456 555-776-4100

789 555-808-9633

Repeating groups of telephone numbers do not occur in this design. Instead, each
Customer-to-Telephone Number link appears on its own record.

It is worth noting that this design meets the additional requirements for second
and third normal form (3NF).

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-18

Second Normal Form

 2NF was originally defined by E.F. Codd in 1971.


 A 1NF table is in 2NF if and only if, given any candidate key K and any attribute A
that is not a constituent of a candidate key, A depends upon the whole of K rather
than just a part of it
 A 1NF table is in 2NF if and only if all its non-prime attributes are functionally
dependent on the whole of every candidate key. (A non-prime attribute is one that
does not belong to any candidate key.)
 Note that when a 1NF table has no composite candidate keys (candidate keys
consisting of more than one attribute), the table is automatically in 2NF.
 Consider a table describing employees' skills:
Employees' Skills

Current
Employee Skill Work
Location
114
Jones Typing Main
Street
114
Jones Shorthand Main
Street
114
Jones Whittling Main
Street
73
Light
Bravo Industrial
Cleaning
Way
73
Ellis Alchemy Industrial
Way
73
Ellis Flying Industrial
Way
73
Light
Harrison Industrial
Cleaning
Way

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Neither {Employee} nor {Skill} is a candidate key for the table. This is because a
given Employee might need to appear more than once (he might have multiple
Skills), and a given Skill might need to appear more than once (it might be
possessed by multiple Employees). Only the composite key {Employee, Skill}
qualifies as a candidate key for the table.

The remaining attribute, Current Work Location, is dependent on only part of the
candidate key, namely Employee. Therefore the table is not in 2NF. Note the
redundancy in the way Current Work Locations are represented: we are told three
times that Jones works at 114 Main Street, and twice that Ellis works at 73
Industrial Way. This redundancy makes the table vulnerable to update anomalies:
it is, for example, possible to update Jones' work location on his "Typing" and
"Shorthand" records and not update his "Whittling" record. The resulting data
would imply contradictory answers to the question "What is Jones' current work
location?"

 A 2NF alternative to this design would represent the same information in two tables:
an "Employees" table with candidate key {Employee}, and an "Employees' Skills"
table with candidate key {Employee, Skill}:

Employees Employees’ Skills

Employee Current Work Location


Jones 114 Main Street Employee Skill
Bravo 73 Industrial Way Jones Typing
Ellis 73 Industrial Way Jones Shorthand
Harrison 73 Industrial Way Jones Whittling
Bravo Light Cleaning
Ellis Alchemy
Ellis Flying
Harrison Light Cleaning

Neither of these tables can suffer from update anomalies.

 Not all 2NF tables are free from update anomalies, however. An example of a 2NF
table which suffers from update anomalies is:

Tournament Winners

Winner Date of
Tournament Year Winner
Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Indiana Invitational 1998 Al Fredrickson 21 July 1975


Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

Even though Winner and Winner Date of Birth are determined by the whole key
{Tournament / Year} and not part of it, particular Winner / Winner Date of Birth
combinations are shown redundantly on multiple records. This leads to an update
anomaly: if updates are not carried out consistently, a particular winner could be
shown as having two different dates of birth.

The underlying problem is the transitive dependency to which the Winner Date of
Birth attribute is subject. Winner Date of Birth actually depends on Winner,
which in turn depends on the key Tournament / Year.

 This problem is addressed by third normal form (3NF)


 Note: In addition to the primary key, the table may contain other candidate keys; it is
necessary to establish that no non-prime attributes have part-key dependencies on any
of these candidate keys.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-19
Third Normal Form:

 3NF as defined by E.F. Codd in 1971 is - a table is in 3NF if and only if both of the
following conditions hold:
o The relation R (table) is in second normal form (2NF)
o Every non-prime attribute of R is non-transitively dependent (i.e. directly
dependent) on every candidate key of R.
o Note:
 A non-prime attribute of R is an attribute that does not belong to any
candidate key of R.
 A transitive dependency is a functional dependency in which X → Z (X
determines Z) indirectly, because X → Y and Y → Z (where it is not the
case that Y → X).
 A 3NF definition, equivalent to Codd's given by Carlo Zaniolo in 1982, states that a table
is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the
following conditions holds:
o X contains A (that is, X → A is trivial functional dependency), or
o X is a superkey, or
o Each attribute in X-A is a prime attribute (i.e., it is contained within a candidate
key)
 Zaniolo's definition gives a clear sense of the difference between 3NF and the more
stringent Boyce-Codd normal form (BCNF). BCNF simply eliminates the third
alternative ("X-A has only prime attribute").
 Difference between 2NF and 3NF can be stated as: non-key attributes be dependent on
"the whole key" ensures that a table is in 2NF; while that non-key attributes be dependent
on "nothing but the key" ensures that the table is in 3NF.
 Example of table given above :

Tournament Winners

Tournament Year Winner Winner Date of Birth


Des Moines Masters 1998 Chip Masterson 14 March 1977
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977

This table is in 2NF but not in 3NF. The breach of 3NF occurs because the non-prime
attribute Winner Date of Birth is transitively dependent on the candidate key
{Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of
Birth is functionally dependent on Winner makes the table vulnerable to logical
inconsistencies, as there is nothing to stop the same person from being shown with
different dates of birth on different records.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

In order to express the same facts without violating 3NF, it is necessary to split the table
into two:

Tournament Winners Player Dates of Birth

Tournament Year Winner


Des Moines Masters 1998 Chip Masterson
Indiana Invitational 1998 Al Fredrickson Player Date of Birth
Cleveland Open 1999 Bob Albertson Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Des Moines Masters 1999 Al Fredrickson
Bob Albertson 28 September 1968
Indiana Invitational 1999 Chip Masterson

Boyce-Codd Normal Form:

 It is a slightly stronger version of the third normal form (3NF). A table is in Boyce-Codd
normal form if and only if for every one of its non-trivial [dependencies] X → Y, X is a
superkey—that is, X is either a candidate key or a superset thereof.
 Note the above set of tables “Tournament Winners” and “Player Dates of Birth” shown as
in 3NF are also in BCNF
 Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table
which does not have multiple overlapping candidate keys is guaranteed to be in BCNF
 An example of a 3NF table that does not meet BCNF is

Today's Court Bookings

Court Start Time End Time Rate Type


1 09:30 10:30 SAVER
1 11:00 12:00 SAVER
1 14:00 15:30 STANDARD
2 10:00 11:30 PREMIUM-B
2 11:30 13:30 PREMIUM-B
2 15:00 16:30 PREMIUM-A

There are two courts available and there are four distinct rate types:

 SAVER, for Court 1 bookings made by members


 STANDARD, for Court 1 bookings made by non-members
 PREMIUM-A, for Court 2 bookings made by members
 PREMIUM-B, for Court 2 bookings made by non-members

So, Rate Type → Court is only non-trivial functional dependency that holds.

o We can observe that the table's candidate keys are:


 {Court, Start Time}

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

 {Court, End Time}


 {Rate Type, Start Time}
 {Rate Type, End Time}
o In the Today's Court Bookings table, there are no non-prime attributes: that is, all
attributes belong to candidate keys. Therefore the table adheres to both 2NF and
3NF
o The table does not adhere to BCNF because in the dependency Rate Type →
Court, the determining attribute (Rate Type) is not a super key.
 The design can be amended so that it meets BCNF as follows:

Rate Types Today’s Bookings

Rate Type Court Member Flag Rate Type Start Time End Time
SAVER 1 Yes SAVER 09:30 10:30
STANDARD 1 No SAVER 11:00 12:00
PREMIUM-A 2 Yes STANDARD 14:00 15:30
PREMIUM-B 2 No PREMIUM-B 10:00 11:30
PREMIUM-B 11:30 13:30
PREMIUM-A 15:00 16:30
The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag};
the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate
Type, End Time}. Both tables are in BCNF.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-20
Consider the following table:
Lending
branch-name branch-city assets customer-name loan-number amount
Sadar Agra 200000 Ram L-12 12000
Sanjay-place Agra 100000 Ram L-13 13000
This table stores the information regarding loans. This table has following problems:
 Since every branch is going to have several loans, the table will have one row for each
loan taken from a branch all of which will have same value for the columns branch-name,
branch-city and assets, repetition of data.
 Updating the branch-city or assets of a particular branch will require updating each row
of this table and hence the operation will be costly.
 If we miss any row without updating then there will be more than one value for a branch
city or assets of a branch, which means breaching the data integrity.
 If there is a branch having no loans then we will not have any entry in this table and we
will not be able represent the complete information.
Decomposition
 The above problem can be solved by decomposing the above table. The set of relations
R1, R2,…Rn is a decomposition of relation R if R = R1 ∪ R2 ∪…∪ Rn . It should be
noted that every pair Ri and Ri+1 of this set should have at least one common attribute so
that they can be combined back again using join operation.
 But all decompositions of this table will not be free from problem.
 Consider for example if we form two new tables out of our Lending table as follows
Branch-customer-schema = (branch-name, branch-city, assets, customer name)
Customer-loan-schema = (customer-name, loan-number, amount)
Then the resulting tables with data will be as follows:
Branch-customer
branch-name branch-city assets customer-name
Sadar Agra 200000 Ram
Sanjay-place Agra 100000 Ram

Customer-loan
customer-name loan-number amount
Ram L-12 12000
Ram L-13 13000

Now suppose to know the branch for loan L-12 we try to form join of these two we will
a table as follows:

Branch-customer ⋈ Customer-loan =
branch-name branch-city assets customer-name loan-number amount
Sadar Agra 200000 Ram L-12 12000
Sadar Agra 200000 Ram L-13 13000
Sanjay-place Agra 100000 Ram L-12 12000
Sanjay-place Agra 100000 Ram L-13 13000

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

According to this join both of the loans are taken from both of the branches. This is an
example of information loss. This occurred because the choice of Column to be kept
common in two tables after decomposition is wrong.
 Lossless-Join Decomposition: A decomposition { R1, R2,…Rn } of relation schema R is
lossless join decomposition if for all legal relations r on schema R,
r = ΠR1 (r)⋈ ΠR1 (r)⋈… ⋈ ΠRn (r)
In other words after decomposition, when we join all of the decomposed tables with data
it should result in the original table with data as was before decomposition.
 Otherwise it is called Lossy-join decomposition.
 Dependency preservation: This is another desirable property of a decomposition.
Suppose it is given that a set F of functional dependencies holds on any relation based on
schema R. Then set of functional dependencies that holds on any relation subschema R1
is F1 that contains all the functional dependencies of F which contains attributes of only
R1. So if decomposition of R is { R1, R2,…Rn } such that corresponding functional
dependencies which holds on them are { F1, F2,…Fn } then following should be true.
F+ = {F1 ∪ F2 ∪ … ∪ Fn}+.
Such a decomposition is called dependency preserving decomposition.
For example:
Consider the schema R = {A, B, C, D} such that following functional dependency holds
on it F = {A→B, A →BC, C →D}.
Now suppose the decomposition of this R is R 1= {A,B} and R2 = {B,C,D}, so the
functional dependencies which holds on R1 are F1= {A→B} (Note: F1 should contain all
the functional dependencies in F which have only attributes of R1) and those on R2 are F2
={C→D}. If we union F1 ∪ F2 is {A→B, C →D} which doesn’t contain the A →BC , so
it is not a dependency preserving decomposition.
If we decompose R into these relation schemas R1 ={A,B,C} and R2={C,D} then
F1={A→B, A →BC} and F2 ={C→D} so F1 ∪ F2 is {A→B, A →BC, C →D}.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture-21
Normalization Using Functional Dependency
 Lossless-Join Decomposition using FD:
o Let R is relation schema and F is a set of functional dependency on R. Let R 1 and
R2 form a decomposition of R. This decomposition is lossless join decomposition
if at least one of the following functional dependency is in F+:
 R1 ∩ R2 → R1
 R1 ∩ R2 → R2
o Example: Lending-schema=(branch-name, branch-city, assets, customer-name,
loan-number, amount) the FD that holds on this schema are given as
branch-name → assets branch-city
loan-number → amount branch -name
so the decomposition of it into two schema as follows:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
is a lossless join decomposition because-
Branch-schema ∩ Loan -info-schema = branch-name
and we have an FD branch-name → assets branch -city, applying augmentation
rule to it, this FD is equivalent to branch-name → branch -name assets branch-
city i.e. branch-name →Branch-schema.
 Third Normal Form Using FD:
o Let R is a relation having F as the minimal set of functional dependencies that
holds on R.
Then do the following:
1. Initially have an empty set of relations.
2. for each FD in F, α→β, i=1
 Add a relation Ri =( α,β) if no other relation contains α, β, Increase
i by one
3. After adding all such relations add another relation Ri = ( any candidate
key of R) if no other relation is containing a candidate key.
 Boyce-Codd Normal Form using FD:
1. Let Ri be relation i.e. not in BCNF
2. And, let α→β is the FD that holds on but α→Ri doesn’t hold on (i.e. α is not a
super key of Ri)
3. Replace relation Ri by two relations (α, β) and (Ri - β).
4. Now check again all the relations present with all the FD’s that holds on them and
Go back to step 1.
o Example:
 Consider: Lending-schema=(branch-name, branch-city, assets, customer-
name, loan-number, amount) the FD that holds on this schema are given as
1. branch-name → assets branch -city
2. loan-number → amount branch-name

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

 We can see that Lending-schema is not in BCNF. Also we see that in FD


branch-name → assets branch-city, branch-name is not superkey of
Lending-schema. So new relations is a set as follows:

Branch-schema=(branch-name, branch-city, assets)


branch-name → assets branch-city

Loan-info-schema = (branch-name, customer-name, loan-number,


amount)
loan-number → amount branch-name
 Again in the new set of relations we see Loan-info-schema is not in BCNF
as loan-number is not a super key of Loan-info-schema. Again we
decompose it and the set of relations are

Branch-schema=(branch-name, branch-city, assets)


branch-name → assets branch-city

Loan-schema = (branch-name, loan-number, amount)


loan-number → amount branch -name

Borrower-schema = (customer-name, loan-number)


Now all of the three relations are in BCNF so we do not have to
decompose any more.
 BCNF may not satisfy the dependency preservation criteria.
o In some cases, a non-BCNF table cannot be decomposed into tables that satisfy
BCNF and preserve the dependencies that held in the original table
o For example, a set of functional dependencies {AB → C, C → B} cannot be
represented by a BCNF schema.
o Unlike the first three normal forms, BCNF is not always achievable.
o Consider the following non-BCNF table whose functional dependencies follow
the {AB → C, C → B} pattern:

Nearest Shop

Person Shop Type Nearest Shop


Davidson Optician Eagle Eye
Davidson Hairdresser Snippets
Wright Bookshop Merlin Books
Fuller Bakery Doughy's
Fuller Hairdresser Sweeney Todd's
Fuller Optician Eagle Eye

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

For each Person / Shop Type combination, the table tells us which shop of this
type is geographically nearest to the person's home. We assume for simplicity that
a single shop cannot be of more than one type.
The candidate keys of the table are:
 {Person, Shop Type}
 {Person, Nearest Shop}
Because all three attributes are prime attributes (i.e. belong to candidate keys), the
table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is
functionally dependent on a non-superkey: Nearest Shop.

Shop Near Person Shop

Person Shop Shop Shop Type


Davidson Eagle Eye Eagle Eye Optician
Davidson Snippets Snippets Hairdresser
Wright Merlin Books Merlin Books Bookshop
Fuller Doughy's Doughy's Bakery
Fuller Sweeney Todd's Sweeney Todd's Hairdresser
Fuller Eagle Eye

The "Shop Near Person" table has a candidate key of {Person, Shop}, and the
"Shop" table has a candidate key of {Shop}. Unfortunately, although this design
adheres to BCNF, it is unacceptable on different grounds: it allows us to record
multiple shops of the same type against the same person. In other words, its
candidate keys do not guarantee that the functional dependency {Person, Shop
Type} → {Shop} will be respected.

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture 22
Multivalued Dependencies
 Let R be a relation schema, and X and Y be disjoint subsets of R (i.e., X ⊆R, Y⊆ R,
X∩Y= ), and Z = R- XY.A relation r(R) satisfies X↠ Y if for any two tuples t1 and t2,
o t1(X)=t2(X), then there exist t3 in r such that
o t3(X)=t1(X), t3(Y)=t1(Y), t3(Z)=t2(Z).
o By symmetry, there exist t4 in r such that
o t4(X)=t1(X), t4(Y)=t2(Y), t4(Z)=t1(Z).

X Y Z
t1 x1 y1 z1
t2 x1 y2 z2
t3 x1 y1 z2
t4 x1 y2 z1

 The MVD X↠ Y says that the relationship between X and Y is independent of the
relationship between X and R-Y
 For example consider the table Employee:

Employee-name Project-name Dependant-name


Smith X John
Smith Y Ann
Smith X Ann
Smith Y John

o MVDs Employee-name↠ Project-name and Employee-name↠ Dependant-name


hold in the relation
o The employee named Smith works on projects X and Y, and has two dependents
John and Ann.
o If we store only the first two tuples in the relation, it would incorrectly show the
associations among attributes
o If we have MVDs in a relation, we may have to repeat values redundantly in the
tuples. In the Employee relation, values X and Y of Project-name are repeated
with each value of Dependant-name--- clearly undesirable
o Problem: Employee schema is in BCNF because no FDs hold for it
o Trivial MVD: If MVD X Y is satisfied by all relations whose schemas include X
and Y, it is called trivial MVD.
 X↠Y is trivial whenever Y⊆ X or X∪Y=R
o If a relation r fails to satisfy a given MVD, a relation r’ that satisfies the MVD can
be constructed by adding tuples to r.
 MVD is called "tuple generating dependency"
 compare it with FD: need to delete tuples to make the relation to satisfy a
given FD

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

o MVD can be used in two ways


 test relations to determine whether they are legal under a given set of FDs
and MVDs
 specify constraints on a set of relations
 Let D: a set of FDs and MVDs then D+: the closure of D is the set of all FDs and MVDs
logically implied by D.
 D+ can be computed using the following set of sound and complete rules
1. reflexivity: if Y⊆ X then X→Y
2. augmentation: if X→ Y then WX →Y
3. transitivity: if X→Y and Y→Z then X→ Z
4. complementation: if X↠Y then X↠ R-XY
5. MV augmentation: if X↠ Y and W⊆ R, V⊆ W,then WX↠ VY
6. MV transitivity: if X ↠Y and Y↠ Z then X↠ Z-Y
7. replication: if X→ Y then X↠ Y
8. coalescence: if X↠Y and Z⊆Y, W⊆R, W⋂Y= , W→Z, then X→Z
 Note: The first three rules are Armstrong’s axioms.

Fourth Normal Form(4NF):


 A relation scheme R is in 4NF w.r.t. D, if for every non-trivial MVD X↠Y in D+, X is a
superkey for R
 4NF vs BCNF
o 4NF is different from BCNF only in the use of D (FD + MVD) instead of F (FDs)
o every 4NF schemas are also in BCNF.
 By replication rule, X→Y implies X↠Y.
o If R is not in BCNF, there exists a non-trivial FD X→Y where X is not a superkey
--- R cannot be in 4NF
 For example: Employee (Employee-name, Project-name, Dependant-name) is not in 4NF,
since
o Employee-name↠Pproject-name but Employee-name is not a key.
o Decompose into Emp-proj (E-n, P-n) and Emp-dep (E-n, D-n) do bring the tables
in 4NF
 For example: Borrow (Loan#, C-name, Street, C-city) is in BCNF, but not in 4NF,
because C-name↠Loan# is a non-trivial MVD, where C-name is not a key in this
schema.
 The decomposition -- R1=(C-name, Loan#), R2=(C-name, Street, C-city)—brings them
in 4NF
 Benefits of Fourth Normal Form
o Reduced number of tuples
o No anomalies for insert/delete/update
 Comparing FD and MVD
o if we have (a1,b1,c1,d1) є r and (a1,b2,c2,d2) є r
 A→B implies b1=b2
 A↠B implies (a1,b1,c2,d2) є r and (a1,b2,c1,d1) є r

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

Lecture 23

Join Dependency and Fifth Normal form(Project Join Normal Form):


 The normal forms discussed so far required that the given relation R if not in the given
normal form be decomposed in two relations to meet the requirements of the normal
form. In some rare cases, a relation can have problems like redundant information and
update anomalies because of it but cannot be decomposed in two relations to remove the
problems. In such cases it may be possible to decompose the relation in three or more
relations using the 5NF.

 The fifth normal form deals with join-dependencies which is a generalisation of the
MVD. The aim of fifth normal form is to have relations that cannot be decomposed
further. A relation in 5NF cannot be constructed from several smaller relations.

 A relation R satisfies join dependency *(R1, R2, ..., Rn) if and only if R is equal to the join
of R1, R2, ..., Rn where each Ri is a subset of the set of attributes of R

 A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies of
the form *(R1, R2, ..., Rn), where each Ri is a subset of the set of attributes of R and
R = R1⋃ R2⋃...⋃Rn, at least one of the following holds.

o *(R1, R2, ..., Rn) is a trivial join-dependency (i.e., one of Ri is R)


o Every Ri is a super key for R.

 An example of 5NF can be provided by the example below that deals with departments,
subjects and students.

Department Subject Student


Comp. Sc. CP1000 John Smith
Mathematics MA1000 John Smith
Comp. Sc. CP2000 Arun Kumar
Comp. Sc. CP3000 Reena Rani
Physics PH1000 Raymond Chew
Chemistry CH2000 Albert Garcia

o The above relation says that Comp. Sc. offers subjects CP1000, CP2000 and
CP3000 which are taken by a variety of students. No student takes all the subjects
and no subject has all students enrolled in it and therefore all three fields are
needed to represent the information.
o The above relation does not show MVDs since the attributes subject and student
are not independent; they are related to each other and the pairings have
significant information in them. The relation can therefore not be decomposed in
two relations
 (dept, subject), and (dept, student)

Department of Electrical and Electronics By: Sulabh Bansal


s.sanyasirao1@gmail.com

Lecture Notes For DBMS and Data Mining and Data Warehousing

without losing some important information.


o The relation can however be decomposed in the following three relations
 (dept, subject), and
 (dept, student)
 (subject, student)
and now it can be shown that this decomposition is lossless
 Consider the Loan-Info-Schema discussed earlier. Suppose it is given that following join
dependency holds on the Loan-Info-Schema-

*((loan-number,branch-name), (loan-number, customer-name), (loan-number,amount))

Then it is not in 5th normal form as all of these relation schema doesn’t represent the
super keys so we should decompose it into three relations as given by the join
dependency i.e. we should have following three relation schemas in place of given Loan-
Info-Schema:
o (loan-number, branch-name),
o (loan-number, customer-name), and
o (loan-number, amount)

Department of Electrical and Electronics By: Sulabh Bansal

You might also like