Professional Documents
Culture Documents
Dbms Notes PDF
Dbms Notes PDF
com
s.sanyasirao1@gmail.com www.jwjobs.net
Lecture 1.
What do you mean by Data and Database?
Raw data – this could be “85” – doesn’t have meaning when it stands alone. It might mean
something if you knew it was weight of a man in Kilograms.
Related raw data is a group (data set or data file) of organized raw data that can be tied
together. For example, it could be a group of Names, weights, blood group and identification
numbers, all tied to the Identity cards issued to patients at hospitals
Cleaned raw data is all the above after being validated or processed through some process.
Such a process might ensure that blood groups doesn’t have any value as “red” or “black” for
example only allowed values could be of the kind A,A+,B,B+ etc.
Data can be acquired from many different sources. It must always be evaluated as to which
category it belongs, and if it needs any additional validation before analysis that produces
information.
Database:
A database consists of an organized collection of interrelated data for one or more uses,
typically in digital form.
Examples of databases could be: Database for Educational Institute or a Bank, Library, Railway
Reservation system etc.
What Is a DBMS?
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
S et ofprogram s File
S ystem
Most explicit and major disadvantages of file system when compared to database management
system are as follows:
? Data Redundancy- The files are created in the file system as and when required by an
enterprise over its growth path. So in that case the repetition of information about an
entity cannot be avoided.
Eg. The addresses of customers will be present in the file maintaining information
about customers holding savings account and also the address of the customers will be
present in file maintaining the current account. Even when same customer have a saving
account and current account his address will be present at two places.
? Data Inconsistency: Data redundancy leads to greater problem than just wasting the
storage i.e. it may lead to inconsistent data. Same data which has been repeated at several
places may not match after it has been updated at some places.
For example: Suppose the customer requests to change the address for his account in
the Bank and the Program is executed to update the saving bank account file only but his
current bank account file is not updated. Afterwards the addresses of the same customer
present in saving bank account file and current bank account file will not match.
Moreover there will be no way to find out which address is latest out of these two.
? Difficulty in Accessing Data: For generating ad hoc reports the programs will not already
be present and only options present will to write a new program to generate requested
report or to work manually. This is going to take impractical time and will be more
expensive.
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
For example: Suppose all of sudden the administrator gets a request to generate a list
of all the customers holding the saving banks account who lives in particular locality of
the city. Administrator will not have any program already written to generate that list but
say he has a program which can generate a list of all the customers holding the savings
account. Then he can either provide the information by going thru the list manually to
select the customers living in the particular locality or he can write a new program to
generate the new list. Both of these ways will take large time which would generally be
impractical.
? Data Isolation: Since the data files are created at different times and supposedly by
different people the structures of different files generally will not match. The data will be
scattered in different files for a particular entity. So it will be difficult to obtain
appropriate data.
For example: Suppose the Address in Saving Account file have fields: Add line1, Add
line2, City, State, Pin while the fields in address of Current account are: House No.,
Street No., Locality, City, State, Pin. Administrator is asked to provide the list of
customers living in a particular locality. Providing consolidated list of all the customers
will require looking in both files. But they both have different way of storing the address.
Writing a program to generate such a list will be difficult.
? Integrity Problems: All the consistency constraints have to be applied to database through
appropriate checks in the coded programs. This is very difficult when number such
constraint is very large.
For example: An account should not have balance less than Rs. 500. To enforce this
constraint appropriate check should be added in the program which add a record and the
program which withdraw from an account. Suppose later on this amount limit is
increased then all those check should be updated to avoid inconsistency. These time to
time changes in the programs will be great headache for the administrator.
? Security and access control: Database should be protected from unauthorized users.
Every user should not be allowed to access every data. Since application programs are
added to the system
For example: The Payroll Personnel in a bank should not be allowed to access
accounts information of the customers.
? Concurrency Problems: When more than one users are allowed to process the database.
If in that environment two or more users try to update a shared data element at about the
same time then it may result into inconsistent data.
For example: Suppose Balance of an account is Rs. 500. And User A and B try to
withdraw Rs 100 and Rs 50 respectively at almost the same time using the Update
process.
Update:
1. Read the balance amount.
2. Subtract the withdrawn amount from balance.
3. Write updated Balance value.
Suppose A performs Step 1 and 2 on the balance amount i.e it reads 500 and subtract 100
from it. But at the same time B withdraws Rs 50 and he performs the Update process and
he also reads the balance as 500 subtract 50 and writes back 450. User A will also write
his updated Balance amount as 400. They may update the Balance value in any order
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
depending on various reasons concerning to system being used by both of the users. So
finally the balance will be either equal to 400 or 450. Both of these values are wrong for
the updated balance and so now the balance amount is having inconsistent value forever.
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Lecture 2
Why Use a DBMS?
? Data independence and efficient access.
? Reduced application development time.
? Data integrity and security.
? Uniform data administration.
? Concurrent access, recovery from crashes.
Role of DBMS:
S et ofprogram s File
U sers S ystem Disk
While the DBMS will be another layer of software package placed between the file system and
set of application programs. The Role of DBMS can described by the following diagram at a
very high level.
A pplication P rogram s
DBM S
File S ystem
Disk
Role of DBMS
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Database schema: Overall design of the database. An analogy to the programming language
could be the definition of various variables with their data types. In case of relational database
management system the definition of table names, and their fields with data types will be the
database schema.
Database Instance: The collection of information stored in the database at a particular moment is
called database instance. An analogy to the programming languages would be the values stored
in the variables during the execution of programs. In case of relational database management
system the data stored in various tables at a particular time is the instance of the database.
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
View level
- - --
L ogicalL evel
P hysicalL evel
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Lecture 3
Data Independence:
? The DDL statements are compiled to form the Data Dictionary or Data Directory which
contains the meta data i.e. data about data.
? It is a language that enables users access or manipulate data from the database.
? This consists of very high level statements that are used to specify the operations to be
performed on the database.
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Query Language:
It is the portion of DML that is used to access or retrieve the information from the database.
Database Manager:
? This is the software that takes care for execution of all the statements specified in DDL or
DML. This software handles all the problems of a database and is responsible for
providing all of the features claimed above like data consistency, non-redundant data,
atomicity, concurrency control, easy access to data etc.
? It may be subdivided into two major components:
o Transaction Manager
o Storage Manager
Storage Manager is responsible for the interaction with the file system and provides an
appropriate level of physical level of data abstraction. It is responsible to provide easy access to
database to the users.
The overall system structure of the database management system could be shown as below:
DM L Q uery DDL
Com piler P rocessor Com piler
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Types Of Users:
? DBA: Person who designs the database and writes database schema in DDL based on the
design
? Sophisticated Users: People who know DML commands and operate on database
directly.
? Application Programmers: People who operate on the database through the application
programs usually written in some high level computer language like C, Java, VB etc.
? Naïve Users: People who executes the application programs through APIs written
specifically for their requirements. They are generally not aware of the computer
technology e.g. tellers, agents, registrars, librarian etc.
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Lecture 4
Data Models
? A data model is a collection of concepts for describing data, data relationships, data
semantics and consistency constraints.
? A schema is a description of a particular collection of data, using a given data model.
? Primary categories for various data models are:
o Object-based logical models
? Provide very high level design of the database
? Provide flexible structuring capabilities. The most popular ones are as
follows:
? The Entity-Relationship model
? The object oriented model
? The semantic model
? The functional model
o Record-based logical models
? Provide more implementation based design
? Specify overall logical structure of the database and provide high level
description of the implementation
? The most popular ones are as follows:
? Relational Model
? Network Model
? Hierarchical Model
o Physical models
? Describe data at the lowest level
? Captures aspects of database-system implementation
? Widely known are unifying model and frame-memory model
Entity Relationship model (E-R Model):
? Identifies basic elements, or objects, or entities which are core to the data base
? Consider for example the Library database. Most basic entities of a library can be
identified as books and users. There are other basic entities like suppliers, magazines,
journals etc..
? We describe the database generally be diagrams called E-R diagrams when using ER
Model.
? The sample E-R diagram for the above mentioned simple library database having only
two entities Books and Users can be formed as follows.
? All of the entities of type book will be represented by an entity set.(represented by
rectangle)
BO O K
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
? We identify what are various attributes that describes the entities of an entity set. A book
is described in library by its Accession Number, Call Number, Title, Author, Publisher,
Year of publication etc. They are attached to entities as Ellipses as shown below:
BO O K
P ublisher
CallN o..
T itle Author
? Similarly we associate attributes which defines a user in library to the respective entity in
E-R diagram as shown below
Card.N o.
U S ER
N am e
? Apart from entities E-R model describes the relationships between the entities. They are
again seen as relationship sets existing between Entity sets. For example a user can
borrow a book from library. All of those relationships between any book of library to any
user are represented by a relationship set. We can name it as ‘borrowed by’ relationship
set. Borrowed By relationship can again have its own attributes which exists only when a
relationship exists. For example ‘Date of Issue’ exists only when a book has been
borrowed by a particular user it is neither the attribute of Book nor of User. We represent
the relationships by diamond in E-R diagrams as shown below:
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Acc.N o.
Card.N o.
BO O K Borr. U S ER
CallN o.. By
U serID
P ublisher
U serT ype
Date ofissue
Yr.ofP ub. N am e
T itle Author
Relational Model:
Book Table:
Acc. No. Call No. Title Author Publisher Yr. of
Publication
312 245 Database System Silberschatz, McGrawHill 1997
Concepts Korth, Sudarshan
433 23 Fundamental of Elmasri, Navathe Addison Wesley 1999
Database Systems
User Table:
Card No. Name User Type User ID
422 Abhishek Student 0706412234
4322 Mr. Lalit Faculty 23456789
? The relationships can also be represented by tables. They include only those attributes of
the related entities which are sufficient to identify them uniquely and possibly attributes
which are specific to the relation
Borrowed By Table:
Book Acc. No. User Card No. Date of Issue
312 422 03/08/2010
433 4322 05/08/2010
? The above relationship table shows that User Abhishek has borrowed ‘Database System
Concepts’ by ‘Silberschatz, Korth, Sudarshan’ and user ‘Mr. Lalit’ has borrowed
‘Fundamental of Database Systems’ by ‘Elmasri, Navathe’ from library on 03/08/2010
and 05/08/2010 respectively.
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Network Model:
? Data in Network Model are represented by collection of records.
? Relationships between data are represented by links or pointers.
Book1 User1
312 245 Database … * 422 Abhishek Student … *
System
Concepts
* * *
Book2
433 23 Fundamental … *
of Database 4322 Mr. Lalit Faculty … *
Systems
Book3
434 24 Fundamental … *
of Database
Systems
? The above diagram shows 5 data records each having the several data values for
corresponding attributes and an extra field marked as ‘*’which is used for link or pointer.
? Whenever a relationship exists between two data elements that is explicitly shown by
using the pointers. So relationship of BOOK1 and BOOK2 with USER1 is shown by
pointers in BOOK1 and BOOK2 records. Similarly the USER1 is related to these books
and they can be shown by circular linked list. This list contains only pointers. One such
list is pointed by the link field of USER1 which in turn contains list of pointers to all of
the books which are borrowed by USER1 and last pointer points back to USER1.
? So the combination of data records and links can be used in any way to form the network
of data as per the convenience of designers and programmers.
Hierarchical Model:
? This one is very similar to Network Model in terms that it also uses records and links to
represent data and relationships respectively.
? The difference is that Network Model forms a network or graph of data and connections
while Hierarchical model forms only trees which doesn’t allow cycles.
? The data elements present in the model have parent-child relationships. Where the data
nodes which are pointing are called parents and those nodes which are pointed by their
parents are called child.
? Any child data node cannot be pointed by two different parents.
? For example if we put Books as parents and Users as children then two books cannot
point to same user record. In that case we will have to replicate the record of user for
each book
Book1 Book2
312 245 Database … * 433 23 Fundamental … *
System of Database
Concepts Systems
434 24 Fundamental … *
of Database
Systems
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Lecture 5
Entity-Relationship Model
Symbol Meaning
Entity Type
Relationship Type
Attribute
Key Attribute
Multivalued Attribute
Composite Attribute
Derived Attribute
E1 R E2 Total Participation of E2 in R
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Account O w ns Branch
Custom er
Mapping Constraints:
? Two most important type of mapping constraints are Mapping cardinalities and Existence
Dependencies.
? For a binary relationship set R between entity sets A and B, The mapping cardinalities
must be one of the following:
o One to One: An entity in A is associated to at most one entity in B and vice-versa.
o One to Many: An entity in A is associated with any number of entities in B but an
entity in B is associated to at most one entity in A.
o Many to One: An entity in A is associated to only one entity in B while an entity
in B is associated to any number of entities in A.
o Many to Many: An entity in A is related to any number of entities in B and vice-
versa.
O ne to O ne O ne to M any
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Lecture 6
Keys:
? Super Key: set of one or more attributes, when taken collectively, can identify uniquely
an entity in the entity set. There can be more than one super key of an Entity set.
? Candidate Key: The minimal super key is a candidate key i.e those super keys of the
entity set, who doesn’t have any subset which are also a super key are called candidate
keys. There can be more than one candidate key of an Entity set.
? Primary Key: A candidate key which used as the key by database administrator while
implementing the database management system is called primary key of the entity set.
? Example: Consider an Entity Set named “Student” which has following set of attributes:
1. Student ID
2. Roll Number
3. Name
4. Father’s Name
5. DOB
6. Address
? Various entities of the above entity set will have the values for all the fields. But no two
entities i.e. no two students will have same values for all the six attributes. So one of the
super key is set containing following attributes:
Super Key1: (Student ID, Roll Number, Name, Father’s Name, DOB, Address)
Also Super Key2: (Student ID, Roll Number, Name, Father’s Name, DOB) will not have
same values for any two students in “Student” entity set. Similarly other super key of the
above entity set are following:
Super Key3: (Student ID, Roll Number, Name, Father’s Name)
Super Key4: (Student ID, Roll Number, Name)
Super Key5: (Student ID, Roll Number)
Super Key6: (Student ID)
Also Roll number is unique for a student so
Super Key7: (Roll Number)
All of them are sufficient to identify a particular student entity in the Entity Set of all
students.
The Super Key6 and Super Key7 is the candidate key of this entity set as no subset of any
of them is a super key. They are minimal subsets which are super keys.
Candidate Key1: (Student ID)
Candidate Key2: (Roll Number)
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
? The following diagram represents the above described entity sets and relationships by an
E-R Diagram.
L oan N o.
P aym ent N o.
A m ount
P aym ent L oan - L oan
Date P aym ent
Cust.Id.
A m ount
? The Weak entity sets may also be modeled as a multivalued, composite attribute of the
owner entity set. Modeling as multivalued, composite attribute will be appropriate when
weak entity set participate only in the identifying relationship and number of attributes
are less. Otherwise modeling as Weak entity set will be more appropriate.
? A Weak Entity set may have several entities which have same values of all the attributes
provided they are related to different strong owner entity. But all the weak entities related
to a particular strong owner entity must be distinguishable. The set of attributes which
allows making a distinction between the weak entities related to a particular strong entity
is called the partial key or discriminator of the weak entity set.
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Lecture 7
Extended E-R Features:
? Specialization: It is the process of identifying the subclass/es of an entity set which are
special from other entities of this set in terms of attributes or relationships they make.
Consider the design of a database for an academic institution. While designing we
identified an entity set “Employee” which represent all the employees of this. The
attributes of this may be (Employee ID, Employee Name, Date of Joining, Address) but
then we see a subclass of this set of employees, called set of all “Teachers” which is
different from other employees. We may have other employees in the sub classes called
“Admin Staff” or “Technical Lab Staff” or “Other Staff” like peons and other workers.
All of these subclasses form specialization of class “Employee”. All the attributes and
associations of class “Employee” will also be there with all the subclasses. Every teacher
will have an employee ID, Name, DOJ, Address similarly the admin staff, technical lab
staff and other staff members. Sub classes may have some attributes or associations of
their own which make them different from others. The Teachers will have a subject they
teach, Department they belong to, Expertise they have. The teachers will have
associations with different entities like “Classes” they teach in, “Projects” they guide etc.
These attributes and associations will not be there with other employees sub sets
? Generalization: This is again a similar process as specialization but it is just opposite of
that. It is the process of combining the subclasses into a general class and moving the
common attributes and associations from subclasses to the general class. It is just the
different practical approach. In specialization we start from the general classes and forms
the special classes out of them while in generalization we start from various low level
classes and forms the general classes by combining several of them identifying the
common features in them. So in the above example of Academic Institute we may start
thinking of “Teachers” as an Entity set and then “Technical Lab Staff” and Then “Admin
staff” and then observing that they have several fields in common like “Employee Id”,
“name”, “ Address” etc. we combine them to define a general class “Employee” which
will only those attributes which are common in all the above three classes. And then these
classes will not have these common fields rather they will be there for all employees
collectively in the general entity set “Employee”.
? The result of both the above process is same. We get a hierarchy of classes and
subclasses which can be represented by a tree structure. The result of generalization and
specialization will be like this.
N am e Address
EM P ID Em ployee DO J
ISA
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Cust-nam e
L oan N um ber
Address
L oan-officer
Em ployee
Em p-ID Address
Em p-nam e
o The diagram above may imply that the relationships Borrower and Loan-Officer
may be combined into one. But then it will require that a loan-officer must be
combined to every Customer-loan pair, which is not true.
o The above diagram also have redundancy as every customer-loan pair in “Loan-
officer” is also in “Borrower”.
? More appropriate way of representing the above set of relationships would be to consider
the entire relationship Borrower with its associated entities Customer and Loan as an
entity i.e. an aggregate entity, and then representing relationship Loan-Officer between
the entities “Employee” and the above aggregated entity. As follows:
L oan-officer
Em ployee
Em p-ID Address
Em p-nam e
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Lecture 8
Reduction of ER Schema to Tables:
? Strong Entity Sets: These are the entity sets for which we have a set of attributes which
are called primary key (or simply a key). To represent such an entity set in a form of table
we will have a column in a table for each attribute of the entity schema. Each entity of
that entity set will be represented by a row in table having values for each attribute.
For Example- the entity set BOOK as referred earlier has following attributes: Acc. No.,
Call No., Title, Author, Publisher and Yr of Publication. Also consider that we have
only two books in the library- a book on “Database System Concepts” by Silbershatz,
Korth and Sudarshan published in 1997 having accession number as 312 and Call no. as
245 and another book on “Fundament of Database Systems” by Elmasri and Navathe
published in year 1999 having accession no. as 433 and call no. as 23. Then entity set
BOOK will be represented in tabular form as follows:
Acc. No. Call No. Title Author Publisher Yr. of
Publication
312 245 Database System Silberschatz, McGrawHill 1997
Concepts Korth, Sudarshan
433 23 Fundamental of Elmasri, Navathe Addison Wesley 1999
Database Systems
? Weak Entity Sets: These are the sets where we cannot identify the different entities only
looking at their attributes we should be able to establish a link between a weak entity and
some of the entity from another strong entity set which is called owner of the weak entity
set. Such entity sets when represented in a tabular form will have a column for key
attributes of owner apart from other columns for the attributes of the entity set.
For Example: Consider the PAYMENT entity set which is a weak entity set dependent on
its owner entity set LOAN. LOAN has Loan No. and Loan Amount as its attributes with
Loan No. as the key attribute and PAYMENT has Payment No., Payment Amount and
Payment Date as the set of attributes (no key as it’s a weak entity set but Payment No. is
a partial Key). Consider the following table is there for LOAN:
Also consider that a payment of Rs 100 is made for L-1 on 22/08/2010 as its first
payment and a payment of Rs 300 is made for L-2 on 25/08/2010 as its first payment then
table corresponding to entity set PAYMENT will look like this:
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
Notice: We have included Loan No. as a column even though it was not an attribute of
entity set PAYMENT because it is the Key attribute of the owner entity set LOAN. Loan
No. and Payment No. in combination forms the primary key of this table.
? Relationships: To represent a relationship of an ERD in tabular form we have a column
corresponding to key attributes of each of the participating entity with a column for each
attribute which is directly associated to the relationship set only.
For example: We have defined earlier the relationship BORROWED BY which exists
between the entity sets BOOK and USER. It has an attribute Date of Issue directly
associated to it. The tabular representation of BORROWED BY will have a column for
Acc. No. a column corresponding key of BOOK and a column for Card No.
corresponding to key of USER and a column for DOI corresponding to the attribute of
relationship set. The table may look like this where rows represents all the borrowings
which are there in the library:
Book Acc. No. User Card No. Date of Issue
312 422 03/08/2010
433 4322 05/08/2010
? Existentially Dependent Entity Sets: Since the existence of all the entities of a
dependent entity set depends on the existence of some entity of its owner. We may
remove the table representing the relationship that is there between an existentially
dependent entity and its dominant entity by adding the Column in table for dependent
entity corresponding to key of dominant entity.
For Example: ACCOUNT having attributes Account No.(key) and Balance is
existentially dependent on BRANCH having attributes Branch Id(key) and Address. So
the table for relationship BRANCH-ACCOUNT which associates the accounts to
branches may be removed by just adding a column naming Branch Id in table
representing ACCOUNT. The Table will have following columns:
? Identifying Relationship Sets: These are the relationship sets represented as doubly
outlined diamonds in ERD which form an associate a weak entity set to its owner. Since
we have already included the Primary key of Strong owner Entity set in the table of weak
entity set so we do not require a separate table to represent the identifying relationships.
? Multivalued Attributes: The attributes of an entity which can have more than one value
is called multi valued attribute. They are marked by doubly outlined ovals in ERD.
For example: Consider “Dependants” an attribute of an EMPLOYEE. Since there may be
more than one dependent of an employee we will represent this as a multivalued attribute
of an EMPLOYEE. But if we represent it as a column in the table for the entity set we
will not be able to put all of the values for a row. A multivalued attribute is represented as
a different table similar to the weak entities where you will have a column corresponding
to the primary key of the entity and a column corresponding to each sub attribute of
multivalued attribute(there will be sub attributes in case multivalued attribute itself is a
composite attribute). We show the ERD and corresponding Table for such an analogy
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
EM P N am e
EM P ID
EM P L O YEE
Dependent
Dependent N o.
N am e R elation
Employee Table
EMPID EMPName
EMP-1 Rajan
EMP-2 Sartaj
Dependant Table
EMPID Dependent No. Name Relation
EMP-1 1 Shashi Mother
EMP-1 2 Rachna Spouse
EMP-2 1 Shahina Spouse
EMP-2 2 Rehman Son
Here we see that Rajan is having dependents his mother and his wife and Sartaj also
having two dependents Shahina his wife and Rehman his son.
ISA
S A VIN G CU R R EN T
It shows a general class of entities ACCOUNT which two special classes SAVING and
CURRENT referring to savings bank account and current bank account.
It can be represented in the following way
1. Tables for general case:
ACCOUNT
AccNo Balance
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
SAVING:
AccNo InterestRate
CURRENT
AccNo OverdraftAmount
CURRENT:
AccNo Balance OverdraftAmount
L oan-officer
Em ployee
Em p-ID Address
Em p-nam e
www.jntuworld.com
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture 9
Relational Model
A relational database consists of a collection of tables, each of which is assigned a unique
name.
A row in table represents a relationship among a set of values.
Table is a collection of rows or relationships which is similar to a mathematical relation
i.e. a set of tuples.
Basic Structure
A mathematical binary relation is an association of values from one set to another set. Ex.
Less-than relation associates a set of integers with another set of integer. Consider
1, 1,
2, 2,
A= 3, B= 3,
… …
1, 1,
Relationship less-than from A to B =
2, 2,
3, 3,
… …
While the Cartesian product of A and B contains all such tuple <x,y> where x belongs to
A and y belongs to B i.e. A × B =
Lecture Notes For DBMS and Data Mining and Data Warehousing
Now we can observe any valid table representing the entity set BOOK for a library given
above restriction on domains will have only a subset of the rows from the above table
which represents A × B × C. For Example a valid entity set for all books in the library
can be
BOOK 100 “DBMS” “Silbershatz”
200 “DBMS” “Ramanuj”
300 “COMPILER” “Silbershatz”
So we can say any table of relational model is actually similar to the mathematical
relation.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Every row of such relational table is similar to a tuple of a mathematical relation. Let the
tuple variable ‘t’ refers to the first tuple (first row) in above mentioned BOOK table then
we can various elements of the tuple as t[acc-no]= 100, t[title]= “DBMS” and t[author] =
“Silbershatz”.
Query Languages
A language in which a user requests information from the database is called a query
language.
o Procedural- user instructs the system to perform a sequence of operations on the
database to compute the desired result. E.g. Relational algebra
o Nonprocedural- user describes the information desired without giving a specific
procedure for obtaining the desired information. E.g. tuple calculus, domain
calculus.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture 10
Relational Algebra:
Basic operations:
o Selection (σ) Selects a subset of rows from relation.
o Projection (π) Selects a subset of columns from relation.
o Cross-product (×) Allows us to combine two relations.
o Set-difference () Tuples in reln. 1, but not in reln. 2.
o Union (U) Tuples in reln. 1 and in reln. 2.
o Rename( ρ) Use new name for the Tables or fields.
Additional operations:
o Intersection (∩) , join( ), division(÷): Not essential, but (very!) useful.
Since each operation returns a relation, operations can be composed! (Algebra is
“closed”.)
Projection
Deletes attributes that are not in projection list.
Schema of result contains exactly the fields in the projection list, with the same names
that they had in the (only) input relation. ( Unary Operation)
Projection operator has to eliminate duplicates! (as it returns a relation which is a set)
o Note: real systems typically don’t do duplicate elimination unless the user
explicitly asks for it. (Duplicate values may be representing different real world
entity or relationship)
Consider the BOOK table:
Acc-No Title Author
100 “DBMS” “Silbershatz”
200 “DBMS” “Ramanuj”
300 “COMPILER” “Silbershatz”
400 “COMPILER” “Ullman”
500 “OS” “Sudarshan”
600 “DBMS” “Silbershatz”
πTitle(BOOK) =
Title
“DBMS”
“COMPILER”
“OS”
Selection
Selects rows that satisfy selection condition.
No duplicates in result! (Why?)
Schema of result identical to schema of (only) input relation.
Result relation can be the input for another relational algebra operation! (Operator
composition.)
σAcc-no>300(BOOK) =
Acc-No Title Author
400 “COMPILER” “Ullman”
500 “OS” “Sudarshan”
600 “DBMS” “Silbershatz”
Lecture Notes For DBMS and Data Mining and Data Warehousing
σTitle=”DBMS”(BOOK)=
Acc-No Title Author
100 “DBMS” “Silbershatz”
200 “DBMS” “Ramanuj”
600 “DBMS” “Silbershatz”
List of customers who are either borrower or depositor at bank= πCust-name (Borrower) U
πCust-name (Depositor)=
Cust-name
Ram
Shyam
Suleman
Radeshyam
Customers who are both borrowers and depositors = πCust-name (Borrower) ∩ π Cust-name
(Depositor)=
Cust-name
Ram
Suleman
Customers who are borrowers but not depositors = πCust-name (Borrower) πCust-name
(Depositor)=
Cust-name
Shyam
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-11
Cartesian-Product or Cross-Product (S1 × R1)
Each row of S1 is paired with each row of R1.
Result schema has one field per field of S1 and R1, with field names `inherited’ if
possible.
Consider the borrower and loan tables as follows:
Borrower: Loan:
Cust-name Loan-no Loan-no Amount
Ram L-13 L-13 1000
Shyam L-30 L-30 20000
Suleman L-42 L-42 40000
The rename operation can be used to rename the fields to avoid confusion when two field
names are same in two participating tables:
Loan-borrower:
Cust-name Loan-No-1 Loan-No-2 Amount
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000
Rename Operation:
It can be used in two ways :
o ( ) return the result of expression E in the table named x.
Lecture Notes For DBMS and Data Mining and Data Warehousing
o ( , ,…, )(
) return the result of expression E in the table named x with
the attributes renamed to A1, A2,…, An.
o It’s benefit can be understood by the solution of the query “ Find the largest
account balance in the bank”
It can be solved by following steps:
1. Find out the relation of those balances which are not largest.
a. Consider Cartesion product of Account with itself i.e. Account
× Account
b. Compare the balances of first Account table with balances of
second Account table in the product.
c. For that we should rename one of the account table by some
other name to avoid the confusion
d. It can be done by following operation
ΠAccount.balance (σAccount.balance < d.balance(Account× ρ d(Account))
e. So the above relation contains the balances which are not
largest.
2. Subtract this relation from the relation containing all the balances i.e .
Πbalance (Account).
3. So the final statement for solving above query is
Πbalance (Account)- ΠAccount.balance (σAccount.balance < d.balance(Account× ρd(Account))
Additional Operations
Natural Join ( ⋈ )
Forms Cartesian product of its two arguments, performs selection forcing equality
on those attributes that appear in both relations
For example consider Borrower and Loan relations, the natural join between them
Borrower ⋈ Loan will automatically perform the selection on the table returned
by Borrower × Loan which force equality on the attribute that appear in both
Borrower and Loan i.e. Loan-no and also will have only one of the column named
Loan-No.
That means Borrower ⋈ Loan = σBorrower.Loan-no = Loan.Loan-no (Borrower × Loan).
The table returned from this will be as follows:
Eliminate rows that does not satisfy the selection criteria “σBorrower.Loan-no = Loan.Loan-
no” from Borrower × Loan =
Borrower.Cust- Borrower.Loan- Loan.Loan- Loan.Amount
name no no
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam
Shyam L-30
L-30
L-30
L-42
20000
40000
Suleman L-42 L-13 1000
Suleman
Suleman L-42
L-42
L-30
L-42
20000
40000
And will remove one of the column named Loan-no.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Division Operation:
denoted by ÷ is used for queries that include the phrase “for all”.
For example “Find customers who has an account in all branches in branch city
Agra”. This query can be solved by following statement.
ΠCustomer-name. branch-name (Depositor ⋈ ) ÷ Πbranch-name (σBranch-city=”Agra”(Branch)
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-12
Tuple Relational Calculus
Relational algebra is an example of procedural language while tuple relational
calculus is a nonprocedural query language.
A query is specified as:
{t | P(t)}, i.e it is the set of all tuples t such that predicate P is true for t.
The formula P(t) is formed using atoms which uses the relations, tuples of
relations and fields of tuples and following symbols
o ∈( belongs to),<,>,≤,≥,≠,=, (comparison operators)
These atoms can then be used to form formulas with following symbols
o ∀ ( universal qualifier generally called "for all")
o ∃ ( existential qualifier generally called "there exists")
o ∧ ( and),∨ (or), ℸ( not)
For example : here are some queries and a way to express them using tuple
calculus:
o Find the branch-name, loan-number and amount for loans over Rs 1200.
{t| t ∈ Loan ∧ t[amount] > 1200}.
o Find the loan number for each loan of an amount greater that Rs1200.
{t| ∃ s ∈ Loan(t[loan-number] = s[loan-number] ∧ s[amount] >1200}
o Find the names of all the customers who have a loan from the Sadar
branch.
{t | ∃ s ∈ Borrower ( t customer-name = s customer-name ∧
∃ u ∈ Loan ( u[loan-number] = s[loan-number
∧ u[branch-name] = "Sadar"))}
o Find all customers who have a loan , an account, or both at the bank
{t| ∃ s ∈ Borrower ( t[customer-name] = s[customer-name])
⋁ ∃ u ∈ Depositor (t[customer-name] = u[customer-name])}
o Find only those customers who have both an account and a loan.
{t| ∃ s ∈ Borrower ( t[customer-name] = s[customer-name])
∧ ∃ u ∈ Depositor (t[customer-name] = u[customer-name])}
o Find all customers who have an account but do not have loan.
{t| ∃u ∈ Depositor (t[customer-name] = u[customer-name]) ∧
ℸ ∃ s ∈ Borrower ( t[customer-name] = s[customer-name])}
Lecture Notes For DBMS and Data Mining and Data Warehousing
o Find all customers who have an account at all branches located in Agra
{t | ∀ w ∈ Branch( w[branch-city] = "Agra" ⇒
∃ s ∈ Depositor ( t customer-name = s customer-name
∧ ∃ u ∈ Account ( u[account-number] = s[account-number]
∧ u[branch-name] = w[branch-name])))}
o Find the loan number for each loan of an amount greater that Rs1200.
{< l >| ∃ b,a( <b, l, a> ∈ Loan ∧ a >1200}
o Find the names of all the customers who have a loan from the Sadar branch and
find the loan amount
{< c, a > | ∃ l(< c, l > ∈ Borrower
∃ b( < b, l, a >∈ Loan ∧ b="Sadar"))}
o Find names of all customers who have a loan , an account, or both at the Sadar
Branch
{<c>| ∃ l(< c, l > ∈ Borrower ∧ ∃ b, a(<b, l, a> ∈ Loan ∧ b ="Sadar"))
⋁ ∃ a(<c, a> ∈ Depositor ∧ ∃ b, n(<b, a, n> ∈ Account ∧ b ="Sadar"))}
o Find only those customers who have both an account and a loan.
{<c>| ∃ l(<c, l> ∈ Borrower ) ∧ ∃ a(<c, a> ∈ Depositor )}
o Find all customers who have an account but do not have loan.
{t| ∃ a(<c, a> ∈ Depositor ) ∧ ℸ ∃ l(<c, l> ∈ Borrower )}
o Find all customers who have an account at all branches located in Agra
{<c> | ∀ x, y, z(<x, y, z> ∈ Branch) ∧ y = "Agra" ⇒
∃ a, b(<x, a, b> ∈ Account ∧ <c, a>∈ Depositor)}
Outer Join
Outer join operation is an extension of join operation to deal with missing information
Suppose that we have following relational schemas:
Lecture Notes For DBMS and Data Mining and Data Warehousing
Fulltime-works
employee-name branch-name salary
Ram Sadar 30000
Shyam Sanjay Place 20000
Rehman Dayalbagh 40000
Lecture Notes For DBMS and Data Mining and Data Warehousing
Aggregate Functions
Aggregate functions are functions that take a collection of values and return a single
value as a result.
Examples are sum, avg, count, max, min.
Find the total balance of all the accounts
sumbalance(Account).
Find the no of borrowers
countcustomer-name(Borrower)
Find the distinct customers who are either borrowers or depositors.
count-distinctcustomer-name(Borrower ⋃ Depositor)
The aggregate functions can be applied on sub groups of the rows in the table rather than
on all of the rows of table using the denoted by symbol( ).
For example we want to find the total salary of all the full time employees branch wise. It
can be specified as follows:
branch-name (Fulltime-works)
Group1: branch name = sadar
Fulltime-works
employee-name branch-name salary
Ram Sadar 30000 Group2: branch name = sanjay place
Shyam Sanjay Place 20000
Rehman Dayalbagh 40000
Group3: branch name = Dayalbagh
Suleman Sadar 25000
The result of aggregate function with grouping specified above will be:
branch-name sum of salary
Sadar 55000
Sanjay Place 20000
Dayalbagh 40000
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-13
Structured Query Language (SQL)
Introduction
Commercial database systems use more user friendly language to specify the queries.
SQL is the most influential commercially marketed product language.
Other commercially used languages are QBE, Quel, and Datalog.
Basic Structure
The basic structure of an SQL consists of three clauses: select, from and where.
select: it corresponds to the projection operation of relational algebra. Used to list the
attributes desired in the result.
from: corresponds to the Cartesian product operation of relational algebra. Used to list
the relations to be scanned in the evaluation of the expression
where: corresponds to the selection predicate of the relational algebra. It consists of a
predicate involving attributes of the relations that appear in the from clause.
A typical SQL query has the form:
select A1, A2,…, An
from r1, r2,…, rm
where P
o Ai represents an attribute
o rj represents a relation
o P is a predicate
o It is equivalent to following relational algebra expression:
o ΠA1 ,A2,…,An (σP (r1 × r2 ×…×rm ))
[Note: The words marked in dark in this text work as keywords in SQL language. For example
“select”, “from” and “where” in the above paragraph are shown in bold font to indicate that
they are keywords]
Select Clause
Let us see some simple queries and use of select clause to express them in SQL.
Lecture Notes For DBMS and Data Mining and Data Warehousing
The asterisk “*” can be used to denote “all attributes”. The following SQL statement will
select and all the attributes of Loan.
select *
from Loan
The arithmetic expressions involving operators, +, -, *, and / are also allowed in select
clause. The following statement will return the amount multiplied by 100 for the rows in
Loan table.
select branch-name, loan-number, amount * 100
from Loan
Where Clause
Find all loan numbers for loans made at “Sadar” branch with loan amounts greater than
Rs 1200.
select loan-number
from Loan
where branch-name= “Sadar” and amount > 1200
where clause uses uses logival connectives and, or, and not
operands of the logical connectives can be expressions involving the comparison
operators <, <=, >, >=, =, and < >.
between can be used to simplify the comparisons
select loan-number
from Loan
where amount between 90000 and 100000
From Clause
The from clause by itself defines a Cartesian product of the relations in the clause.
When an attribute is present in more than one relation they can be referred as relation-
name.attribute-name to avoid the ambiguity.
For all customers who have loan from the bank, find their names and loan numbers
select distinct customer-name, Borrower.loan-number
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number
Lecture Notes For DBMS and Data Mining and Data Warehousing
For all customers who have a loan from the bank, find their names and loan-numbers
String Operation
Two special characters are used for pattern matching in strings:
o Percent ( % ) : The % character matches any substring
o Underscore( _ ): The _ character matches any character
“%Mandi”: will match with the strings ending with “Mandi” viz. “Raja Ki mandi”,
“Peepal Mandi”
“_ _ _” matches any string of three characters.
Find the names of all customers whose street address includes the substring “Main”
select customer-name
from Customer
where customer-street like “%Main%”
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-14
Set Operations
union, intersect and except operations are set operations available in SQL.
Relations participating in any of the set operation must be compatible; i.e. they must have
the same set of attributes.
Union Operation:
o Find all customers having a loan, an account, or both at the bank
(select customer-name
from Depositor )
union
(select customer-name
from Borrower )
It will automatically eliminate duplicates.
o If we want to retain duplicates union all can be used
(select customer-name
from Depositor )
union all
(select customer-name
from Borrower )
Intersect Operation
o Find all customers who have both an account and a loan at the bank
(select customer-name
from Depositor )
intersect
(select customer-name
from Borrower )
o If we want to retail all the duplicates
(select customer-name
from Depositor )
intersect all
(select customer-name
from Borrower )
Except Opeartion
o Find all customers who have an account but no loan at the bank
(select customer-name
from Depositor )
except
(select customer-name
from Borrower )
o If we want to retain the duplicates:
(select customer-name
from Depositor )
except all
(select customer-name
Lecture Notes For DBMS and Data Mining and Data Warehousing
from Borrower )
Aggregate Functions
Aggregate functions are those functions which take a collection of values as input and
return a single value.
SQL offers 5 built in aggregate functions-
o Average: avg
o Minimum:min
o Maximum:max
o Total: sum
o Count:count
The input to sum and avg must be a collection of numbers but others may have
collections of non-numeric data types as input as well
Find the average account balance at the Sadar branch
select avg(balance)
from Account
where branch-name= “Sadar”
The result will be a table which contains single cell (one row and one column) having
numerical value corresponding to average balance of all account at sadar branch.
group by clause is used to form groups, tuples with the same value on all attributes in
the group by clause are placed in one group.
Find the average account balance at each branch
select branch-name, avg(balance)
from Account
group by branch-name
By default the aggregate functions include the duplicates.
distinct keyword is used to eliminate duplicates in an aggregate functions:
Find the number of depositors for each branch
select branch-name, count(distinct customer-name)
from Depositor, Account
where Depositor.account-number = Account.account-number
group by branch-name
having clause is used to state condition that applies to groups rather than tuples.
Find the average account balance at each branch where average account balance is more
than Rs. 1200
select branch-name, avg(balance)
from Account
group by branch-name
having avg(balance) > 1200
Count the number of tuples in Customer table
select count(*)
from Customer
SQL doesn’t allow distinct with count(*)
When where and having are both present in a statement where is applied before having.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Nested Subqueries
A subquery is a select-from-where expression that is nested within another query.
Set Membership
o The in and not in connectives are used for this type of subquery.
o “Find all customers who have both a loan and an account at the bank”, this query
can be written using nested subquery form as follows
select distinct customer-name
from Borrower
where customer-name in(select customer-name
from Depositor )
o Select the names of customers who have a loan at the bank, and whose names are
neither “Smith” nor “Jones”
select distinct customer-name
from Borrower
where customer-name not in(“Smith”, “Jones”)
Set Comparison
o Find the names of all branches that have assets greater than those of at least one
branch located in Mathura
select branch-name
from Branch
where asstets > some (select assets
from Branch
where branch-city = “Mathura” )
o Apart from > some others comparison could be < some , <= some , >= some ,
= some , < > some.
o Find the names of all branches that have assets greater than that of each branch
located in Mathura
select branch-name
from Branch
where asstets > all (select assets
from Branch
where branch-city = “Mathura” )
o Apart from > all others comparison could be < all , <= all , >= all , = all ,
< >all.
Lecture-15
Views
? In SQL create view command is used to define a view as follows:
create view v as <query expression>
where <query expression> is any legal query expression and v is the view name.
? The view consisting of branch names and the names of customers who have either an
account or a loan at the branch. This can be defined as follows:
create view All-customer as
(select branch-name, customer-name
from Depositor, Account
where Depositor.account-number=account.account-number)
union
(select branch-name, customer-name
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number)
? The attributes names may be specified explicitly within a set of round bracket after the
name of view.
? The view names may be used as relations in subsequent queries. Using the view All-
customer find all customers of Sadar branch
select customer-name
from All-customer
where branch-name= “Sadar”
? A create-view clause creates a view definition in the database which stays until a
command - drop view view-name - is executed.
Modification of Database
? Deletion
o In SQL we can delete only whole tuple and not the values on any particular
attributes. The command is as follows:
delete from r
where P.
where P is a predicate and r is a relation.
o delete command operates on only one relation at a time. Examples are as follows:
o Delete all tuples from the Loan relation
delete from Loan
o Delete all of the Smith’s account records
delete from Depositor
where customer-name = “Smith”
o Delete all loans with loan amounts between Rs 1300 and Rs 1500.
delete from Loan
where amount between 1300 and 1500
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
o Delete the records of all accounts with balances below the average at the bank
delete from Account
where balance < ( select avg(balance)
from Account)
? Insertion
o In SQL we either specify a tuple to be inserted or write a query whose result is a
set of tuples to be inserted. Examples are as follows:
o Insert an account of account number A-9732 at the Sadar branch having balance
of Rs 1200
insert into Account
values(“Sadar”, “A-9732”, 1200)
the values are specified in the order in which the corresponding attributes are
listed in the relation schema.
o SQL allows the attributes to be specified as part of the insert statement
insert into Account(account-number, branch-name, balance)
values(“A-9732”, “Sadar”, 1200)
insert into Account(branch-name, account-number, balance)
values(“Sadar”, “A-9732”, 1200)
o Provide for all loan customers of the Sadar branch a new Rs 200 saving account
for each loan account they have. Where loan-number serve as the account number
for these accounts.
insert into Account
select branch-name, loan-number, 200
from Loan
where branch-name = “Sadar”
? Updates
o Used to change a value in a tuple without changing all values in the tuple.
o Suppose that annual interest payments are being made, and all balances are to be
increased by 5 percent.
update Account
set balance = balance * 1.05
o Suppose that accounts with balances over Rs10000 receive 6 percent interest,
whereas all others receive 5 percent.
update Account
set balance = balance * 1.06
where balance > 10000
update Account
set balance = balance * 1.05
where balance <= 10000
www.jntuworld.com
www.jntuworld.com
s.sanyasirao1@gmail.com www.jwjobs.net
www.jntuworld.com
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-16
Integrity Constraints
Integrity Constraints guard against accidental damage to the database.
Integrity constraints are predicates pertaining to the database.
Domain Constraints:
o Predicates defined on the domains are Domain constraints.
o Simplest Domain constraints are defined by defining standard data types of the
attributes like Integer, Double, Float, etc.
o We can define domains by create domain clause also we can define the
constraints on such domains as follows:
create domain hourly-wage numeric(5,2)
constraint wage-value-test check(value >= 4.00)
So we can use hourly-wage as data type for any attribute where DBMS will
automatically allow only values greater than or equal to 4.00.
o Other examples for defining Domain constraints are as follows:
create domain account-number char(10)
constraint account-number-null-test check(value not null)
create domain account-type char(10)
constraint account-tyope-test
check(value in ( “Checking”, “Saving”))
By using the later domain of two above the DBMS will allow only values for any
attribute having type as account-type i.e. Checking and Saving.
Referential Integrity:
o Foreign Key: If two table R and S are related to each other, K1 and K2 are
primary keys of the two relations also K1 is one of the attribute in S. Suppose we
want that every row in S must have a corresponding row in R, then we define the
K1 in S as foreign key. Example in our original database of library we had a table
for relation BORROWEDBY, containing two fields Card No. and Acc. No. .
Every row of BORROWEDBY relation must have corresponding row in USER
Table having same Card No. and a row in BOOK table having same Acc. No..
Then we will define the Card No. and Acc. No. in BORROWEDBY relation as
foreign keys.
o In other way we can say that every row of BORROWEDBY relation must refer to
some row in BOOK and also in USER tables.
o Such referential requirement in one table to another table is called Referential
Integrity.
o Referential Integrity constraints are defined by defining some of the attributes in a
table, which forms primary key of some other table, as foreign key.
Functional Dependencies
o Suppose in a relation having schema R, α ⊆ R and β ⊆ R. A functional
dependency α→β holds on R if, in any table having schema R, for every two rows
r1 and r2 the values of attributes α are same in r1 and r2 then values of attributes β
are also same.
Lecture Notes For DBMS and Data Mining and Data Warehousing
We can prove AB→D also holds, find pair of rows where value of A and B
are both same
No row where A and B both are same, So AB→D holds
Lecture Notes For DBMS and Data Mining and Data Warehousing
Find other functional dependencies that can be derived using various rules
given above
Examples are as follows-
A→H can be derived using functional dependencies 1 and 5 and
transitivity rule.
CG→HI can be derived using functional dependencies 3 and 4 and union
rule.
AG→I can be derived using 2 and 4 and Pseudotransitivity.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-17
Normal Forms
Some of the undesirable properties that a bad database design may have
o Repetition of information
o Inability to represent certain information
o Incapability to maintain integrity of data
The normal forms of relational database theory provide criteria for determining a table's
degree of vulnerability to logical inconsistencies and anomalies.
The higher the normal form applicable to a table, the less vulnerable it is to
inconsistencies and anomalies.
Each table has a "highest normal form" (HNF): by definition, a table always meets the
requirements of its HNF and of all normal forms lower than its HNF; also by definition, a
table fails to meet the requirements of any normal form higher than its HNF.
Generally known hierarchy of normal forms is as follows First Normal Form(1NF),
Second Normal Form(2NF), Third Normal Form(3NF), Fourth Normal Form(4NF), Fifth
Normal Form(5NF).
We will discuss only up to 3NF of above hierarchy and another normal form Boyce-Codd
Normal Form(BCNF) in this course.
Examples of tables (or views) that would not meet this definition of 1NF are:
o A table that lacks a unique key. Such a table would be able to accommodate
duplicate rows, in violation of condition 3.
o A view whose definition mandates that results be returned in a particular order, so
that the row-ordering is an intrinsic and meaningful aspect of the view. This
violates condition 1. The tuples in true relations are not ordered with respect to
each other.
o A table which is having at least one nullable attribute. A nullable attribute would
be in violation of condition 4, which requires every field to contain exactly one
value from its column's domain. It should be noted, however, that this aspect of
Lecture Notes For DBMS and Data Mining and Data Warehousing
Codd states that the "values in the domains on which each relation is defined are required
to be atomic with respect to the DBMS." Codd defines an atomic value as one that
"cannot be decomposed into smaller pieces by the DBMS (excluding certain special
functions)." Meaning a field should not be divided into parts with more than one kind of
data in it such that what one part means to the DBMS depends on another part of the
same field.
Suppose a novice designer wish to record the names and telephone numbers of
customers. He defines a customer table which looks like this:
Customer
Telephone
Customer ID First Name Surname
Number
Customer Telephone
First Name Surname
ID Number
555-403-1659
456 Jane Wright
555-776-4100
Assuming, however, that the Telephone Number column is defined on some Telephone
Number-like domain (e.g. the domain of strings 12 characters in length), the
Lecture Notes For DBMS and Data Mining and Data Warehousing
representation above is not in 1NF. 1NF (and, for that matter, the RDBMS) prevents a
single field from containing more than one value from its column's domain.
Repeating groups across columns: The designer might attempt to get around this
restriction by defining multiple Telephone Number columns:
Customer First
Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
ID Name
This representation, however, makes use of nullable columns, and therefore does not
conform to Date's definition of 1NF. Even if the view is taken that nullable columns are
allowed, the design is not in keeping with the spirit of 1NF.Tel. No. 1, Tel. No. 2., and
Tel. No. 3. share exactly the same domain and exactly the same meaning; the splitting of
Telephone Number into three headings is artificial and causes logical problems. These
problems include:
Repeating groups within columns: The designer might, alternatively, retain the
single Telephone Number column but alter its domain, making it a string of sufficient
length to accommodate multiple telephone numbers:
Lecture Notes For DBMS and Data Mining and Data Warehousing
555-403-1659,
456 Jane Wright
555-776-4100
This design is consistent with 1NF according to Date’s definition but not according to
Codd’s definition. It presents several design issues. The Telephone Number heading
becomes semantically woolly, as it can now represent either a telephone number, a list of
telephone numbers, or indeed anything at all. A query such as "Which pairs of customers
share a telephone number?" is more difficult to formulate, given the necessity to cater for
lists of telephone numbers as well as individual telephone numbers. Meaningful
constraints on telephone numbers are also very difficult to define in the RDBMS with this
design.
A design that complies with 1NF:A design that is unambiguously in 1NF makes
use of two tables: a Customer Name table and a Customer Telephone Number table.
789 555-808-9633
Repeating groups of telephone numbers do not occur in this design. Instead, each
Customer-to-Telephone Number link appears on its own record.
It is worth noting that this design meets the additional requirements for second
and third normal form (3NF).
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-18
Current
Employee Skill Work
Location
114
Jones Typing Main
Street
114
Jones Shorthand Main
Street
114
Jones Whittling Main
Street
73
Light
Bravo Industrial
Cleaning
Way
73
Ellis Alchemy Industrial
Way
73
Ellis Flying Industrial
Way
73
Light
Harrison Industrial
Cleaning
Way
Lecture Notes For DBMS and Data Mining and Data Warehousing
Neither {Employee} nor {Skill} is a candidate key for the table. This is because a
given Employee might need to appear more than once (he might have multiple
Skills), and a given Skill might need to appear more than once (it might be
possessed by multiple Employees). Only the composite key {Employee, Skill}
qualifies as a candidate key for the table.
The remaining attribute, Current Work Location, is dependent on only part of the
candidate key, namely Employee. Therefore the table is not in 2NF. Note the
redundancy in the way Current Work Locations are represented: we are told three
times that Jones works at 114 Main Street, and twice that Ellis works at 73
Industrial Way. This redundancy makes the table vulnerable to update anomalies:
it is, for example, possible to update Jones' work location on his "Typing" and
"Shorthand" records and not update his "Whittling" record. The resulting data
would imply contradictory answers to the question "What is Jones' current work
location?"
A 2NF alternative to this design would represent the same information in two tables:
an "Employees" table with candidate key {Employee}, and an "Employees' Skills"
table with candidate key {Employee, Skill}:
Not all 2NF tables are free from update anomalies, however. An example of a 2NF
table which suffers from update anomalies is:
Tournament Winners
Winner Date of
Tournament Year Winner
Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Lecture Notes For DBMS and Data Mining and Data Warehousing
Even though Winner and Winner Date of Birth are determined by the whole key
{Tournament / Year} and not part of it, particular Winner / Winner Date of Birth
combinations are shown redundantly on multiple records. This leads to an update
anomaly: if updates are not carried out consistently, a particular winner could be
shown as having two different dates of birth.
The underlying problem is the transitive dependency to which the Winner Date of
Birth attribute is subject. Winner Date of Birth actually depends on Winner,
which in turn depends on the key Tournament / Year.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-19
Third Normal Form:
3NF as defined by E.F. Codd in 1971 is - a table is in 3NF if and only if both of the
following conditions hold:
o The relation R (table) is in second normal form (2NF)
o Every non-prime attribute of R is non-transitively dependent (i.e. directly
dependent) on every candidate key of R.
o Note:
A non-prime attribute of R is an attribute that does not belong to any
candidate key of R.
A transitive dependency is a functional dependency in which X → Z (X
determines Z) indirectly, because X → Y and Y → Z (where it is not the
case that Y → X).
A 3NF definition, equivalent to Codd's given by Carlo Zaniolo in 1982, states that a table
is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the
following conditions holds:
o X contains A (that is, X → A is trivial functional dependency), or
o X is a superkey, or
o Each attribute in X-A is a prime attribute (i.e., it is contained within a candidate
key)
Zaniolo's definition gives a clear sense of the difference between 3NF and the more
stringent Boyce-Codd normal form (BCNF). BCNF simply eliminates the third
alternative ("X-A has only prime attribute").
Difference between 2NF and 3NF can be stated as: non-key attributes be dependent on
"the whole key" ensures that a table is in 2NF; while that non-key attributes be dependent
on "nothing but the key" ensures that the table is in 3NF.
Example of table given above :
Tournament Winners
This table is in 2NF but not in 3NF. The breach of 3NF occurs because the non-prime
attribute Winner Date of Birth is transitively dependent on the candidate key
{Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of
Birth is functionally dependent on Winner makes the table vulnerable to logical
inconsistencies, as there is nothing to stop the same person from being shown with
different dates of birth on different records.
Lecture Notes For DBMS and Data Mining and Data Warehousing
In order to express the same facts without violating 3NF, it is necessary to split the table
into two:
It is a slightly stronger version of the third normal form (3NF). A table is in Boyce-Codd
normal form if and only if for every one of its non-trivial [dependencies] X → Y, X is a
superkey—that is, X is either a candidate key or a superset thereof.
Note the above set of tables “Tournament Winners” and “Player Dates of Birth” shown as
in 3NF are also in BCNF
Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table
which does not have multiple overlapping candidate keys is guaranteed to be in BCNF
An example of a 3NF table that does not meet BCNF is
There are two courts available and there are four distinct rate types:
So, Rate Type → Court is only non-trivial functional dependency that holds.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Rate Type Court Member Flag Rate Type Start Time End Time
SAVER 1 Yes SAVER 09:30 10:30
STANDARD 1 No SAVER 11:00 12:00
PREMIUM-A 2 Yes STANDARD 14:00 15:30
PREMIUM-B 2 No PREMIUM-B 10:00 11:30
PREMIUM-B 11:30 13:30
PREMIUM-A 15:00 16:30
The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag};
the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate
Type, End Time}. Both tables are in BCNF.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-20
Consider the following table:
Lending
branch-name branch-city assets customer-name loan-number amount
Sadar Agra 200000 Ram L-12 12000
Sanjay-place Agra 100000 Ram L-13 13000
This table stores the information regarding loans. This table has following problems:
Since every branch is going to have several loans, the table will have one row for each
loan taken from a branch all of which will have same value for the columns branch-name,
branch-city and assets, repetition of data.
Updating the branch-city or assets of a particular branch will require updating each row
of this table and hence the operation will be costly.
If we miss any row without updating then there will be more than one value for a branch
city or assets of a branch, which means breaching the data integrity.
If there is a branch having no loans then we will not have any entry in this table and we
will not be able represent the complete information.
Decomposition
The above problem can be solved by decomposing the above table. The set of relations
R1, R2,…Rn is a decomposition of relation R if R = R1 ∪ R2 ∪…∪ Rn . It should be
noted that every pair Ri and Ri+1 of this set should have at least one common attribute so
that they can be combined back again using join operation.
But all decompositions of this table will not be free from problem.
Consider for example if we form two new tables out of our Lending table as follows
Branch-customer-schema = (branch-name, branch-city, assets, customer name)
Customer-loan-schema = (customer-name, loan-number, amount)
Then the resulting tables with data will be as follows:
Branch-customer
branch-name branch-city assets customer-name
Sadar Agra 200000 Ram
Sanjay-place Agra 100000 Ram
Customer-loan
customer-name loan-number amount
Ram L-12 12000
Ram L-13 13000
Now suppose to know the branch for loan L-12 we try to form join of these two we will
a table as follows:
Branch-customer ⋈ Customer-loan =
branch-name branch-city assets customer-name loan-number amount
Sadar Agra 200000 Ram L-12 12000
Sadar Agra 200000 Ram L-13 13000
Sanjay-place Agra 100000 Ram L-12 12000
Sanjay-place Agra 100000 Ram L-13 13000
Lecture Notes For DBMS and Data Mining and Data Warehousing
According to this join both of the loans are taken from both of the branches. This is an
example of information loss. This occurred because the choice of Column to be kept
common in two tables after decomposition is wrong.
Lossless-Join Decomposition: A decomposition { R1, R2,…Rn } of relation schema R is
lossless join decomposition if for all legal relations r on schema R,
r = ΠR1 (r)⋈ ΠR1 (r)⋈… ⋈ ΠRn (r)
In other words after decomposition, when we join all of the decomposed tables with data
it should result in the original table with data as was before decomposition.
Otherwise it is called Lossy-join decomposition.
Dependency preservation: This is another desirable property of a decomposition.
Suppose it is given that a set F of functional dependencies holds on any relation based on
schema R. Then set of functional dependencies that holds on any relation subschema R1
is F1 that contains all the functional dependencies of F which contains attributes of only
R1. So if decomposition of R is { R1, R2,…Rn } such that corresponding functional
dependencies which holds on them are { F1, F2,…Fn } then following should be true.
F+ = {F1 ∪ F2 ∪ … ∪ Fn}+.
Such a decomposition is called dependency preserving decomposition.
For example:
Consider the schema R = {A, B, C, D} such that following functional dependency holds
on it F = {A→B, A →BC, C →D}.
Now suppose the decomposition of this R is R 1= {A,B} and R2 = {B,C,D}, so the
functional dependencies which holds on R1 are F1= {A→B} (Note: F1 should contain all
the functional dependencies in F which have only attributes of R1) and those on R2 are F2
={C→D}. If we union F1 ∪ F2 is {A→B, C →D} which doesn’t contain the A →BC , so
it is not a dependency preserving decomposition.
If we decompose R into these relation schemas R1 ={A,B,C} and R2={C,D} then
F1={A→B, A →BC} and F2 ={C→D} so F1 ∪ F2 is {A→B, A →BC, C →D}.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture-21
Normalization Using Functional Dependency
Lossless-Join Decomposition using FD:
o Let R is relation schema and F is a set of functional dependency on R. Let R 1 and
R2 form a decomposition of R. This decomposition is lossless join decomposition
if at least one of the following functional dependency is in F+:
R1 ∩ R2 → R1
R1 ∩ R2 → R2
o Example: Lending-schema=(branch-name, branch-city, assets, customer-name,
loan-number, amount) the FD that holds on this schema are given as
branch-name → assets branch-city
loan-number → amount branch -name
so the decomposition of it into two schema as follows:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
is a lossless join decomposition because-
Branch-schema ∩ Loan -info-schema = branch-name
and we have an FD branch-name → assets branch -city, applying augmentation
rule to it, this FD is equivalent to branch-name → branch -name assets branch-
city i.e. branch-name →Branch-schema.
Third Normal Form Using FD:
o Let R is a relation having F as the minimal set of functional dependencies that
holds on R.
Then do the following:
1. Initially have an empty set of relations.
2. for each FD in F, α→β, i=1
Add a relation Ri =( α,β) if no other relation contains α, β, Increase
i by one
3. After adding all such relations add another relation Ri = ( any candidate
key of R) if no other relation is containing a candidate key.
Boyce-Codd Normal Form using FD:
1. Let Ri be relation i.e. not in BCNF
2. And, let α→β is the FD that holds on but α→Ri doesn’t hold on (i.e. α is not a
super key of Ri)
3. Replace relation Ri by two relations (α, β) and (Ri - β).
4. Now check again all the relations present with all the FD’s that holds on them and
Go back to step 1.
o Example:
Consider: Lending-schema=(branch-name, branch-city, assets, customer-
name, loan-number, amount) the FD that holds on this schema are given as
1. branch-name → assets branch -city
2. loan-number → amount branch-name
Lecture Notes For DBMS and Data Mining and Data Warehousing
Now all of the three relations are in BCNF so we do not have to
decompose any more.
BCNF may not satisfy the dependency preservation criteria.
o In some cases, a non-BCNF table cannot be decomposed into tables that satisfy
BCNF and preserve the dependencies that held in the original table
o For example, a set of functional dependencies {AB → C, C → B} cannot be
represented by a BCNF schema.
o Unlike the first three normal forms, BCNF is not always achievable.
o Consider the following non-BCNF table whose functional dependencies follow
the {AB → C, C → B} pattern:
Nearest Shop
Lecture Notes For DBMS and Data Mining and Data Warehousing
For each Person / Shop Type combination, the table tells us which shop of this
type is geographically nearest to the person's home. We assume for simplicity that
a single shop cannot be of more than one type.
The candidate keys of the table are:
{Person, Shop Type}
{Person, Nearest Shop}
Because all three attributes are prime attributes (i.e. belong to candidate keys), the
table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is
functionally dependent on a non-superkey: Nearest Shop.
The "Shop Near Person" table has a candidate key of {Person, Shop}, and the
"Shop" table has a candidate key of {Shop}. Unfortunately, although this design
adheres to BCNF, it is unacceptable on different grounds: it allows us to record
multiple shops of the same type against the same person. In other words, its
candidate keys do not guarantee that the functional dependency {Person, Shop
Type} → {Shop} will be respected.
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture 22
Multivalued Dependencies
Let R be a relation schema, and X and Y be disjoint subsets of R (i.e., X ⊆R, Y⊆ R,
X∩Y= ), and Z = R- XY.A relation r(R) satisfies X↠ Y if for any two tuples t1 and t2,
o t1(X)=t2(X), then there exist t3 in r such that
o t3(X)=t1(X), t3(Y)=t1(Y), t3(Z)=t2(Z).
o By symmetry, there exist t4 in r such that
o t4(X)=t1(X), t4(Y)=t2(Y), t4(Z)=t1(Z).
X Y Z
t1 x1 y1 z1
t2 x1 y2 z2
t3 x1 y1 z2
t4 x1 y2 z1
The MVD X↠ Y says that the relationship between X and Y is independent of the
relationship between X and R-Y
For example consider the table Employee:
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture Notes For DBMS and Data Mining and Data Warehousing
Lecture 23
The fifth normal form deals with join-dependencies which is a generalisation of the
MVD. The aim of fifth normal form is to have relations that cannot be decomposed
further. A relation in 5NF cannot be constructed from several smaller relations.
A relation R satisfies join dependency *(R1, R2, ..., Rn) if and only if R is equal to the join
of R1, R2, ..., Rn where each Ri is a subset of the set of attributes of R
A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies of
the form *(R1, R2, ..., Rn), where each Ri is a subset of the set of attributes of R and
R = R1⋃ R2⋃...⋃Rn, at least one of the following holds.
An example of 5NF can be provided by the example below that deals with departments,
subjects and students.
o The above relation says that Comp. Sc. offers subjects CP1000, CP2000 and
CP3000 which are taken by a variety of students. No student takes all the subjects
and no subject has all students enrolled in it and therefore all three fields are
needed to represent the information.
o The above relation does not show MVDs since the attributes subject and student
are not independent; they are related to each other and the pairings have
significant information in them. The relation can therefore not be decomposed in
two relations
(dept, subject), and (dept, student)
Lecture Notes For DBMS and Data Mining and Data Warehousing
Then it is not in 5th normal form as all of these relation schema doesn’t represent the
super keys so we should decompose it into three relations as given by the join
dependency i.e. we should have following three relation schemas in place of given Loan-
Info-Schema:
o (loan-number, branch-name),
o (loan-number, customer-name), and
o (loan-number, amount)