You are on page 1of 90

DBMS

Sem III- IT, Sem IV - Comp

Index
1. DBMS Concepts : 1–9

2. Relational Model: 10 – 18

3. E-R Model : 19- 33

4. SQL : 34 - 66

5. Transaction Management : 67 - 76

6. Concurrency Control : 77 - 85

7. Recovery System : 86 - 90

[ These notes are sufficient to for attempting most of exam questions but Not complete and
thorough. Please, refer to some standard books for other topics. Also, refer old DBMS exam
Papers for SemIII-IT , Sem IV Comp.(old) ]

Santosh Kabir.
Mobile : 98336 29398.
www.santoshkabirsir.com
D.B.M.S.

1. Database & DBMS Concepts


What is DBMS !
Database management system (DBMS) is a collection of interrelated data and a set
of programs to access the data. The collection of data is known as Database. The
database consists of information relevant to a particular enterprise, stored in a
particular format.
Basically DBMS is a software that is made up of set of sub-programs that are used
to store a large amount data for convenient and efficient access. Oracle 9i from
Oracle Corporation, Microsoft‟s SQL-server 2000/2005, Microsoft Access2000 are
some of the commonly used DBMS software. These are rather RDBMS ( Relational
DBMS software)
Any DBMS usually provides following minimum features,
Storing and retrieving large amount of data.
Easy and efficient way of accessing data.
Safety of the information from system crashes or attempts of unauthorized access.

e.g. Database can hold information about a students studying in a institute. In


this case the data will be stored for multiple students in the form of their
registration number, Name, Address, Contact number, Date of birth etc.

DBMS versus File systems :


( Advantages of DBMS over storing data in File system )
Consider a part of the Bank system that keeps information about all the
customers for and their savings accounts. A bank can have thousands of customers and
thousands of accounts of these customers. One way to keep the information in a
computer is to store it in some file operated by file system of operating system (
say Notepad utility in case of Windows ). To allow users to access the information
in these files, the computer must have number of application programs that will
perform different tasks such as : Create new account, debit or credit amount in a
given account, display balance in a specified account and a program to generate
monthly bank statements for customers. If any other type of operation is to be
performed on the account information, a new program must be added to the
existing system. If the information is to be processed in a different way then it
might be required that the new files must be created for that particular type of
information. Thus, as the time goes by, there will be more and more programs and
files required for handling the entire information.

Thus, storing information in operating system files have following major


disadvantages.

 Data redundancy and inconsistency:


Data may be duplicated in multiple files ( called data redundancy) taking extra
storage space. Information can be duplicated and cause inconsistency if not
altered properly. e.g. customer information can appear in multiple files. It can

Prepared by, Santosh Kabir. 2 1. DBMS Basics


Mobile : 98336 29398
D.B.M.S.

happen that the changed customer address may be reflected in one file but not in
the other file, causing data inconsistency.

 Difficulty in accessing data:


Getting particular data is difficult if stored in file system. e.g. information about
customers from particular city having particular type of account is required.
Since, there is no program ready for such data retrieval, either the data-processing
department has to make a list of such customers (from thousands of customers of
bank) and submit to the officer or a program must be developed for data retrieval.
Both the methods are obviously unsatisfactory and time consuming. The
requirements can be of any type e.g. an officer needs to get information of
customers with account balance above 10 lacks.
Thus, the file system environment will not allow retrieving data as and when
required.

 Data isolation :
Data is scattered over different files in different format. Writing new application
programs to access this data will be difficult.

 Data Integrity problem:


Constraints and rules can not be imposed easily. Data stored in the database
must follow certain types of Consistency constrains. e.g. in a bank customer
balance must not fall below Rs.1500. or the account types can be only Savings
and Current, other types of accounts should not allowed. To add and change such
constraints will be a difficult task for application programmers.

 Atomicity problem:
Transaction consisting of multiple operations should be successful or fail totally.
Because of system failure or data integrity problem, the transaction performed on
a database can fail. If the transaction fails the data should be restored to the
consistent state before failure. In a bank while transferring money, from account A
to B, the money was removed from A but not added to B, will lead into data
inconsistency. Thus, the money transfer must be atomic, i.e. it must happen in its
entirety or not at all.

 Concurrent access problems:


Data in the Database may be handled by multiple users at a time. This can also
lead in-consistent data. e.g. consider a bank account having balance Rs.5000. If
two users A and B withdraw amounts of say 1000 and 1500 respectively from the
account at the same time. The application program reads the balance Rs.5000/-
for both the users, and say A finishes the withdrawal, the balance will be updated
to 4000/- (i.e. 5000-1000) and when the customer B finishes withdrawal, the
account will be updated with the balance 3,500 ( i.e. 5000 – 1500 ). Handling such
cases with file system will become very difficult for application programmers.

Prepared by, Santosh Kabir. 3 1. DBMS Basics


Mobile : 98336 29398
D.B.M.S.

 Security problems:
The data stored in the database must be safe-guarded against the illegal access.
Also, the users that work with particular type of data should be restricted to work
other type of data. e.g. the bank employees working for savings accounts are not
allowed to work with data and records related to loans.
Maintaining such different levels of security for data stored in file system will be
almost impossible.

View of data :

Database is a collection of interrelated data and set of programs to handle the


data. The data is stored in computer using files, which hold the data in a very
complex manner. But the details of the data storage are hidden from database
users. The database systems‟ one of the major tasks is to provide users with an
abstract view of the data. The system hides certain details of how data is actually
stored and maintained.

 Data Abstraction:
The data is stored in database systems using complex data-structures so that the
data can be efficiently accessed and modified by common database users. The
database system users may not be computer professionals; hence the developers
of database system hide the complex details of data storage, so as to provide the
users with easy access to the data.

Physical level:
This is the lowest level of data abstraction and describes how data is actually
stored. This level describes the complex data-structures for data storage. Database
administrators sometimes handle database at this level to make database more
efficient and easy for access.

Logical level:
This level describes what data are stored in database and what relationship exists
among those data. The entire database is described in terms of a small number of
relatively simple structures. Database administrators, who must decide what data
to keep in the database, use logical level of abstraction.

View level:
This is the highest level of abstraction, which describes only part of the database.
All users don‟t need to know and view all the data. They want only part of the
data. This level simplifies the interaction of such users with the system.

 Instances and Schemas :


Databases are continuously used to view data, store new data, modify the present
data, delete data. The collection of information stored in a database at a particular
moment is called the Instance of the database.
Prepared by, Santosh Kabir. 4 1. DBMS Basics
Mobile : 98336 29398
D.B.M.S.
The overall design of the database is called the database Schema. In brief schema
describes what type of data can be stored in the database, what is a relationship
among them and what are the rules for the data storage. Physical schema
describes the database design at the physical level, while logical schemas describe
the database design at logical levels. Actual logical schema describes the database
design like database-tables, their structure and the constraints used for data
storage.

Database Languages :

Database systems use database languages to work with the databases. The
commonly used language for database handling is called Structured Query
Language (SQL).
Note: SQL is a universally accepted and used Database language. All the DBMS use SQL for
handling database operations.
Whereas, SQL-Server is a DBMS software, that is a product of Microsoft company. It is used for
creating and using database. There are other popular DBMS (rather RDBMS) like Oracle9i from
Oracle, Access-2000 from Microsoft etc. The DBMS comes in different versions like MS-SQL Server
2000, MS-SQL Server 2005 etc. The software uses SQL language for its database operations.

Part of the SQL language that is used to specify database schema is called as Data
Definition Language (DDL) and the part of the language that is used to work with
the existing database is called Data Manipulation Language (DML).

Data-Manipulation Language (DML) -


DML is a language that enables users to access or manipulate data as organized
by the appropriate data model. According to the operation performed on the
database (actually tables) the language has four different instructions namely:
 SELECT : for retrieval of information in the database.
 INSERT : Inserting new information in the database
 UPDATE : Deletion of information from database
 DELETE : Modification of information stored in the database.

Query is a statement (SQL instruction) requesting the information from database.


The portion of a DML that involves data retrieval is called a Query language.

Data Definition Language (DDL) –


The database schema is specified using certain definitions i.e. statements made-
up of a set of special instructions. This group of instructions is called as Data
definitions language.
Note: In a software development process, first database design is done (say, on paper) according to
the needs of the user (i.e. organization for which software is to be developed). Database design
consists of defining different tables, their column structure and rules that the tables (i.e. rather
DBMS) should follow for accessing and updating data. Using DDL we basically specify some
specifications about the database and tables in the database.

Prepared by, Santosh Kabir. 5 1. DBMS Basics


Mobile : 98336 29398
D.B.M.S.

The DDL consists of the instructions using which one can specify the storage
structure and access methods used by database system. These instructions define
the implementation details of the database schema which are usually hidden from
users. Using instructions of DDL certain constraints and data integrity rules are
specified while defining database schema.
[ The SQL is discussed in later chapter ]

Database Users and Administrators:


Basic use of the database system is to retrieve or store information in the
database. Different people using the databases can be categorized as Users and
DataBase Administrators (DBAs).

Database users:
According to the way users are expected to interact with the Databases, users are
categorized as follows,
Naive users: These are the users who work with the databases indirectly through
some application programs (like web applications, desktop applications) that are
previously written by application programmers. For example a bank-clerk updates
the bank account when a request from an account holder is given for depositing
money. Here the bank-clerk uses account information stored in the database
using some application program on his/her computer.
Thus, the user may not be a computer professional, also user may not know
anything about the database.
Application Programmers: Are the computer professionals who write the application
programs. These users work with the database schema already defined by the
database designers (usually DBA).
Application programmer uses different development tools so that the database
users (indirect) can work with data easily.
Sophisticated users: These are DBMS users who interact with the database
systems by writing queries (statements in some database language like SQL).
Specialized users: These are sophisticated users who write specialized database
applications that do not fit into traditional data-processing framework. These
applications store data in more complex form like graphics, audio data.

Database Administrator (DBA):


DBMS gives facility of keeping a centralized data so that all the users can share
the same data (from multiple computers in network or from Internet in case of
Web applications). Thus, there should be a centralized control over the data and
the programs that access those data. The person who has such central control
over the system is called a database administrator. The functions of DBA are
follows,
Schema definition: The DBA creates the original database schema (usually
according to the user requirements) by executing DDL statements. Here he also
defines the data storage structures (i.e. table structures) and defines methods and
rules for accessing data (called constraints).
Prepared by, Santosh Kabir. 6 1. DBMS Basics
Mobile : 98336 29398
D.B.M.S.
Schema and Physical-organization modifications: The DBA carries out changes to
the schema and physical organization to reflect the changing needs of the
organization. Also, if required, he alters the physical organization of the system to
improve performance.
Granting of authorization for data access: DBA‟s one of the job to grant different
types of authorization (access permissions), he controls which parts of the
database should be accessible to which type of users. e.g. the person handling
loan transactions in a bank may not be allowed to work with customers account
information or the salaries of the bank employees.
Routine maintenance: For routine maintenance of the database, DBA has to
perform following tasks:
Periodically backing up the database. The database backup is almost daily taken
into tape-drives or remote servers, to prevent data loss in case of disasters.
Ensuring there is an enough storage space (disk space in Computer) to handle the
database transactions and the new data added to the database.
Monitoring the jobs running on the database and ensuring that the performance is
not degraded by very heavy tasks submitted by some users.

Database Architecture :

Database system is partitioned into multiple parts (modules) that deal with
different responsibilities of the overall system. Broadly different functional
components of the system are divided into two parts : Storage manager and Query
Processor. Storage manager‟s main task is to handle the physical data present in
the Operating system files. The volume of data can be in Megabytes for small firms
and it can be in Terabytes for large organizations. Storage manager also keeps
some data in buffer so that the movement of data from actual storage place to
memory will be faster.
The query processor does a job of handling user requests in such a way that user
don‟t have to interact with the actual data. It translates the user queries (e.g. SQL
statements) into efficient sequence of operations at physical level.
 Storage Manager:
It is program module that provides interface between the queries submitted by
system ( and requests given by application programs) and the low level data stored
in Database. It interacts with the file manager. Various DML queries are
translated into low-level file instructions.
It includes following components:
Authorization and integrity manager : tests for the satisfaction of integrity constraints
and checks the authority of users to access the data.
Transaction manager : ensures that the database remains in a consistent state
despite system failures. Also, it ensures that the concurrent transactions work
together without conflicting.
File manager: This manages the allocation of disk storage space and the data
structures used to store data.

Prepared by, Santosh Kabir. 7 1. DBMS Basics


Mobile : 98336 29398
D.B.M.S.

Buffer Manager: Its job is fetch data from disk storage (actual database) into main
memory, and deciding what data must be held in main memory. It enables
database to handle data sizes that are much larger than main memory.
Storage manager implements several data structures to hold the physical data.
Data files: which store the database it self.
Data dictionary: Which stores metadata about the structure of the database i.e.
schema of the database.
Indices: Provides fast access to data items in database that hold particular value.
 The Query Processor :
This includes following main modules:
DDL interpreter: Interprets the DDL statements and records the definitions in data
dictionary.
DML Compiler: Translates DML statements (usually given in query language) into
an evaluation plan consisting of low level instructions that the query evaluation
engine understands. DML compiler also does a Query Optimization i.e. decides
and executes the most efficient evaluation plan from among the alternatives.
Query Evaluation engine: Executes low level instructions generated by the DML
compiler. Figure below shows the Database system structure.

Prepared by, Santosh Kabir. 8 1. DBMS Basics


Mobile : 98336 29398
D.B.M.S.

Prepared by, Santosh Kabir. 9 1. DBMS Basics


Mobile : 98336 29398
2. Relational Model
Structure of Relational Databases:

Relational databases consist of collection of tables each of which is assigned a


unique name. As we know the tables are defined with their Column names and
Columns-types. One can define constraints for the columns or tables, also. Tables
store data in the form of Rows (i.e. records)
A row in a table represents relation among the set of values. Since a table is a
collection of such relationship, there is a close correspondence between concept of
table and mathematical concept of relation. Hence, table is called a relation in this
model and the name Relational model.
Each table is defined by Columns (i.e. attributes). Attributes can hold particular
set of values called domain. Lets consider a Account table for a banking system as
discussed before. Let the attributes be account_no, branch_name and balance.
Account_no can take integer values i.e. its domain is integer values in particular
range, branch_name can be one of the branches of the bank. Thus, if D1, D2 and
D3 represent the domains for the three attributes for the Account, then the table
will be sub-set of the values given by,
D1 x D2 x D3
In general a table with n attributes will be a subset of
D1 x D2 x D3 x … Dn
Mathematically Relation is defined as subset of Cartesian product of a list of
domains.
We can use the terms relation and tuple in the place of table and row. A tuple
variable stands for a tuple (collection of values for a particular row in table). Thus,
tuple variable is a variable whose domain is set of all the tuples.
The table below shows an Account relation. (A table representing bank account
information for multiple accounts)

account_no branch_name balance


A-101 Andheri 10000
A-215 Churchgate 15000
A-102 Borivli 20000
A-250 Dadar 4500
A-115 Andheri 35000

In the above Account relation there are five tuples. Lets consider a first tuple,
given by a tuple variable, say t. We use the notation t[account_no] to denote value
of t on account_no attribute. Thus, we can say t[account_no] is A-101, also
t[branch_name] is Andheri and so on. Alternatively one can use notation t[1] to
refer to the first attribute (here account_no), t[2] for branch_name and so on.

Prepared by, Santosh Kabir. 10 2. Relational Model


Mobile : 98336 29398
Database Schema:
Database schema stands for the logical design of a database. Database instance is
a state (design and values stored) of the database at a particular instant in time.
Every table (i.e. relation) is defined with a particular type of a structure. (same way
we define data-type or a class in OOP languages). Relation schema corresponds to
the concept of this type-definition. Relation is analogous to a variable that holds
values (as defined by the type i.e. schema). A relation schema is always given some
name. e.g. a schema for an Account can be defined as follows,
Account-schema = (act_no, cust_name, b_name, balance )
Note : act_no is for bank-account number, cust_name for customer name,
b_name= name of the branch in which account exists.
For our further discussion we will use following schemas for the Banking database
system.
Customer-schema = ( cust_name, street, city )
Branch-schema = ( b_name, city, assets )
Loan-schema = ( loan_no, amount, b_name )
Further we will define two more relations which are used to define relationship
between some relations (tables).
Depositor-schema = ( act_no, cust_name )
Relates customer and account tables (i.e. relations)
And, Borrower-schema = ( cust_name, loan_no )
Relates Customer and Loan tables.

Relational Algebra:
Database user use some query language to work with data in the database. Query
languages consist of some predefined instructions used in a particular syntax.
According to their use query languages are categorized as either a Procedural
query language or Non-procedural query language. In a procedural language user
instructs the system to perform a sequence of operations on the database to
compute the desired result. In a non-procedural language user describes the
desired information without giving specific procedure for obtaining that
information.
Relational algebra is a procedural query language. It consists of set of operations that
take one or two relations as input (also called argument) and produce a new
relation as their result.

Fundamental operations:
These are Select, Project, Union, Set-difference, Cartesian product and Rename.
The select, project and rename operations are called as unary operations, because
they operate on a single relation.
Remember two points about the relations.
 Result of a relational algebraic expression is a relation.
 And since relations are sets, they don’t hold duplicate values.

Prepared by, Santosh Kabir. 11 2. Relational Model


Mobile : 98336 29398
In addition to the fundamental operations, there are several other operations
namely set intersection, natural join, division and assignment. These operations
can be defined in terms of the fundamental operations.

1. The Select operation:


The operation selects tuples that satisfies particular predicate (i.e. some
Condition). Greek letter Sigma (σ ) is used to denote selection operation. In the
expression the condition (i.e. predicate) is written as a subscript of to σ and the
argument relation is written in the parenthesis after σ.
General syntax for the select operation is as follows,
σ <selection_condition> (Relation)
< > is to indicate that any condition can appear at that place.
e.g. to select those tuples (i.e. records) from the loan relation (i.e. table) where
branch is “Andheri”, we write,
σ b_name= “Andheri” (loan)
The above expression is read as „select (or retrieve) the tuples from loan relation
where branch name is Andheri. As it is mentioned before, the result of the
expression will be a relation i.e. set of tuples that satisfy the given predicate. The
result will be as follows,

LoanNo Amount BName


L-101 4000 Andheri
L-104 6500 Andheri

We can use the Relational operators, like =, <, ≤ , > , ≥, ≠.


To get all the tuples from loan relation, where amount is greater than 4000, will be
written as follows,
σ amount > 4000 (loan)

LoanNo Amount BName


L-102 5000 Kothrud
L-104 6500 Andheri

The operands in the predicate can be numbers or strings (text data) or can be the
attributes of some relation.
The string values are enclosed in double-quotation symbols.
The numeric values are directly written as numbers.
We can also use operators like And( Л), Or(V) and Not(¬).

Thus, to find the tuples where branch name is Andheri and amount is greater
than 4000 we can write expression as follows,
σ amount > 4000 Л b_name= “Andheri” (loan)
To get the tuples where expenses is equal to credit limit from Cards relation,
σ expenses = cr_limit (Cards)

Prepared by, Santosh Kabir. 12 2. Relational Model


Mobile : 98336 29398
2. Project operation :
By default any select operation retrieves the specified tuples with all the
attributes. Thus, by default, the degree of the resultant relation is same as that
Project operation is used to select some of the attributes (columns) from the
relation. Greek letter ∏ is used to denote project operation, the required attribute
list is written as sub-script to the ∏ symbol.
To get loan number and loan amount from loan relation (which has three
attributes for Loan number, Branch name and Loan amount), we can write,
∏ loan_no, amount ( loan )
The result of the expression will be a following expression,

LoanNo Amount
L-101 4000
L-102 5000
L-103 1500
L-104 6500
L-105 3500
L-106 4000

Note that the attribute for branch name is not selected.


The attributes specified in the attribute list (of the expression) may not be in the
same order as they appear in the original relation. Also, the resultant relation will
report the attributes in an order specified in the expression.

We can combine multiple operations and get a required resultant relation, using a
relational algebraic expression.
e.g. to get the names of the customers who stay in a “Mumbai” we can write,
∏ cust_name ( σ city = “Mumbai” (customer) )
The select operation in the parentheses returns a relation which holds tuples with
city equal to Mumbai and with all the attributes. This resultant relation is used as
an input (argument) to the project operation and returns the final relation with
only cust_name attribute. The resultant of the expression is as follows,

CustName
Sanjay
Ajay
Rita
Mack
Dinesh

3. Union Operation :
This works on sets theory of mathematics. Union is a set of values obtained after
merger of two sets, removing duplicate values.
Let‟s consider a case of getting all the customers who have either account or a
loan or both in a bank. Now, to get this information we have to get the tuples from
depositor and borrower. Customer relation will not provide this information,
because all the customers may not have loan or account in a bank.
Prepared by, Santosh Kabir. 13 2. Relational Model
Mobile : 98336 29398
We can get customers having loan by following expression,
∏ cust_name ( borrower )
We can get customers having account by following expression,
∏ cust_name ( depositor )
To get the customers having either loan or account or both we need to take a
union of the relations resulting from above two expressions, as follows,
∏ cust_name ( borrower ) U ∏ cust_name ( depositor )
Since, the relations are set, duplicate values are eliminated. The resultant relation
will be as follows,

Dinesh
Mack
Neeta
Puja
Ravi
Rita
Sachin
Sanjay
Vijay

We must ensure that the union is taken between compatible relations. In the
above example both relations ( left and right side of union symbol) have same
number of attributes (i.e. only custName) and same type of attributes (i.e. strings).
Thus, we can say that if an expression r U s to be valid,
1. The relations r and s must have same arity i.e. they must have same
number of attributes.
2. The domains of the ith attribute of r and ith attribute of s must be the same
for all the i.

4. Set Difference operation :


The set-difference operation, denoted by -, allows us to find tuples in one relation
but are not in the other. The expression r – s will produce the tuples that are in r
but not in s.
e.g. we can find the customers of the bank, who have an account but not the loan
in the bank using following expression,
∏ cust_name (depositor) - ∏ cust_name (borrower )
As with the union operation, we must ensure that the set-difference is taken
between compatible (same as that for union) relations.
The result of the expression is as follows,
Dinesh
Neeta
Puja
Sachin
5. Cartesian-Product operation:
This is also called as Cross product and is denoted by x. The operation gives the
combination of the two relations, given as arguments.

Prepared by, Santosh Kabir. 14 2. Relational Model


Mobile : 98336 29398
When any two relations are combined and if they have attributes with same
names, then we need to specify each such attribute name by attaching the relation
name to the attribute. It is commonly done using dot operator,
i.e. relation.attribute
For example if, r = borrower x loan
Then the schema for r will be,
(borrower.custname, borrower.loanno, loan.loanno, loan.bname, loan.amount )
The resultant relation will hold a large number of tuples. If r1 has n1 tuples and
r2 has n2 tuples, then, the relation r = r1 x r2
Will have n1 x n2 number of tuples.
For the above expression the resultant relation will be with 36 tuples as follows,
CustName Borrower.LoanNo Loan.LoanNo Amount BName
Vijay L-101 L-101 4000 Andheri
Vijay L-101 L-102 5000 Kothrud
Vijay L-101 L-103 1500 Dadar
Vijay L-101 L-104 6500 Andheri
Vijay L-101 L-105 3500 Kothrud
Vijay L-101 L-106 4000 Safdarganj
Rita L-102 L-101 4000 Andheri
Rita L-102 L-102 5000 Kothrud
Rita L-102 L-103 1500 Dadar
Rita L-102 L-104 6500 Andheri
Rita L-102 L-105 3500 Kothrud
Rita L-102 L-106 4000 Safdarganj
Ravi L-103 L-101 4000 Andheri
Ravi L-103 L-102 5000 Kothrud
Ravi L-103 L-103 1500 Dadar
Ravi L-103 L-104 6500 Andheri
Ravi L-103 L-105 3500 Kothrud
Ravi L-103 L-106 4000 Safdarganj
Mack L-104 L-101 4000 Andheri
Mack L-104 L-102 5000 Kothrud
Mack L-104 L-103 1500 Dadar
Mack L-104 L-104 6500 Andheri
Mack L-104 L-105 3500 Kothrud
Mack L-104 L-106 4000 Safdarganj
Vijay L-105 L-101 4000 Andheri
Vijay L-105 L-102 5000 Kothrud
Vijay L-105 L-103 1500 Dadar
Vijay L-105 L-104 6500 Andheri
Vijay L-105 L-105 3500 Kothrud
Vijay L-105 L-106 4000 Safdarganj
Sanjay L-106 L-101 4000 Andheri
Sanjay L-106 L-102 5000 Kothrud
Sanjay L-106 L-103 1500 Dadar
Sanjay L-106 L-104 6500 Andheri
Sanjay L-106 L-105 3500 Kothrud
Sanjay L-106 L-106 4000 Safdarganj
It is clear from the above table that the union operation associates every tuple of
loan with every tuple of borrower.

Prepared by, Santosh Kabir. 15 2. Relational Model


Mobile : 98336 29398
e.g. find the customer names who have loan at „Andheri branch. Now for the
branch we need to access loan relation and for the customer names we need
information from borrower. If we write
σ b_name= “Andheri” ( borrower x loan )
Since the select operation works on the union of the two relations (which results
into lots of repeated information from both the relations). The resultant is as
follows,

CustName Borrower.LoanNo Loan.LoanNo Amount BName


Mack L-104 L-101 4000 Andheri
Mack L-104 L-104 6500 Andheri
Ravi L-103 L-101 4000 Andheri
Ravi L-103 L-104 6500 Andheri
Rita L-102 L-101 4000 Andheri
Rita L-102 L-104 6500 Andheri
Sanjay L-106 L-101 4000 Andheri
Sanjay L-106 L-104 6500 Andheri
Vijay L-101 L-101 4000 Andheri
Vijay L-101 L-104 6500 Andheri
Vijay L-105 L-101 4000 Andheri
Vijay L-105 L-104 6500 Andheri

We have to now select those tuples from resultant in which loan numbers from the
two original relations match.
σ borrower.loanno = loan.loanno ( σ b_name= “Andheri” ( borrower x loan ) )
To get only customer names, we do projection, as follows,
∏ cust_name ( σ borrower.loanno = loan.loanno
( σ b_name= “Andheri” ( borrower x loan ) ) )
The final result is as follows,

CustName
Vijay
Mack

6. Rename operation: 
Some-times it is required to reuse the relations, which result from a Relational
algebraic expressions. These relations are name-less, but we can assign them a
name using a rename operation. General use of rename is as shown below,
 name ( Expression )
e.g.  x (σ b_name= “Andheri” (Loan) )
Here, the result of expression in the braces will be a relation which will be given a
name x.

7. Set Intersection operation :


The result of this operation is a relation that includes all tuples that are in both
the relations given as arguments. It is not a totally new type of operation, but it‟s a
combination basic operations : union and set-difference.

Prepared by, Santosh Kabir. 16 2. Relational Model


Mobile : 98336 29398
Set intersection of relations R and S is denoted by, R ∩ S , and it is equivalent of
following operation:
R–(R–S)
e.g. find customers that have both loan and account:
∏ cust_name ( borrower ) ∩ ∏ cust_name ( depositor )

8. The Natural Join operation :


Cartesian product of multiple relations is used to retrieve particular type tuples
from those relations. A selection operation is commonly applied on the resultant
relation to get the required final tuples.
Natural join is a binary operation that allows us to combine certain selections and
a Cartesian product into one operation. This denoted by join symbol ( )
The natural join operation forms a Cartesian product of its two arguments,
performs a selection forcing equality on those attributes that appear in both
relation schemas and finally removes the duplicate attributes.
e.g. finding customer name and loan amount of the customers who have loan in
the bank.
Consider natural join on the two relations Borrower and Loan.
Borrower Loan
The natural join will consider the pairs of tuples that have same loan number. It
combines each such pair of tuples into a single tuple on the union of the two
schemas.
Now we can apply projection operation to get customer name and loan amount, as
follows,
∏ cust_name, amount (Borrower Loan )
e.g
1. Find the names of all branches with customers who have an account in the
bank and who live in city Mumbai.
∏ b_name (σ city= “Mumbai” ( Customer Account Depositor ) )

Natural join is associative i.e. the following three expressions are equivalent,
Customer Account Depositor
( Customer Account ) Depositor
Customer ( Account Depositor )

2. Find all the customers who have both a loan and an account in a bank.
∏ cust_name (Borrower Depositor )

9. Division Operation :
The operation is denoted by a symbol, and is used in special queries that include
the phrase „for all‟. e.g. we want to retrieve all the customers who have account at
all the branches in located in city Mumbai.
We can obtain all the branches in Mumbai by the expression,
R1 = ∏ Bname (σ city= “Mumbai” ( Branch ) )
Now, we will retrieve all the customers (with their branch name) who have account
in a bank,
R2 = ∏ cust_name, bname (Depositor Account)
Prepared by, Santosh Kabir. 17 2. Relational Model
Mobile : 98336 29398
Now want to retrieve all the customers who appear in R2 with every branch name
in the R1. This can be done by : R2  R1
∏ cust_name, bname (Depositor Account)
 ∏ Bname (σ city= “Mumbai” ( Branch ) )

10. Assignment Operation:


It is convenient at times to write a relational-algebra expression by assigning part
of it to a temporary relation variables. The assignment operation, denoted by
works like assignment operator in programming language.
The above expression can be written as,
R1  ∏ Bname (σ city= “Mumbai” ( Branch ) )
R2  ∏ cust_name, bname (Depositor Account)
Result R2  R1

----000----

Classes for Engg. students:


FE – SPA …Sem II

SE – DS (Comp) , DSAA ( IT ) … Sem III


OOPM (Java) ( Comp, IT) …Sem III

Web Programming. (IT ) … Sem IV


(JSP, C#, ASP.Net, PHP etc)

* With practicals *

By – Santosh Kabir sir.

Contact: 98336 29398

Andheri, Dadar, Thane

Prepared by, Santosh Kabir. 18 2. Relational Model


Mobile : 98336 29398
3. Entity-Relationship Model
The first step in database design is usually an Entity-Relationship data model
(i.e. ER-Model). The model perceives the real world as consisting of basic objects,
called entities and relationships among these objects. It allows database
designers to specify the basic schema of the database. ER model is a high level
conceptual data model. This model and its variations are used for the conceptual
design of database applications.

Basic Concepts: (Entity, Attributes and Relationship)


To understand these basic concepts we will consider an organization such as a
Bank. (this chapter assumes students are introduced to the basics of Database,
tables, records and a few SQL queries). The bank holds database of its
customers in the form of Account no (or customer ID), Name, Address, Age,
Balance amount (for account), Telephone, Loan taken (if any). This can be stored
in one database table (say Customers). And loan information can be stored in a
different table (say Loans) in the form of Loan ID, Customer name, Loan
Amount, type and the current balance loan.

An Entity is a „thing‟ or an „object‟ in the real world that is distinguishable from all
other objects. For example Customer (for a bank) is an entity. An entity has set
of properties and some values can be associated with some or all of these
properties. Values for some of these properties may uniquely identify an entity.
e.g. customer ID can uniquely identify an entity. Or a Loan can be identified by
the Loan Id property of the loan.
An entity is represented by a set of properties, called Attributes. e.g. the customer
entity can be represented by the attributes such Customer ID, name, Address
etc. as given above.
Entity Set is a set of entities of same type that share same properties i.e. same
attributes. Set of all the customers of a bank can be called as an entity set
customer.
The individual entities that constitute a set are said to be extension of the entity
set. Thus, individual customer of a bank is an extension of the Customers entity
set.
Each entity has Value for each of each attributes. e.g. a particular customer can
have Customer id= „1234‟, name= „Mr Rajesh Kumar‟ etc. The customer ID is
used to uniquely identify the customers., since there can be multiple customers
with a same name or same balance amount.
For each attribute there may be some set of permitted values. e.g. Customer
name can consist of alphabets. Customers id should be a number. These set of
permitted values is called Domain or Value Set.

Prepared by, Santosh Kabir. 19 3. E-R Model


Mobile : 98336 29398
Types of Attributes:

Simple and Composite attributes: Simple attributes can not be further divided (to
describe the entity in more detail) into multiple attributes. e.g. the attribute
balance amount for a customer cannot be further broken into subparts.
Composite attributes, on the other hand can be divided into subparts i.e. other
attributes. e.g. the customer address attribute for a customer can be divided into
attributes such as Apartment name, street, City and area code(pin codes in
India).

Single valued and Multi-valued attributes: Single valued attributes refer to only
one value for an entity. Example Customer balance amount will hold only one
value. It can not refer to multiple values for one customer entity. Where as
Customer telephone number can refer to multiple values because customer can
have zero, one or more telephone number. Thus, telephone number can be
multivalued attribute. Thus, multivalued entity can have 0 or more values for a
single entity.

Derived attributes: The value for this attribute may be derived from other related
attributes or entities. e.g. the value for Age attribute for a customer can be
derived from the attribute Date of birth.

An attribute can take null value, when a particular entity doesn‟t have value for
it. This, can happen because of one or more reasons like: The attribute is not
applicable for a particular entity or the value is currently not known.

Relationship Sets:
A relationship is an association among several entities. For example, say a
customer Rajesh Kumar has taken a loan from a bank and his loan id is L0012.
The loan entities are part of Loans entity set. We can define a relationship that
associates customer Rajesh Kumar with loan number L_0012. This relationship
specifies that the Rajesh Kumar is a customer of Bank with loan number
L_0012.
Relationship Set is a set of relationships of same type. Consider two entity sets
Customers and Loans we define the relationship set borrower to denote the
association between customers and bank loans.

cust_no cust_name branch City loanNo Amount


1234 Vijay Joshi Dadar Mumbai 1111 50000
1235 Ajay Sharma Banjara Chennai 1112 30000
1236 Rita Agarwal Dadar Mumbai 1113 35000
1237 Sanjay Kumar Worli Mumbai 1114 60000
1238 Simran Panessar Pimpree Pune 1115 10000

Prepared by, Santosh Kabir. 20 3. E-R Model


Mobile : 98336 29398
Association between the entity sets is referred to as Participation. That is, we say,
multiple entity sets E1, E2, E3, … participate in some Relationship Set R.
The Relationship Instance represents the association between the real world
entities that are modeled by the ER model. e.g. an individual customer entity say
Rajesh Kumar and the loan entity say L_0012 participate in the relationship
instance borrower.
Relationship may also have attributes called Descriptive attributes. Consider two
entity sets related to banking system: Customer and Account. The relationship
depositor (or operator) exists between Customer and Account. Customer accesses
his account for various reasons like deposit money, check balance amount in the
account. We can have an attribute access-date to the relationship depositor to
specify the most recent date on which customer accessed an account.

Most of the relationship sets are Binary relationship set- that is, one that involves
two entity sets. For example, the relationship set Borrower discussed above is a
binary relationship set since it involves two entity sets: Customer and Loan. In a
database systems there can exists relationship sets that involve more than two
entities.

Constraints :
ER Model for a particular enterprise may define certain Constraints (rules and
specifications to be followed by the actual data in the database) to which the
contents of the database must conform. Following sections explain the Mapping
cardinalities and Participation constraints, which are two most important types
of constraints.

 Mapping Cardinalities:
Also called as Cardinality ratios, express the maximum number of entities to which
another entity can be associated via a relationship set. We will discuss only
binary relationship sets.
For a binary relationship set R between entity sets A and B, the mapping
cardinality must be one of the following:

Prepared by, Santosh Kabir. 21 3. E-R Model


Mobile : 98336 29398
One to One : An entity in A is
associated with at most one entity in B,
A1 B1
and an entity in B is associated with at
most one entity in A A2 B2
A3 B3
A4 B4

One to Many : An entity in A is


associated with many (zero or more)
entities in B. An entity in B, however, A1 B1
can be associated with at most one A2 B2
entity in A B3
A3
B4
B5

Many to One : An entity in A is


associated with at most one entity in B.
An entity in B can be associated with A1 B1
any number (zero or more) of entities in A2 B2
A A3
B3
A4
A5

Many to Many : An entity in A is


associated with any number (zero or
more) of entities in B. And an entity in A1 B1
B can be associated with any number
(zero or more) of entities in A A2 B2
A3 B3
A4 B4

To understand the above four types, lets take an example of Customer and Loan
entities for banking system (discussed in previous topics). One customer can
have multiple loans forming many to one relationship. Also one loan can be
taken on multiple customers‟ name (say a business loan is taken by multiple
partners in the business, who are customers of the bank), thus forming many to

Prepared by, Santosh Kabir. 22 3. E-R Model


Mobile : 98336 29398
one relationship. But since both the relationships are possible we can say
Customer-Loan is relationship is many to many.

 Participation Constraints:
The participation of an entity set E in a relationship set R is said to be Total if
every entity in E participates in at least one relation in R. Every loan entity in a
loan entity set is related to customer i.e. participates in Borrower relationship
set (discussed in previous topics). Thus, participation of a loan entity set in
borrower relationship is total. Whereas, every customer of a bank may not have
a loan. Thus, every entity in a Customer entity set may not be related to Loan
through Borrower relationship set. Hence, participation of a customer in the
borrower relationship set is Partial.

 Keys :

All the entities within a given entity set are distinct, but there must be a way to
specify how entities within the entity set are distinguished. The values of the
attributes of entities must be such that they can uniquely identify the entity. In
other words no two entities in the entity set are allowed to have exactly the same
value for all attributes.
A key allows us to identify a set of attributes that make it possible to distinguish
entities from each other.

A superkey is a set of one or more attributes that, taken collectively allow us to


identify uniquely an entity in entity set. For example bill number in Bills entity
set is sufficient to distinguish one bill from another. Thus, Bill no is a super key.
Similarly, a combination Bill number and bill date is also a super key. But, the
combination bill date and customer name may not be a super key because there
can be multiple bills for same customer name for a bill date.
Now, it is clear that if K is a super key, any superset of K will be also a superkey.
Usually in a database system design it is expected that no subset of a superkey
should be a super key. Such, minimal superkeys are called as Candidate keys.
There can be several sets of attributes which can serve as candidate keys. e.g.
for a bank Customers set, the combination customer name and contact number
can be a candidate key. Also, the combination of customer name and Address
can be a candidate key for Customers entity set.
Primary Key : Usually to distinguish one entity from the other, database designer
decides an attribute that should take unique value e.g. Customer ID in
Customers entity set, or Bill number in Bills entity set. We use a term Primary
Key to denote the candidate key that is chosen by database designer as a
principal means of identifying entities within an entity set.
In a rare case Primary key is a combination of multiple attributes. Primary keys (in
any case) must be chosen in such a way that there attribute values changes very
rarely.

Prepared by, Santosh Kabir. 23 3. E-R Model


Mobile : 98336 29398
Entity-Relationship Diagram:

ER diagrams are used to represent the overall logical structure of a database


graphically. The diagram consists of following major components,

Component Represents
Rectangle Entity set
Ellipse Attribute
Diamond Relationship set
Line Link attributes to entity sets and entity sets to
relationship sets
Double Ellipse Multivalued Attribute
Dashed Ellipse Derived attribute
Double line Total participation of the entity set in the
relationship set
Double Rectangle Weak Entity sets

ER Diagram example:
Lets consider the banking system example discussed above. We will consider two
entity sets: Customers and Loans related through the binary relationship set
borrower. Lets consider the attributes customer-id, customer-name, customer-
street and customer-city. Attributes associated with Loan are loan-number,
amount. The attributes that are members of primary key are underlined.
Simple ER-diagram for the example is drawn below with basic components listed
above.

cust-name cust-street loan-no amount

cust-id cust-city

borrower
Customers Loans

The relationship borrower may be one-to-many, one-to-one, many-to-many or


many-to-one. To represent these different relation-ship we use either a directed
line ( ) or undirected line ( ) between the relationship and the entity set
under consideration.
e.g.
1. The directed line from relationship set borrower to entity set Loans,
specifies that the borrower is either one-to-one or many-to-one
Prepared by, Santosh Kabir. 24 3. E-R Model
Mobile : 98336 29398
relationship set from Customers to Loans. i.e. borrower can not be a
many-to-many or one-to-many relationship from customer to loan.

2. The undirected line from relationship set borrower to entity set Loans,
specifies that the borrower is either many-to-many or one-to-one
relationship set from Customers to Loans.

In the above ER-diagram the relationship set borrower is many-to-many.

The ER-diagram (A) below shows the relationship set borrower from Customers
to Loans is one-to-many. Line from borrower to Customers is directed.

Also, the ER-diagram (B), below, shows the relationship borrower is one-to-one.
Both the lines from borrower are directed.

cust-name cust-street loan-no amount

cust-id cust-city

borrower
Customers Loans

Fig.A

cust-name cust-street loan-no amount

cust-id cust-city

borrower
Customers Loans

Fig.B

Relationship set with attributes:


Consider a relationship set depositor between the entity sets Customers and
Accounts (for the banking system discussed in previous topics). In the banking
system customer accesses his account for different transactions like depositing
money, withdrawing money or getting monthly transaction updates. The
depositor relationship set can have an attribute say access-date i.e. the last date

Prepared by, Santosh Kabir. 25 3. E-R Model


Mobile : 98336 29398
on which customer accessed his account. This, attribute is also shown in ER-
diagram with link between the relationship set and the attribute. The following
figure shows ER-diagram for this case.

cust-name cust-street act-no balance

cust-id cust-city

depositor
Customers Accounts

Access-date

Entity set with different types of attributes:


The Customers entity set can be made up of attributes like cust-name, cust-id,
address, date-of-birth, phone and age. Cust-id is member of primary key and it
will be a simple number. Cust-name can be composite attribute, made up of
component attributes like first-name and last-name. Also address can be a
composite attributes made up of component attributes like apartment, street,
city, pin. The phone attribute can be a multivalued attribute. Also, the age a
derived attribute which is derived from the attribute date-of-birth.
Thus, the entity set Customers can be drawn as follows

cust- city pincode


First- street
name Last-
name

cust-name

address

cust-id

Customers
age

Phone-no Date-of-
birth

Prepared by, Santosh Kabir. 26 3. E-R Model


Mobile : 98336 29398
Double lines are used in an ER-diagram to indicate that the participation of an
entity set in the relationship is total. i.e. is each entity in the entity set occurs at
least once in that relationship. e.g. consider borrower relationship for Loans
entity set in the above banking system example. Each loan is related to one
customer through borrower relationship. It is drawn as follows, by drawing two
lines between Loans and borrower.

cust-name cust-street loan-no amount

cust-id cust-city

borrower
Customers Loans

ER-diagram with Ternary relationship:


All the relationship sets we considered so far were binary relationship sets. Lets
consider a case of three entity sets Employees, Branches and Jobs. One
employee is related to one branch and does a particular job. Jobs set consists of
attributes title and level. Branches set has attributes branch-name and city and
assets. Each entity in Employees entity set i.e. each employee is related to one
job and one branch. This, is a ternary relationship and it can be shown as
follows,
title level

Jobs

emp-name street branch- branch-


name city

emp-id city assets

works-on
Employees Branches

Prepared by, Santosh Kabir. 27 3. E-R Model


Mobile : 98336 29398
Weak Entity Sets:
An entity set may not have sufficient attributes to form a primary-key. Such an
entity set is termed as a Weak entity set. The entity set that has primary key is
termed as strong entity set.
Lets consider the entity set LoanPayment, which has three attributes payment-
number, payment-date, amount. Obviously, there can be multiple payments on
same date, also there can be multiple payments with same amount. Also, even
though the payment number can be a serial number one payment number can
correspond to multiple loan payments. Thus, the LoanPayment don‟t have
sufficient attributes to define a primary-key. For a weak entity set to be
meaningful it must be associated with another entity set, called an identifying or
owner entity set.
Each entity in a weak entity set must be associated with an identifying entity.
The relationship associating the weak entity set with the identifying entity set is
called as Identifying relationship. The identifying relationship is many to one from
weak entity-set to the identifying entity set.

Extended E-R features:

The basic ER model that we discussed before can be used to model most of the
database features, but some of the features can be better expressed using some
extensions added to basic ER-model.

1. Specialization:
In some cases an entity set might consists of sub-groups and some of the
attributes for these sub-groups may not be shared by all the entities in the entity
set. These subgroups can be further separated to multiple entity sets.
Consider an entity set Persons for a banking system. It will have attributes like
name, city, age. But there can be two types of persons:
Customers and Employees.
Each of these person types have above attributes (name, city, age). But they can
be described by some more attributes which may not be common to both
(Employees and Customers). For example Customers will have additional
attributes like customer-id, customer-type, where Employees can have attributes
like salary, job.
The process of designating sub groupings within any entity set is called
Specialization. Thus, specialization of persons allows us to distinguish among
Persons according to whether they are Employees or Customers.
Consider another example of Bank Accounts. Accounts is an entity set with
attributes account-no, account-name, balance. But a bank may have two types
accounts Savings Accounts and Current Account. Savings accounts are given
interest per month according to the balance. Current Account are not given
monthly interest. But, a current account may be given overdraft facility, reports
of monthly transactions, transferring money from one account to other etc.

Prepared by, Santosh Kabir. 28 3. E-R Model


Mobile : 98336 29398
Then, bank can separate the two type of accounts according to their attributes
and working.
In an ER-diagram, specialization is depicted by a triangle labeled ISA. The label
ISA sands for „is a‟. Following figure shows a Part of ER-Diagram in which
Customers set is shown as ISA Persons. Or we can also say Customers is a sub-
class of Persons (i.e. Persons is a super-class of Customers)

city
name

age

Persons
Cust-type

ISA
salary
Cust-id
job

Customers Employees

2. Generalization:
Suppose there are two entity sets in a banking database system: say
SavingsAccounts and CurrentAccounts. Except for a few attributes and very
small operational difference both the entity sets have a lot of common features.
Then a database designer can decide whether to group them into one entity set
say Accounts. This feature is called as Generalization, in which entity sets are
regrouped into one entity set. It is quite obvious that it is opposite to
Specialization.
Here the two entity sets SavingsAccounts and CurrentAccounts are called low
level entity sets and the entity set Accounts is called high level entity set.
In terms of ER-diagram, again this process can be shown with ISA label in a
triangle component. i.e. the same ER-diagram is used for Specialization and
Generalization is same. Thus, in terms of ER-diagrams we don‟t distinguish
between Specialization and Generalization.

3. Aggregation:
This feature of ER model is used to specify relationship between relationships.
Lets consider a relationship set works-on between Employees, Jobs and
Branches, as discussed in previous topics. Each employee is related to some Job
in some Branch. Suppose it is required to have manager to keep a record for
which employee worked at what branch. Lets assume an entity set Managers for
this purpose. One way to represent using ER diagram is to show Quaternary
relationship manages between Employees, Jobs, Branches and Managers as
shown below. (following figure doesn‟t include attributes for entity sets).

Prepared by, Santosh Kabir. 29 3. E-R Model


Mobile : 98336 29398
Jobs

works-on
Employees Branches

manages

Manager

The ER-diagram shown above for the case is having redundant information. It is
clear that the Employees, Jobs, Branches combination in manages is also in
works-on.
This situation is handled by using aggregation. Aggregation is an abstraction
through which relationships are treated as higher level entities. In this example
we will consider that the works-on as a higher level entity-set. We can create a
binary relationship between works-on and Managers to represent who manages
what task.

Jobs

works-on
Employees Branches

manages

Manager

Prepared by, Santosh Kabir. 30 3. E-R Model


Mobile : 98336 29398
Problems:
Q.1 For a following problem draw an ER Diagram and Convert it into Tables
using the rules.
Xyz Inc. is a software company, that has several employees working on
different types of projects, on different platform. Projects have different
schedules and may be in one of the several phases. Each project has
project leader and team members at different levels. (assume any other
information) May-03 16marks

Table Attribute Table Attribute

Employee Emp-ID Team TeamID


Name Leader
Salary Member
Addr
Phone
TeamID

Project Proj-name
St-date
Sub-date
Leader
TeamID
Level

ProjLevel Level
Status
TeamID
Leader
St-date
Sub-date

[ ER-Diagram on the next page ]

Q.2 Draw ER diagram for University database consisting of four entities :


student, department, class, faculty. Assume following information.
Student has unique Id. The student can enroll for multiple classes.
Faculty belongs to a department and faculty can teach multiple subjects.
Each class is taught by a faculty.
Every student will get a grade for the class he/she enrolls.
Dec-04 10marks

[ See next page for ER-diagram ]

Prepared by, Santosh Kabir. 31 3. E-R Model


Mobile : 98336 29398
Salary
Name Addr

Phone
EmpID

TeamID
Employee

Has
Leader

TeamID
D Member

Team

Works
on

Proj- St-
Name date
Sub-
date
Leader
Project
TeamI
D
Level

Has

Level Status

TeamID

Leader
ProjLevel
St-date
ER Diagram For
Software company Database Sub-date

Prepared by, Santosh Kabir. 32 3. E-R Model


Mobile : 98336 29398
Name Dept-head
Dept

St-Name Grande

Mobile : 98336 29398


Department Experience
Name

Prepared by, Santosh Kabir.


Name Subject
Subject

Dept

Student Faculty

33
======================
Enroll Teach
Class

Year Name

Capacity

3. E-R Model
E – R Diagram for university Database
DBMS

4. SQL These notes are for Reference & Practicals


SQL is a standard language for accessing and manipulating databases.

What is SQL?

 SQL stands for Structured Query Language


 SQL lets you access and manipulate databases
 SQL is an ANSI (American National Standards Institute) standard

What Can SQL do?

 SQL can execute queries against a database


 SQL can retrieve data from a database
 SQL can insert records in a database
 SQL can update records in a database
 SQL can delete records from a database
 SQL can create new databases
 SQL can create new tables in a database
 SQL can create stored procedures in a database
 SQL can create views in a database
 SQL can set permissions on tables, procedures, and views

Although SQL is an ANSI (American National Standards Institute) standard, there


are many different versions of the SQL language. However, to be compliant with the
ANSI standard, they all support at least the major commands (such as SELECT,
UPDATE, DELETE, INSERT, WHERE) in a similar manner.

Note: Most of the SQL database programs also have their own proprietary
extensions in addition to the SQL standard!

RDBMS :

RDBMS stands for Relational Database Management System.

RDBMS is the basis for SQL, and for all modern database systems like MS SQL
Server, IBM DB2, Oracle, MySQL, and Microsoft Access. The data in RDBMS is
stored in database objects called tables. A table is a collections of related data
entries and it consists of columns and rows.

SQL DML and DDL


SQL can be divided into two parts: The Data Manipulation Language (DML) and the
Data Definition Language (DDL).

Prepared by, Santosh Kabir. 34 SQL


Mobile: 98336 29398
DBMS

The query and update commands form the DML part of SQL:

 SELECT - extracts data from a database. Data in the database is not


modified.
 UPDATE - updates data in a database
 DELETE - deletes data from a database
 INSERT INTO - inserts new data into a database

The DDL part of SQL permits database tables to be created or deleted. It also define
indexes (keys), specify links between tables, and impose constraints between tables.
The most important DDL statements in SQL are:

 CREATE DATABASE - creates a new database


 ALTER DATABASE - modifies a database
 CREATE TABLE - creates a new table
 ALTER TABLE - modifies a table
 DROP TABLE - deletes a table
 CREATE INDEX - creates an index (search key)
 DROP INDEX - deletes an index

SELECT :
The statement retrieves data from database and the result is returned in the form of
query result. A general syntax of the statement is as follows,
SELECT [ALL/DISTINCT] [aggregate functions] Column_list
FROM table/s
WHERE search-condition
GROUP BY column-name/s
HAVING search-condition
ORDER BY column-name/s

The compulsory part of the SELECT statement is,


SELECT columns FROM Table
There are many more keywords and operators involved in SELECT statement; but
they will be discussed in later topics.
The statement can only retrieve information. The information retrieved can be
formatted to view results in a required pattern. But the data in the actual database
is never modified with SELECT statement.
The columns will be retrieved in the order they appear in the column_list in the
statement (and not as are present in the database table).
WHERE part of the statement allows us to retrieve particular records, which satisfy
certain condition. GROUP BY is used to group the information to be retrieved in a
particular manner or a particular values. HAVING clause tells the statement to
retrieve only certain groups produced by the GROUP BY clause. ORDER BY clause
sorts the query results based on the data in one or more columns. If ORDER BY is
not mentioned, by default query results are not sorted.

Prepared by, Santosh Kabir. 35 SQL


Mobile: 98336 29398
DBMS

The result of a query is expected in the form of table of the retrieved columns and
data. There can be no record received if the database table is empty or if the
WHERE clause search-condition is not satisfied by any of the records in the table. If
any records are retrieved the result can have one ore more columns.

Assume following database tables:

Table Account :
ACCOUNT_NO BRANCH_NAME BALANCE
--------------- --------------- ----------
A-101 Downtown 500
A-215 Mianus 700
A-102 Perryridge 400
A-305 Round Hill 350
A-201 Perryridge 900
A-222 Redwood 700
A-217 Brighton 750
A-333 Central 850
A-444 North Town 625

Table Customer:
CUSTOMER_NAME CUSTOMER_STREET CUSTOMER_CITY
--------------- ------------ ---------------
Jones Main Harrison
Smith Main Rye
Hayes Main Harrison
Curry North Rye
Lindsay Park Pittsfield
Turner Putnam Stamford
Williams Nassau Princeton
Adams Spring Pittsfield
Johnson Alma Palo Alto
Glenn Sand Hill Woodside
Brooks Senator Brooklyn
Green Walnut Stamford
Jackson University Salt Lake
Majeris First Rye
McBride Safety Rye

Simple examples:

1. Retrieving customer name and city from Customer table


Query: SELECT CUSTOMER_NAME , CUSTOMER_CITY FROM CUSTOMER;

Prepared by, Santosh Kabir. 36 SQL


Mobile: 98336 29398
DBMS

Result:
CUSTOMER_NAME CUSTOMER_CITY
--------------- ---------------
Jones Harrison
Smith Rye
Hayes Harrison
Curry Rye
Lindsay Pittsfield
Turner Stamford
Williams Princeton
Adams Pittsfield
Johnson Palo Alto
Glenn Woodside
Brooks Brooklyn
Green Stamford
Jackson Salt Lake
Majeris Rye
McBride Rye

2. Retrieving entire information (i.e. all columns) for all accounts.


Query: SELECT * FROM ACCOUNT;

Result :
ACCOUNT_NUMBER BRANCH_NAME BALANCE
--------------- --------------- ----------
A-101 Downtown 500
A-215 Mianus 700
A-102 Perryridge 400
A-305 Round Hill 350
A-201 Perryridge 900
A-222 Redwood 700
A-217 Brighton 750
A-333 Central 850
A-444 North Town 625

3. Retrieving calculated columns:


SQL query can include calculated columns whose values are calculated or
formatted using stored values. To retrieve calculated column SQL expression can be
used with the column selection list. The expression can involve one or more
database columns. Simple arithmetic operators, some SQL operators and some
built-in functions of SQL can be used to retrieve calculated columns.

a. Retrieve customer name and customer‟s street and city as one column address.
Query: SELECT CUSTOMER_NAME, CONCAT(CUSTOMER_STREET, CONCAT('-' ,
CUSTOMER_CITY)) ADDRESS FROM CUSTOMER;

Result:
Prepared by, Santosh Kabir. 37 SQL
Mobile: 98336 29398
DBMS

CUSTOMER_NAME ADDRESS
--------------- ----------------------------
Jones Main-Harrison
Smith Main-Rye
Hayes Main-Harrison
Curry North-Rye
Lindsay Park-Pittsfield
Turner Putnam-Stamford
Williams Nassau-Princeton
Adams Spring-Pittsfield
Johnson Alma-Palo Alto
Glenn Sand Hill-Woodside
Brooks Senator-Brooklyn
Green Walnut-Stamford
Jackson University-Salt Lake
Majeris First-Rye
McBride Safety-Rye

b. Display account number and 10% of the balance.


Query: SELECT ACCOUNT_NUMBER, (0.1*BALANCE) NEW_BALANCE FROM
ACCOUNT;
Result:
ACCOUNT_NUMBER NEW_BALANCE
--------------- -----------
A-101 50
A-215 70
A-102 40
A-305 35
A-201 90
A-222 70
A-217 75
A-333 85
A-444 62.5

4. Selecting non-duplicate values from a column.(DISTINCT keyword)


e.g. List the names of the cities in which the customers of the bank live.
The statement „select customer_city from Customer‟ will retrieve all the city names
from customer_city, where some city names will be repeated.
Query: SELECT DISTINCT(CUSTOMER_CITY) FROM CUSTOMER;
Result:
CUSTOMER_CITY
---------------
Brooklyn
Harrison
Palo Alto
Pittsfield
Princeton
Prepared by, Santosh Kabir. 38 SQL
Mobile: 98336 29398
DBMS

Rye
Salt Lake
Stamford
Woodside

Selecting Required Rows with WHERE Clause:


Commonly it is required to retrieve some of the rows from a database that are
required for specific purpose. Retrieving all the records is not always required or
convenient and efficient since there can be a lot of records present in the database.
1. Retrieve the customers information who stay in Stamford city.
Query : SELECT * FROM CUSTOMER WHERE CUSTOMER_CITY= 'Stamford';
Result:
CUSTOMER_NAME CUSTOMER_STR CUSTOMER_CITY
--------------- ------------ ---------------
Turner Putnam Stamford
Green Walnut Stamford

The search condition can consist of


Relational operators such as, = , < , <=, > , >=, <> (not equal to).
Logical operators such as, AND, OR, NOT
A special SQL operators such as, LIKE, BETWEEN, IN, ALL etc.

2. Retrieve the customers information who stay in Stamford city, on Walnut street.
Query: SELECT * FROM CUSTOMER WHERE CUSTOMER_CITY='Stamford' AND
CUSTOMER_STREET='Walnut';

The ORDER BY Keyword

The ORDER BY keyword is used to sort the result-set by a specified column. The
ORDER BY keyword sorts the records in ascending order by default. If you want to
sort the records in a descending order, you can use the DESC keyword.

SQL ORDER BY Syntax


SELECT column_name(s)
FROM table_name
ORDER BY column_name(s) ASC|DESC

The INSERT INTO Statement


The INSERT INTO statement is used to insert a new row in a table.

SQL INSERT INTO Syntax


It is possible to write the INSERT INTO statement in two forms.

The first form doesn't specify the column names where the data will be inserted,
only their values:

Prepared by, Santosh Kabir. 39 SQL


Mobile: 98336 29398
DBMS

INSERT INTO table_name


VALUES (value1, value2, value3,...)

The second form specifies both the column names and the values to be inserted:

INSERT INTO table_name (column1, column2, column3,...)


VALUES (value1, value2, value3,...)

SQL INSERT INTO Example

We have the following "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
Now we want to insert a new row in the "Persons" table.

We use the following SQL statement:

INSERT INTO Persons


VALUES (4,'Nilsen', 'Johan', 'Bakken 2', 'Stavanger' )

The "Persons" table will now look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Johan Bakken 2 Stavanger

Insert Data Only in Specified Columns

It is also possible to only add data in specific columns.

The following SQL statement will add a new row, but only add data in the "P_Id",
"LastName" and the "FirstName" columns:

INSERT INTO Persons (P_Id, LastName, FirstName)


VALUES (5, 'Tjessem', 'Jakob')

The "Persons" table will now look like this:

P_Id LastName FirstName Address City

Prepared by, Santosh Kabir. 40 SQL


Mobile: 98336 29398
DBMS

1 Hansen Ola Timoteivn 10 Sandnes


2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Johan Bakken 2 Stavanger
5 Tjessem Jakob

The LIKE Operator

The LIKE operator is used to search for a specified pattern in a column.

SQL LIKE Syntax


SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern

LIKE Operator Example

Now we want to select the persons living in a city that starts with "s" from the table
above. We use the following SELECT statement:

SELECT * FROM Persons WHERE City LIKE 's%'

Next, we want to select the persons living in a city that ends with an "s" from the
"Persons" table.

We use the following SELECT statement:

SELECT * FROM Persons WHERE City LIKE '%s'

Next, we want to select the persons living in a city that contains the pattern "tav"
from the "Persons" table.

We use the following SELECT statement:

SELECT * FROM Persons WHERE City LIKE '%tav%'

The result-set will look like this:

P_Id LastName FirstName Address City


3 Pettersen Kari Storgt 20 Stavanger

The IN Operator

The IN operator allows you to specify multiple values in a WHERE clause.

Prepared by, Santosh Kabir. 41 SQL


Mobile: 98336 29398
DBMS

SQL IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1,value2,...)

Now we want to select the persons with a last name equal to "Hansen" or
"Pettersen" from the table above.

We use the following SELECT statement:

SELECT * FROM Persons WHERE LastName IN ('Hansen','Pettersen')

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

SQL BETWEEN Operator


The BETWEEN operator is used in a WHERE clause to select a range of data
between two values.The BETWEEN operator selects a range of data between two
values. The values can be numbers, text, or dates.

SQL BETWEEN Syntax


SELECT column_name(s) FROM table_name
WHERE column_name BETWEEN value1 AND value2

BETWEEN Operator Example

Now we want to select the persons with a last name alphabetically between
"Hansen" and "Pettersen" from the table above.

We use the following SELECT statement:

SELECT * FROM Persons


WHERE LastName
BETWEEN 'Hansen' AND 'Pettersen'

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes

Note: The BETWEEN operator is treated differently in different databases.

Prepared by, Santosh Kabir. 42 SQL


Mobile: 98336 29398
DBMS

In some databases, persons with the LastName of "Hansen" or "Pettersen" will not
be listed, because the BETWEEN operator only selects fields that are between and
excluding the test values).

In other databases, persons with the LastName of "Hansen" or "Pettersen" will be


listed, because the BETWEEN operator selects fields that are between and including
the test values).

SQL Aggregate Functions

SQL aggregate functions return a single value, calculated from values in a column.

Useful aggregate functions:

 AVG() - Returns the average value


 COUNT() - Returns the number of rows
 FIRST() - Returns the first value
 LAST() - Returns the last value
 MAX() - Returns the largest value
 MIN() - Returns the smallest value
 SUM() - Returns the sum

SELECT AVG(OrderPrice) AS OrderAverage FROM Orders

SELECT Customer FROM Orders


WHERE OrderPrice>(SELECT AVG(OrderPrice) FROM Orders)

The COUNT() function returns the number of rows that matches a specified criteria.

SQL COUNT(column_name) Syntax

The COUNT(column_name) function returns the number of values (NULL values will
not be counted) of the specified column:

The COUNT(*) function returns the number of records in a table:

SELECT COUNT(*) FROM table_name

SQL COUNT(DISTINCT column_name) Syntax

The COUNT(DISTINCT column_name) function returns the number of distinct


values of the specified column:

SELECT COUNT(DISTINCT column_name) FROM table_name

Prepared by, Santosh Kabir. 43 SQL


Mobile: 98336 29398
DBMS

SQL Scalar functions

SQL scalar functions return a single value, based on the input value.

Useful scalar functions:

 UCASE() - Converts a field to upper case


 LCASE() - Converts a field to lower case
 MID() - Extract characters from a text field
 LEN() - Returns the length of a text field
 ROUND() - Rounds a numeric field to the number of decimals specified
 NOW() - Returns the current system date and time
 FORMAT() - Formats how a field is to be displayed

The GROUP BY Statement

The GROUP BY statement is used in conjunction with the aggregate functions to


group the result-set by one or more columns.

SQL GROUP BY Syntax

SELECT column_name, aggregate_function(column_name)


FROM table_name
WHERE column_name operator value
GROUP BY column_name
SQL GROUP BY Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find the total sum (total order) of each customer.

We will have to use the GROUP BY statement to group the customers.

We use the following SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders


GROUP BY Customer

The result-set will look like this:

Prepared by, Santosh Kabir. 44 SQL


Mobile: 98336 29398
DBMS

Customer SUM(OrderPrice)
Hansen 2000
Nilsen 1700
Jensen 2000

GROUP BY More Than One Column

We can also use the GROUP BY statement on more than one column, like this:
SELECT Customer,OrderDate,SUM(OrderPrice) FROM Orders
GROUP BY Customer,OrderDate

The HAVING Clause

The HAVING clause was added to SQL because the WHERE keyword could not be
used with aggregate functions.

SQL HAVING Syntax


SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value

SQL HAVING Example

Now we want to find if any of the customers have a total order of less than 2000.

We use the following SQL statement:

The result-set will look like this:


SELECT Customer,SUM(OrderPrice) FROM Orders
GROUP BY Customer
HAVING SUM(OrderPrice)<2000

Customer SUM(OrderPrice)
Nilsen 1700

SQL Joins
SQL joins are used to query data from two or more tables, based on a relationship
between certain columns in these tables.

Tables in a database are often related to each other with keys.

Prepared by, Santosh Kabir. 45 SQL


Mobile: 98336 29398
DBMS

A primary key is a column (or a combination of columns) with a unique value for
each row. Each primary key value must be unique within the table. The purpose is
to bind data together, across tables, without repeating all of the data in every table.

Note that the "P_Id" column is the primary key in the "Persons" table. This means
that no two rows can have the same P_Id. The P_Id distinguishes two persons even
if they have the same name.

Next, we have the "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 1
4 24562 1
5 34764 15

Note that the "O_Id" column is the primary key in the "Orders" table and that the
"P_Id" column refers to the persons in the "Persons" table without using their
names.

Notice that the relationship between the two tables above is the "P_Id" column.

Different SQL JOINs

Before we continue with examples, we will list the types of JOIN you can use, and
the differences between them.

 JOIN: Return rows when there is at least one match in both tables
 LEFT JOIN: Return all rows from the left table, even if there are no matches
in the right table
 RIGHT JOIN: Return all rows from the right table, even if there are no
matches in the left table
 FULL JOIN: Return rows when there is a match in one of the tables

SQL INNER JOIN Keyword

The INNER JOIN keyword return rows when there is at least one match in both
tables.

SQL INNER JOIN Syntax


SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name

PS: INNER JOIN is the same as JOIN.

Prepared by, Santosh Kabir. 46 SQL


Mobile: 98336 29398
DBMS

We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo


FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName

The result-set will look like this:

LastName FirstName OrderNo


Hansen Ola 22456
Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678

The INNER JOIN keyword return rows when there is at least one match in both
tables. If there are rows in "Persons" that do not have matches in "Orders", those
rows will NOT be listed.

SQL LEFT JOIN Keyword

The LEFT JOIN keyword returns all rows from the left table (table_name1), even if
there are no matches in the right table (table_name2).

SQL LEFT JOIN Syntax


SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name

PS: In some databases LEFT JOIN is called LEFT OUTER JOIN.

We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo


FROM Persons LEFT JOIN Orders ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName

The result-set will look like this:

LastName FirstName OrderNo


Hansen Ola 22456
Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678
Svendson Tove

Prepared by, Santosh Kabir. 47 SQL


Mobile: 98336 29398
DBMS

The LEFT JOIN keyword returns all the rows from the left table (Persons), even if
there are no matches in the right table (Orders).

SQL RIGHT JOIN Keyword

The RIGHT JOIN keyword Return all rows from the right table (table_name2), even if
there are no matches in the left table (table_name1).

SQL RIGHT JOIN Syntax


SELECT column_name(s) FROM table_name1
RIGHT JOIN table_name2 ON
table_name1.column_name=table_name2.column_name

PS: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.

We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo


FROM Persons
RIGHT JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName

The result-set will look like this:

LastName FirstName OrderNo


Hansen Ola 22456
Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678
Svendson Tove 34764

The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if
there are no matches in the left table (Persons).

SQL FULL JOIN Keyword

The FULL JOIN keyword return rows when there is a match in one of the tables.

SQL FULL JOIN Syntax


SELECT column_name(s)
FROM table_name1
FULL JOIN table_name2
ON table_name1.column_name=table_name2.column_name

We use the following SELECT statement:

Prepared by, Santosh Kabir. 48 SQL


Mobile: 98336 29398
DBMS

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo


FROM Persons
FULL JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName

The result-set will look like this:

LastName FirstName OrderNo


Hansen Ola 22456
Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678
Svendson Tove 34764

The FULL JOIN keyword returns all the rows from the left table (Persons), and all
the rows from the right table (Orders). If there are rows in "Persons" that do not
have matches in "Orders", or if there are rows in "Orders" that do not have matches
in "Persons", those rows will be listed as well.

SELECT INTO :

The SQL SELECT INTO statement can be used to create backup copies of tables.

The SQL SELECT INTO Statement

The SELECT INTO statement selects data from one table and inserts it into a
different table.

The SELECT INTO statement is most often used to create backup copies of tables.

SQL SELECT INTO Syntax

We can select all columns into the new table:

SELECT *
INTO new_table_name [IN externaldatabase]
FROM old_tablename

Or we can select only the columns we want into the new table:

SELECT column_name(s)
INTO new_table_name [IN externaldatabase]
FROM old_tablename

SQL SELECT INTO - With a WHERE Clause

Prepared by, Santosh Kabir. 49 SQL


Mobile: 98336 29398
DBMS

We can also add a WHERE clause.

The following SQL statement creates a "Persons_Backup" table with only the
persons who lives in the city "Sandnes":

SELECT LastName,Firstname INTO Persons_Backup FROM Persons


WHERE City='Sandnes'

SQL SELECT INTO - Joined Tables

Selecting data from more than one table is also possible.

The following example creates a "Persons_Order_Backup" table contains data from


the two tables "Persons" and "Orders":

SELECT Persons.LastName,Orders.OrderNo
INTO Persons_Order_Backup
FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id

The SQL UNION Operator

The UNION operator is used to combine the result-set of two or more SELECT
statements.

Notice that each SELECT statement within the UNION must have the same number
of columns. The columns must also have similar data types. Also, the columns in
each SELECT statement must be in the same order.

SQL UNION Syntax


SELECT column_name(s) FROM table_name1
UNION
SELECT column_name(s) FROM table_name2

Note: The UNION operator selects only distinct values by default. To allow duplicate
values, use UNION ALL.

SQL UNION ALL Syntax


SELECT column_name(s) FROM table_name1
UNION ALL
SELECT column_name(s) FROM table_name2

PS: The column names in the result-set of a UNION are always equal to the column
names in the first SELECT statement in the UNION.

Look at the following tables:

Prepared by, Santosh Kabir. 50 SQL


Mobile: 98336 29398
DBMS

"Employees_Norway":

E_ID E_Name
01 Hansen, Ola
02 Svendson, Tove
03 Svendson, Stephen
04 Pettersen, Kari

"Employees_USA":

E_ID E_Name
01 Turner, Sally
02 Kent, Clark
03 Svendson, Stephen
04 Scott, Stephen

Now we want to list all the different employees in Norway and USA.

We use the following SELECT statement:

SELECT E_Name FROM Employees_Norway


UNION
SELECT E_Name FROM Employees_USA

The result-set will look like this:

E_Name
Hansen, Ola
Svendson, Tove
Svendson, Stephen
Pettersen, Kari
Turner, Sally
Kent, Clark
Scott, Stephen

Note: This command cannot be used to list all employees in Norway and USA. In the
example above we have two employees with equal names, and only one of them will
be listed. The UNION command selects only distinct values.

The UPDATE Statement


The UPDATE statement is used to update records in a table.

The UPDATE statement is used to update existing records in a table.

Prepared by, Santosh Kabir. 51 SQL


Mobile: 98336 29398
DBMS

SQL UPDATE Syntax


UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value

Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause
specifies which record or records that should be updated. If you omit the WHERE
clause, all records will be updated!

UPDATE Persons SET Address='Nissestien 67', City='Sandnes'


WHERE LastName='Tjessem' AND FirstName='Jakob'

DDL Commands :
The CREATE TABLE Statement

The CREATE TABLE statement is used to create a table in a database.

SQL CREATE TABLE Syntax


CREATE TABLE table_name
(
column_name1 data_type,
column_name2 data_type,
column_name3 data_type,
....
)

The data type specifies what type of data the column can hold. For a complete
reference of all the data types available in MS Access, MySQL, and SQL Server, go
to our complete

The ALTER TABLE Statement

The ALTER TABLE statement is used to add, delete, or modify columns in an


existing table.

SQL ALTER TABLE Syntax

To add a column in a table, use the following syntax:

ALTER TABLE table_name ADD column_name datatype

To delete a column in a table, use the following syntax (notice that some database
systems don't allow deleting a column):

ALTER TABLE table_name DROP COLUMN column_name

Prepared by, Santosh Kabir. 52 SQL


Mobile: 98336 29398
DBMS

To change the data type of a column in a table, use the following syntax:

ALTER TABLE table_name ALTER COLUMN column_name datatype

SQL Constraints

Constraints are used to limit the type of data that can go into a table.

Constraints can be specified when a table is created (with the CREATE TABLE
statement) or after the table is created (with the ALTER TABLE statement).

We will focus on the following constraints:

 NOT NULL
 UNIQUE
 PRIMARY KEY
 FOREIGN KEY
 CHECK
 DEFAULT

SQL NOT NULL Constraint

The NOT NULL constraint enforces a column to NOT accept NULL values.

The NOT NULL constraint enforces a field to always contain a value. This means
that you cannot insert a new record, or update a record without adding a value to
this field.

SQL UNIQUE Constraint

The UNIQUE constraint uniquely identifies each record in a database table.


The UNIQUE and PRIMARY KEY constraints both provide a guarantee for
uniqueness for a column or set of columns.
A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.
Note that you can have many UNIQUE constraints per table, but only one PRIMARY
KEY constraint per table.

SQL Server / Oracle / MS Access:

CREATE TABLE Persons


(
P_Id int NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)

Prepared by, Santosh Kabir. 53 SQL


Mobile: 98336 29398
DBMS

SQL PRIMARY KEY Constraint

The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain unique values.
A primary key column cannot contain NULL values.
Each table should have a primary key, and each table can have only one primary
key.

SQL Server / Oracle / MS Access:

CREATE TABLE Persons


(
P_Id int NOT NULL PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY
constraint on multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

CREATE TABLE Persons


(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)
)

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName

SQL Server / Oracle / MS Access:

ALTER TABLE Persons


DROP CONSTRAINT pk_PersonID

Prepared by, Santosh Kabir. 54 SQL


Mobile: 98336 29398
DBMS

SQL FOREIGN KEY Constraint

A FOREIGN KEY in one table points to a PRIMARY KEY in another table.


Let's illustrate the foreign key with an example. Look at the following two tables:
The "Persons" table:
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 2
4 24562 1
Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the
"Persons" table.
The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons"
table.
The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy link
between tables.
The FOREIGN KEY constraint also prevents that invalid data is inserted into the
foreign key column, because it has to be one of the values contained in the table it
points to.

SQL CREATE TABLE Orders


(
O_Id int NOT NULL PRIMARY KEY,
OrderNo int NOT NULL,
P_Id int FOREIGN KEY REFERENCES Persons(P_Id)
)

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY
constraint on multiple columns, use the following SQL syntax:
MySQL / SQL Server / Oracle / MS Access:
CREATE TABLE Orders
(
O_Id int NOT NULL,
OrderNo int NOT NULL,
P_Id int,
PRIMARY KEY (O_Id),
CONSTRAINT fk_PerOrders FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)
)

Prepared by, Santosh Kabir. 55 SQL


Mobile: 98336 29398
DBMS

SQL CHECK Constraint

The CHECK constraint is used to limit the value range that can be placed in a
column.

If you define a CHECK constraint on a single column it allows only certain values
for this column.

If you define a CHECK constraint on a table it can limit the values in certain
columns based on values in other columns in the row.

e.g
1) P_Id int NOT NULL CHECK (P_Id>0)
2) CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes')

To create a CHECK constraint on the "P_Id" column when the table is already
created, use the following SQL:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD CHECK (P_Id>0)

To allow naming of a CHECK constraint, and for defining a CHECK constraint on


multiple columns, use the following SQL syntax:

MySQL / SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ADD CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes'

To DROP a CHECK Constraint

To drop a CHECK constraint, use the following SQL:

SQL Server / Oracle / MS Access:

ALTER TABLE Persons


DROP CONSTRAINT chk_Person
SQL DEFAULT Constraint

The DEFAULT constraint is used to insert a default value into a column.

The default value will be added to all new records, if no other value is specified.

e.g
1) City varchar(255) DEFAULT 'Mumbai'
Prepared by, Santosh Kabir. 56 SQL
Mobile: 98336 29398
DBMS

2) OrderDate date DEFAULT GETDATE()

SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ALTER COLUMN City SET DEFAULT 'SANDNES'

SQL Server / Oracle / MS Access:

ALTER TABLE Persons


ALTER COLUMN City DROP DEFAULT

The DROP TABLE Statement


The DROP TABLE statement is used to delete a table.
DROP TABLE table_name

Indexes :

The CREATE INDEX statement is used to create indexes in tables.

Indexes allow the database application to find data fast; without reading the whole
table. An index can be created in a table to find data more quickly and efficiently.

The users cannot see the indexes, they are just used to speed up searches/queries.

Note: Updating a table with indexes takes more time than updating a table without
(because the indexes also need an update). So you should only create indexes on
columns (and tables) that will be frequently searched against.

SQL CREATE INDEX Syntax

Creates an index on a table. Duplicate values are allowed:

CREATE INDEX index_name


ON table_name (column_name)

The DROP INDEX Statement

The DROP INDEX statement is used to delete an index in a table.

DROP INDEX Syntax for MS Access:


DROP INDEX index_name ON table_name

DROP INDEX Syntax for MS SQL Server:


DROP INDEX table_name.index_name

Prepared by, Santosh Kabir. 57 SQL


Mobile: 98336 29398
DBMS

SQL Views
A view is a virtual table. A view contains rows and columns, just like a real table.
The fields in a view are fields from one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present
the data as if the data were coming from one single table.

This chapter shows how to create, update, and delete a view.


SQL CREATE VIEW Statement (Defining a View)
In SQL, a view is a virtual table based on the result-set of an SQL statement.

SQL CREATE VIEW Syntax :


CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

Note: A view always shows up-to-date data! The database engine recreates the data,
using the view's SQL statement, every time a user queries a view.

SQL CREATE VIEW Examples ( Northwind database of SQL server)


The view "Current Product List" lists all active products (products that are not
discontinued) from the "Products" table. The view is created with the following SQL:
CREATE VIEW [Current Product List] AS SELECT ProductID,ProductName
FROM Products WHERE Discontinued=No

We can query the view above as follows:


SELECT * FROM [Current Product List]

The records (i.e. set of tuples) in a view is a result of evaluation of the query
expression that defines the view at that time. Thus if the view relation is computed
and stored, it may become outdated if the tables (relations) used to define the view
are modified. To avoid this, the views are not stored as result of the query, but the
definition of view it self is stored with database. Wherever the view name appears in
a query (or relational expression) it is replaced with the query expression. Thus
whenever we evaluate a query the view relation is recomputed.
Some databases allow the view relation (table) to be stored, but they make sure that
if the actual relations (tables) in the view definition change, the view is kept up to
date. Such views are called as materialized views.
Views can be defined using existing views, called as view expansion.

Another view in the Northwind sample database selects every product in the
"Products" table with a unit price higher than the average unit price:
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName,UnitPrice FROM Products
WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)

Prepared by, Santosh Kabir. 58 SQL


Mobile: 98336 29398
DBMS

We can query the view above as follows:


SELECT * FROM [Products Above Average Price]

Another view in the Northwind database calculates the total sale for each category
in 1997. Note that this view selects its data from another view called "Product Sales
for 1997":
CREATE VIEW [Category Sales For 1997] AS
SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales
FROM [Product Sales for 1997]
GROUP BY CategoryName

We can query the view above as follows:


SELECT * FROM [Category Sales For 1997]
We can also add a condition to the query. Now we want to see the total sale only for
the category "Beverages":
SELECT * FROM [Category Sales For 1997] WHERE CategoryName='Beverages'

SQL Updating a View


You can update a view by using the following syntax:
SQL CREATE OR REPLACE VIEW Syntax
CREATE OR REPLACE VIEW view_name AS SELECT column_name(s)
FROM table_name
WHERE condition
Now we want to add the "Category" column to the "Current Product List" view. We
will update the view with the following SQL:
CREATE VIEW [Current Product List] AS SELECT ProductID,ProductName,Category
FROM Products WHERE Discontinued=No

SQL Dropping a View


You can delete a view with the DROP VIEW command.
SQL DROP VIEW Syntax
DROP VIEW view_name

SQL NULL Values


NULL values represent missing unknown data. By default, a table column can hold
NULL values.
Here we will explain the IS NULL and IS NOT NULL operators.

If a column in a table is optional, we can insert a new record or update an existing


record without adding a value to this column. This means that the field will be
saved with a NULL value.
NULL values are treated differently from other values.
NULL is used as a placeholder for unknown or inapplicable values.

How can we test for NULL values?


It is not possible to test for NULL values with comparison operators, such as =or <>.

Prepared by, Santosh Kabir. 59 SQL


Mobile: 98336 29398
DBMS

We will have to use the IS NULL and IS NOT NULL operators instead.
SQL IS NULL
How do we select only the records with NULL values in the "Address" column?
We will have to use the IS NULL operator:
SELECT LastName,FirstName,Address FROM Persons
WHERE Address IS NULL

Example Tables
In the subsequent text, the following 3 example tables are used:
p Table (parts) s Table (suppliers) sp Table (suppliers & parts)
pno descr color sno name city sno pno qty
P1 Widget Blue S1 Pierre Paris S1 P1 NULL
P2 Widget Red S2 John London S2 P1 200
P3 Dongle Green S3 Mario Rome S3 P1 1000
S3 P2 200

Joining Tables
The FROM clause allows more than 1 table in its list, however simply listing more
than one table will very rarely produce the expected results. The rows from one
table must be correlated with the rows of the others. This correlation is known as
joining.
An example can best illustrate the rationale behind joins. The following query:
SELECT * FROM sp, p
Produces:
sno pno qty pno descr color
S1 P1 NULL P1 Widget Blue
S1 P1 NULL P2 Widget Red
S1 P1 NULL P3 Dongle Green
S2 P1 200 P1 Widget Blue
S2 P1 200 P2 Widget Red
S2 P1 200 P3 Dongle Green
S3 P1 1000 P1 Widget Blue
S3 P1 1000 P2 Widget Red
S3 P1 1000 P3 Dongle Green
S3 P2 200 P1 Widget Blue
S3 P2 200 P2 Widget Red
S3 P2 200 P3 Dongle Green

Each row in sp is arbitrarily combined with each row in p, giving 12 result rows (4
rows in sp X 3 rows in p.) This is known as a cartesian product.

A more usable query would correlate the rows from sp with rows from p, for
instance matching on the common column -- pno:

SELECT * FROM sp, p WHERE sp.pno = p.pno

This produces:
Prepared by, Santosh Kabir. 60 SQL
Mobile: 98336 29398
DBMS

sno pno qty pno descr color


S1 P1 NULL P1 Widget Blue
S2 P1 200 P1 Widget Blue
S3 P1 1000 P1 Widget Blue
S3 P2 200 P2 Widget Red

Rows for each part in p are combined with rows in sp for the same part by matching
on part number (pno). In this query, the WHERE Clause provides the join predicate,
matching pno from p with pno from sp.

The join in this example is known as an inner equi-join. equi meaning that the join
predicate uses = (equals) to match the join columns. Other types of joins use
different comparison operators. For example, a query might use a greater-than join.

The term inner means only rows that match are included. Rows in the first table
that have no matching rows in the second table are excluded and vice versa (in the
above join, the row in p with pno P3 is not included in the result.) An outer join
includes unmatched rows in the result. See Outer Join below.

More than 2 tables can participate in a join. This is basically just an extension of a
2 table join. 3 tables -- a, b, c, might be joined in various ways:

 a joins b which joins c


 a joins b and the join of a and b joins c
 a joins b and a joins c

Plus several other variations. With inner joins, this structure is not explicit. It is
implicit in the nature of the join predicates. With outer joins, it is explicit; see below.
This query performs a 3 table join:
SELECT name, qty, descr, color FROM s, sp, p WHERE s.sno = sp.sno
AND sp.pno = p.pno
It joins s to sp and sp to p, producing:
name qty descr color
Pierre NULL Widget Blue
John 200 Widget Blue
Mario 1000 Widget Blue
Mario 200 Widget Red
Note that the order of tables listed in the FROM clause should have no significance,
nor does the order of join predicates in the WHERE clause.

Outer Joins
An inner join excludes rows from either table that don't have a matching row in the
other table. An outer join provides the ability to include unmatched rows in the
query results. The outer join combines the unmatched row in one of the tables with
an artificial row for the other table. This artificial row has all columns set to null.

The outer join is specified in the FROM clause and has the following general format:

Prepared by, Santosh Kabir. 61 SQL


Mobile: 98336 29398
DBMS

table-1 { LEFT | RIGHT | FULL } OUTER JOIN table-2 ON predicate-1


predicate-1 is a join predicate for the outer join. It can only reference columns from
the joined tables. The LEFT, RIGHT or FULL specifiers give the type of join:

 LEFT -- only unmatched rows from the left side table (table-1) are retained
 RIGHT -- only unmatched rows from the right side table (table-2) are retained
 FULL -- unmatched rows from both tables (table-1 and table-2) are retained

Outer join example:


SELECT pno, descr, color, sno, qty
FROM p LEFT OUTER JOIN sp ON p.pno = sp.pno
pno descr color sno qty
P1 Widget Blue S1 NULL
P1 Widget Blue S2 200
P1 Widget Blue S3 1000
P2 Widget Red S3 200
P3 Dongle Green NULL NULL

Self Joins
A query can join a table to itself. Self joins have a number of real world uses. For
example, a self join can determine which parts have more than one supplier:
SELECT DISTINCT a.pno FROM sp a, sp b
WHERE a.pno = b.pno AND a.sno <> b.sno
pno
P1
As illustrated in the above example, self joins use correlation names to distinguish
columns in the select list and where predicate. In this case, the references to the
same table are renamed - a and b.

Self joins are often used in subqueries. See Subqueries below.

Subqueries
Subqueries are an identifying feature of SQL. It is called Structured Query Language
because a query can nest inside another query.

There are 3 basic types of subqueries in SQL:

 Predicate Subqueries -- extended logical constructs in the WHERE (and


HAVING) clause.
 Scalar Subqueries -- standalone queries that return a single value; they can
be used anywhere a scalar value is used.

Table Subqueries -- queries nested in the FROM clause.


All subqueries must be enclosed in parentheses.
Predicate Subqueries

Prepared by, Santosh Kabir. 62 SQL


Mobile: 98336 29398
DBMS

Predicate subqueries are used in the WHERE (and HAVING) clause. Each is a
special logical construct. Except for EXISTS, predicate subqueries must retrieve one
column (in their select list.)
IN Subquery
The IN Subquery tests whether a scalar value matches the single query column
value in any subquery result row. It has the following general format:
value-1 [NOT] IN (query-1)
Using NOT is equivalent to:
NOT value-1 IN (query-1)
For example, to list parts that have suppliers:
SELECT * FROM p WHERE pno IN (SELECT pno FROM sp)
pno descr color
P1 Widget Blue
P2 Widget Red

The Self Join example in the previous subsection can be expressed with an
IN Subquery:

SELECT DISTINCT pno FROM sp a


WHERE pno IN (SELECT pno FROM sp b WHERE a.sno <> b.sno)

pno
P1

Note that the subquery where clause references a column in the outer query
(a.sno). This is known as an outer reference. Subqueries with outer references
are sometimes known as correlated subqueries.

 Quantified Subqueries

A quantified subquery allows several types of tests and can use the full set of
comparison operators. It has the following general format:
value-1 {=|>|<|>=|<=|<>} {ANY|ALL|SOME} (query-1)
The comparison operator specifies how to compare value-1 to the single query
column value from each subquery result row. The ANY, ALL, SOME specifiers give
the type of match expected. ANY and SOME must match at least one row in the
subquery. ALL must match all rows in the subquery, or the subquery must be
empty (produce no rows).
For example, to list all parts that have suppliers:
SELECT * FROM p WHERE pno =ANY (SELECT pno FROM sp)
pno descr color
P1 Widget Blue
P2 Widget Red
A self join is used to list the supplier with the highest quantity of each part
(ignoring null quantities):
SELECT * FROM sp a WHERE qty >ALL (SELECT qty FROM sp b
WHERE a.pno = b.pno AND a.sno <> b.sno AND qty IS NOT NULL)

Prepared by, Santosh Kabir. 63 SQL


Mobile: 98336 29398
DBMS

sno pno qty


S3 P1 1000
S3 P2 200

 EXISTS Subqueries

The EXISTS Subquery tests whether a subquery retrieves at least one row,
that is, whether a qualifying row exists. It has the following general format

EXISTS(query-1)
Any valid EXISTS subquery must contain an outer reference. It must be a correlated
subquery.
Note: the select list in the EXISTS subquery is not actually used in evaluating the
EXISTS, so it can contain any valid select list (though * is normally used).
To list parts that have suppliers:
SELECT *
FROM p
WHERE EXISTS(SELECT * FROM sp WHERE p.pno = sp.pno)

pno descr color


P1 Widget Blue
P2 Widget Red

Scalar Subqueries
The Scalar Subquery can be used anywhere a value can be used. The subquery
must reference just one column in the select list. It must also retrieve no more than
one row.

When the subquery returns a single row, the value of the single select list column
becomes the value of the Scalar Subquery. When the subquery returns no rows, a
database null is used as the result of the subquery. Should the subquery retreive
more than one row, it is a run-time error and aborts query execution.

A Scalar Subquery can appear as a scalar value in the select list and where
predicate of an another query. The following query on the sp table uses a Scalar
Subquery in the select list to retrieve the supplier city associated with the supplier
number (sno column in sp):

SELECT pno, qty, (SELECT city FROM s WHERE s.sno = sp.sno)


FROM sp

pno qty city


P1 NULL Paris
P1 200 London
P1 1000 Rome
P2 200 Rome

Prepared by, Santosh Kabir. 64 SQL


Mobile: 98336 29398
DBMS

The next query on the sp table uses a Scalar Subquery in the where clause to
match parts on the color associated with the part number (pno column in sp):
SELECT *
FROM sp
WHERE 'Blue' = (SELECT color FROM p WHERE p.pno = sp.pno)
sno pno qty
S1 P1 NULL
S2 P1 200
S3 P1 1000
Note that both example queries use outer references. This is normal in Scalar
Subqueries. Often, Scalar Subqueries are Aggregate Queries.

SQL-Transaction Statements : ( Not required for SE comp/IT )


SQL-Transaction Statements control transactions in database access. This subset
of SQL is also called the Data Control Language for SQL (SQL DCL).

There are 2 SQL-Transaction Statements:

 COMMIT Statement -- commit (make persistent) all changes for the current
transaction
 ROLLBACK Statement -- roll back (rescind i.e. cancel) all changes for the
current transaction

Transaction Overview
A database transaction is a larger unit that frames multiple SQL statements. A
transaction ensures that the action of the framed statements is atomic with respect
to recovery.

COMMIT Statement
The COMMIT Statement terminates the current transaction and makes all changes
under the transaction persistent. It commits the changes to the database. The
COMMIT statement has the following general format:
COMMIT [WORK]
WORK is an optional keyword that does not change the semantics of COMMIT.

ROLLBACK Statement
The ROLLBACK Statement terminates the current transaction and rescinds all
changes made under the transaction. It rolls back the changes to the database. The
ROLLBACK statement has the following general format:
ROLLBACK [WORK]
WORK is an optional keyword that does not change the semantics of ROLLBACK.

Prepared by, Santosh Kabir. 65 SQL


Mobile: 98336 29398
DBMS

GRANT Statement
The GRANT Statement grants access privileges for database objects to other users.
It has the following general format:
GRANT privilege-list ON [TABLE] object-list TO user-list
privilege-list is either ALL PRIVILEGES or a comma-separated list of properties:
SELECT, INSERT, UPDATE, DELETE. object-list is a comma-separated list of table
and view names. user-list is either PUBLIC or a comma-separated list of user
names.

The optional specificier WITH GRANT OPTION may follow user-list in the GRANT
statement. WITH GRANT OPTION specifies that, in addition to access privileges, the
privilege to grant those privileges to other users is granted.

GRANT Statement Examples


GRANT SELECT ON s, sp TO PUBLIC

GRANT SELECT,INSERT,UPDATE(color) ON p TO art,nan

GRANT SELECT ON supplied_parts TO sam WITH GRANT OPTION

=====0000=====

Prepared by, Santosh Kabir. 66 SQL


Mobile: 98336 29398
5. Transaction Management
Transaction :
Transaction is a collection of database operations that form a single logical unit
of work. Thus, a transaction can consist of multiple database updates, deletes or
data retrievals. For a database user it can appear like one unit of work. Consider
bank database system in which there is an account table that holds multiple
accounts information like account number, name and balance amount. Transfer
of money from one account to another is a one unit of work for a database user,
but consists of two update operations on the database. i.e. subtracting amount
from source account and adding it to the destination account.
Transaction is usually initiated by a high level language like C++, Java, Visual
Basic or even SQL. It is delimited by commands such as Begin Transaction and
End transaction. A group of statements (operations) between these two
commands is treated as a transaction.
The transaction once started should execute fully or it should fail totally. i.e. if
transaction fails then all the database changes done by individual database
operations should be canceled and the database should be brought to its state
before transaction started.

ACID properties of transaction:


The transaction once started should execute fully or it should fail totally. i.e. if
transaction fails then all the database changes done by individual database
operations should be canceled and the database should be brought to its state
before transaction started. Also, there can be multiple transactions working
simultaneously in a Database system. The database system must provide
mechanisms to isolate one transaction from the other.

Thus to ensure data integrity, every database system must maintain following
four properties of the transactions.
1) Atomicity 2) Consistency, 3)Isolation and 4) Durability.
These properties are often called as ACID properties (acronym derived from first
letter of each of the four properties.
Let‟s consider a transaction in which money is transferred (say Rs.1000) from
account A to account B, having Rs.5000 and Rs.4000 balance respectively,
before the transfer is done. ( To complete the transaction, database performs two
update operations as mentioned above. )

Consistency: Before any transaction starts the data-items in the database are
assumed to be in a consistent i.e. some stable and meaning-full state.
After the transaction is complete the database must achieve a new consistent
state. If consistency requirement is not maintained then, money may be
deducted from one account but not added to the other or the other way.
Ensuring consistency for individual transaction is the responsibility of the
application programmer who codes the transaction.

Prepared by, Prof Santosh Kabir. 67 of 90 5.Transaction Management


Mobile: 98336 29398
Atomicity: Suppose during a transaction (consider above example) money is
deducted from A and before the amount is added to B, because of some reason
operation fails. The database will go into inconsistent state.
Even during the transaction the database can be temporarily in the inconsistent
state i.e. when during transfer, after deduction of amount from A and before
adding it to B, the database in momentarily in the inconsistent state. Such data
inconsistencies should not be visible to the database. To achieve consistency in
such cases, the database keeps track of the previous database update operations
and if transaction fails then these updates are canceled. The updates are not
finalized on database till the last operation completes successfully. That is the
requirement of atomicity: if atomicity is present, all the actions of a transaction
are visible to database or none are. Ensuring Atomicity is the responsibility of
the database itself, and it is handled by the Transaction-management
component of the database system.

Durability: Once the transaction is complete and database has indicated so, any
system failure afterwards should not result into any loss of data.
The durability property ensures that, once a transaction completes successfully,
all the updates it carried out on the database persist, even if there is a system
failure. Ensuring durability is responsibility of Recovery-management
component of database system.

Isolation : If there are multiple transactions working simultaneously their


operations may leave Database in some undesirable state, resulting in an
inconsistent state. The simplest way to avoid the problem is to execute the
transactions one after the other i.e. serially. But, to achieve better efficiency,
some of the transactions can be performed concurrently. The isolation property
of a database ensures that the concurrent execution of transaction results in a
system state i.e. equivalent to a state that could have been obtained had these
transactions executed one at a time in some order.

Transaction State:
Transaction consists of multiple database operations. These operations can be
database updates ( like combination of insert, delete or change) or simple data
retrievals. If all the operations in the transaction are completed successfully then
we say transaction completed successfully. But, a transaction can fail. Such
transaction is termed Aborted. If transaction can not finish successfully then any
changes made by the transaction must undone. Once the changes caused by an
aborted transaction are undone, we say that the transaction has is Rolled back.
A transaction that completes its execution successfully is said to be Committed.
Thus, the transaction can be in one of the following state.
Active: this is initial state. The transaction stays in this stage while it is
executing.
Partially Committed: When the last operation of the transaction is executed.
Failed: When transaction can not execute further normally.
Aborted: After the transaction has been rolled back and the database has been
restored to its state prior to the start of transaction.
Committed: After successful completion of transaction.
Following figure shows the state diagram of a transaction.
Prepared by, Prof Santosh Kabir. 68 of 90 5.Transaction Management
Mobile: 98336 29398
Partially
committed Committed

Active

Failed Aborted

From the state diagram it is seen that, once transaction starts working it can
complete all its operations successfully (partially committed state) or can fail. If
a transaction fails then it must be aborted and must be rolled back.
All the operations done by an active transaction are done in memory ( and not on
actual database) and hence the changes are done in the copy of a database in
memory. Once all the operations are completed successfully, all the changes are
updated in database. Here, the transaction is supposed to be committed. A
transaction is said to be Terminated if it is either committed or aborted.
Once a transaction is aborted and rolled back system has two options as follows,
a) A transaction can be restarted, only if the transaction was aborted because
of some hardware or software problem. A restarted transaction is treated
as a new transaction.
b) It can kill a transaction if there is logical error in the instructions forming
the transaction. In this case the transaction must be rewritten.

Implementing Atomicity and Durability (Shadow Copy technique) :


Ensuring the atomicity and durability of a database system is a job of Recovery-
management component of the database system. For this the component uses
different schemes; one of them is shadow copy scheme. Even though very simple
it is very inefficient scheme for handling atomicity and durability. The scheme is
based on making copies of databases on the storage disk called shadow copies.
The scheme assumes that only one transaction is working at a time and
database is simply a file on a disk that holds data. It maintains a pointer on disk
called db-pointer and it points to the current valid copy of the database on disk.
When a transaction, that wants to update the database starts, a new copy of a
complete database is created (in main memory of computer). The updates are
done in this new copy, leaving the original copy of the database untouched. Once
all the changes are made and transaction is complete it is committed as follows.
The entire new copy of the database (that can consists of multiple pages of
memory) is copied to the disk. After all the pages are written to the disk, the
database system updates the db-pointer to point to the new copy of database,
and new copy becomes the current copy of the database. The old copy of the
database is then deleted from disk. The transaction is said to have been
committed at the point where the updated db-pointer is written to the disk. This
achieves atomicity. Also, since the database is written to the disk file, the

Prepared by, Prof Santosh Kabir. 69 of 90 5.Transaction Management


Mobile: 98336 29398
updates done in the database will remain unchanged after the file is written to
disk (till new updates are explicitly done). This also achieves durability.

Handling incomplete transactions:


If transaction terminates before committing then one should get the original copy
of the database as it appeared before the transaction started. Suppose the
transaction fails because of the data integrity problem or problem in the
commands executed in the transaction. The transaction is to be rolled back. If
any problem occurs before the db-pointer is updated, then the old copy of the
database remains untouched and the new copy of database is deleted.
Consider that the system failure occurs before updating the db-pointer. Then,
when the system restarts, it will read the old db-pointer which will point to the
old copy of the database and none of the changes will be visible to the database.
Also, we assume that the writing the db-pointer itself is atomic.

The scheme is inefficient especially for the large databases, since executing a single
transaction requires copying the entire database. Also, the scheme doesn‟t allow
multiple transactions to execute concurrently.

Concurrent Execution of Transactions:


Transactions working concurrently create complications in maintaining the data
consistency. Ensuring data consistency in spite of concurrent executions of
transactions requires extra work. Executing transactions serially i.e. one
transaction starting after another is terminated is not very efficient.
Concurrent executions of transactions has following two features:
Improves throughput and resource utilization: A transaction consists of lot of steps,
some of which may involve disk input/output ( disk I/O) i.e. reading writing data
to disk, and the other CPU activity i.e. processing data. CPU and disk of a
computer can operate in parallel i.e. simultaneously. This feature of a computer
system can be used to execute parts of multiple transactions concurrently. Thus,
multiple transactions will work together and give results faster. Also, the disk
and CPU i.e. resources are utilized more efficiently.
Reduces waiting time: There may be different types of transactions running on a
system, some may be very long and some very short. If a long transaction is
working the short transaction has to wait till the previous long transaction
finishes. This will increase the waiting time for transactions resulting into long
delays in results. If the transactions are working on different parts of database
then it is better to run them concurrently. This will reduce the average response
time.
Concurrent execution of multiple transactions is achieved by a database system
through variety of mechanisms called Concurrency-control schemes.

Again consider a banking system having an Account table as discussed at the


beginning of the chapter. Suppose there are two transactions T1 and T2, which
transfer money from one account to the other. T1 transfers Rs.1000 from
account A to B and the T2 transfers 10% of balance amount of A to B.
T1 can be defined as follows,

Prepared by, Prof Santosh Kabir. 70 of 90 5.Transaction Management


Mobile: 98336 29398
T1: Read(A)
A = A – 1000
Write A
Read(B)
B = B + 1000
Write(B)
Transaction T2 can be defined as follows,
T2: Read(A)
Temp = A * 0.1
A = A – temp
Write A
Read(B)
B = B + temp
Write(B)

The total amount in the two accounts is Rs.9,000 before transactions starts.
After both are executed one after the other, the final amounts in the two
accounts will be, Rs.3600 and Rs.5400 in A and B respectively. The total of the
amounts will be again Rs.9,000.
The two transactions can be written as follows,

T1 T2

Read(A)
A = A – 1000
Write A
Read(B)
B = B + 1000
Write(B)
Read(A)
Temp = A * 0.1
A = A – temp
Write A
Read(B)
B = B + temp
Write(B)

Schedule 1

The execution sequences described above are called as schedules. They,


represent the chronological order in which instructions are executed in the
system. The two transactions are executed one after the other. Thus, the
schedules are Serial. A serial schedule consists of sequence of instructions from
various transactions, where the instructions belonging to one single transaction
appear together in that schedule.
We can define one more serial schedule for the above transactions in which T2
precedes T1.

Prepared by, Prof Santosh Kabir. 71 of 90 5.Transaction Management


Mobile: 98336 29398
If both transactions T1 and T2 are executed concurrently then one possible
schedule will be as follows,

T1 T2

Read(A)
A = A – 1000
Write A
Read(A)
Temp = A * 0.1
A = A – temp
Write A
Read(B)
B = B + 1000
Write(B)

Read(B)
B = B + temp
Write(B)
Schedule 2

Here, after the two transactions are complete, the account A will hold Rs.3600
and B will hold Rs. 5400, i.e. total is Rs.9000, thus maintaining the consistency.

Transactions may be performing any kind of updates and they can be internally
very complicated programming instructions. Thus, instead considering the
details of the transactions we consider only read and write operations. There can
many complicated instructions between data read, update and write. Thus, if
read/write is performed on data item Q, we represent the operations in the
transactions just by Read(Q) and Write(Q).

T1 T2

Read(A)
Write(A)
Read(A)
Write(A)
Read(B)
Write(B)

Read(B)
Write(B)

Schedule 3

Prepared by, Prof Santosh Kabir. 72 of 90 5.Transaction Management


Mobile: 98336 29398
Serializability:
One serial schedule can be represented using different sequence of transactions
by keeping them in serial order. Or multiple transactions can be converted into
concurrent transactions by interleaving the instructions of those transactions in
such way that the database remains in a required consistent state after the
transactions are completed.
Such schedules are called equivalent schedules.
According to the way different schedules can be formed for the same
transactions, there are two concepts: Conflict Serializability and View
Serializability.

Conflict Serializability:
Each transactions can consists of several operations on same or different data
items in a database. For this there are multiple instructions executed (Mostly
SQL statements). Lets consider that there are multiple transactions working
concurrently for which we will consider a part of a schedule say S, in which
there are two transactions say Ti and Tj. Also, consider that there are two
consecutive instructions Ii and Ij, of transactions Ti and Tj respectively. If the
instructions (Ii and Ij) are working on two different data items then we can swap
Ii and Ij, without affecting the result of any instruction in the schedule.
However, if the instructions are working with the same data item Q then the
order of the two instructions will matter. According to the type of operations
performed (read/write) there are four cases to be considered,
1. If Ii and Ij both are Read(Q) : The order of Ii and Ij does not matter, since
same value of Q is read by Ti and Tj, regardless of the order of the
instructions.
2. If Ii is Read(Q) and Ij is Write(Q) : If Ii comes before Ij then Ii reads the old
value of Q i.e. the value that is not written (or updated) by Ij. But, if Ii
works after Ij then Ii reads a value of Q that is written (or updated) by Ij.
Thus, the order of Ii and Ij matters.
3. If Ii is Write(Q) and Ij is Read(Q): This is same as previous case and order
of the instructions matters.
4. Ii an Ij both are Write(Q) : Since both are write operations the order of the
instructions does not affect either Ti or Tj. However, the value read by the
next Read(Q) in the schedule S is affected, since it will read the value
written (or updated) by one of the instructions (Ii or Ij), whichever worked
last.
Thus, it is clear that only in the case when both the instructions are Read
instructions the, order of the instructions can be changed.
Thus, we say that Ii and Ij conflict if they are operations by different transactions
on the same data item and at least one of them is Write instruction.
Consider the schedule 3 discussed previously, ( draw in your answer ). The
Write(A) instruction of T1, conflicts with the Read(A) of T2. However, the Write(A)
of T2 is not conflicting with Read(B) of T1, since they are working on two
different data item.
Swapping the non-conflicting instructions of the two transactions in schedule S,
we can produce a new schedule say S’.
By performing following swaps on S3 (one at time ), we can produce a new
schedule S4 as shown below,
Prepared by, Prof Santosh Kabir. 73 of 90 5.Transaction Management
Mobile: 98336 29398
First swap Write(A) and Read(B) of T2 and T1 respectively.
Swap Read(B) and Read(A) of T1 and T2 respectively.
Swap Write(B) and Write(A) of T1 and T2 respectively.
Finally swap Write(B) and Read(A) of T and T2 respectively.

T1 T2

Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)

Schedule 4
Thus, Schedule S3 is equivalent to a serial schedule S4.
If a schedule S can be transformed into a schedule S ’ , by series of swaps of
non-conflicting instructions, we say that S and S’ are Conflict equivalent.
We say that a schedule S is Conflict serializable if it is conflict equivalent to a serial
schedule S’. Thus, we say that schedule S3 is conflict serializable.

View Serializability :
Consider two schedules S and S‟, where the same set of transactions participates
in both schedules. The schedules S and S‟ are said to be View equivalent if three
conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in
schedule S, then transaction Ti must, in schedule S‟, also read the initial
value of Q.
2. For each data item Q, if transaction Ti executes read(Q) in schedule S,
and if that value was produce by a write (Q) operation executed by
transaction Tj, then the read (Q) operation of transaction Ti must, in
schedule S‟, also read the value of Q that was produced by the same write
(Q) operation of transaction Tj.
3. For each data item Q, the transaction (if any) that performs the final write
(Q) operation in schedule S must perform the final write(Q) operation in
schedule S‟.

Conditions 1and 2 ensure that each transaction reads the same values in
both schedules and, therefore, performs the same computation. Condition 3,
coupled with conditions1 and 2, ensures that both schedules result in same
final system state.
Consider following two schedules S5 and S6. In S5, T2 starts after T1 is finished.
And in S6, the transaction T2 completes first before T1.

Prepared by, Prof Santosh Kabir. 74 of 90 5.Transaction Management


Mobile: 98336 29398
T1 T2 T1 T2

Read(A) Read(A)
Write A Write A
Read(B) Read(B)
Write(B) Write(B)
Read(A)
Write A Read(A)
Read(B) Write A
Write(B) Read(B)
Write(B)
Schedule 5 Schedule 6

The two are not View equivalent, since in schedule S5 the value of A read by T2
was produced by T1, where as this case does not hold in S6.

The schedule S7, shown below is view serializable,


T1 T2 T3
Read(Q)
Write(Q)
Write(Q)
Write(Q)

Schedule - 7

Above schedule is equivalent to a serial Schedule S8 i.e. <T1, T2, T3>

T1 T2 T3
Read(Q)
Write(Q)
Write(Q)
Write(Q)

Schedule - 8

Since same instruction Read(Q) instruction reads initial value of Q in both the
schedule and final write is done by T3 in both the schedules.
Observe that, in above schedules, transactions T2 and T3 perform write (Q)
operations without having performed a read (Q) operation. Writes of this sort are
called Blind writes. Blind writes appear in any view-serializable schedule that is
not conflict serializable.

Recoverability of Schedules:
If a transaction Ti fails, for whatever reason, we need to undo the changes done
by the transaction. In a system that allows concurrent executions, it is also
necessary to ensure that any transaction Tj that is dependent on Ti is also
aborted if Ti fails.

Prepared by, Prof Santosh Kabir. 75 of 90 5.Transaction Management


Mobile: 98336 29398
Recoverable schedules:
Consider a schedule 9, as shown below,
T1 T2
Read (A)
Write (A)

Read (A)
Read (B)

Schedule 9

Here, suppose T2 commits immediately after Read(A). But, T1 which has to yet
finish Reab(B) fails, then the T1 is aborted and all the updates done by T1 are to
be undone. But, T2 has read the data item A that is written by T1, we have to
abort T2 also to ensure atomicity. But, since T2 has already committed, it can
not be rolled back.
Schedule 9, with the commit happening immediately after the read (A)
instruction is an example of a nonrecoverable schedule, which should not be
allowed. Most database system require that all schedule be recoverable. A
recoverable schedule is one where, for each pair of transactions Ti and Tj such
that Tj reads a data item previously written by Ti, the commit operation of Ti
appears before the commit operation of Tj.

Cascade less Schedules :


Even if a schedule is recoverable, to correctly form the failure of a transaction Ti,
we may have to roll back several transactions. Such situations occur if
transactions have read data written by Ti. As an illustration, consider the partial
schedule
T10 T11 T12
Read (A)
Read (B)
Write (A)

Read (A)
Write (A)
Read (A)
Schedule 10
Transaction T10 writes a value of A that is read by transaction T11. Transaction
T11 writes a value of A read by transaction T12. Suppose that, at this point T10
fails. T10 must be rolled back. Since T11 is dependent on T10, T11 must be rolled
back. Since T12 is dependent on T11, T12 must be rolled back. This phenomenon,
in which a single transaction failure leads to a series of transaction rollbacks, is
called cascading rollback.
Cascading rollback is undesirable, since it leads to the undoing of a significant
amount of work. It is desirable to restrict the schedules to those where cascading
rollbacks cannot occur. Such schedules are called cascadeless schedules.
Formally, a cascadeless schedule is one where, for each pair of transaction Ti
and Tj such that Tj reads data item previously written by Ti, the commit
operation of Ti appears before the read operation of Tj. It is easy to verify that
every cascadeless schedule is also recoverable.
-----000-----
Prepared by, Prof Santosh Kabir. 76 of 90 5.Transaction Management
Mobile: 98336 29398
DBMS

6. Concurrency Control
One of the important properties of transaction is isolation. When multiple
transactions execute concurrently in the database, the isolation property may no
longer be preserved. To ensure the isolation of transactions from each other,
there are various mechanisms used called as concurrency control schemes.
[ Here we consider all the schedules are serializable ]

Locked Based Protocols:


This is based on the concept that when one data item is accessed by one transaction,
no other transaction can modify that data item. This is achieved by allowing a
transaction to access only that data item on which the transaction currently
holds a lock. Locks can be of two types: Shared and Exclusive.
Shared lock: if a transaction Ti has obtained a Shared mode lock on item Q then
Ti can only read Q, but can not write Q.
Exclusive lock: if a transaction Ti has obtained an Exclusive mode lock on item
Q then Ti can both read and write data item Q.
The transaction should obtain a lock on a data item, depending upon the type of
operations it wants to perform on the data item. Shared mode lock is compatible
to Shared lock, but not with exclusive lock. Exclusive lock is not compatible with
both the locks. At a time several transactions can hold a shared mode lock on a
single data item.
Transaction can proceed with any operation on a data item only after the
concurrency control manager grants the lock to the transaction.
Transaction requests a shared lock on data item Q by executing the instruction
Lock-S(Q). Similarly transaction requests an exclusive lock on data item Q by
executing the instruction Lock-X(Q).
Transaction can unlock a data item by executing Ulock(Q) instruction.
A transaction must hold a lock on a data item as long as it accesses that item.
If a transaction Ti requests a lock on Q and it is already locked by another
transaction of incompatible type, then Ti has to wait till all the incompatible
locks held by other transactions have been released.
Now consider the bank accounts example. Say transaction T1 transfers amount
Rs.1000 from account B to A, and the transaction T2 displays the sum of
balances in the two accounts.
Transaction T1 can be written as follows,
T1: Lock-X(B)
Read(B)
B = B -1000
Write(B)
Lock-X(A)
Read(A)
A= A+1000
Write(A)
Unlock(B)
Unlock(A)

And the transaction T2 can be written as follows,


Prepared by, Prof Santosh Kabir. 77 6. Concurrency Control
Mobile: 98336 29398
DBMS

T2: Lock-S(A)
Read(A)
Lock-S(B)
Read(B)
Display( A + B)
Unlock(A)
Unlock(B)

Locking can lead to an undesirable situation called Deadlock. Consider that a


transaction T1 is holding a exclusive mode lock on item A and T2 is requesting a
shared mode lock on A. T2 has to wait till T1 unlocks A. Further suppose T2 is
holding a shared lock on B and T1 is requesting an exclusive mode lock on B, T1
has to wait till T2 releases a lock on B. Thus, neither of the transaction can
proceed further with its normal execution. This situation is called as deadlock.
In this case one of the transactions will be rolled back and the data items that were locked
by the transaction will be unlocked.
The schedule for the situation is as follows.
T1 T2

Lock-X(A)
Read(A)
A=A -1000
Write(A)
Lock-S(B)
Read(B)
Lock-S(A)
Lock-X(B)

A schedule than will lead


to a Deadlock.
Schedule 1

Granting of Locks:
When a transaction requests a lock on a data item in a particular mode, and no
other transaction has lock on the same data item in incompatible mode, the lock
can be granted. However, in some situations another transaction waits for the
data items to get free of all incompatible locks and it doesn‟t get lock on the data
item for very long time. For example a transaction T2 has a shared mode lock on
data item Q, and the other transaction T1 is requesting an exclusive lock on Q,
then T1 has to wait. In the mean time transaction T3 requests a shared mode
lock on Q and gets it, since it is compatible to the lock on Q. It may happen that
T2 will release the lock but the T3 is holding and T1 has to wait again for T3 to
release the lock. In this way it can happen that a sequence of transactions can
request compatible mode locks and they will be granted the lock and T1 has to
wait till all such transactions are finished. The transaction T1 may not proceed
and is said to be starved.
To avoid such situations, concurrency-manager grants locks in the following
manner:

Prepared by, Prof Santosh Kabir. 78 6. Concurrency Control


Mobile: 98336 29398
DBMS

When a transaction Ti requests a lock on the data item Q in a particular mode


say M, the concurrency-manager grants locks provided that,
1.There is no other transaction holding a lock on Q in a mode that conflicts with
M.
2. There is no other transaction that is waiting for a lock on Q, and that made its
lock request before Ti.

The Two-Phase Locking protocol:


( If only this is asked in exam then, in brief explain locks, lock modes and granting
of locks)
This type of locking protocol ensures serializability. The protocol requires that each
transaction issues lock and unlock requests in two phases.
In Growing phase transaction may obtain the locks, but may not release any
lock. Initially a transaction is in growing phase, and the transaction acquires the
locks as needed. In a Shrinking phase transaction may release locks but will not
obtain any new locks. Once a transaction releases locks it enters into shrinking
phase and it can not issue lock requests.
The transactions T1 and T2 discussed in the previous topic are two phase.
[ Give the example of T1 and T2 (pg.1 of the chapter) here for writing answer for
Two-phase protocol ]
For any transaction, the point in the schedule where the transaction has
obtained its final lock is called a lock point of the transaction. Now, the
transactions can be ordered according to their lock points. This ordering is actually a
serializability ordering for the transactions.
This type locking doesn’t ensure freedom from deadlocks. In the schedule 1
discussed above, T1 and T2 are two phase, but they are deadlocked.
In addition to being serializable, schedules should be cascadeless. Cascading
rollbacks can be avoided by a modification of two-phase locking called the Strict
two-phase locking protocol. This protocol requires not only that locking be two
phase, but also that all exclusive-mode locks taken by a transaction be released
only after that transaction commits. This requirement ensures that any data
written by an uncommitted transaction are locked in exclusive mode until the
transaction commits, preventing any other transaction from reading the data.
Another variant of two-phase locking is the rigorous two-phase locking protocol,
which requires that all locks be held until the transaction commits.

Time-stamp Based Protocol:


Ordering between each pair of conflicting transactions should be done to ensure data
consistency and atomicity. The method used here is to select an ordering
amongst the transactions in advance. One of the schemes used for doing so is to
use Time-stamp ordering.

Timestamps:
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by
TS(Ti). This timestamp is assigned by the database system before the transaction
Ti starts execution. If a transaction Ti has been assigned timestamp TS (Ti), and
a new transaction Tj enters the system, then TS (Ti) <TS (Tj) i.e. timestamp of Ti

Prepared by, Prof Santosh Kabir. 79 6. Concurrency Control


Mobile: 98336 29398
DBMS

will be less than Tj, if Tj enters the database system later. There are two simple
methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a
transaction‟s timestamp is equal to the value of the clock when the
transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been
assigned; that is, a transaction‟s timestamp is equal to the value of the
counter when the transaction enters the system.

The timestamps of the transactions determine the serializability order. Thus, if


TS(Ti) < TS (Tj), then the system must ensure that the produced schedule is
equivalent to a serial schedule in which transaction Ti appears before
transaction Tj.
To implement this scheme, each data item Q is associated with two timestamp values:
 W-timestamp (Q) denotes the largest timestamp of any transaction that
executed write (Q) successfully. i.e. the timestamp of that transaction
that has largest time stamp value, out of multiple transactions that
wrote Q.
 R-timestamp (Q) denotes the largest timestamp of any transaction that
executed read (Q) successfully.

The Timestamp-Ordering Protocol


1. Suppose that transaction Ti issues read (Q).
a. If TS(Ti) < W-timestamp (Q) (i.e. the time-stamp of the transaction
Ti that is reading Q is less that the current Write timestamp value
of Q), then Ti needs to read a value of Q that was already
overwritten. Hence, the read operation is rejected, and Ti is rolled
back. In other words, since Ti has entered system first, but it is
trying to read the value of Q that is updated by a transaction that
entered the system later.
b. If TS (Ti) ≥ W-timestamp (Q), then the read operation is executed,
and R-timestamp (Q) is set to the maximum of R-timestamp(Q) and
TS (Ti).

2. Suppose that transaction Ti issues write(Q).

a. If TS (Ti) < R-timestamp (Q), then the value of Q that Ti is


producing was needed previously, and the system assumed that
that value would never be produced. Hence, the system rejects the
write operation and rolls Ti back.
b. If TS (Ti) < W-timestamp (Q), then Ti is attempting to write an
obsolete value of Q. Hence, the system rejects this write operation
and roll Ti back.
c. Otherwise, the system executes the write operation and sets W-
timestamp (Q) to TS (Ti).

If a transaction Ti is rolled back by the concurrency-control scheme as result of


issuance of either a read or write operation, the system assigns it a new
timestamp and restarts it.
Prepared by, Prof Santosh Kabir. 80 6. Concurrency Control
Mobile: 98336 29398
DBMS

The timestamp-ordering protocol ensures conflict serializability. This is because


conflicting operations are processed in timestamp order.
The protocol ensures freedom from deadlock, since no transaction ever waits.
However, there is a possibility of starvation of long transactions if a sequence of
conflicting short transactions causes repeated restarting of the long transaction.

Thomas’ Write Rule :


This gives greater concurrency than the normal Time-stamp protocol. One of the
rules for Write operation in the normal time-stamp protocol is as follows,
If a timestamp of a current transaction Ti (that is trying to write data item) is
less than the write timestamp of the data item, then since Ti is attempting to
write obsolete value of Q, system rejects the operation and rolls Ti back.
Lets consider a schedule 2 as shown below,
T1 T2

Read( Q )
Write(Q)
Write(Q)

Schedule 2
Since T1 starts before T2, TS(T1) < TS(T2). The first Read(Q) operation of T1 and
Write(Q) operation of T2 will be successful, but according to the above mentioned
rule, the Write(Q) operation of T1 will be rejected and T1 will be rolled back. This
is because T1 is writing an obsolete value of Q, which already written by T2.
This rolling back of T2 is not necessary according to Thomas write rule, which
states that,
If Timestamp of Ti is less than the Write timestamp of Q, then since Ti is
attempting to write an obsolete value of Q, this write operation will be just
ignored.

Validation-Based Protocols :
In cases where a majority of transactions are read-only transactions, the rate of
conflicts among transactions may be low. Thus, many of these transactions, if
executed without the supervision of a concurrency-control scheme, may not
create problem of data inconsistency. A concurrency-control scheme imposes
overhead of code execution and possible delay of transactions. It may be better
to use an alternative scheme that imposes less overhead. A difficulty in reducing
the overhead is that we do not know in advance which transactions will be
involved in a conflict. To gain that knowledge, we need a scheme for monitoring
the system.
We assume that each transaction Ti executes in two or three different phases in
its lifetime, depending on whether it is a read-only or an update transaction. The
phases are, in order,

Prepared by, Prof Santosh Kabir. 81 6. Concurrency Control


Mobile: 98336 29398
DBMS
1. Read phase. During this phase, the system executes transaction Ti. It
reads the values of the various data items and stores them in variables
local to Ti. It performs all write operations on temporary local variables,
without updates of the actual database.
2. Validation Phase. Transaction Ti performs a validation test to determine
whether it can copy to the database the temporary local variables that
hold the results of write operations without causing a violation of
serializability.
3. Write phase. If transaction Ti succeeds in validation (step 2), then the
system applies the actual updates to the database. Otherwise, the
system rolls back Ti.

Each transaction must go through all the three phases as shown. The
transactions can be then interleaved by checking the validation phase, and for
this we need to know when the various phases took place. Three different time-
stamp are assigned to the phases of transaction say Ti.
Start(Ti) : the time when Ti started execution.
Validation(Ti): The time when Ti finished its read phase and started its validation
phase.
Finish(Ti): The time when Ti finished its write phase.

The serializability order is determined using the value of the time-stamp


Validation (Ti).
If for the two transaction Ti and Tj, if TS(Ti) < TS(Tj), then one following
conditions must be true.
1. Finish(Ti) < Start(Tj). Since Ti completes its execution before Tj started the
serializability order is indeed maintained.
2. The set of data items written by Ti does not intersect with the set of data
items read by Tj, and Ti completes its write phase before Tj starts its
validation phase. This ensures that writes of Ti and Tj do not overlap.

Deadlock Handling
A system is in a deadlock state if there exists a set of transactions such that every
transaction in the set is waiting for another transaction in the set. More precisely, there
exists a set of waiting transactions {T0, T1, . . ., Tn} such that T0 is waiting for a
data item that T1 holds, and T1 is waiting for a data item that T2 holds, and
Tn-1 is waiting for a data item that Tn holds, and Tn is waiting for a data item
that T0 holds. None of the transaction can make progress in such a situation.
The only remedy to this undesirable situation is for the system to invoke some
drastic action, such as rolling back some of the transactions involved in the
deadlock. Rollback of a transaction may be partial: That is, a transaction may be
rolled back to the point where it obtained a lock whose release resolves the
deadlock.
There are two principal methods for dealing with the deadlock problem. We
can use a deadlock prevention protocol to ensure that the system will never enter a
deadlock state. Alternatively, we can allow the system to enter a deadlock state,
and then try to recover by using a deadlock detection and deadlock recovery scheme.
Prepared by, Prof Santosh Kabir. 82 6. Concurrency Control
Mobile: 98336 29398
DBMS

As we shall see, both methods may result in transaction rollback. Prevention is


commonly used if the probability that the system would enter a deadlock state is
relatively high; otherwise, detection and recovery are more efficient.

Deadlock Prevention :
There are two approaches to deadlock prevention. One approach ensures that no
cyclic waits can occur by ordering the requests for locks, or requiring all locks to
be acquired together.
The simplest scheme under the first approach requires that each transaction
locks all its data items before it begins execution. Moreover, either all are locked
in one step or none are locked. There are two main disadvantages to this
protocol:
(1) it is often hard to predict, before the transaction begins, what data items
need to be locked;
(2) data-item utilization may be very low, since many of the data items may be
locked but unused for a long time.
Another approach for preventing deadlocks is to impose an ordering of all
data items, and to require that a transaction lock data items only in a sequence
consistent with the ordering.
The second approach for preventing deadlocks is to use preemption and
transaction rollbacks. In preemption, when a transaction T2 requests a lock that
transaction T1 holds, the lock granted to T1 may be preempted by rolling back of
T1, and granting of the lock to T2. To control the preemption, we assign a unique
timestamp to each transaction. The system uses these timestamps only to decide
whether a transaction should wait or roll back.

Timeout-Based Schemes
Another simple approach to deadlock handling is based on lock timeouts. In this
approach, a transaction that has requested a lock waits for a specified amount of
time. If the lock has not been granted within that time, the transaction is said to
time out, and it roll itself back and restarts. If there was in fact a deadlock, one
or more transactions involved in the deadlock will time out and roll back,
allowing the others to proceed. This scheme falls somewhere between deadlock
prevention, where a deadlock will never occur, and deadlock detection and
recovery.

Deadlock Detection and Recovery :


If the system does not provide any scheme that can avoid dead locks, then the
system must have a scheme for deadlock detection and recovery from deadlock.
Usually a system implements some algorithm that works periodically to test
whether deadlock has occurred, and if one has then the system attempts to
recover from the deadlock.

Deadlock Detection :
Deadlocks can be described precisely in terms of directed graphs called Wait-for
graph. The graph consists of vertices that represent transactions and the directed

Prepared by, Prof Santosh Kabir. 83 6. Concurrency Control


Mobile: 98336 29398
DBMS

edges between any two vertices (say transactions Ti and Tj ) indicate that the Ti
is waiting for Tj, if there is an edge directed from Ti to Tj (denoted by TiTj ).
When any transaction Tn requests a data item currently locked by Tm then the
edge Tn  Tm is inserted in the wait for graph. The edge is removed only if the
Tm is no longer holding lock on data item needed by Tn.
Deadlock exists in the system if and only if the wait-for graph contains a cycle.
Consider following Wait-for graph.

T2 T4

T1

T3

Since there is no cycle in the graph, it indicates that currently there is no


deadlock in the system.
Suppose, now that a transaction is requesting a data item that is currently held
by T3, the edge T4T3 will be added to the wait-for graph, resulting into a new
system state shown in figure below.

T2 T4

T1

T3

At this moment of time the system has cycle,


T2  T4 T4 T3  T2
Implying that transactions T2, T3 and T4 are deadlocked.
If deadlocks occur frequently in the system, the algorithm for deadlock detection
should be invoked frequently. Also, when an how frequently the algorithm
should be invoked depends upon the number of transactions working in the
system concurrently.

Recovery from Deadlocks:


When a detection algorithm determines that the deadlock exists, the system
must recover from the deadlock. The most common solution is to roll-back one
of the one or more transactions to break the deadlock.
Following three actions are taken:
Selection of victim: this will determine which transaction (or transactions) to roll
back to break the deadlock. Those transactions are rolled back which will incur
minimum cost i.e. minimum updates to be undone or minimum dependent
transactions to be rolled back.

Prepared by, Prof Santosh Kabir. 84 6. Concurrency Control


Mobile: 98336 29398
DBMS
Rollback : Once the above action determines which transactions to rollback, here
it is decided that how far the transaction should be rolled back. It can be partial
rollback or total rollback. In total rollback the transaction is completely rolled
back and then restarted. The partial roll back, cancels updates up to the stage
before which deadlock was not present.
Starvation : In the systems where selection of the victim depends on the cost (as
discussed in first point), it may happen that the same transaction will be picked
up repeated for rollback and the transaction will not complete i.e. thus resulting
into starvation of the transaction. Here, the care is taken that one transaction
will be picked up for only small finite number of times.

---000---

Classes for Engg. students:


FE – SPA …Sem II

SE – DS (Comp) , DSAA ( IT ) … Sem III


OOPM (Java) ( Comp, IT) …Sem III

Web Programming. (IT ) … Sem IV


(JSP, C#, ASP.Net, PHP etc)

* With practicals *

By – Santosh Kabir sir.

Contact: 98336 29398

Andheri, Dadar, Thane

Prepared by, Prof Santosh Kabir. 85 6. Concurrency Control


Mobile: 98336 29398
7. Recovery System
A computer system, like any other device, is subject to failure from a variety of
causes: disk crash, power outage, software error, a fire in the machine room,
even sabotage. In any failure, information may be lost. Therefore, the database
system must take actions in advance to ensure that the atomicity and durability
properties of transactions are preserved. An integral part of a database system is
a recovery scheme that can restore the database to the consistent state that
existed before the failure. The recovery scheme must also provide high
availability; that is, it must minimize the time for which the database is not
usable after a crash.

Failure Classification
 Transaction failure. There are two types of errors that may cause a
transaction to fail:
Logical error. The transaction can no longer continue with its normal
execution because of some internal condition, such as bad input, data not
found, overflow, or resource limit exceeded.

System error. The system has entered an undesirable state (for example,
deadlock), as a result of which a transaction cannot continue with its
normal execution. The transaction, however, can be re-executed at a later
time.

 System crash. There is a hardware malfunction, or a bug in the


database software or the operating system, that causes the loss the
content of volatile storage, and brings transaction processing to a halt.
The content of nonvolatile storage remains intact, and is not corrupted.
 Disk failure. A disk block loses its content as a result of either a head
crash or failure during data transfer operation. Copies of the data on
other disks, or archival backups on tertiary media, such as tapes, are
used to recover from the failure.

To ensure the database consistency and transaction atomicity despite failures


some algorithms, known as recovery algorithms are used. These have two parts:
1. Actions are taken during normal transaction processing to ensure
that enough information exits to allow recovery from failures. Thus,
information is gathered about the current transactions being
executed in the system.
2. Actions are taken after a failure to recover the database contents to
a state that ensures database consistency, transaction atomicity,
and durability.

Log-Based Recovery
The most widely used structure for recording database modifications is the log.
The log is a sequence of log records, recording all the update activities in the
database.
e.g. an information about the currently executed operations is kept aside in the
form of records. If transaction or system fails then to recover the database
Prepared by, Santosh Kabir. 86 7. Recovery System
Mobile: 98336 29398
system from this failure these records will be used. These records are called log
records.
There are several types of log records. An update log record describes a single
database write. It has these fields:

 Transaction identifier is the unique identifier of the transaction that


performed the write operation.
 Data-item identifier is the unique identifier of the data item written.
Typically, it is the location on disk of the data item.
 Old value is the value that the data item prior to the write.
 New value is the value that the data item will have after the write.

We denote the various types of log records as:


 < Ti start>. Transaction Ti has started.
 < Ti, Xj, V1, V2 >. Transaction Ti has performed a write on data item Xj.
Xj had value V1 before the write, and will have V2 after the write.
 < Ti commit >. Transaction Ti has committed.
 < Ti about >. Transaction Ti has aborted.

Whenever a transaction performs a write, it is essential that the log record for
that write be created before the database is modified. Once a log record exists,
we can output the modification to the database if that is desirable. Also, we have
the ability to undo a modification that has already been output to the database.
We undo it by using the old-value field in log records.
For log records to be useful for recovery from system and disk failures, the log
must reside in stable storage. Observe that the log contains a complete record of
all database activity.

Database updates are done in one of the following two ways:

1. Deferred Database Modification


The deferred-modification technique ensures transaction atomicity by recording
all database modifications in the log, but deferring the execution of all write
operations of a transaction until the transaction partially commits. Recall that a
transaction is said to be partially committed once the final action (instruction) of
the transaction has been executed. Here, we assume that transactions are
executed serially.
When a transaction partially commits, the information on the log associated
with the transaction is used in execution the deferred writes. If the system
crashes before the transaction completes its execution, or if the transaction
aborts, then the information on the log is simply ignored.

2. Immediate database modifications:


This technique allows database modifications to be to be output to the database
while transaction is still is in active state. Data modifications written by the
active modifications are called uncommitted modifications. In the event of
system crash old values of log records are used for data recovery.

Prepared by, Santosh Kabir. 87 7. Recovery System


Mobile: 98336 29398
Before transaction Ti starts its execution, the system writes the record <Ti Start>
to the log. During its execution, any write(X) operation by Ti is preceded by the
writing a new update log record to the log. When Ti partially commits the system
writes the records <Ti Commit> to the log.

Using the log, system can handle any failure that does not result in the loss of
information in non-volatile storage, i.e. in the disk. Recovery scheme uses

Checkpoints:
In a log-based recovery scheme, when a system failure occurs, the entire log
need to be searched to determine which transactions should be redone and
which to be undone. This process is inefficient since the searching particular
transactions from entire log is time consuming. Also, although not harmful, but
redoing the transactions which have already done updates, is not needed.

To reduce these type of overheads, systems use checkpoints. During the


execution database operations, the system maintains a log, using the two
techniques discussed above.
Check point records are also stored in log.
The presence of the checkpoint records in the log allows the system to
streamline its recovery procedure.

Shadow Paging
An alternative to log-based crash-recovery techniques is shadow paging. Under
certain circumstances, shadow paging may require fewer disk accesses than do
the log-based methods discussed previously. It is hard to extend shadow paging
to allow multiple transactions to execute concurrently, this is one of the
limitations of Shadow pafing.

Shadow paging techniques works as follows,


The database is partitioned into some number of fixed-length blocks,
which are referred to as pages. Assume that there are n pages, numbered 1
thought n. (In practice, n may be in the hundreds of thousands.) It uses a page
table, as in following figure for this purpose. The page table has n entries – one
for each database page. Each entry contains a pointer to a page on disk.

Prepared by, Santosh Kabir. 88 7. Recovery System


Mobile: 98336 29398
1
2
3
4
….
N

Page table

………

Pages on disk

Also, there are some unused pages called as free pages on a disk. System
maintains list these free pages called as Free page list.
The key idea behind the shadow-paging technique is to maintain two page tables
during the life of a transaction: the current page table and the shadow page
table. These table maintain the pointers for the pages in which the records to be
updated are found. When the transaction starts, both page tables are identical.
The shadow page table is never changed over the duration of the transaction. The
current page table may be changed when a transaction performs a write
operation. All input and output operations use the current page table to locate
database pages on disk.

Suppose that the transaction Tj performs a Write(X) operation, and that X


resides on ith page, the system executes Write operation as follows,
1. If the ith page (that is, the page on which X resides) is not already in
main memory, then the system issues input (X). Record is read into
memory for update.
2. If this is the write first performed on the ith page by this transaction, then
the system modifies the current page table as follows:
a. It finds an unused page on disk. Usually, the database system has
access to a list of unused (free) pages.
b. It deletes the free-page entry found in step 2a from the list of free
page frames; it copies the contents of the ith page(which has X
record) to the found in step 2a.
c. It modifies the current page table so that the ith entry points to the
(free)page found in step 2a. i.e. now the current page‟s ith entry
points to this new page.
3. It assigns the value of xi (changed value) to X in buffer page.
The procedure is shown in a figure below,
Consider the records to be updated by a transaction are in pages 2, 3 and 5 of
the pages on the disk. Shadow and Current page table will keep a track of those
pages. Suppose currently updated record X lies in page 2, then

Prepared by, Santosh Kabir. 89 7. Recovery System


Mobile: 98336 29398
Page 1

Page 2 ( Data X )
2 2
3 Page 3 3
5 5
Page 4
Shadow Page table
Page 5 current Page table

….

Page n

Pages on disk ( ie. DB )

After updating Page 1

Page 2 ( Data X )
2 2
3 Page 3 3
5 5
Page 4
Shadow Page table
Page 5 current Page table

….

Page n

Temp. page in which X is updated

Intuitively, the shadow-page approach to recovery is to store the shadow page


table in nonvolatile storage, so that the state of the database prior to the
execution of the transaction can be recovered in the event of a crash or
transaction about. When the transaction commits, the system writes the current
page table to nonvolatile storage. The current page table then becomes the new
shadow page table, and the next transaction is allowed to begin execution. It is
important that the shadow page table be stored in nonvolatile storage, since it
provides the only means of locating database pages. The current page table may
be kept in main memory (volatile storage). We do not care whether the current
page table is lost in a crash, since the system recovers by using the shadow page
table.
---000---

Prepared by, Santosh Kabir. 90 7. Recovery System


Mobile: 98336 29398

You might also like