Professional Documents
Culture Documents
Index
1. DBMS Concepts : 1–9
2. Relational Model: 10 – 18
4. SQL : 34 - 66
5. Transaction Management : 67 - 76
6. Concurrency Control : 77 - 85
7. Recovery System : 86 - 90
[ These notes are sufficient to for attempting most of exam questions but Not complete and
thorough. Please, refer to some standard books for other topics. Also, refer old DBMS exam
Papers for SemIII-IT , Sem IV Comp.(old) ]
Santosh Kabir.
Mobile : 98336 29398.
www.santoshkabirsir.com
D.B.M.S.
happen that the changed customer address may be reflected in one file but not in
the other file, causing data inconsistency.
Data isolation :
Data is scattered over different files in different format. Writing new application
programs to access this data will be difficult.
Atomicity problem:
Transaction consisting of multiple operations should be successful or fail totally.
Because of system failure or data integrity problem, the transaction performed on
a database can fail. If the transaction fails the data should be restored to the
consistent state before failure. In a bank while transferring money, from account A
to B, the money was removed from A but not added to B, will lead into data
inconsistency. Thus, the money transfer must be atomic, i.e. it must happen in its
entirety or not at all.
Security problems:
The data stored in the database must be safe-guarded against the illegal access.
Also, the users that work with particular type of data should be restricted to work
other type of data. e.g. the bank employees working for savings accounts are not
allowed to work with data and records related to loans.
Maintaining such different levels of security for data stored in file system will be
almost impossible.
View of data :
Data Abstraction:
The data is stored in database systems using complex data-structures so that the
data can be efficiently accessed and modified by common database users. The
database system users may not be computer professionals; hence the developers
of database system hide the complex details of data storage, so as to provide the
users with easy access to the data.
Physical level:
This is the lowest level of data abstraction and describes how data is actually
stored. This level describes the complex data-structures for data storage. Database
administrators sometimes handle database at this level to make database more
efficient and easy for access.
Logical level:
This level describes what data are stored in database and what relationship exists
among those data. The entire database is described in terms of a small number of
relatively simple structures. Database administrators, who must decide what data
to keep in the database, use logical level of abstraction.
View level:
This is the highest level of abstraction, which describes only part of the database.
All users don‟t need to know and view all the data. They want only part of the
data. This level simplifies the interaction of such users with the system.
Database Languages :
Database systems use database languages to work with the databases. The
commonly used language for database handling is called Structured Query
Language (SQL).
Note: SQL is a universally accepted and used Database language. All the DBMS use SQL for
handling database operations.
Whereas, SQL-Server is a DBMS software, that is a product of Microsoft company. It is used for
creating and using database. There are other popular DBMS (rather RDBMS) like Oracle9i from
Oracle, Access-2000 from Microsoft etc. The DBMS comes in different versions like MS-SQL Server
2000, MS-SQL Server 2005 etc. The software uses SQL language for its database operations.
Part of the SQL language that is used to specify database schema is called as Data
Definition Language (DDL) and the part of the language that is used to work with
the existing database is called Data Manipulation Language (DML).
The DDL consists of the instructions using which one can specify the storage
structure and access methods used by database system. These instructions define
the implementation details of the database schema which are usually hidden from
users. Using instructions of DDL certain constraints and data integrity rules are
specified while defining database schema.
[ The SQL is discussed in later chapter ]
Database users:
According to the way users are expected to interact with the Databases, users are
categorized as follows,
Naive users: These are the users who work with the databases indirectly through
some application programs (like web applications, desktop applications) that are
previously written by application programmers. For example a bank-clerk updates
the bank account when a request from an account holder is given for depositing
money. Here the bank-clerk uses account information stored in the database
using some application program on his/her computer.
Thus, the user may not be a computer professional, also user may not know
anything about the database.
Application Programmers: Are the computer professionals who write the application
programs. These users work with the database schema already defined by the
database designers (usually DBA).
Application programmer uses different development tools so that the database
users (indirect) can work with data easily.
Sophisticated users: These are DBMS users who interact with the database
systems by writing queries (statements in some database language like SQL).
Specialized users: These are sophisticated users who write specialized database
applications that do not fit into traditional data-processing framework. These
applications store data in more complex form like graphics, audio data.
Database Architecture :
Database system is partitioned into multiple parts (modules) that deal with
different responsibilities of the overall system. Broadly different functional
components of the system are divided into two parts : Storage manager and Query
Processor. Storage manager‟s main task is to handle the physical data present in
the Operating system files. The volume of data can be in Megabytes for small firms
and it can be in Terabytes for large organizations. Storage manager also keeps
some data in buffer so that the movement of data from actual storage place to
memory will be faster.
The query processor does a job of handling user requests in such a way that user
don‟t have to interact with the actual data. It translates the user queries (e.g. SQL
statements) into efficient sequence of operations at physical level.
Storage Manager:
It is program module that provides interface between the queries submitted by
system ( and requests given by application programs) and the low level data stored
in Database. It interacts with the file manager. Various DML queries are
translated into low-level file instructions.
It includes following components:
Authorization and integrity manager : tests for the satisfaction of integrity constraints
and checks the authority of users to access the data.
Transaction manager : ensures that the database remains in a consistent state
despite system failures. Also, it ensures that the concurrent transactions work
together without conflicting.
File manager: This manages the allocation of disk storage space and the data
structures used to store data.
Buffer Manager: Its job is fetch data from disk storage (actual database) into main
memory, and deciding what data must be held in main memory. It enables
database to handle data sizes that are much larger than main memory.
Storage manager implements several data structures to hold the physical data.
Data files: which store the database it self.
Data dictionary: Which stores metadata about the structure of the database i.e.
schema of the database.
Indices: Provides fast access to data items in database that hold particular value.
The Query Processor :
This includes following main modules:
DDL interpreter: Interprets the DDL statements and records the definitions in data
dictionary.
DML Compiler: Translates DML statements (usually given in query language) into
an evaluation plan consisting of low level instructions that the query evaluation
engine understands. DML compiler also does a Query Optimization i.e. decides
and executes the most efficient evaluation plan from among the alternatives.
Query Evaluation engine: Executes low level instructions generated by the DML
compiler. Figure below shows the Database system structure.
In the above Account relation there are five tuples. Lets consider a first tuple,
given by a tuple variable, say t. We use the notation t[account_no] to denote value
of t on account_no attribute. Thus, we can say t[account_no] is A-101, also
t[branch_name] is Andheri and so on. Alternatively one can use notation t[1] to
refer to the first attribute (here account_no), t[2] for branch_name and so on.
Relational Algebra:
Database user use some query language to work with data in the database. Query
languages consist of some predefined instructions used in a particular syntax.
According to their use query languages are categorized as either a Procedural
query language or Non-procedural query language. In a procedural language user
instructs the system to perform a sequence of operations on the database to
compute the desired result. In a non-procedural language user describes the
desired information without giving specific procedure for obtaining that
information.
Relational algebra is a procedural query language. It consists of set of operations that
take one or two relations as input (also called argument) and produce a new
relation as their result.
Fundamental operations:
These are Select, Project, Union, Set-difference, Cartesian product and Rename.
The select, project and rename operations are called as unary operations, because
they operate on a single relation.
Remember two points about the relations.
Result of a relational algebraic expression is a relation.
And since relations are sets, they don’t hold duplicate values.
The operands in the predicate can be numbers or strings (text data) or can be the
attributes of some relation.
The string values are enclosed in double-quotation symbols.
The numeric values are directly written as numbers.
We can also use operators like And( Л), Or(V) and Not(¬).
Thus, to find the tuples where branch name is Andheri and amount is greater
than 4000 we can write expression as follows,
σ amount > 4000 Л b_name= “Andheri” (loan)
To get the tuples where expenses is equal to credit limit from Cards relation,
σ expenses = cr_limit (Cards)
LoanNo Amount
L-101 4000
L-102 5000
L-103 1500
L-104 6500
L-105 3500
L-106 4000
We can combine multiple operations and get a required resultant relation, using a
relational algebraic expression.
e.g. to get the names of the customers who stay in a “Mumbai” we can write,
∏ cust_name ( σ city = “Mumbai” (customer) )
The select operation in the parentheses returns a relation which holds tuples with
city equal to Mumbai and with all the attributes. This resultant relation is used as
an input (argument) to the project operation and returns the final relation with
only cust_name attribute. The resultant of the expression is as follows,
CustName
Sanjay
Ajay
Rita
Mack
Dinesh
3. Union Operation :
This works on sets theory of mathematics. Union is a set of values obtained after
merger of two sets, removing duplicate values.
Let‟s consider a case of getting all the customers who have either account or a
loan or both in a bank. Now, to get this information we have to get the tuples from
depositor and borrower. Customer relation will not provide this information,
because all the customers may not have loan or account in a bank.
Prepared by, Santosh Kabir. 13 2. Relational Model
Mobile : 98336 29398
We can get customers having loan by following expression,
∏ cust_name ( borrower )
We can get customers having account by following expression,
∏ cust_name ( depositor )
To get the customers having either loan or account or both we need to take a
union of the relations resulting from above two expressions, as follows,
∏ cust_name ( borrower ) U ∏ cust_name ( depositor )
Since, the relations are set, duplicate values are eliminated. The resultant relation
will be as follows,
Dinesh
Mack
Neeta
Puja
Ravi
Rita
Sachin
Sanjay
Vijay
We must ensure that the union is taken between compatible relations. In the
above example both relations ( left and right side of union symbol) have same
number of attributes (i.e. only custName) and same type of attributes (i.e. strings).
Thus, we can say that if an expression r U s to be valid,
1. The relations r and s must have same arity i.e. they must have same
number of attributes.
2. The domains of the ith attribute of r and ith attribute of s must be the same
for all the i.
We have to now select those tuples from resultant in which loan numbers from the
two original relations match.
σ borrower.loanno = loan.loanno ( σ b_name= “Andheri” ( borrower x loan ) )
To get only customer names, we do projection, as follows,
∏ cust_name ( σ borrower.loanno = loan.loanno
( σ b_name= “Andheri” ( borrower x loan ) ) )
The final result is as follows,
CustName
Vijay
Mack
6. Rename operation:
Some-times it is required to reuse the relations, which result from a Relational
algebraic expressions. These relations are name-less, but we can assign them a
name using a rename operation. General use of rename is as shown below,
name ( Expression )
e.g. x (σ b_name= “Andheri” (Loan) )
Here, the result of expression in the braces will be a relation which will be given a
name x.
Natural join is associative i.e. the following three expressions are equivalent,
Customer Account Depositor
( Customer Account ) Depositor
Customer ( Account Depositor )
2. Find all the customers who have both a loan and an account in a bank.
∏ cust_name (Borrower Depositor )
9. Division Operation :
The operation is denoted by a symbol, and is used in special queries that include
the phrase „for all‟. e.g. we want to retrieve all the customers who have account at
all the branches in located in city Mumbai.
We can obtain all the branches in Mumbai by the expression,
R1 = ∏ Bname (σ city= “Mumbai” ( Branch ) )
Now, we will retrieve all the customers (with their branch name) who have account
in a bank,
R2 = ∏ cust_name, bname (Depositor Account)
Prepared by, Santosh Kabir. 17 2. Relational Model
Mobile : 98336 29398
Now want to retrieve all the customers who appear in R2 with every branch name
in the R1. This can be done by : R2 R1
∏ cust_name, bname (Depositor Account)
∏ Bname (σ city= “Mumbai” ( Branch ) )
----000----
* With practicals *
An Entity is a „thing‟ or an „object‟ in the real world that is distinguishable from all
other objects. For example Customer (for a bank) is an entity. An entity has set
of properties and some values can be associated with some or all of these
properties. Values for some of these properties may uniquely identify an entity.
e.g. customer ID can uniquely identify an entity. Or a Loan can be identified by
the Loan Id property of the loan.
An entity is represented by a set of properties, called Attributes. e.g. the customer
entity can be represented by the attributes such Customer ID, name, Address
etc. as given above.
Entity Set is a set of entities of same type that share same properties i.e. same
attributes. Set of all the customers of a bank can be called as an entity set
customer.
The individual entities that constitute a set are said to be extension of the entity
set. Thus, individual customer of a bank is an extension of the Customers entity
set.
Each entity has Value for each of each attributes. e.g. a particular customer can
have Customer id= „1234‟, name= „Mr Rajesh Kumar‟ etc. The customer ID is
used to uniquely identify the customers., since there can be multiple customers
with a same name or same balance amount.
For each attribute there may be some set of permitted values. e.g. Customer
name can consist of alphabets. Customers id should be a number. These set of
permitted values is called Domain or Value Set.
Simple and Composite attributes: Simple attributes can not be further divided (to
describe the entity in more detail) into multiple attributes. e.g. the attribute
balance amount for a customer cannot be further broken into subparts.
Composite attributes, on the other hand can be divided into subparts i.e. other
attributes. e.g. the customer address attribute for a customer can be divided into
attributes such as Apartment name, street, City and area code(pin codes in
India).
Single valued and Multi-valued attributes: Single valued attributes refer to only
one value for an entity. Example Customer balance amount will hold only one
value. It can not refer to multiple values for one customer entity. Where as
Customer telephone number can refer to multiple values because customer can
have zero, one or more telephone number. Thus, telephone number can be
multivalued attribute. Thus, multivalued entity can have 0 or more values for a
single entity.
Derived attributes: The value for this attribute may be derived from other related
attributes or entities. e.g. the value for Age attribute for a customer can be
derived from the attribute Date of birth.
An attribute can take null value, when a particular entity doesn‟t have value for
it. This, can happen because of one or more reasons like: The attribute is not
applicable for a particular entity or the value is currently not known.
Relationship Sets:
A relationship is an association among several entities. For example, say a
customer Rajesh Kumar has taken a loan from a bank and his loan id is L0012.
The loan entities are part of Loans entity set. We can define a relationship that
associates customer Rajesh Kumar with loan number L_0012. This relationship
specifies that the Rajesh Kumar is a customer of Bank with loan number
L_0012.
Relationship Set is a set of relationships of same type. Consider two entity sets
Customers and Loans we define the relationship set borrower to denote the
association between customers and bank loans.
Most of the relationship sets are Binary relationship set- that is, one that involves
two entity sets. For example, the relationship set Borrower discussed above is a
binary relationship set since it involves two entity sets: Customer and Loan. In a
database systems there can exists relationship sets that involve more than two
entities.
Constraints :
ER Model for a particular enterprise may define certain Constraints (rules and
specifications to be followed by the actual data in the database) to which the
contents of the database must conform. Following sections explain the Mapping
cardinalities and Participation constraints, which are two most important types
of constraints.
Mapping Cardinalities:
Also called as Cardinality ratios, express the maximum number of entities to which
another entity can be associated via a relationship set. We will discuss only
binary relationship sets.
For a binary relationship set R between entity sets A and B, the mapping
cardinality must be one of the following:
To understand the above four types, lets take an example of Customer and Loan
entities for banking system (discussed in previous topics). One customer can
have multiple loans forming many to one relationship. Also one loan can be
taken on multiple customers‟ name (say a business loan is taken by multiple
partners in the business, who are customers of the bank), thus forming many to
Participation Constraints:
The participation of an entity set E in a relationship set R is said to be Total if
every entity in E participates in at least one relation in R. Every loan entity in a
loan entity set is related to customer i.e. participates in Borrower relationship
set (discussed in previous topics). Thus, participation of a loan entity set in
borrower relationship is total. Whereas, every customer of a bank may not have
a loan. Thus, every entity in a Customer entity set may not be related to Loan
through Borrower relationship set. Hence, participation of a customer in the
borrower relationship set is Partial.
Keys :
All the entities within a given entity set are distinct, but there must be a way to
specify how entities within the entity set are distinguished. The values of the
attributes of entities must be such that they can uniquely identify the entity. In
other words no two entities in the entity set are allowed to have exactly the same
value for all attributes.
A key allows us to identify a set of attributes that make it possible to distinguish
entities from each other.
Component Represents
Rectangle Entity set
Ellipse Attribute
Diamond Relationship set
Line Link attributes to entity sets and entity sets to
relationship sets
Double Ellipse Multivalued Attribute
Dashed Ellipse Derived attribute
Double line Total participation of the entity set in the
relationship set
Double Rectangle Weak Entity sets
ER Diagram example:
Lets consider the banking system example discussed above. We will consider two
entity sets: Customers and Loans related through the binary relationship set
borrower. Lets consider the attributes customer-id, customer-name, customer-
street and customer-city. Attributes associated with Loan are loan-number,
amount. The attributes that are members of primary key are underlined.
Simple ER-diagram for the example is drawn below with basic components listed
above.
cust-id cust-city
borrower
Customers Loans
2. The undirected line from relationship set borrower to entity set Loans,
specifies that the borrower is either many-to-many or one-to-one
relationship set from Customers to Loans.
The ER-diagram (A) below shows the relationship set borrower from Customers
to Loans is one-to-many. Line from borrower to Customers is directed.
Also, the ER-diagram (B), below, shows the relationship borrower is one-to-one.
Both the lines from borrower are directed.
cust-id cust-city
borrower
Customers Loans
Fig.A
cust-id cust-city
borrower
Customers Loans
Fig.B
cust-id cust-city
depositor
Customers Accounts
Access-date
cust-name
address
cust-id
Customers
age
Phone-no Date-of-
birth
cust-id cust-city
borrower
Customers Loans
Jobs
works-on
Employees Branches
The basic ER model that we discussed before can be used to model most of the
database features, but some of the features can be better expressed using some
extensions added to basic ER-model.
1. Specialization:
In some cases an entity set might consists of sub-groups and some of the
attributes for these sub-groups may not be shared by all the entities in the entity
set. These subgroups can be further separated to multiple entity sets.
Consider an entity set Persons for a banking system. It will have attributes like
name, city, age. But there can be two types of persons:
Customers and Employees.
Each of these person types have above attributes (name, city, age). But they can
be described by some more attributes which may not be common to both
(Employees and Customers). For example Customers will have additional
attributes like customer-id, customer-type, where Employees can have attributes
like salary, job.
The process of designating sub groupings within any entity set is called
Specialization. Thus, specialization of persons allows us to distinguish among
Persons according to whether they are Employees or Customers.
Consider another example of Bank Accounts. Accounts is an entity set with
attributes account-no, account-name, balance. But a bank may have two types
accounts Savings Accounts and Current Account. Savings accounts are given
interest per month according to the balance. Current Account are not given
monthly interest. But, a current account may be given overdraft facility, reports
of monthly transactions, transferring money from one account to other etc.
city
name
age
Persons
Cust-type
ISA
salary
Cust-id
job
Customers Employees
2. Generalization:
Suppose there are two entity sets in a banking database system: say
SavingsAccounts and CurrentAccounts. Except for a few attributes and very
small operational difference both the entity sets have a lot of common features.
Then a database designer can decide whether to group them into one entity set
say Accounts. This feature is called as Generalization, in which entity sets are
regrouped into one entity set. It is quite obvious that it is opposite to
Specialization.
Here the two entity sets SavingsAccounts and CurrentAccounts are called low
level entity sets and the entity set Accounts is called high level entity set.
In terms of ER-diagram, again this process can be shown with ISA label in a
triangle component. i.e. the same ER-diagram is used for Specialization and
Generalization is same. Thus, in terms of ER-diagrams we don‟t distinguish
between Specialization and Generalization.
3. Aggregation:
This feature of ER model is used to specify relationship between relationships.
Lets consider a relationship set works-on between Employees, Jobs and
Branches, as discussed in previous topics. Each employee is related to some Job
in some Branch. Suppose it is required to have manager to keep a record for
which employee worked at what branch. Lets assume an entity set Managers for
this purpose. One way to represent using ER diagram is to show Quaternary
relationship manages between Employees, Jobs, Branches and Managers as
shown below. (following figure doesn‟t include attributes for entity sets).
works-on
Employees Branches
manages
Manager
The ER-diagram shown above for the case is having redundant information. It is
clear that the Employees, Jobs, Branches combination in manages is also in
works-on.
This situation is handled by using aggregation. Aggregation is an abstraction
through which relationships are treated as higher level entities. In this example
we will consider that the works-on as a higher level entity-set. We can create a
binary relationship between works-on and Managers to represent who manages
what task.
Jobs
works-on
Employees Branches
manages
Manager
Project Proj-name
St-date
Sub-date
Leader
TeamID
Level
ProjLevel Level
Status
TeamID
Leader
St-date
Sub-date
Phone
EmpID
TeamID
Employee
Has
Leader
TeamID
D Member
Team
Works
on
Proj- St-
Name date
Sub-
date
Leader
Project
TeamI
D
Level
Has
Level Status
TeamID
Leader
ProjLevel
St-date
ER Diagram For
Software company Database Sub-date
St-Name Grande
Dept
Student Faculty
33
======================
Enroll Teach
Class
Year Name
Capacity
3. E-R Model
E – R Diagram for university Database
DBMS
What is SQL?
Note: Most of the SQL database programs also have their own proprietary
extensions in addition to the SQL standard!
RDBMS :
RDBMS is the basis for SQL, and for all modern database systems like MS SQL
Server, IBM DB2, Oracle, MySQL, and Microsoft Access. The data in RDBMS is
stored in database objects called tables. A table is a collections of related data
entries and it consists of columns and rows.
The query and update commands form the DML part of SQL:
The DDL part of SQL permits database tables to be created or deleted. It also define
indexes (keys), specify links between tables, and impose constraints between tables.
The most important DDL statements in SQL are:
SELECT :
The statement retrieves data from database and the result is returned in the form of
query result. A general syntax of the statement is as follows,
SELECT [ALL/DISTINCT] [aggregate functions] Column_list
FROM table/s
WHERE search-condition
GROUP BY column-name/s
HAVING search-condition
ORDER BY column-name/s
The result of a query is expected in the form of table of the retrieved columns and
data. There can be no record received if the database table is empty or if the
WHERE clause search-condition is not satisfied by any of the records in the table. If
any records are retrieved the result can have one ore more columns.
Table Account :
ACCOUNT_NO BRANCH_NAME BALANCE
--------------- --------------- ----------
A-101 Downtown 500
A-215 Mianus 700
A-102 Perryridge 400
A-305 Round Hill 350
A-201 Perryridge 900
A-222 Redwood 700
A-217 Brighton 750
A-333 Central 850
A-444 North Town 625
Table Customer:
CUSTOMER_NAME CUSTOMER_STREET CUSTOMER_CITY
--------------- ------------ ---------------
Jones Main Harrison
Smith Main Rye
Hayes Main Harrison
Curry North Rye
Lindsay Park Pittsfield
Turner Putnam Stamford
Williams Nassau Princeton
Adams Spring Pittsfield
Johnson Alma Palo Alto
Glenn Sand Hill Woodside
Brooks Senator Brooklyn
Green Walnut Stamford
Jackson University Salt Lake
Majeris First Rye
McBride Safety Rye
Simple examples:
Result:
CUSTOMER_NAME CUSTOMER_CITY
--------------- ---------------
Jones Harrison
Smith Rye
Hayes Harrison
Curry Rye
Lindsay Pittsfield
Turner Stamford
Williams Princeton
Adams Pittsfield
Johnson Palo Alto
Glenn Woodside
Brooks Brooklyn
Green Stamford
Jackson Salt Lake
Majeris Rye
McBride Rye
Result :
ACCOUNT_NUMBER BRANCH_NAME BALANCE
--------------- --------------- ----------
A-101 Downtown 500
A-215 Mianus 700
A-102 Perryridge 400
A-305 Round Hill 350
A-201 Perryridge 900
A-222 Redwood 700
A-217 Brighton 750
A-333 Central 850
A-444 North Town 625
a. Retrieve customer name and customer‟s street and city as one column address.
Query: SELECT CUSTOMER_NAME, CONCAT(CUSTOMER_STREET, CONCAT('-' ,
CUSTOMER_CITY)) ADDRESS FROM CUSTOMER;
Result:
Prepared by, Santosh Kabir. 37 SQL
Mobile: 98336 29398
DBMS
CUSTOMER_NAME ADDRESS
--------------- ----------------------------
Jones Main-Harrison
Smith Main-Rye
Hayes Main-Harrison
Curry North-Rye
Lindsay Park-Pittsfield
Turner Putnam-Stamford
Williams Nassau-Princeton
Adams Spring-Pittsfield
Johnson Alma-Palo Alto
Glenn Sand Hill-Woodside
Brooks Senator-Brooklyn
Green Walnut-Stamford
Jackson University-Salt Lake
Majeris First-Rye
McBride Safety-Rye
Rye
Salt Lake
Stamford
Woodside
2. Retrieve the customers information who stay in Stamford city, on Walnut street.
Query: SELECT * FROM CUSTOMER WHERE CUSTOMER_CITY='Stamford' AND
CUSTOMER_STREET='Walnut';
The ORDER BY keyword is used to sort the result-set by a specified column. The
ORDER BY keyword sorts the records in ascending order by default. If you want to
sort the records in a descending order, you can use the DESC keyword.
The first form doesn't specify the column names where the data will be inserted,
only their values:
The second form specifies both the column names and the values to be inserted:
The following SQL statement will add a new row, but only add data in the "P_Id",
"LastName" and the "FirstName" columns:
Now we want to select the persons living in a city that starts with "s" from the table
above. We use the following SELECT statement:
Next, we want to select the persons living in a city that ends with an "s" from the
"Persons" table.
Next, we want to select the persons living in a city that contains the pattern "tav"
from the "Persons" table.
The IN Operator
SQL IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1,value2,...)
Now we want to select the persons with a last name equal to "Hansen" or
"Pettersen" from the table above.
Now we want to select the persons with a last name alphabetically between
"Hansen" and "Pettersen" from the table above.
In some databases, persons with the LastName of "Hansen" or "Pettersen" will not
be listed, because the BETWEEN operator only selects fields that are between and
excluding the test values).
SQL aggregate functions return a single value, calculated from values in a column.
The COUNT() function returns the number of rows that matches a specified criteria.
The COUNT(column_name) function returns the number of values (NULL values will
not be counted) of the specified column:
SQL scalar functions return a single value, based on the input value.
Now we want to find the total sum (total order) of each customer.
Customer SUM(OrderPrice)
Hansen 2000
Nilsen 1700
Jensen 2000
We can also use the GROUP BY statement on more than one column, like this:
SELECT Customer,OrderDate,SUM(OrderPrice) FROM Orders
GROUP BY Customer,OrderDate
The HAVING clause was added to SQL because the WHERE keyword could not be
used with aggregate functions.
Now we want to find if any of the customers have a total order of less than 2000.
Customer SUM(OrderPrice)
Nilsen 1700
SQL Joins
SQL joins are used to query data from two or more tables, based on a relationship
between certain columns in these tables.
A primary key is a column (or a combination of columns) with a unique value for
each row. Each primary key value must be unique within the table. The purpose is
to bind data together, across tables, without repeating all of the data in every table.
Note that the "P_Id" column is the primary key in the "Persons" table. This means
that no two rows can have the same P_Id. The P_Id distinguishes two persons even
if they have the same name.
Note that the "O_Id" column is the primary key in the "Orders" table and that the
"P_Id" column refers to the persons in the "Persons" table without using their
names.
Notice that the relationship between the two tables above is the "P_Id" column.
Before we continue with examples, we will list the types of JOIN you can use, and
the differences between them.
JOIN: Return rows when there is at least one match in both tables
LEFT JOIN: Return all rows from the left table, even if there are no matches
in the right table
RIGHT JOIN: Return all rows from the right table, even if there are no
matches in the left table
FULL JOIN: Return rows when there is a match in one of the tables
The INNER JOIN keyword return rows when there is at least one match in both
tables.
The INNER JOIN keyword return rows when there is at least one match in both
tables. If there are rows in "Persons" that do not have matches in "Orders", those
rows will NOT be listed.
The LEFT JOIN keyword returns all rows from the left table (table_name1), even if
there are no matches in the right table (table_name2).
The LEFT JOIN keyword returns all the rows from the left table (Persons), even if
there are no matches in the right table (Orders).
The RIGHT JOIN keyword Return all rows from the right table (table_name2), even if
there are no matches in the left table (table_name1).
The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if
there are no matches in the left table (Persons).
The FULL JOIN keyword return rows when there is a match in one of the tables.
The FULL JOIN keyword returns all the rows from the left table (Persons), and all
the rows from the right table (Orders). If there are rows in "Persons" that do not
have matches in "Orders", or if there are rows in "Orders" that do not have matches
in "Persons", those rows will be listed as well.
SELECT INTO :
The SQL SELECT INTO statement can be used to create backup copies of tables.
The SELECT INTO statement selects data from one table and inserts it into a
different table.
The SELECT INTO statement is most often used to create backup copies of tables.
SELECT *
INTO new_table_name [IN externaldatabase]
FROM old_tablename
Or we can select only the columns we want into the new table:
SELECT column_name(s)
INTO new_table_name [IN externaldatabase]
FROM old_tablename
The following SQL statement creates a "Persons_Backup" table with only the
persons who lives in the city "Sandnes":
SELECT Persons.LastName,Orders.OrderNo
INTO Persons_Order_Backup
FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id
The UNION operator is used to combine the result-set of two or more SELECT
statements.
Notice that each SELECT statement within the UNION must have the same number
of columns. The columns must also have similar data types. Also, the columns in
each SELECT statement must be in the same order.
Note: The UNION operator selects only distinct values by default. To allow duplicate
values, use UNION ALL.
PS: The column names in the result-set of a UNION are always equal to the column
names in the first SELECT statement in the UNION.
"Employees_Norway":
E_ID E_Name
01 Hansen, Ola
02 Svendson, Tove
03 Svendson, Stephen
04 Pettersen, Kari
"Employees_USA":
E_ID E_Name
01 Turner, Sally
02 Kent, Clark
03 Svendson, Stephen
04 Scott, Stephen
Now we want to list all the different employees in Norway and USA.
E_Name
Hansen, Ola
Svendson, Tove
Svendson, Stephen
Pettersen, Kari
Turner, Sally
Kent, Clark
Scott, Stephen
Note: This command cannot be used to list all employees in Norway and USA. In the
example above we have two employees with equal names, and only one of them will
be listed. The UNION command selects only distinct values.
Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause
specifies which record or records that should be updated. If you omit the WHERE
clause, all records will be updated!
DDL Commands :
The CREATE TABLE Statement
The data type specifies what type of data the column can hold. For a complete
reference of all the data types available in MS Access, MySQL, and SQL Server, go
to our complete
To delete a column in a table, use the following syntax (notice that some database
systems don't allow deleting a column):
To change the data type of a column in a table, use the following syntax:
SQL Constraints
Constraints are used to limit the type of data that can go into a table.
Constraints can be specified when a table is created (with the CREATE TABLE
statement) or after the table is created (with the ALTER TABLE statement).
NOT NULL
UNIQUE
PRIMARY KEY
FOREIGN KEY
CHECK
DEFAULT
The NOT NULL constraint enforces a column to NOT accept NULL values.
The NOT NULL constraint enforces a field to always contain a value. This means
that you cannot insert a new record, or update a record without adding a value to
this field.
The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain unique values.
A primary key column cannot contain NULL values.
Each table should have a primary key, and each table can have only one primary
key.
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY
constraint on multiple columns, use the following SQL syntax:
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY
constraint on multiple columns, use the following SQL syntax:
MySQL / SQL Server / Oracle / MS Access:
CREATE TABLE Orders
(
O_Id int NOT NULL,
OrderNo int NOT NULL,
P_Id int,
PRIMARY KEY (O_Id),
CONSTRAINT fk_PerOrders FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)
)
The CHECK constraint is used to limit the value range that can be placed in a
column.
If you define a CHECK constraint on a single column it allows only certain values
for this column.
If you define a CHECK constraint on a table it can limit the values in certain
columns based on values in other columns in the row.
e.g
1) P_Id int NOT NULL CHECK (P_Id>0)
2) CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes')
To create a CHECK constraint on the "P_Id" column when the table is already
created, use the following SQL:
The default value will be added to all new records, if no other value is specified.
e.g
1) City varchar(255) DEFAULT 'Mumbai'
Prepared by, Santosh Kabir. 56 SQL
Mobile: 98336 29398
DBMS
Indexes :
Indexes allow the database application to find data fast; without reading the whole
table. An index can be created in a table to find data more quickly and efficiently.
The users cannot see the indexes, they are just used to speed up searches/queries.
Note: Updating a table with indexes takes more time than updating a table without
(because the indexes also need an update). So you should only create indexes on
columns (and tables) that will be frequently searched against.
SQL Views
A view is a virtual table. A view contains rows and columns, just like a real table.
The fields in a view are fields from one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present
the data as if the data were coming from one single table.
Note: A view always shows up-to-date data! The database engine recreates the data,
using the view's SQL statement, every time a user queries a view.
The records (i.e. set of tuples) in a view is a result of evaluation of the query
expression that defines the view at that time. Thus if the view relation is computed
and stored, it may become outdated if the tables (relations) used to define the view
are modified. To avoid this, the views are not stored as result of the query, but the
definition of view it self is stored with database. Wherever the view name appears in
a query (or relational expression) it is replaced with the query expression. Thus
whenever we evaluate a query the view relation is recomputed.
Some databases allow the view relation (table) to be stored, but they make sure that
if the actual relations (tables) in the view definition change, the view is kept up to
date. Such views are called as materialized views.
Views can be defined using existing views, called as view expansion.
Another view in the Northwind sample database selects every product in the
"Products" table with a unit price higher than the average unit price:
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName,UnitPrice FROM Products
WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)
Another view in the Northwind database calculates the total sale for each category
in 1997. Note that this view selects its data from another view called "Product Sales
for 1997":
CREATE VIEW [Category Sales For 1997] AS
SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales
FROM [Product Sales for 1997]
GROUP BY CategoryName
We will have to use the IS NULL and IS NOT NULL operators instead.
SQL IS NULL
How do we select only the records with NULL values in the "Address" column?
We will have to use the IS NULL operator:
SELECT LastName,FirstName,Address FROM Persons
WHERE Address IS NULL
Example Tables
In the subsequent text, the following 3 example tables are used:
p Table (parts) s Table (suppliers) sp Table (suppliers & parts)
pno descr color sno name city sno pno qty
P1 Widget Blue S1 Pierre Paris S1 P1 NULL
P2 Widget Red S2 John London S2 P1 200
P3 Dongle Green S3 Mario Rome S3 P1 1000
S3 P2 200
Joining Tables
The FROM clause allows more than 1 table in its list, however simply listing more
than one table will very rarely produce the expected results. The rows from one
table must be correlated with the rows of the others. This correlation is known as
joining.
An example can best illustrate the rationale behind joins. The following query:
SELECT * FROM sp, p
Produces:
sno pno qty pno descr color
S1 P1 NULL P1 Widget Blue
S1 P1 NULL P2 Widget Red
S1 P1 NULL P3 Dongle Green
S2 P1 200 P1 Widget Blue
S2 P1 200 P2 Widget Red
S2 P1 200 P3 Dongle Green
S3 P1 1000 P1 Widget Blue
S3 P1 1000 P2 Widget Red
S3 P1 1000 P3 Dongle Green
S3 P2 200 P1 Widget Blue
S3 P2 200 P2 Widget Red
S3 P2 200 P3 Dongle Green
Each row in sp is arbitrarily combined with each row in p, giving 12 result rows (4
rows in sp X 3 rows in p.) This is known as a cartesian product.
A more usable query would correlate the rows from sp with rows from p, for
instance matching on the common column -- pno:
This produces:
Prepared by, Santosh Kabir. 60 SQL
Mobile: 98336 29398
DBMS
Rows for each part in p are combined with rows in sp for the same part by matching
on part number (pno). In this query, the WHERE Clause provides the join predicate,
matching pno from p with pno from sp.
The join in this example is known as an inner equi-join. equi meaning that the join
predicate uses = (equals) to match the join columns. Other types of joins use
different comparison operators. For example, a query might use a greater-than join.
The term inner means only rows that match are included. Rows in the first table
that have no matching rows in the second table are excluded and vice versa (in the
above join, the row in p with pno P3 is not included in the result.) An outer join
includes unmatched rows in the result. See Outer Join below.
More than 2 tables can participate in a join. This is basically just an extension of a
2 table join. 3 tables -- a, b, c, might be joined in various ways:
Plus several other variations. With inner joins, this structure is not explicit. It is
implicit in the nature of the join predicates. With outer joins, it is explicit; see below.
This query performs a 3 table join:
SELECT name, qty, descr, color FROM s, sp, p WHERE s.sno = sp.sno
AND sp.pno = p.pno
It joins s to sp and sp to p, producing:
name qty descr color
Pierre NULL Widget Blue
John 200 Widget Blue
Mario 1000 Widget Blue
Mario 200 Widget Red
Note that the order of tables listed in the FROM clause should have no significance,
nor does the order of join predicates in the WHERE clause.
Outer Joins
An inner join excludes rows from either table that don't have a matching row in the
other table. An outer join provides the ability to include unmatched rows in the
query results. The outer join combines the unmatched row in one of the tables with
an artificial row for the other table. This artificial row has all columns set to null.
The outer join is specified in the FROM clause and has the following general format:
LEFT -- only unmatched rows from the left side table (table-1) are retained
RIGHT -- only unmatched rows from the right side table (table-2) are retained
FULL -- unmatched rows from both tables (table-1 and table-2) are retained
Self Joins
A query can join a table to itself. Self joins have a number of real world uses. For
example, a self join can determine which parts have more than one supplier:
SELECT DISTINCT a.pno FROM sp a, sp b
WHERE a.pno = b.pno AND a.sno <> b.sno
pno
P1
As illustrated in the above example, self joins use correlation names to distinguish
columns in the select list and where predicate. In this case, the references to the
same table are renamed - a and b.
Subqueries
Subqueries are an identifying feature of SQL. It is called Structured Query Language
because a query can nest inside another query.
Predicate subqueries are used in the WHERE (and HAVING) clause. Each is a
special logical construct. Except for EXISTS, predicate subqueries must retrieve one
column (in their select list.)
IN Subquery
The IN Subquery tests whether a scalar value matches the single query column
value in any subquery result row. It has the following general format:
value-1 [NOT] IN (query-1)
Using NOT is equivalent to:
NOT value-1 IN (query-1)
For example, to list parts that have suppliers:
SELECT * FROM p WHERE pno IN (SELECT pno FROM sp)
pno descr color
P1 Widget Blue
P2 Widget Red
The Self Join example in the previous subsection can be expressed with an
IN Subquery:
pno
P1
Note that the subquery where clause references a column in the outer query
(a.sno). This is known as an outer reference. Subqueries with outer references
are sometimes known as correlated subqueries.
Quantified Subqueries
A quantified subquery allows several types of tests and can use the full set of
comparison operators. It has the following general format:
value-1 {=|>|<|>=|<=|<>} {ANY|ALL|SOME} (query-1)
The comparison operator specifies how to compare value-1 to the single query
column value from each subquery result row. The ANY, ALL, SOME specifiers give
the type of match expected. ANY and SOME must match at least one row in the
subquery. ALL must match all rows in the subquery, or the subquery must be
empty (produce no rows).
For example, to list all parts that have suppliers:
SELECT * FROM p WHERE pno =ANY (SELECT pno FROM sp)
pno descr color
P1 Widget Blue
P2 Widget Red
A self join is used to list the supplier with the highest quantity of each part
(ignoring null quantities):
SELECT * FROM sp a WHERE qty >ALL (SELECT qty FROM sp b
WHERE a.pno = b.pno AND a.sno <> b.sno AND qty IS NOT NULL)
EXISTS Subqueries
The EXISTS Subquery tests whether a subquery retrieves at least one row,
that is, whether a qualifying row exists. It has the following general format
EXISTS(query-1)
Any valid EXISTS subquery must contain an outer reference. It must be a correlated
subquery.
Note: the select list in the EXISTS subquery is not actually used in evaluating the
EXISTS, so it can contain any valid select list (though * is normally used).
To list parts that have suppliers:
SELECT *
FROM p
WHERE EXISTS(SELECT * FROM sp WHERE p.pno = sp.pno)
Scalar Subqueries
The Scalar Subquery can be used anywhere a value can be used. The subquery
must reference just one column in the select list. It must also retrieve no more than
one row.
When the subquery returns a single row, the value of the single select list column
becomes the value of the Scalar Subquery. When the subquery returns no rows, a
database null is used as the result of the subquery. Should the subquery retreive
more than one row, it is a run-time error and aborts query execution.
A Scalar Subquery can appear as a scalar value in the select list and where
predicate of an another query. The following query on the sp table uses a Scalar
Subquery in the select list to retrieve the supplier city associated with the supplier
number (sno column in sp):
The next query on the sp table uses a Scalar Subquery in the where clause to
match parts on the color associated with the part number (pno column in sp):
SELECT *
FROM sp
WHERE 'Blue' = (SELECT color FROM p WHERE p.pno = sp.pno)
sno pno qty
S1 P1 NULL
S2 P1 200
S3 P1 1000
Note that both example queries use outer references. This is normal in Scalar
Subqueries. Often, Scalar Subqueries are Aggregate Queries.
COMMIT Statement -- commit (make persistent) all changes for the current
transaction
ROLLBACK Statement -- roll back (rescind i.e. cancel) all changes for the
current transaction
Transaction Overview
A database transaction is a larger unit that frames multiple SQL statements. A
transaction ensures that the action of the framed statements is atomic with respect
to recovery.
COMMIT Statement
The COMMIT Statement terminates the current transaction and makes all changes
under the transaction persistent. It commits the changes to the database. The
COMMIT statement has the following general format:
COMMIT [WORK]
WORK is an optional keyword that does not change the semantics of COMMIT.
ROLLBACK Statement
The ROLLBACK Statement terminates the current transaction and rescinds all
changes made under the transaction. It rolls back the changes to the database. The
ROLLBACK statement has the following general format:
ROLLBACK [WORK]
WORK is an optional keyword that does not change the semantics of ROLLBACK.
GRANT Statement
The GRANT Statement grants access privileges for database objects to other users.
It has the following general format:
GRANT privilege-list ON [TABLE] object-list TO user-list
privilege-list is either ALL PRIVILEGES or a comma-separated list of properties:
SELECT, INSERT, UPDATE, DELETE. object-list is a comma-separated list of table
and view names. user-list is either PUBLIC or a comma-separated list of user
names.
The optional specificier WITH GRANT OPTION may follow user-list in the GRANT
statement. WITH GRANT OPTION specifies that, in addition to access privileges, the
privilege to grant those privileges to other users is granted.
=====0000=====
Thus to ensure data integrity, every database system must maintain following
four properties of the transactions.
1) Atomicity 2) Consistency, 3)Isolation and 4) Durability.
These properties are often called as ACID properties (acronym derived from first
letter of each of the four properties.
Let‟s consider a transaction in which money is transferred (say Rs.1000) from
account A to account B, having Rs.5000 and Rs.4000 balance respectively,
before the transfer is done. ( To complete the transaction, database performs two
update operations as mentioned above. )
Consistency: Before any transaction starts the data-items in the database are
assumed to be in a consistent i.e. some stable and meaning-full state.
After the transaction is complete the database must achieve a new consistent
state. If consistency requirement is not maintained then, money may be
deducted from one account but not added to the other or the other way.
Ensuring consistency for individual transaction is the responsibility of the
application programmer who codes the transaction.
Durability: Once the transaction is complete and database has indicated so, any
system failure afterwards should not result into any loss of data.
The durability property ensures that, once a transaction completes successfully,
all the updates it carried out on the database persist, even if there is a system
failure. Ensuring durability is responsibility of Recovery-management
component of database system.
Transaction State:
Transaction consists of multiple database operations. These operations can be
database updates ( like combination of insert, delete or change) or simple data
retrievals. If all the operations in the transaction are completed successfully then
we say transaction completed successfully. But, a transaction can fail. Such
transaction is termed Aborted. If transaction can not finish successfully then any
changes made by the transaction must undone. Once the changes caused by an
aborted transaction are undone, we say that the transaction has is Rolled back.
A transaction that completes its execution successfully is said to be Committed.
Thus, the transaction can be in one of the following state.
Active: this is initial state. The transaction stays in this stage while it is
executing.
Partially Committed: When the last operation of the transaction is executed.
Failed: When transaction can not execute further normally.
Aborted: After the transaction has been rolled back and the database has been
restored to its state prior to the start of transaction.
Committed: After successful completion of transaction.
Following figure shows the state diagram of a transaction.
Prepared by, Prof Santosh Kabir. 68 of 90 5.Transaction Management
Mobile: 98336 29398
Partially
committed Committed
Active
Failed Aborted
From the state diagram it is seen that, once transaction starts working it can
complete all its operations successfully (partially committed state) or can fail. If
a transaction fails then it must be aborted and must be rolled back.
All the operations done by an active transaction are done in memory ( and not on
actual database) and hence the changes are done in the copy of a database in
memory. Once all the operations are completed successfully, all the changes are
updated in database. Here, the transaction is supposed to be committed. A
transaction is said to be Terminated if it is either committed or aborted.
Once a transaction is aborted and rolled back system has two options as follows,
a) A transaction can be restarted, only if the transaction was aborted because
of some hardware or software problem. A restarted transaction is treated
as a new transaction.
b) It can kill a transaction if there is logical error in the instructions forming
the transaction. In this case the transaction must be rewritten.
The scheme is inefficient especially for the large databases, since executing a single
transaction requires copying the entire database. Also, the scheme doesn‟t allow
multiple transactions to execute concurrently.
The total amount in the two accounts is Rs.9,000 before transactions starts.
After both are executed one after the other, the final amounts in the two
accounts will be, Rs.3600 and Rs.5400 in A and B respectively. The total of the
amounts will be again Rs.9,000.
The two transactions can be written as follows,
T1 T2
Read(A)
A = A – 1000
Write A
Read(B)
B = B + 1000
Write(B)
Read(A)
Temp = A * 0.1
A = A – temp
Write A
Read(B)
B = B + temp
Write(B)
Schedule 1
T1 T2
Read(A)
A = A – 1000
Write A
Read(A)
Temp = A * 0.1
A = A – temp
Write A
Read(B)
B = B + 1000
Write(B)
Read(B)
B = B + temp
Write(B)
Schedule 2
Here, after the two transactions are complete, the account A will hold Rs.3600
and B will hold Rs. 5400, i.e. total is Rs.9000, thus maintaining the consistency.
Transactions may be performing any kind of updates and they can be internally
very complicated programming instructions. Thus, instead considering the
details of the transactions we consider only read and write operations. There can
many complicated instructions between data read, update and write. Thus, if
read/write is performed on data item Q, we represent the operations in the
transactions just by Read(Q) and Write(Q).
T1 T2
Read(A)
Write(A)
Read(A)
Write(A)
Read(B)
Write(B)
Read(B)
Write(B)
Schedule 3
Conflict Serializability:
Each transactions can consists of several operations on same or different data
items in a database. For this there are multiple instructions executed (Mostly
SQL statements). Lets consider that there are multiple transactions working
concurrently for which we will consider a part of a schedule say S, in which
there are two transactions say Ti and Tj. Also, consider that there are two
consecutive instructions Ii and Ij, of transactions Ti and Tj respectively. If the
instructions (Ii and Ij) are working on two different data items then we can swap
Ii and Ij, without affecting the result of any instruction in the schedule.
However, if the instructions are working with the same data item Q then the
order of the two instructions will matter. According to the type of operations
performed (read/write) there are four cases to be considered,
1. If Ii and Ij both are Read(Q) : The order of Ii and Ij does not matter, since
same value of Q is read by Ti and Tj, regardless of the order of the
instructions.
2. If Ii is Read(Q) and Ij is Write(Q) : If Ii comes before Ij then Ii reads the old
value of Q i.e. the value that is not written (or updated) by Ij. But, if Ii
works after Ij then Ii reads a value of Q that is written (or updated) by Ij.
Thus, the order of Ii and Ij matters.
3. If Ii is Write(Q) and Ij is Read(Q): This is same as previous case and order
of the instructions matters.
4. Ii an Ij both are Write(Q) : Since both are write operations the order of the
instructions does not affect either Ti or Tj. However, the value read by the
next Read(Q) in the schedule S is affected, since it will read the value
written (or updated) by one of the instructions (Ii or Ij), whichever worked
last.
Thus, it is clear that only in the case when both the instructions are Read
instructions the, order of the instructions can be changed.
Thus, we say that Ii and Ij conflict if they are operations by different transactions
on the same data item and at least one of them is Write instruction.
Consider the schedule 3 discussed previously, ( draw in your answer ). The
Write(A) instruction of T1, conflicts with the Read(A) of T2. However, the Write(A)
of T2 is not conflicting with Read(B) of T1, since they are working on two
different data item.
Swapping the non-conflicting instructions of the two transactions in schedule S,
we can produce a new schedule say S’.
By performing following swaps on S3 (one at time ), we can produce a new
schedule S4 as shown below,
Prepared by, Prof Santosh Kabir. 73 of 90 5.Transaction Management
Mobile: 98336 29398
First swap Write(A) and Read(B) of T2 and T1 respectively.
Swap Read(B) and Read(A) of T1 and T2 respectively.
Swap Write(B) and Write(A) of T1 and T2 respectively.
Finally swap Write(B) and Read(A) of T and T2 respectively.
T1 T2
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)
Schedule 4
Thus, Schedule S3 is equivalent to a serial schedule S4.
If a schedule S can be transformed into a schedule S ’ , by series of swaps of
non-conflicting instructions, we say that S and S’ are Conflict equivalent.
We say that a schedule S is Conflict serializable if it is conflict equivalent to a serial
schedule S’. Thus, we say that schedule S3 is conflict serializable.
View Serializability :
Consider two schedules S and S‟, where the same set of transactions participates
in both schedules. The schedules S and S‟ are said to be View equivalent if three
conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in
schedule S, then transaction Ti must, in schedule S‟, also read the initial
value of Q.
2. For each data item Q, if transaction Ti executes read(Q) in schedule S,
and if that value was produce by a write (Q) operation executed by
transaction Tj, then the read (Q) operation of transaction Ti must, in
schedule S‟, also read the value of Q that was produced by the same write
(Q) operation of transaction Tj.
3. For each data item Q, the transaction (if any) that performs the final write
(Q) operation in schedule S must perform the final write(Q) operation in
schedule S‟.
Conditions 1and 2 ensure that each transaction reads the same values in
both schedules and, therefore, performs the same computation. Condition 3,
coupled with conditions1 and 2, ensures that both schedules result in same
final system state.
Consider following two schedules S5 and S6. In S5, T2 starts after T1 is finished.
And in S6, the transaction T2 completes first before T1.
Read(A) Read(A)
Write A Write A
Read(B) Read(B)
Write(B) Write(B)
Read(A)
Write A Read(A)
Read(B) Write A
Write(B) Read(B)
Write(B)
Schedule 5 Schedule 6
The two are not View equivalent, since in schedule S5 the value of A read by T2
was produced by T1, where as this case does not hold in S6.
Schedule - 7
T1 T2 T3
Read(Q)
Write(Q)
Write(Q)
Write(Q)
Schedule - 8
Since same instruction Read(Q) instruction reads initial value of Q in both the
schedule and final write is done by T3 in both the schedules.
Observe that, in above schedules, transactions T2 and T3 perform write (Q)
operations without having performed a read (Q) operation. Writes of this sort are
called Blind writes. Blind writes appear in any view-serializable schedule that is
not conflict serializable.
Recoverability of Schedules:
If a transaction Ti fails, for whatever reason, we need to undo the changes done
by the transaction. In a system that allows concurrent executions, it is also
necessary to ensure that any transaction Tj that is dependent on Ti is also
aborted if Ti fails.
Read (A)
Read (B)
Schedule 9
Here, suppose T2 commits immediately after Read(A). But, T1 which has to yet
finish Reab(B) fails, then the T1 is aborted and all the updates done by T1 are to
be undone. But, T2 has read the data item A that is written by T1, we have to
abort T2 also to ensure atomicity. But, since T2 has already committed, it can
not be rolled back.
Schedule 9, with the commit happening immediately after the read (A)
instruction is an example of a nonrecoverable schedule, which should not be
allowed. Most database system require that all schedule be recoverable. A
recoverable schedule is one where, for each pair of transactions Ti and Tj such
that Tj reads a data item previously written by Ti, the commit operation of Ti
appears before the commit operation of Tj.
Read (A)
Write (A)
Read (A)
Schedule 10
Transaction T10 writes a value of A that is read by transaction T11. Transaction
T11 writes a value of A read by transaction T12. Suppose that, at this point T10
fails. T10 must be rolled back. Since T11 is dependent on T10, T11 must be rolled
back. Since T12 is dependent on T11, T12 must be rolled back. This phenomenon,
in which a single transaction failure leads to a series of transaction rollbacks, is
called cascading rollback.
Cascading rollback is undesirable, since it leads to the undoing of a significant
amount of work. It is desirable to restrict the schedules to those where cascading
rollbacks cannot occur. Such schedules are called cascadeless schedules.
Formally, a cascadeless schedule is one where, for each pair of transaction Ti
and Tj such that Tj reads data item previously written by Ti, the commit
operation of Ti appears before the read operation of Tj. It is easy to verify that
every cascadeless schedule is also recoverable.
-----000-----
Prepared by, Prof Santosh Kabir. 76 of 90 5.Transaction Management
Mobile: 98336 29398
DBMS
6. Concurrency Control
One of the important properties of transaction is isolation. When multiple
transactions execute concurrently in the database, the isolation property may no
longer be preserved. To ensure the isolation of transactions from each other,
there are various mechanisms used called as concurrency control schemes.
[ Here we consider all the schedules are serializable ]
T2: Lock-S(A)
Read(A)
Lock-S(B)
Read(B)
Display( A + B)
Unlock(A)
Unlock(B)
Lock-X(A)
Read(A)
A=A -1000
Write(A)
Lock-S(B)
Read(B)
Lock-S(A)
Lock-X(B)
Granting of Locks:
When a transaction requests a lock on a data item in a particular mode, and no
other transaction has lock on the same data item in incompatible mode, the lock
can be granted. However, in some situations another transaction waits for the
data items to get free of all incompatible locks and it doesn‟t get lock on the data
item for very long time. For example a transaction T2 has a shared mode lock on
data item Q, and the other transaction T1 is requesting an exclusive lock on Q,
then T1 has to wait. In the mean time transaction T3 requests a shared mode
lock on Q and gets it, since it is compatible to the lock on Q. It may happen that
T2 will release the lock but the T3 is holding and T1 has to wait again for T3 to
release the lock. In this way it can happen that a sequence of transactions can
request compatible mode locks and they will be granted the lock and T1 has to
wait till all such transactions are finished. The transaction T1 may not proceed
and is said to be starved.
To avoid such situations, concurrency-manager grants locks in the following
manner:
Timestamps:
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by
TS(Ti). This timestamp is assigned by the database system before the transaction
Ti starts execution. If a transaction Ti has been assigned timestamp TS (Ti), and
a new transaction Tj enters the system, then TS (Ti) <TS (Tj) i.e. timestamp of Ti
will be less than Tj, if Tj enters the database system later. There are two simple
methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a
transaction‟s timestamp is equal to the value of the clock when the
transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp has been
assigned; that is, a transaction‟s timestamp is equal to the value of the
counter when the transaction enters the system.
Read( Q )
Write(Q)
Write(Q)
Schedule 2
Since T1 starts before T2, TS(T1) < TS(T2). The first Read(Q) operation of T1 and
Write(Q) operation of T2 will be successful, but according to the above mentioned
rule, the Write(Q) operation of T1 will be rejected and T1 will be rolled back. This
is because T1 is writing an obsolete value of Q, which already written by T2.
This rolling back of T2 is not necessary according to Thomas write rule, which
states that,
If Timestamp of Ti is less than the Write timestamp of Q, then since Ti is
attempting to write an obsolete value of Q, this write operation will be just
ignored.
Validation-Based Protocols :
In cases where a majority of transactions are read-only transactions, the rate of
conflicts among transactions may be low. Thus, many of these transactions, if
executed without the supervision of a concurrency-control scheme, may not
create problem of data inconsistency. A concurrency-control scheme imposes
overhead of code execution and possible delay of transactions. It may be better
to use an alternative scheme that imposes less overhead. A difficulty in reducing
the overhead is that we do not know in advance which transactions will be
involved in a conflict. To gain that knowledge, we need a scheme for monitoring
the system.
We assume that each transaction Ti executes in two or three different phases in
its lifetime, depending on whether it is a read-only or an update transaction. The
phases are, in order,
Each transaction must go through all the three phases as shown. The
transactions can be then interleaved by checking the validation phase, and for
this we need to know when the various phases took place. Three different time-
stamp are assigned to the phases of transaction say Ti.
Start(Ti) : the time when Ti started execution.
Validation(Ti): The time when Ti finished its read phase and started its validation
phase.
Finish(Ti): The time when Ti finished its write phase.
Deadlock Handling
A system is in a deadlock state if there exists a set of transactions such that every
transaction in the set is waiting for another transaction in the set. More precisely, there
exists a set of waiting transactions {T0, T1, . . ., Tn} such that T0 is waiting for a
data item that T1 holds, and T1 is waiting for a data item that T2 holds, and
Tn-1 is waiting for a data item that Tn holds, and Tn is waiting for a data item
that T0 holds. None of the transaction can make progress in such a situation.
The only remedy to this undesirable situation is for the system to invoke some
drastic action, such as rolling back some of the transactions involved in the
deadlock. Rollback of a transaction may be partial: That is, a transaction may be
rolled back to the point where it obtained a lock whose release resolves the
deadlock.
There are two principal methods for dealing with the deadlock problem. We
can use a deadlock prevention protocol to ensure that the system will never enter a
deadlock state. Alternatively, we can allow the system to enter a deadlock state,
and then try to recover by using a deadlock detection and deadlock recovery scheme.
Prepared by, Prof Santosh Kabir. 82 6. Concurrency Control
Mobile: 98336 29398
DBMS
Deadlock Prevention :
There are two approaches to deadlock prevention. One approach ensures that no
cyclic waits can occur by ordering the requests for locks, or requiring all locks to
be acquired together.
The simplest scheme under the first approach requires that each transaction
locks all its data items before it begins execution. Moreover, either all are locked
in one step or none are locked. There are two main disadvantages to this
protocol:
(1) it is often hard to predict, before the transaction begins, what data items
need to be locked;
(2) data-item utilization may be very low, since many of the data items may be
locked but unused for a long time.
Another approach for preventing deadlocks is to impose an ordering of all
data items, and to require that a transaction lock data items only in a sequence
consistent with the ordering.
The second approach for preventing deadlocks is to use preemption and
transaction rollbacks. In preemption, when a transaction T2 requests a lock that
transaction T1 holds, the lock granted to T1 may be preempted by rolling back of
T1, and granting of the lock to T2. To control the preemption, we assign a unique
timestamp to each transaction. The system uses these timestamps only to decide
whether a transaction should wait or roll back.
Timeout-Based Schemes
Another simple approach to deadlock handling is based on lock timeouts. In this
approach, a transaction that has requested a lock waits for a specified amount of
time. If the lock has not been granted within that time, the transaction is said to
time out, and it roll itself back and restarts. If there was in fact a deadlock, one
or more transactions involved in the deadlock will time out and roll back,
allowing the others to proceed. This scheme falls somewhere between deadlock
prevention, where a deadlock will never occur, and deadlock detection and
recovery.
Deadlock Detection :
Deadlocks can be described precisely in terms of directed graphs called Wait-for
graph. The graph consists of vertices that represent transactions and the directed
edges between any two vertices (say transactions Ti and Tj ) indicate that the Ti
is waiting for Tj, if there is an edge directed from Ti to Tj (denoted by TiTj ).
When any transaction Tn requests a data item currently locked by Tm then the
edge Tn Tm is inserted in the wait for graph. The edge is removed only if the
Tm is no longer holding lock on data item needed by Tn.
Deadlock exists in the system if and only if the wait-for graph contains a cycle.
Consider following Wait-for graph.
T2 T4
T1
T3
T2 T4
T1
T3
---000---
* With practicals *
Failure Classification
Transaction failure. There are two types of errors that may cause a
transaction to fail:
Logical error. The transaction can no longer continue with its normal
execution because of some internal condition, such as bad input, data not
found, overflow, or resource limit exceeded.
System error. The system has entered an undesirable state (for example,
deadlock), as a result of which a transaction cannot continue with its
normal execution. The transaction, however, can be re-executed at a later
time.
Log-Based Recovery
The most widely used structure for recording database modifications is the log.
The log is a sequence of log records, recording all the update activities in the
database.
e.g. an information about the currently executed operations is kept aside in the
form of records. If transaction or system fails then to recover the database
Prepared by, Santosh Kabir. 86 7. Recovery System
Mobile: 98336 29398
system from this failure these records will be used. These records are called log
records.
There are several types of log records. An update log record describes a single
database write. It has these fields:
Whenever a transaction performs a write, it is essential that the log record for
that write be created before the database is modified. Once a log record exists,
we can output the modification to the database if that is desirable. Also, we have
the ability to undo a modification that has already been output to the database.
We undo it by using the old-value field in log records.
For log records to be useful for recovery from system and disk failures, the log
must reside in stable storage. Observe that the log contains a complete record of
all database activity.
Using the log, system can handle any failure that does not result in the loss of
information in non-volatile storage, i.e. in the disk. Recovery scheme uses
Checkpoints:
In a log-based recovery scheme, when a system failure occurs, the entire log
need to be searched to determine which transactions should be redone and
which to be undone. This process is inefficient since the searching particular
transactions from entire log is time consuming. Also, although not harmful, but
redoing the transactions which have already done updates, is not needed.
Shadow Paging
An alternative to log-based crash-recovery techniques is shadow paging. Under
certain circumstances, shadow paging may require fewer disk accesses than do
the log-based methods discussed previously. It is hard to extend shadow paging
to allow multiple transactions to execute concurrently, this is one of the
limitations of Shadow pafing.
Page table
………
Pages on disk
Also, there are some unused pages called as free pages on a disk. System
maintains list these free pages called as Free page list.
The key idea behind the shadow-paging technique is to maintain two page tables
during the life of a transaction: the current page table and the shadow page
table. These table maintain the pointers for the pages in which the records to be
updated are found. When the transaction starts, both page tables are identical.
The shadow page table is never changed over the duration of the transaction. The
current page table may be changed when a transaction performs a write
operation. All input and output operations use the current page table to locate
database pages on disk.
Page 2 ( Data X )
2 2
3 Page 3 3
5 5
Page 4
Shadow Page table
Page 5 current Page table
….
Page n
Page 2 ( Data X )
2 2
3 Page 3 3
5 5
Page 4
Shadow Page table
Page 5 current Page table
….
Page n