You are on page 1of 151

MSc.

Information Technology

Database Management
System
Semester I

Amity
University

Database Management System is primary ingredients of modern computing systems. Although


database concepts, technology and architectures have been developed and consolidated in the last
three decades, many aspects are subject to technological evolution and revolution. Thus,
developing a study material on this classical and yet continuously evolving field is a great
challenge.
Key features
This study material provides a widespread treatment of databases, dealing with the complete
syllabus for both an introductory course and an advanced course on databases. It offers a
balanced view of concepts, languages and architectures, with concrete reference to current
technology and to commercial database management systems (DBMS). It originates from the
authors experience in teaching, both the UG and PG classes for theory and application.
The study material is composed of seven chapters. Chapter 1 and 2 are designed to expose
students to the fundamental principles of database management and RDBMS concepts. It gives
an idea of how to design a database and develop its schema.Discussion of design techniques
starts with the introduction of the elements of the E-R (Entity-Relationship) model and proceeds
through a well-defined, staged process through conceptual design to the logical design, which
produces a relational schema.
Chapter 3 and 4 are devoted to advanced concepts, including Normalization, Functional
Dependency and use of structure query language required for mastering database technology.
Chapter 5 describes the fundamental and advance concept of procedural query language
commonly known as PL SQL. It improves the power of structure query language. PL/SQL
technology is like an engine that executes PL/SQL blocks and subprograms. This engine can be
started in Oracle server or in application development tools such as Oracle Forms, Oracle
Reports etc.

Chapter 6 and 7 is focusing on many advance concepts of Database systems including the
concept of Transaction Management, Concurrency Control Technology and Backup and
Recovery methods of database system.

Updated Syllabus
Course Contents:
Model I: Introduction to DBMS
Introduction to DBMS, Architecture of DBMS, Components of DBMS, Traditional data Models
(Network, Hierarchical and Relational), Database Users, Database Languages, Schemas and
Instances, Data Independence
Module II: Data Modeling
Entity sets attributes and keys, Relationships (ER), Database modeling using entity, Weak and
Strong entity types, Enhanced entity-relationship (EER), Entity Relationship Diagram Design of
an E-R Database schema, Object modeling, Specialization and generalization
Module III: Relational Database Model
Basic Definitions, Properties of Relational Model, Keys, Constraints, Integrity rules, Relational
Algebra, Relational Calculus.
Module IV: Relational Database Design
Functional

Dependencies,

Normalization,

Normal

forms

(1st,

2nd,

3rd,BCNF),

Lossless

decomposition, Join dependencies, 4th & 5th Normal form.


Module V: Query Language
SQL Components (DDL, DML, DCL), SQL Constructs (Selectfromwhere. group by.
having. order by), Nested tables, Views, correlated query, Objects in Oracle.
Module VI: PL/SQL
Introduction, Basic block, Structure of PL/SQL program, Control Statements, Exception
handling, Cursor Concept, Procedure, functions and triggers.
Module VII: Database Security and Authorization

Basic security issues, Discretionary access control, Mandatory access control, Statistical
database security.
Module VIII: Transaction Management and Concurrency Control Techniques
Transaction concept, ACID properties, Schedules and recoverability, Serial and Non-serial
schedules, Serializability, Concurrency Techniques: Locking Protocols, Timestamping Protocol,
Multiversion Technique, Deadlock Concept - detection and resolution.
Module IX: Backup and Recovery
Database recovery techniques based on immediate and deferred update, ARIES recovery
algorithm, Shadow pages and Write-ahead Logging
Text & References:

Tex t:

Fundamental of Database Systems, Elmasri & Navathe, Pearson Education, Asia

Data Base Management System, Leon & Leon, Vikas Publications

Database System Concepts, Korth & Sudarshan, TMH

R eferences:

Introduction to Database Systems, Bipin C Desai, Galgotia

Oracle 9i The Complete Reference, Oracle Press

Index:
Chapter

Page No.

Introduction to dbms and data modeling

Relational database model

34

Functional dependency and normalization

49

Structure query language

64

Procedural query language

78

Transaction management & concurrency conyrol technique

106

Database recovey, backup & security

138

Chapter-1
INTRODUCTION TO DBMS AND DATA MODELING
1. Introductory Concepts
Data: - Data is Collection of facts, upon which a conclusion is based. (Information or
knowledge has value, data has cost). Data can be represented in terms of numbers, characters,
pictures, sounds and figures
Data item: - Smallest named unit of data that has meaning in the real world (examples: last
name, Locality, STD_Code )
Database: - Interrelated collection of data that serves the needs of multiple users within one or
more organizations, i.e. interrelated collections of records of potentially many types.
Database administrator (DBA):- A person or group of person responsible for the effective
use of database technology in an organization or enterprise. DBA is said to be custodian or
owner of Database.
Database Management System: - DBMS is a logical collection of software programs which
facilitates large, structured sets of data to be stored, modified, extracted and manipulated in
different ways. Database Management System (DBMS) also provides security features that
protect against unauthorized users trying to gain access to confidential information and prevent
data loss in case of a system crash. Depending on the specific users requirement, users are
allowed access to either all, or specific database subschema, through the use of passwords.
DBMS is also responsible for the databases integrity, ensuring that no two users are able to
update the same record at the same time, as well as preventing duplicate entries, such as two
employees being given the same employee number.
The following are examples of database applications:
1. Computerized library systems.
2. Automated teller machines.
3. Airline reservation systems.

4. Inventory Management systems.

There are innumerable numbers of Database Management System (DBMS) Software available in
the market. Some of the most popular ones include Oracle, IBMs DB2, Microsoft Access,
Microsoft SQL Server, MySQL. MySQL is, one of the most popular database management
systems used by online entrepreneurs is one example of an object-oriented DBMS. Microsoft
Access (another popular DBMS) on the other hand is not a fully object oriented system, even
though it does exhibit certain aspects of it.

Example: A database may contain detailed student information, certain users may only be
allowed access to student names , addresses and Phone number, while others user may be able
to view payment detail of students or marks detail of student. Access and change logs can be
programmed to add even more security to a database, recording the date, time and details of any
user making any alteration to the database.

Furthermore, the Database Management Systems employ the use of a query language and report
writers to interrogate the database and analyze its data. Queries allow users to search, sort, and
analyze specific data by granting users efficient access to the required information.
Example: one would use a query command to make the system retrieve data regarding all
courses of a particular department. The most common query language used to access database
systems is the Structured Query Language (SQL).

2. Objectives of Database Management:


Data availabilitymakes an integrated collection of data available to a wide variety of users
* At reasonable costperformance in query update, eliminate or control data redundancy
* In meaningful formatdata definition language, data dictionary
* Easy accessquery language (4GL, SQL, forms, windows, menus);
Data integrityinsure correctness and validity
* Primary Key Constraint / Foreign Key Constraints / Check Constraints.
* Concurrency control and multi-user updates

* Audit trail.
Privacy (the goal) and security (the means)
* Schema/ Sub-schema,
* Passwords
Management controlDBA: lifecycle control, training, maintenance
Data independence (a relative term) -- Avoids reprogramming of applications, allows easier
conversion and reorganization of data.
Physical data independence:

Application program is unaffected by changes in the storage

structure or physical method of data accessing.


Logical data independence: Application program unaffected by changes in the logical schema

3. Database Models: Database information normally consists of subjects, such as customers,


employees or suppliers; as well as activities such as orders, payments or purchases. This
information must be organized into related record types through a process known as database
design. The DBMS that is chosen must be able to manage different relationships, which is where
database models come in.

3.1 Hierarchical databases organize data under the premise of a basic parent/child relationship.
Each parent can have many children, but each child can only have one parent. In hierarchical
databases, attributes of specific records are listed under an entity type and entity types are
connected to each other through one-to-many relationships, also known as 1:N mapping.
Originally, hierarchical relationships were most commonly used in mainframe systems, but with
the advent of increasingly complex relationship systems, they have now become too restrictive
and are thus rarely used in modern databases. If any of the one-to-many relationships are
compromised, for e.g. an employee having more than one manager, the database structure
switches from hierarchical to a network.
3.2 Network model: In the network model of a database it is possible for a record to have
multiple parents, making the system more flexible compared to the strict single-parent model of
the hierarchical database. The model is made to accommodate many to many relationships,
which allows for a more realistic representation of the relationships between entities. Even
though the network database model enjoyed popularity for a short while, it never really lifted of

the ground in terms of staging a revolution. It is now rarely used because of the availability of
more competitive models that boast the higher flexibility demanded in todays ever advancing
age.
3.3 Relational databases (RDBMS) are completely unique when compared to the
aforementioned models as the design of the records is organized around a set of tables (with
unique identifiers) to represent both the data and their relationships. The fields to be used for
matching are often indexed in order to speed up the process and the data can be retrieved and
manipulated in a number of ways without the need to reorganize the original database tables.
Working under the assumption that file systems (which often use the hierarchical or network
models) are not considered databases, the relational database model is the most commonly used
system today. While the concepts behind hierarchical and network database models are older
than that of the relational model, the latter was in fact the first one to be formally defined.

After the relational DBMS soared to popularity, the most recent development in DMBS
technology came in the form of the object-oriented database model, which offers more flexibility
than the hierarchical, network and relational models put together. Under this model, data exists
in the form of objects, which include both the data and the datas behavior. Certain modern
information systems contain such convoluted combinations of information that traditional data
models (including the RDBMS) remain too restrictive to adequately model this complex data.
The object-oriented model also exhibits better cohesion and coupling than prior models, resulting
in a database which is not only more flexible and more manageable but also the most able when
it comes to modeling real-life processes. However, due to the immaturity of this model, certain
problems are bound to arise, some major ones being the lack of an SQL equivalent as well as
lack of standardization. Furthermore, the most common use of the object oriented model is to
have an object point to the child or parent OID (object I.D.) to be retrieved; leaving many
programmers with the impression that the object oriented model is simply a reincarnation of the
network model at best. That is, however, an attempt at the over-simplification of an innovative
technology.

4. Components of a DBMS
Components of a Data Base Management System (DBMS) is well illustrated by the diagram
shown bellow.

4.1. Database Engine: Database Engine is the foundation for storing, processing, and securing
data. The Database Engine provides controlled access and rapid transaction processing to meet the
requirements of the most demanding data consuming applications within your enterprise. Use the
Database Engine to create relational databases for online transaction processing or online analytical
processing data. This includes creating tables for storing data, and database objects such as
indexes, views, and stored procedures for viewing, managing, and securing data. You can use SQL
Server Management Studio to manage the database objects, and SQL Server Profiler for capturing
server

events.

4.2. Data dictionary: A data dictionary is a reserved space within a database which is used to store
information about the database itself. A data dictionary is a set of table and views which can only
be read and never altered. Most data dictionaries contain different information about the data used
in the enterprise. In terms of the database representation of the data, the data table defines all
schema objects including views, tables, clusters, indexes, sequences, synonyms, procedures,

packages, functions, triggers and many more. This will ensure that all these things follow one
standard defined in the dictionary. The data dictionary also defines how much space has been
allocated for and / or currently in used by all the schema objects. A data dictionary is used when
finding information about users, objects, schema and storage structures. Every time a data
definition language (DDL) statement is issued, the data dictionary becomes modified.
A data dictionary may contain information such as:

Database design information

Stored SQL procedures

User permissions

User statistics

Database process information

Database growth statistics

Database performance statistics


4.3. Query Processor: A relational database consists of many parts, but at its heart are two major
components: the storage engine and the query processor. The storage engine writes data to and
reads data from the disk. It manages records, controls concurrency, and maintains log files. The
query processor accepts SQL syntax, selects a plan for executing the syntax, and then executes the
chosen plan. The user or program interacts with the query processor, and the query processor in
turn interacts with the storage engine. The query processor isolates the user from the details of
execution: The user specifies the result, and the query processor determines how this result is
obtained. The query processor components include

DDL interpreter

DML compiler

Query evaluation engine


4.4. Report writer: Also called a report generator, a program, usually part of a database
management system that extracts information from one or more files and presents the information
in a specified format. Most report writers allow you to select records that meet certain conditions
and to display selected fields in rows and columns. You can also format data into pie charts, bar

charts, and other diagrams. Once you have created a format for a report, you can save the format
specifications in a file and continue reusing it for new data.

5. Database Languages
5.1 Data Definition Language (DDL): Data Definition Language (DDL). It is use to define the
structure of a Database. The database structure definition (Schema) typically includes the
following:
Defining all data element, Defining data element field and records, Defining the name, field
length, and field type for each data type, Defining control for field that can have only selective
values.
Typical DDL operations (with their respective keywords in the structured query language SQL):

Creation of tables and definition of attributes (CREATE TABLE ...)

Change of tables by adding or deleting attributes (ALTER TABLE )

Deletion of whole table including content (DROP TABLE )


5.2 Data Manipulation Language (DML): Data Manipulation Language (DML) Once the
structure is defined the database is ready for entry and manipulation of data. Data Manipulation
Language (DML) includes the command to enter and manipulate the Data, with these commands
the user can Add new records, navigate through the existing records, view contents of various
fields, modify the data, delete the existing record, sort the record in desired sequence. Typical
DML operations (with their respective keywords in the structured query language SQL):

Add data (INSERT)

Change data (UPDATE)

Delete data (DELETE)


Query data (SELECT)

5.3 Data Control Language (DCL): Data control commands in SQL control access privileges
and security issues of a database system or parts of it. These commands are closely related to the

DBMS (Database Management System) and can therefore vary in different SQL
implementations. Some typical commands are:

GRANT

REVOKE

- give user access privileges to a database


withdraws access privileges given with the GRANT or taken with the DENY

command
Since these commands depend on the actual database management system (DBMS), we will not
cover DCL in this module.

6. Database USER
6.1 Database Administrator (DBA): The DBA is a person or a group of persons who is
responsible for the management of the database. The DBA is responsible for authorizing access
to the database by grant and revoke permissions to the users, for coordinating and monitoring its
use, managing backups and repairing damage due to hardware and/or software failures and for
acquiring hardware and software resources as needed. In case of small organization the role of
DBA is performed by a single person and in case of large organizations there is a group of
DBA's who share responsibilities.
6.2 Database Designers: They are responsible for identifying the data to be stored in the
database and for choosing appropriate structure to represent and store the data. It is the
responsibility of database designers to communicate with all prospective of the database users in
order to understand their requirements so that they can create a design that meets their
requirements.
6.3 End Users: End Users are the people who interact with the database through applications or
utilities. The various categories of end users are:
Casual End Users - These Users occasionally access the database but may need
different information each time. They use sophisticated database Query language to specify their
requests. For example: High level Managers who access the data weekly or biweekly.

Native End Users - These users frequently query and update the database using standard
types of Queries. The operations that can be performed by this class of users are very limited and
effect precise portion of the database.
For example: - Reservation clerks for airlines/hotels check availability for given request and
make reservations. Also, persons using Automated Teller Machines (ATM's) fall under this
category as he has access to limited portion of the database.
Standalone end Users/On-line End Users - Those end Users who interact with the
database directly via on-line terminal or indirectly through Menu or graphics based Interfaces.
For example: - User of a text package, library management software that store variety of library
data such as issue and return of books for fine purposes.
6.4 Application Programmers
Application Programmers are responsible for writing application programs that use the database.
These programs could be written in General Purpose Programming languages such as Visual
Basic, Developer, C, FORTRAN, COBOL etc. to manipulate the database. These application
programs operate on the data to perform various operations such as retaining information,
creating new.

7. ADVANTAGES OF DBMS
The DBMS (Database Management System) is preferred over the conventional file
processing system due to the following advantages:

Controlling Data Redundancy - In the conventional file processing system, every user group
maintains

its

own

files

for

handling

its

data

files.

This

may

lead

to

Duplication of same data in different files.


Wastage of storage space.
Errors may be generated due to updating of the same data in different files.
Time in entering data again and again is wasted.
Computer Resources are needlessly used.

It is very difficult to combine information.


The entire above mentioned problem was eliminated in Database Management System.
Elimination of Inconsistency - In the file processing system information is duplicated
throughout the system. So changes made in one file may be necessary be carried over to another
file. This may lead to inconsistent data. So we need to remove this duplication of data in multiple
file to eliminate inconsistency.
For example: - Let us consider an example of student's result system. Suppose that in
STUDENT file it is indicated that Roll no= 10 has opted for 'Computer' course but in RESULT
file it is indicated that 'Roll No. =l0' has opted for 'Accounts' course. Thus, in this case the two
entries for particular student don't agree with each other. Thus, database is said to be in an
inconsistent state. Science to eliminate this conflicting information we need to centralize the
database. On centralizing the data base the duplication will be controlled and hence
inconsistency will be removed. Data inconsistency are often encountered in every day life
Consider an another example, we have all come across situations when a new address is
communicated to an organization that we deal it (Eg - Telecom, Gas Company, Bank). We find
that some of the communications from that organization are received at a new address while
other continued to be mailed to the old address. So combining all the data in database would
involve reduction in redundancy as well as inconsistency so it is likely to reduce the costs for
collection storage and updating of Data.
Better service to the users - A DBMS is often used to provide better services to the users. In
conventional system, availability of information is often poor, since it normally difficult to
obtain information that the existing systems were not designed for. Once several conventional
systems are combined to form one centralized database, the availability of information and its
update ness is likely to improve since the data can now be shared and DBMS makes it easy to
respond to anticipated information requests.
Centralizing the data in the database also means that user can obtain new and combined
information easily that would have been impossible to obtain otherwise. Also use of DBMS
should allow users that don't know programming to interact with the data more easily, unlike file
processing system where the programmer may need to write new programs to meet every new
demand.

Flexibility of the System is improved - Since changes are often necessary to the contents of the
data stored in any system, these changes are made more easily in a centralized database than in a
conventional system. Applications programs need not to be changed on changing the data in the
database. This will also maintain the consistency and integrity of data into the database.

Integrity can be improved - Since data of the organization using database approach is
centralized and would be used by a number of users at a time. It is essential to enforce integrityconstraints.
In the conventional systems because the data is duplicated in multiple files so updating or
changes may sometimes lead to entry of incorrect data in some files where it exists.
For example: - The example of result system that we have already discussed. Since multiple files
are to maintained, as sometimes you may enter a value for course which may not exist. Suppose
course can have values (Computer, Accounts, Economics, and Arts) but we enter a value 'Hindi'
for it, so this may lead to an inconsistent data, so lack of Integrity. Even if we centralized the
database

it

may

still

contain

incorrect

data.

For

example:

Salary of full time employ may be entered as Rs. 500 rather than Rs. 5000.
A student may be shown to have borrowed books but has no enrollment.
A list of employee numbers for a given department may include a number of non existent
employees. These problems can be avoided by defining the validation procedures whenever any
update operation is attempted.

Standards can be enforced - Since all access to the database must be through DBMS, so
standards are easier to enforce. Standards may relate to the naming of data, format of data,
structure of the data etc. Standardizing stored data formats is usually desirable for the purpose of
data interchange or migration between systems.

Security can be improved - In conventional systems; applications are developed in an adhoc or


temporary manner. Often different system of an organization would access different components
of the operational data, in such an environment enforcing security can be quiet difficult. Setting
up of a database makes it easier to enforce security restrictions since data is now centralized. It is
easier to control who has access to what parts of the database. Different checks can be

established for each type of access (retrieve, modify, delete etc.) to each piece of information in
the database.
Consider an Example of banking in which the employee at different levels may be given access
to different types of data in the database. A clerk may be given the authority to know only the
names of all the customers who have a loan in bank but not the details of each loan the customer
may have. It can be accomplished by giving the privileges to each employee.
Organization's requirement can be identified - Organizations have sections and departments
and each of these units often consider the work of their unit as the most important and therefore
consider their need as the most important. Once a database has been setup with centralized
control, it will be necessary to identify organization's requirement and to balance the needs of the
different units. So it may become necessary to ignore some requests for information if they
conflict with higher priority need of the organization. It is the responsibility of the DBA
(Database Administrator) to structure the database system to provide the overall service that is
best

for

an

organization.

For example: - A DBA must choose best file Structure and access method to give fast response
for

the

high

critical

applications

as

compared

to

less

critical

applications.

Overall cost of developing and maintaining systems is lower - It is much easier to respond to
unanticipated requests when data is centralized in a database than when it is stored in a
conventional file system. Although the initial cost of setting up of a database can be large, one
normal expects the overall cost of setting up of a database, developing and maintaining
application programs to be far lower than for similar service using conventional systems, Since
the productivity of programmers can be higher in using non-procedural languages that have been
developed

with

DBMS

than

using

procedural

languages.

Data Model must be developed - Perhaps the most important advantage of setting up of
database system is the requirement that an overall data model for an organization be build. In
conventional systems, it is more likely that files will be designed as per need of particular
applications demand. The overall view is often not considered. Building an overall view of an
organization's

data

is

usual

cost

effective

in

the

long

terms.

Provides backup and Recovery - Centralizing a database provides the schemes such as
recovery and backups from the failures including disk crash, power failures, software errors
which may help the database to recover from the inconsistent state to the state that existed prior
to the occurrence of the failure, though methods are very complex.

8. Three-Schemes Architecture
The objective of Three-Schemes Architecture is to separate the user application program and the
physical database. The Three schema architecture is an effective tool with which the user can
visualize the schema levels in a database system. The three levels ANSI architecture has an
important place in database technology development because it clearly separates the users
external level, the systems conceptual level, and the internal storage level for designing a
database. In three-schemas architecture schemas can be defined at three different levels.
8.1 External Scheme:
An external scheme describes the specific users view of data. and the specific methods
and constraints connected with this information.. Each external schema describes the part
of the part of the database that a particular user group is interested in and hides the rest of
the database from that database from that database.
8.2 Internal Scheme:
The Internal scheme mainly describes the physical storage structure of the database.
Internal scheme describes the data from a view very close to the computer or system in
general. It completes the logical scheme with data technical aspects like storage methods
or help functions for more efficiency.

8.3 Conceptual Schema: It describes the structure of the whole database for the entire user
community. The conceptual schema hides the details of physical storage structure and
concentrates on describing entities, data types, relationships and constraints. This
implementation of conceptual schema is based on conceptual schema design in a high level data
model.

9. Data Independence:
With knowledge about the three-scheme architecture the term data independence can be
explained as followed: Each higher level of the data architecture is immune to changes of the
next lower level of the architecture.
Data independence is normally thought of in terms of two levels or types. Logical data
independence makes it possible to change the structure of the data independently without
modifying the application programs that make use of the data. There is no need to rewrite current
applications as part of the process of adding to or removing data from then system.
The second type or level of data independence is known as physical data independence. This
approach has to do with altering the organization or storage procedures related to the data, rather
than modifying the data itself. Accomplishing this shift in file organization or the indexing
strategy used for the data does not require any modification to the external structure of the
applications, meaning that users of the applications are not likely to notice any difference at all in
the function of their programs.
Database Instance: The term instance is typically used to describe a complete database
environment, including the RDBMS software, table structure, stored procedures and other

functionality.

It

is

most

commonly

used

when

administrators

describe

multiple

instances of the same database. Also Known As: environment


Examples: An organization with an employees database might have three different
instances: production (used to contain live data), pre-production (used to test new
functionality prior to release into production) and development (used by database developers to
create new functionality).
Relational Schema: A relation schema can be thought of as the basic information describing a
table or relation. This includes a set of column names, the data types associated with each
column, and the name associated with the entire table.

10. Entity - Relationship Model


The Entity - Relationship Model (E-R Model) is a high-level conceptual data model developed
by Chen in 1976 to facilitate database design. Conceptual Modeling is an important phase in
designing a flourishing database. A conceptual data model is a set of concepts that describe the
structure of a database and associated retrieval and update transactions on the database. A high
level model is chosen so that all the technical aspects are also covered. The E-R data model grew
out of the exercise of using commercially available DBMS to model the database. The E-R
model is the generalization of the earlier available commercial models like the Hierarchical and
the Network Model. It also allows the representation of the various constraints as well as their
relationships.
So to sum up, the Entity-Relationship (E-R) Model is based on a view of a real world that
consists of set of objects called entities and relationships among entity sets which are basically a
group of similar objects. The relationships between entity sets is represented by a named E-R
relationship and is of 1:1, 1: N or M: N type which tells the mapping from one entity set to
another.
The E-R model is shown diagrammatically using Entity-Relationship (E-R) diagrams
which represent the elements of the conceptual model that show the meanings and the

relationships between those elements independent of any particular DBMS and implementation
details.

10.1 What are Entity Relationship Diagrams?


Entity Relationship Diagrams (ERD) illustrates the logical structure of databases.

An ER Diagram

10.2 Entity Relationship Diagram Notations

Entity
An entity is an real world objects (living or non living) or concept about which you want to
store information..

Weak Entity
A weak entity is an entity that must defined by a foreign key relationship with another entity as it
cannot be uniquely identified by its own attributes alone.

Key attribute
A key attribute is the unique, distinguishing characteristic of the entity, which can uniquely
identify the instances of entity set.. For example, an employee's social security number might be
the employee's key attribute.

Multi valued attribute


A multi valued attribute can have more than one value. For example, an employee entity can
have multiple skill values.

Derived attribute
A derived attribute is based on another attribute. For example, an employee's monthly salary is
based on the employee's annual salary.

Relationships
Relationships illustrate how two entities share information in the database structure.
First, connect the two entities, then drop the relationship notation on the line.

Cardinality
Cardinality specifies how many instances of an entity relate to one instance of another entity.
ordinality is also closely linked to cardinality. While cardinality specifies the occurrences of a
relationship, ordinality describes the relationship as either mandatory or optional. In other words,
cardinality specifies the maximum number of relationships and ordinality specifies the absolute
minimum number of relationships.

Recursive relationship
In some cases, entities can be self-linked. For example, employees can supervise other
employees.

10.3 How to design an Effective ER Diagrams


1) Make sure that each entity only appears once per diagram.
2) Name every entity, relationship, and attribute on your diagram.
3) Examine relationships between entities closely. Are they necessary? Are there any
relationships missing? Eliminate any redundant relationships. Don't connect relationships to each
other.
4) Use colors to highlight important portions of your diagram.

Using colors can help you highlight important features in your diagram
5) Create a polished diagram by adding shadows and color. You can choose from a number of
ready-made styles in the Edit menu under Colors and Shadows, or you can create your own.

10.4 Features of the E-R Model:


1. The E-R diagram used for representing E-R Model can be easily converted into Relations
(tables) in Relational Model.
2. The E-R Model is used for the purpose of good database design by the database developer so
to use that data model in various DBMS.
3. It is helpful as a problem decomposition tool as it shows the entities and the relationship
between those entities.
4. It is inherently an iterative process. On later modifications, the entities can be inserted into this
model.

5. It is very simple and easy to understand by various types of users and designers because
specific standards are used for their representation.

11. Enhanced Entity Relationship (EER) Diagrams

It Contain all the essential modeling concepts of an ER Diagram

Adds extra concepts:


o Specialization/generalization
o Subclass/super class
o Categories
o Attribute inheritance

Extended ER diagrams use some object-oriented concepts such as inheritance.

EER is used to model concepts more accurately than the ER diagram.

Sub classes and Super classes


In some cases, and entity type has numerous sub-groupings of its entities that are meaningful,
and need to be explicitly represented, because of their importance.

For example, members of entity Employee can be grouped further into Secretary, Engineer,
Manager, Technician, Salaried_Employee.

The set listed is a subset of the entities that belong to the Employee entity, which means that
every entity that belongs to one of the sub sets is also an Employee.

Each of these sub-groupings is called a subclass, and the Employee entity is called the superclass.

An entity cannot only be a member of a subclass; it must also be a member of the superclass.

An entity can be included as a member of a number of sub classes, for example, a Secretary
may also be a salaried employee, however not every member of the super class must be a
member of a sub class.

Type Inheritance

The type of an entity is defined by the attributes it possesses, and the relationship types it
participates in.

Because an entity in a subclass represents the same entity from the super class, it should
possess all the values for its attributes, as well as the attributes as a member of the super
class.

This means that an entity that is a member of a subclass inherits all the attributes of the entity
as a member of the super class; as well, an entity inherits all the relationships in which the
super class participates.

Employee

Work
For

Department

Secretary

Engineer

Technician

Specialization
The process of defining a set of subclasses of a super class.

Specialization is the top-down refinement into (super) classes and subclasses

The set of sub classes is based on some distinguishing characteristic of the super class.

For example, the set of sub classes for Employee, Secretary, Engineer, Technician,
differentiates among employee based on job type.

There may be several specializations of an entity type based on different distinguishing


characteristics.

Another example is the specialization, Salaried_Employee and Hourly_Employee, which


distinguish employees based on their method of pay.

Notation for Specialization

To represent a specialization, the subclasses that define a specialization are attached by lines
to a circle that represents the specialization, and is connected to the super class.

The subset symbol (half-circle) is shown on each line connecting a subclass to a super class,
indicates the direction of the super class/subclass relationship.

Attributes that only apply to the sub class are attached to the rectangle representing the
subclass. They are called specific attributes.

A sub class can also participate in specific relationship types. See Example.
Employee

Work
For

Department

Secretary

Engineer

Technician

Belongs
To

Professional
Organization

Reasons for Specialization

Certain attributes may apply to some but not all entities of a super class. A subclass is
defined in order to group the entities to which the attributes apply.

The second reason for using subclasses is that some relationship types may be participated in
only by entities that are members of the subclass.

Summary of Specialization
Allows for:
Defining set of subclasses of entity type

Create additional specific attributes for each sub class

Create additional specific relationship types between each sub class and other entity types or
other subclasses.

Generalization

The reverse of specialization is generalization.

Several classes with common features are generalized into a super class.

For example, the entity types Car and Truck share common attributes License_PlateNo,
VehicleID and Price, therefore they can be generalized into the super class Vehicle.

Constraints on Specialization and Generalization


Several specializations can be defined on an entity type.

Entities may belong to subclasses in each of the specializations.

The specialization may also consist of a single subclass, such as the manager specialization;
in this case we dont use the circle notation.

Types of Specializations
Predicate-defined or Condition-defined specialization
Occurs in cases where we can determine exactly the entities of each sub class by placing a
condition of the value of an attribute in the super class.

An example is where the Employee entity has an attribute, Job Type. We can specify the
condition of membership in the Secretary subclass by the condition, JobType=Secretary

Example:

The condition is called the defining predicate of the sub class.

The condition is a constraint specifying exactly those entities of the Employee entity type
whose attribute value for Job Type is Secretary belong to the subclass.

Predicate defined subclasses are displayed by writing the predicate condition next to the line
that connects the subclass to the specialization circle.

Attribute-defined specialization
If all subclasses in a specialization have their membership condition on the same attribute of
the super class, the specialization is called an attribute-defined specialization, and the
attribute is called the defining attribute.

Attribute-defined specializations are displayed by placing the defining attribute name next to
the arc from the circle to the super class.

User-defined specialization

When we do not have a condition for determining membership in a subclass the subclass is
called user-defined.

Membership to a subclass is determined by the database users when they add an entity to the
subclass.

Dis-jointness / Overlap Constraint

Specifies that the subclass of the specialization must be disjoint, which means that an entity
can be a member of, at most, one subclass of the specialization.

The d in the specialization circle stands for disjoint.

If the subclasses are not constrained to be disjoint, they overlap.

Overlap means that an entity can be a member of more than one subclass of the
specialization.

Overlap constraint is shown by placing an o in the specialization circle.

Completeness Constraint

The completeness constraint may be either total or partial.

A total specialization constraint specifies that every entity in the super class must be a
member of at least one subclass of the specialization.

Total specialization is shown by using a double line to connect the super class to the circle.

A single line is used to display a partial specialization, meaning that an entity does not have
to belong to any of the subclasses.
Disjointness vs. Completeness

Disjoint constraints and completeness constraints are independent. The following possible
constraints on specializations are possible:

Disjoint, total

Department

Academic

Administrative

Employee
Disjoint, partial

Secretary

Analyst

Engineer

Overlapping, total

Part

o
Manufactured

Puchased

Overlapping, partial

Movie

Children

Comedy

Drama

Chapter-I
INTRODUCTION TO DBMS AND DATA MODELING.
End Chapter quizzes:

Q.1. Entity is represented by the symbol


(a)
(b)
(c)
(d)

Circle
Ellipse
Rectangle
Square

Q2. A relationship is

(a)
(b)
(c)
(d)

an item in an application
a meaningful dependency between entities
a collection of related entities
related data

Q3. Overall logical structure of a database can be expressed graphically by


a). ER diagram
b). Records
c). Relations
d). Hierarchy.
Q4. In three schemas architecture a specific view of data given to a particular user is defined at
a) Internal Level
b) External Level
c) Conceptual Level
d) Physical Level
Q5. By data redundancy in a file based system we mean that

(a) Unnecessary data is stored


(b) Same data is duplicated in many files
(c) Data is unavailable
(d) Files have redundant data
Q6. Entities are identified from the word statement of a problem by
(a) picking words which are adjectives
(b) picking words which are nouns
(c) picking words which are verbs
(d) picking words which are pronouns
Q7. Data independence allows
(a) sharing the same database by several applications
(b) extensive modification of applications
(c) no data sharing between applications
(d) elimination of several application programs

Q8. Access right to a database is controlled by


(a) top management
(a) system designer
(b) system analyst
(c) database administrator
Q9. Data integrity in a file based system may be lost because
(a) the same variable may have different values in different files
(b) files are duplicated
(c) unnecessary data is stored in files
(d) redundant data is stored in files
Q10. Characteristics of an entity set is known as:
a) Attributes
b) Cardinality
c) Relationship
d) Many to Many Relation
Q11. Vehicle identification number, color, weight, and horsepower best exemplify:
a.) entities.
b.) entity types.
c.) data markers.
d.) attributes.
Q12. If each employee can have more than one skill, then skill is referred to as a:
a.) gerund.
b.) multivalued attribute.
c.) nonexclusive attribute.
d.) repeating attribute
Q13. The data structure used in the hierarchical model is
a) Tree
b) Graph
c) Table
d) None of these.
Q14. By data security in DBMS we mean
(a) preventing access to data
(b) allowing access to data only to authorized users
(c) preventing changing data
(d) introducing integrity constraints

Chapter-2
RELATIONAL DATABASE MODEL
2. Introductory Concepts
Relational Database Management System
A Relational Database Management System (RDBMS) provides a complete and integrated move
towards information management.

A relational model provides the basis for a relational

database. A relational model has three aspects:

Structures

Operations

Integrity rules

Structures consist of a collection of objects or relations that store data. An example of relation is
a table. You can store information in a table and use the table to retrieve and modify data.
Operations are used to manipulate data and structures in a database. When using operations.
You must stick to a predefined set of integrity rules.
Integrity rules are laws that govern the operations allowed on data in a database. This ensures
data accuracy and consistency.
Relational database components include:
Table
Row
Column
Field
Primary key
Foreign key

Figure Relational database components


A Table is a basic storage structure of an RDBMS and consists of columns and rows. A table
represents an entity. For example, the S_DEPT table stores information about the departments of
an organization.
A Row is a combination of column values in a table and is identified by a primary key. Rows are
also known as records. For example, a row in the table S_DEPT contains information about one
department.

A Column is a collection of one type of data in a table. Columns represent the attributes of an
object. Each column has a column name and contains values that are bound by the same type and
size. For example, a column in the table S_DEPT specifies the names of the departments in the
organization.

A Field is an intersection of a row and a column. A field contains one data value. If there is no
data in the field, the field is said to contain a NULL value.

Figure Table, Row, Column & Field


A Primary key is a column or a combination of columns that is used to uniquely identify each
row in a table. For example, the column containing department numbers in the S_DEPT table is
created as a primary key and therefore every department number is different. A primary key must
contain a value. It cannot contain a NULL value.

A Foreign key is a column or set of columns that refers to a primary key in the same table or
another table. You use foreign keys to establish principle connections between, or within, tables.
A foreign key must either match a primary key or else be NULL. Rows are connected logically
when required. The logical connections are based upon conditions that define a relationship
between corresponding values, typically between a primary key and a matching foreign key. This
relational method of linking provides great flexibility as it is independent of physical links
between records.

Figure Primary & Foreign key

RDBMS Properties
An RDBMS is easily accessible. You execute commands in the Structured Query Language
(SQL) to manipulate data. SQL is the international Standards Organization (ISO) standard
language for interacting with a RDBMS.

An RDBMS provides full data independence. The organization of the data is independent of the
applications that use it. You do not need to specify the access routes to tables or know how data
is physically arranged in a database.

A relational database is a collection of individual, named objects. The basic unit of data storage
in a relational database is called a table. A table consists of rows and columns used to store
values. For access purpose, the order of rows and columns is insignificant. You can control the
access order as required.

Figure SQL & Database


When querying the database, you use conditional operations such as joins and restrictions. A join
combines data from separate database rows. A restriction limits the specific rows returned by a
query.

Figure Conditional operations

An RDBMS enables data sharing between users. At the same time, you can ensure consistency
of data across multiple tables by using integrity constraints. An RDBMS uses various types of
data integrity constraints. These types include entity, column, referential and user-defined
constraints.

The constraint, entity, ensures uniqueness of rows, and the constraint column ensures
consistency of the type of data within a column. The other type, referential, ensures validity of
foreign keys, and user-defined constraints are used to enforce specific business rules.
An RDBMS minimizes the redundancy of data. This means that similar data is not

3. Codd's 12 rules
Codd's 12 rules are a set of twelve rules proposed by E. F. Codd, a pioneer of the relational
model for databases, designed to define what is required from a database management system in
order for it to be considered relational, i.e., an RDBMS. Codd produced these rules as part of a
personal campaign to prevent his vision of the relational database being diluted.

Rule 1: The information rule:


All information in the database is to be represented in one and only one way, namely by values
in column positions within rows of tables.
Rule 2: The guaranteed access rule:
All data must be accessible with no ambiguity. This rule is essentially a restatement of the
fundamental requirement for primary keys. It says that every individual scalar value in the
database must be logically addressable by specifying the name of the containing table, the name
of the containing column and the primary key value of the containing row.
Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support a
representation of "missing information and inapplicable information" that is systematic, distinct
from all regular values (for example, "distinct from zero or any other number", in the case of
numeric values), and independent of data type. It is also implied that such representations must
be manipulated by the DBMS in a systematic way.
Rule 4: Active online catalog based on the relational model:

The system must support an online, inline, relational catalog that is accessible to authorized users
by means of their regular query language. That is, users must be able to access the database's
structure (catalog) using the same query language that they use to access the database's data.
Rule 5: The comprehensive data sublanguage rule:
The system must support at least one relational language that
o

Has a linear syntax

Can be used both interactively and within application programs,

Supports data definition operations (including view definitions), data manipulation

operations

(update

as

well

as

retrieval),

security

and

integrity

constraints,

and transaction management operations (begin, commit, and rollback).


Rule 6: The view updating rule:
All views that are theoretically updatable must be updatable by the system.
Rule 7: High-level insert, update, and delete:
The system must support set-at-a-time insert, update, and delete operators. This means that data
can be retrieved from a relational database in sets constructed of data from multiple rows and/or
multiple tables. This rule states that insert, update, and delete operations should be supported for
any retrievable set rather than just for a single row in a single table.
Rule 8: Physical data independence:
Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must
not require a change to an application based on the structure.
Rule 9: Logical data independence:
Changes to the logical level (tables, columns, rows, and so on) must not require a change to an
application based on the structure. Logical data independence is more difficult to achieve than
physical data independence.

Rule 10: Integrity independence:


Integrity constraints must be specified separately from application programs and stored in
the catalog. It must be possible to change such constraints as and when appropriate without
unnecessarily affecting existing applications.
Rule 11: Distribution independence:

The distribution of portions of the database to various locations should be invisible to users of
the database. Existing applications should continue to operate successfully:
o

when a distributed version of the DBMS is first introduced; and

when existing distributed data are redistributed around the system.

Rule 12: The nonsubversion rule:


If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used
to subvert the system, for example, bypassing a relational security or integrity constraint.

3. Data Integrity and Integrity Rules


Data Integrity is very important concepts in database operations in particular and Data
Warehousing and Business Intelligence in general. Because Data Integrity ensured that only data
of high quality, correct, consistent is accessible to its user. The database designer is responsible
for incorporating elements to promote the accuracy and reliability of stored data within the
database. There are many different techniques that can be used to encourage data integrity, with
some of these dependants on what database technology is being used. Here we are discussing
two most common integrity rule.
Integrity rule 1: Entity integrity
It says that no component of a primary key may be null.
All entities must be distinguishable. That is, they must have a unique identification of some kind.
Primary keys perform unique identification function in a relational database. An identifier that
was wholly null would be a contradiction in terms. It would be like there was some entity that
did not have any unique identification. That is, it was not distinguishable from other entities. If
two entities are not distinguishable from each other, then by definition there are not two entities
but only one.
Integrity rule 2: Referential integrity
The referential integrity constraint is specified between two relations and is used to
maintain the consistency among tuples of the two relations.
Suppose we wish to ensure that value that appears in one relation for a given set of attributes also
appears for a certain set of attributes in another. This is referential integrity.
The referential integrity constraint states that, a tuple in one relation that refers to another
relation must refer to the existing tuple in that relation. This means that the referential integrity is

a constraint specified on more than one relation. This ensures that the consistency is maintained
across the relations.
Table A
DeptID

DeptName

DeptManager

F-1001

Financial

Nathan

S-2012

Software

Martin

H-0001

HR

Jason

Table B
EmpNo

DeptID

EmpName

1001

F-1001

Tommy

1002

S-2012

Will

1003

H-0001

Jonathan

4. Relational algebra
Relational algebra is a procedural query language, which consists of a set of operations that take
one or two relations as input and produce a new relation as their result. The fundamental
operations that will be discussed in this section are: select, project, union, and set difference.
Besides the fundamental operations, the following additional operations will be discussed: setintersection.
Each operation will be applied to tables of a sample database. Each table is otherwise known as a
relation and each row within the table is referred to as a tuple. The sample database consists of
tables in which one might see in a bank. The sample database consists of the following 6
relations:

Account
branch-name
Downtown
Mianus
Perryridge
Round Hill
Brighton
Redwood
Brighton

account-number
A-101
A-215
A-102
A-305
A-201
A-222
A-217

balance
500
700
400
350
900
700
750

Branch
branch-name
Downtown
Redwood
Perryridge
Mianus
Round Hill
Pownal
North Town
Brighton

branch-city
Brooklyn
Palo Alto
Horseneck
Horseneck
Horseneck
Bennington
Rye
Brooklyn

assets
9000000
2100000
1700000
400000
8000000
300000
3700000
7100000

Customer
customer-name
Jones
Smith
Hayes
Curry
Lindsay
Turner
Williams
Adams
Johnson
Glenn
Brooks
Green

customer-street
Main
North
Main
North
Park
Putnam
Nassau
Spring
Alma
Sand Hill
Senator
Walnut

customer-city
Harrison
Rye
Harrison
Rye
Pittsfield
Stamford
Princeton
Pittsfield
Palo Alto
Woodside
Brooklyn
Stamford

Depositor
customer-name
Johnson
Smith
Hayes
Turner
Johnson
Jones
Lindsay

account-number
A-101
A-215
A-102
A-305
A-201
A-217
A-222

Loan
branch-name
Downtown
Redwood
Perryridge
Downtown
Mianus
Round Hill
Perryridge

loan-number
L-17
L-23
L-15
L-14
L-93
L-11
L-16

amount
1000
2000
1500
1500
500
900
1300

Borrower
customer-name
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams

loan-number
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16

The Select operation is a unary operation, which means it operates on one relation. Its function is
to select tuples that satisfy a given predicate. To denote selection, the lowercase Greek letter
sigma ( ) is used. The predicate appears as a subscript to

. The argument relation is given in

parentheses following the .


For example, to select those tuples of the loan relation where the branch is "Perryridge," we
write:
branch-home = "Perryridge"

(loan)

The results of the query are the following:


branch-name
Perryridge
Perryridge

loan-number
L-15
L-16

amount
1500
1300

Comparisons like =, , <, >, can also be used in the selection predicate. An example query using
a comparison is to find all tuples in which the amount lent is more than $1200 would be written:
amount > 1200 (loan)
The project operation is a unary operation that returns its argument relation with certain
attributes left out. Since a relation is a set, any duplicate rows are eliminated. Projection is
denoted by the Greek letter pi ( ). The attributes that wish to be appear in the result are listed as
a subscript to . The argument relation follows in parentheses. For example, the query to list all
loan numbers and the amount of the loan is written as:
Loan-number, amount (loan)
The result of the query is the following:
loan-number
L-17
L-23
L-15

amount
1000
2000
1500

L-14
L-93
L-11
L-16

1500
500
900
1300

Another more complicated example query is to find those customers who live in Harrison is
written as:
Customer-name ( customer-city = "Harrison" (customer
The union operation yields the results that appear in either or both of two relations. It is a binary
operation denoted by the symbol

An example query would be to find the name of all bank customers who have either an account
or a loan or both. To find this result we will need the information in the depositor relation and in
the borrower relation. To find the names of all customers with a loan in the bank we would write:
Customer-name (borrower)
and to find the names of all customers with an account in the bank, we would write:
Customer-name (depositor)
Then by using the union operation on these two queries we have the query we need to obtain the
wanted results. The final query is written as:
Customer-name (borrower)
The result of the query is the following:

customer-name (depositor)

customer-name
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Adams
The set intersection operation is denoted by the symbol . It is not a fundamental operation,
however it is a more convenient way to write r - (r - s).

An example query of the operation to find all customers who have both a loan and and account
can be written as:
Customer-name (borrower)

customer-name (depositor)

The results of the query are the following:


customer-name
Hayes
Jones
Smith

Set Difference Operation Set difference is denoted by the minus sign ( ). It finds tuples that are
in one relation, but not in another. Thus

results in a relation containing tuples that are in

but not in .

Cartesian Product Operation The Cartesian product of two relations is denoted by a cross (
written

The result of
and .

is a new relation with a tuple for each possible pairing of tuples from

),

Chapter-2
RELATIONAL DATABASE MODEL
End Chapter quizzes:
Q1. Which of the following are characteristics of an RDBMS?
a) Data are organized in a series of two-dimensional tables each of which contains records for one
entity.
b) Queries are possible on individual or groups of tables.
c) It cannot use SQL.
d) Tables are linked by common data known as keys.

Q2. The keys that can have NULL values are


a). Primary Key
b). Unique Key
c). Foreign Key
d). Both b and c
Q3 . GRANT and REVOKE are
(a) DDL statements
(b) DML statements
(c) DCL statements
(d) None of these.
Q4.

Rows of a relation are called


(a) tuples
(b) a relation row
(c) a data structure
(d) an entity
Q5. Primary Key column in the Table
(a)
(b)
(c)
(d)
Q6.

Cant accept NULL values


Cant accept duplicate values
Cant be more than one
All of the above

A table can have how many primary key

A). any number


B). 1
C). 255
D). None of the above

Q7. Projection operation is:


a)
b)
c)
d)

Unary operation
Ternary operation
binary operation
None of the above

Q8. The keys that can have NULL values are


A). Primary Key
B). Unique Key
C). Foreign Key

D). Both b and c


Q9. Referential integrity constraint is specified between two relations

a) True
b) False
Q10 Union operation in relational algebra is performed on
a) Single Relation
b) Two relation
c) Both a and b
d) None
Q11. As per Codds rule NULL value is same as
a) blank space
b) Zero
c) Character string
d) None of the above.
Q12 Relational Algebra is a non procedural query language
a) True
b) False

Chapter: 3
FUNCTIONAL DEPENDENCY AND NORMALIZATION

1. Functional Dependency
Consider a relation R that has two attributes A and B. The attribute B of the relation is
functionally dependent on the attribute A if and only if for each value of A no more than one
value of B is associated. In other words, the value of attribute A uniquely determines the value of
B and if there were several tuples that had the same value of A then all these tuples will have an
identical value of attribute B. That is, if t1 and t2 are two tuples in the relation R and t1(A) =
t2(A) then we must have t1(B) = t2(B).

A and B need not be single attributes. They could be any subsets of the attributes of a relation R
(possibly single attributes). We may then write
R.A -> R.B
If B is functionally dependent on A (or A functionally determines B). Note that functional
dependency does not imply a one-to-one relationship between A and B although a one-to-one
relationship may exist between A and B.
A simple example of the above functional dependency is when A is a primary key of an entity
(e.g. student number) and A is some single-valued property or attribute of the entity (e.g. date of
birth). A -> B then must always hold.
Functional dependencies also arise in relationships. Let C be the primary key of an entity and D
be the primary key of another entity. Let the two entities have a relationship. If the relationship is
one-to-one, we must have C -> D and D -> C. If the relationship is many-to-one, we would have
C -> D but not D -> C. For many-to-many relationships, no functional dependencies hold. For
example, if C is student number and D is subject number, there is no functional dependency
between them. If however, we were storing marks and grades in the database as well, we would
have

(student_number, subject_number) -> marks and we might have


marks -> grades
The second functional dependency above assumes that the grades are dependent only on the
marks. This may sometime not be true since the instructor may decide to take other
considerations into account in assigning grades, for example, the class average mark.
For example, in the student database that we have discussed earlier, we have the following
functional dependencies:
sno -> sname sno -> address cno -> cname cno -> instructor
instructor -> office
These functional dependencies imply that there can be only one name for each sno, only one
address for each student and only one subject name for each cno. It is of course possible that
several students may have the same name and several students may live at the same address. If
we consider cno -> instructor, the dependency implies that no subject can have more than one
instructor (perhaps this is not a very realistic assumption). Functional dependencies therefore
place constraints on what information the database may store. In the above example, one may be
wondering if the following FDs hold
sname -> sno cname -> cno
Certainly there is nothing in the instance of the example database presented above that
contradicts the above functional dependencies. However, whether above FDs hold or not would
depend on whether the university or college whose database we are considering allows duplicate
student names and subject names. If it was the enterprise policy to have unique subject names
than cname -> cno holds. If duplicate student names are possible, and one would think there
always is the possibility of two students having exactly the same name, then sname -> sno does
not hold.

Functional dependencies arise from the nature of the real world that the database models. Often
A and B are facts about an entity where A might be some identifier for the entity and B some
characteristic. Functional dependencies cannot be automatically determined by studying one or
more instances of a database. They can be determined only by a careful study of the real world
and a clear understanding of what each attribute means.

We have noted above that the definition of functional dependency does not require that A and B
be single attributes. In fact, A and B may be collections of attributes. For example
(sno, cno) -> (mark, date)

When dealing with a collection of attributes, the concept of full functional dependence is an
important one. Let A and B be distinct collections of attributes from a relation R end let R.A ->
R.B. B is then
fully functionally dependent on A if B is not functionally dependent on any subset of A. The
above example of students and subjects would show full functional dependence if mark and date
are not functionally dependent on either student number ( sno) or subject number ( cno) alone.
The implies that we are assuming that a student may have more than one subjects and a subject
would be taken by many different students. Furthermore, it has been assumed that there is at
most one enrolment of each student in the same subject.
The above example illustrates full functional dependence. However the following dependence
(sno, cno) -> instructor is not full functional dependence because cno -> instructor holds.

As noted earlier, the concept of functional dependency is related to the concept of candidate key
of a relation since a candidate key of a relation is an identifier which uniquely identifies a tuple
and therefore determines the values of all other attributes in the relation. Therefore any subset X
of the attributes of a relation R that satisfies the property that all remaining attributes of the
relation are functionally dependent on it (that is, on X), then X is candidate key as long as no
attribute can be removed from X and still satisfy the property of functional dependence. In the
example above, the attributes (sno, cno) form a candidate key (and the only one) since they
functionally determine all the remaining attributes.
Functional dependence is an important concept and a large body of formal theory has been
developed about it. We discuss the concept of closure that helps us derive all functional
dependencies that are implied by a given set of dependencies. Once a complete set of functional
dependencies has been obtained, we will study how these may be used to build normalised
relations.
Rules about Functional Dependencies
Let F be set of FDs specified on R

Must be able to reason about FDs in F


Schema designer usually explicitly states only FDs which are obvious
Without knowing exactly what all tuples are, must be able to deduce other/all FDs that hold on
R
Essential when we discuss design of good relational schemas
Design of Relational Database Schemas
Problems such as redundancy that occur when we try to cram too much into a single relation are
called anomalies. The principal kinds of anomalies that we encounter are:
_ Redundancy. Information may be repeated unnecessarily in several tuples.
_ Update Anomalies. We may change information in one tuples but leave the same information
unchanged in another.
_ Deletion Anomalies. If a set of values becomes empty, we may lose other information as side
effect.

2 Normalization
Designing a database, usually a data model is translated into relational schema. The important
question is whether there is a design methodology or is the process arbitrary. A simple answer to
this question is affirmative. There are certain properties that a good database design must possess
as dictated by Codds rules. There are many different ways of designing good database. One of
such methodologies is the method involving Normalization. Normalization theory is built
around the concept of normal forms. Normalization reduces redundancy. Redundancy is
unnecessary repetition of data. It can cause problems with storage and retrieval of data. During
the process of normalization, dependencies can be identified, which can cause problems during
deletion and updation. Normalization theory is based on the fundamental notion of Dependency.
Normalization helps in simplifying the structure of schema and tables.
For example the normal forms; we will take an example of a database of the following logical
design: Relation S
{ S#, SUPPLIERNAME, SUPPLYTATUS, SUPPLYCITY}, Primary Key{S#}
Relation P { P#, PARTNAME, PARTCOLOR, PARTWEIGHT, SUPPLYCITY}, Primary
Key{P#}
Relation SP { S#, SUPPLYCITY, P#, PARTQTY}, Primary Key{S#, P#}
5F

Foreign Key{S#} Reference S


Foreign Key{P#} Reference P
S#
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4

SUPPLYCITY
Bombay
Bombay
Bombay
Bombay
Bombay
Bombay
Mumbai
Mumbai
Mumbai
Madras
Madras
Madras

P#
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5

PARTQTY
3000
2000
4000
2000
1000
1000
3000
4000
2000
2000
3000
4000

Let us examine the table above to find any design discrepancy. A quick glance reveals that some
of the data are being repeated. That is data redundancy, which is of course an undesirable. The
fact that a particular supplier is located in a city has been repeated many times. This redundancy
causes many other related problems. For instance, after an update a supplier may be displayed to
be from Madras in one entry while from Mumbai in another. This further gives rise to many
other problems.
Therefore, for the above reasons, the tables need to be refined. This process of refinement of a
given schema into another schema or a set of schema possessing qualities of a good database is
known as Normalization. Database experts have defined a series of Normal forms each
conforming to some specified design
Decomposition. Decomposition is the process of splitting a relation into two or more relations.
This is nothing but projection process. Decompositions may or may not loose information. As
you would learn shortly, that normalization process involves breaking a given relation into one
or more relations and also that these decompositions should be reversible as well, so that no
information is lost in the process. Thus, we will be interested more with the decompositions that
incur no loss of information rather than the ones in which information is lost.

Lossless decomposition: The decomposition, which results into relations without loosing any
information, is known as lossless decomposition or nonloss decomposition. The decomposition
that results in loss of information is known as lossy decomposition.

Consider the relation S{S#,

SUPPLYSTATUS, SUPPLYCITY} with some instances of the entries as

shown below.
S

S#

SUPPLYSTATUS

SUPPLYCITY

S3

100

Madras

S5

100

Mumbai

Let us decompose this table into two as shown below:


(1)

(2)

SX

SX

S#

SUPPLYSTATUS

SY

S#

SUPPLYCITY

S3

100

S3

Madras

S5

100

S5

Mumbai

S#

SUPPLYSTATUS

SY

SUPPLYSTATUS

SUPPLYCITY

S3

100

100

Madras

S5

100

100

Mumbai

Let us examine these decompositions. In decomposition (1) no information is lost. We can still
say that S3s status is 100 and location is Madras and also that supplier S5 has 100 as its status
and location Mumbai. This decomposition is therefore lossless.
In decomposition (2), however, we can still say that status of both S3 and S5 is 100. But the
location of suppliers cannot be determined by these two tables. The information regarding the
location of the suppliers has been lost in this case. This is a lossy decomposition. Certainly,
lossless decomposition is more desirable because otherwise the decomposition will be
irreversible. The decomposition process is in fact projection, where some attributes are selected
from a table. A natural question arises here as to why the first decomposition is lossless while the
second one is lossy? How should a given relation must be decomposed so that the resulting
projections are nonlossy? Answer to these questions lies in functional dependencies and may be
given by the following theorem.

Heaths theorem: Let R {A, B, C} be a relation, where A, B and C are sets of attributes. If R
satisfies the FD A

B,
B} and
t
hen
{A,
Ri
C}.
sequalt
ot
hej
oi
nofi
t
spro

Let us apply this theorem on the decompositions described above. We observe that relation S
satisfies two irreducible sets of FDs
SUPPLYSTATUSS# SUPPLYCITY

S#

Now taking A as S#, B as SUPPLYSTATUS, and C as SUPPLYCITY, this theorem confirms


that relation S can be nonloss decomposition into its projections on {S#, SUPPLYSTATUS} and
{S#, SUPPLYCITY} . Note, however, that the theorem does not say why projections {S#,
SUPPLYSTATUS} and {SUPPLYSTATUS, SUPPLYCITY} should be lossy. Yet we can see
that one of the FDs is lost in this decomposition. While the FD S#

SUPPLYSTA

represented by projection on {S#, SUPPLYSTATUS}, but the FD S#

SUPPLYCIT

lost.
An alternative criteria for lossless decomposition is as follows. Let R be a relation schema, and
let F be a set of functional dependencies on R. let R1 and R2 form a decomposition of R. this
decomposition is a lossless-join decomposition of R if at least one of the following functional
dependencies are in F+:
R1

R2 R1

R1

R2 R2

2.1 First Normal Form


A relation is in 1st Normal form (1NF) if and only if, in every legal value of that relation, every
tuple contains exactly one value for each attribute.
Although, simplest, 1NF relations have a number of discrepancies and therefore it not the most
desirable form of a relation.
Let us take a relation (modified to illustrate the point in discussion) as
Rel1 {S#, SUPPLYSTATUS, SUPPLYCITY, P#, PARTQTY}
Primary Key{S#, P#}
FD {SUPPLYCITY SUPPLYSTATUS}

Note

that SUPPLYSTATUS is functionally dependent on SUPPLYCITY; meaning that a

suppliers status is determined by the location of that supplier e.g. all suppliers from Madras
must have status of 100. The primary key of the relation Rel1 is {S#, P#}.

Let us discuss some of the problems with this 1NF relation. For the purpose of
illustration, let us insert some sample tuples into this relation
REL1 S#
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4

SUPPLYSTATUS
SUPPLYCITY
200 Madras
200 Madras
200 Madras
200 Madras
200 Madras
200 Madras
100 Mumbai
100 Mumbai
100 Mumbai
200 Madras
200 Madras
200 Madras

P#
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5

PARTQTY
3000
2000
4000
2000
1000
1000

3000
4000
2000
2000
3000
4000

The redundancies in the above relation causes many problems usually known as update
anomalies, that is in INSERT, DELETE and UPDATE operations. Let us see these problems
due to supplier-city redundancy corresponding to FD S#

SUPPLYCITY.

INSERT: In this relation, unless a supplier supplies at least one part, we cannot insert the
information regarding a supplier. Thus, a supplier located in Kolkata is missing from the relation
because he has not supplied any part so far.
DELETE: Let us see what problem we may face during deletion of a tuple. If we delete the
tuple of a supplier (if there is a single entry for that supplier), we not only delte the fact that the
supplier supplied a particular part but also the fact that the supplier is located in a particular city.
In our case, if we delete entries corresponding to S#=S2, we loose the information that the
supplier is located at Mumbai. This is definitely undesirable. The problem here is there are too
many informations attached to each tuple, therefore deletion forces loosing too many
informations.
UPDATE: If we modify the city of a supplier S1 to Mumbai from Madras, we have to make sure
that all the entries corresponding to S#=S1 are updated otherwise inconsistency will be
introduced. As a result some entries will suggest that the supplier is located at Madras while
others will contradict this fact.

2.2 Second Normal Form


A relation is in 2NF if and only if it is in 1NF and every nonkey attribute is fully functionally
dependent on the primary key. Here it has been assumed that there is only one candidate key,
which is of course primary key.
A relation in 1NF can always decomposed into an equivalent set of 2NF relations. The reduction
process consists of replacing the 1NF relation by suitable projections.
We have seen the problems arising due to the less-normalization (1NF) of the relation. The
remedy is to break the relation into two simpler relations.
REL2{S#, SUPPLYSTATUS, SUPPLYCITY} and
REL3{S#, P#, PARTQTY}
REL2 and REL3 are in 2NF with their {S#} and {S#, P#} respectively. This is because all
nonkeys of REL1{ SUPPLYSTATUS, SUPPLYCITY}, each is functionally dependent on the
primary key that is S#. By similar argument, REL3 is also in 2NF. Evidently, these two relations
have overcome all the update anomalies stated earlier. Now it is possible to insert the facts
regarding supplier S5 even when he is not supplied any part, which was earlier not possible. This
solves insert problem. Similarly, delete and update problems are also over now.
These relations in 2NF are still not free from all the anomalies. REL3 is free from most of the
problems we are going to discuss here, however, REL2 still carries some problems. The reason is
that the dependency of SUPPLYSTATUS on S# is though functional, it is transitive via
SUPPLYCITY. Thus we see that there are two dependencies S#

SUPPLYCIT

SUPPLYCITY

SUPPLYSTA

transitive dependency. We will see that this transitive dependency gives rise to another set of
anomalies.
INSERT: We are unable to insert the fact that a particular city has a particular status until we
have some supplier actually located in that city.
DELETE: If we delete sole REL2 tuple for a particular city, we delete the information that that
city has that particular status.
UPDATE: The status for a given city still has redundancy. This causes usual redundancy
problem related to update.

2.3 Third Normal Form

A relation is in 3NF if only if it is in 2NF and every non-key attribute is non-transitively


dependent on the primary key.
To convert the 2NF relation into 3NF, once again, the REL2 is split into two simpler relations
REL4 and REL5 as shown below.
RELATION 4 {S#, SUPPLYCITY}
and
RELATION 5
{SUPPLYCITY SUPLLYSTATUS}
Sample relation is shown below.
RELATION 4
S#
S1
S2
S3
S4
S5

RELATION 5
SUPPLYCITY
Madras
Mumbai
Mumbai
Madras
Kolkata

SUPPLYCITY
Madras
Mumbai
Kolakata

SUPPLYSTATUS
200
100
300

Evidently, the above relations RELATION 4 and RELATION5 are in 3NF, because there is no

transitive

dependencies. Every 2NF can be reduced into 3NF by decomposing it further and removing any
transitive dependency.
2.4 Boyce-Codd Normal Form
The previous normal forms assumed that there was just one candidate key in the relation and that
key was also the primary key. Another class of problems arises when this is not the case. Very
often there will be more candidate keys than one in practical database designing situation. To be
precise the 1NF, 2NF and 3NF did not deal adequately with the case of relations that had two or
more candidate keys, and that the candidate keys were composite, and they overlapped
(i.e. had at least one attribute common).
A relation is in BCNF (Boyce-Codd Normal Form) if and only if every nontrivial, leftirreducible FD has a candidate key as its determinant. Or
A relation is in BCNF if and only if all the determinants are candidate keys.
It should be noted that the BCNF definition is conceptually simpler than the old 3NF definition,
in that it makes no explicit reference to first and second normal forms as such, nor to the concept
of transitive dependence. Furthermore, although BCNF is strictly stronger than 3NF, it is still the

case that any given relation can be nonloss decomposed into an equivalent collection of BCNF
relations. Thus, relations REL 1 and REL 2 which were not in 3NF, are not in BCNF either; also
that relations REL3, REL 4, and REL5, which were in 3NF, are also in BCNF. Relation REL1
contains three determinants, namely {S#}, {SUPPLYCITY}, and {S#, P#}; of these, only {S#,
P#} is a candidate key, so REL1 is not in BCNF. Similarly, REL2 is not in BCNF either, because
the determinant {SUPPLYCITY} is not a candidate key. Relations REL 3, REL 4, and REL 5,
on the other hand, are each in BCNF, because in each case the sole candidate key is the only
determinant in the respective relations.
2.5 Comparison of BCNF and 3NF
We have seen two normal forms for relational-database schemas: 3NF and BCNF. There is an
advantage to 3NF in that we know that it is always possible to obtain a 3NF design without
sacrificing a lossless join or dependency preservation. Nevertheless, there is a disadvantage to
3NF. If we do not eliminate all transitive dependencies, we may have to use null values to
represent some of the possible meaningful relationship among data items, and there is the
problem of repetition of information. The other difficulty is the repetition of information.
If we are forced to choose between BCNF and dependency preservation with 3NF, it is generally
preferable to opt for 3NF. If we cannot test for dependency preservation efficiently, we either
pay a high penalty in system performance or risk the integrity of the data in our database. Neither
of these alternatives is attractive.
With such alternatives, the limited amount of redundancy imposed by transitive dependencies
allowed under 3NF is the lesser evil.
Thus, we normally choose to retain dependency preservation and to sacrifice BCNF.
2.6 Multi-valued dependency
Multi-valued dependency may be formally defined as:
Let R be a relation, and let A, B, and C be subsets of the attributes of R. Then we say that B is
multi-dependent on A - in symbols,
A B
read "A multi-determines B," or simply "A double arrow B") - if and only if, in every possible
legal value of R, the set of B values matching a given A value, C value pair depends only on the
A value and is independent of the C value.

2.7 Fifth Normal Form

It seems that the sole operation necessary or available in the further normalization process is the
replacement of a relation in a nonloss way by exactly two of its projections. This assumption has
successfully carried us as far as 4NF. It comes perhaps as a surprise, therefore, to discover that
there exist relations that cannot be nonloss-decomposed into two projections but can be nonlossdecomposed into three (or more). An unpleasant but convenient term, we will describe such a
relation as "n-decomposable" (for some n > 2) - meaning that the relation in question can be
nonloss-decomposed into n projections but not into m for any m < n.
A relation that can be nonloss-decomposed into two projections we will call "2-decomposable"
and similarly term n-decomposable may be defined.
2.8 Join Dependency:

Let R be a relation, and let A, B, Z be subsets of the attributes of R. Then we say that R satisfies
the Join Dependency (JD)
*{ A, B, ..., Z} (Read "star A, K ..., Z") if and only if every possible legal value of R is equal to
the join of its projections on A, B,..., Z.
Fifth normal form: A relation R is in 5NF - also called projection-join normal torn (PJ/NF) - if
and only if every nontrivial* join dependency that holds for R is implied by the candidate keys of
R. Let us understand what it means for a JD to be "implied by candidate keys."
Relation REL12 is not in 5NF, it satisfies a certain join dependency, namely Constraint 3D, that
is certainly not implied by its sole candidate key (that key being the combination of all of its
attributes).
Now let us understand through an example, what it means for a JD to be implied by candidate
keys. Suppose that the familiar suppliers relation REL1 has two candidate keys, {S#} and
{SUPPLIERNAME}. Then that relation satisfies several join dependencies - for example, it
satisfies the JD
*{ { S#, SUPPLIERNAME, SUPPLYSTATUS }, { S#, SUPPLYCITY } }

That is, relation REL1 is equal to the join of its projections on {S#, SUPPLIERNAME,
SUPPLYSTATUS} and {S#, SUPPLYCITY), and hence can be nonloss-decomposed into those
projections. (This fact does not mean that it should be so decomposed, of course, only that it
could be.) This JD is implied by the fact that {S#} is a candidate key (in fact it is implied by
Heath's theorem) Likewise, relation REL1 also satisfies the JD

* {{S#, SUPPLIERNAME}, {S#, SUPPLYSTATUS}, {SUPPLIERNAME, SUPPLYCITY}}


This JD is implied by the fact that {S#} and { SUPPLYNAME} are both candidate keys.

To conclude, we note that it follows from the definition that 5NF is the ultimate normal form
with respect to projection and join (which accounts for its alternative name, projection-join
normal form). That is, a relation in 5NF is guaranteed to be free of anomalies that can be
eliminated by taking projections. For a relation is in 5NF the only join dependencies are those
that are implied by candidate keys, and so the only valid decompositions are ones that are based
on those candidate keys.

Chapter-3
FUNCTIONAL DEPENDENCY AND NORMALIZATION
End Chapter quizzes:
Q1 Normalization is step by step process of decomposing:
(e) Table
(f) Database
(g) Group Data item
(h) All of the above

Q2 A relation is said to be in 2 NF if
(i) it is in 1 NF
(ii) non-key attributes dependent on key attribute
(iii) non-key attributes are independent of one another
(iv) if it has a composite key, no non-key attribute should be dependent on
part of the composite key.

(a) i, ii, iii


(c) i, ii, iv

(b) i and ii
(d) i, iv

Q3. A relation is said to be in 3 NF if


(i) it is in 2 NF
(ii) non-key attributes are independent of one another
(iii) key attribute is not dependent on part of a composite key
(iv) has no multi-valued dependency
(a) i and iii
(c) i and ii

(b) i and iv
(d) ii and iv

Q4. A relation is said to be in BCNF when


(a) it has overlapping composite keys
(b) it has no composite keys
(c) it has no multivalued dependencies
(d) it has no overlapping composite keys which have related attributes
Q5. Fourth normal form (4 NF) relations are needed when.
(a) there are multivalued dependencies between attributes in composite key
(b) there are more than one composite key
(c) there are two or more overlapping composite keys
(d) there are multivalued dependency between non-key attributes

Q6. A good database design


(i) is expandable with growth and changes in organization
(ii) easy to change when software changes
(iii) ensures data integrity
(iv) allows access to only authorized users
(a) i, ii
(c) i, ii, iii, iv

(b) ii, iii


(d) i, ii, iii

Q7. Given an attribute x, another attribute y is dependent on it, if for a given x


(a) there are many y values
(b) there is only one value of y
(c) there is one or more y values
(d) there is none or one y value
Q8. If a non key attribute is depending on another non key attribute, It is known as
a) Full F D
b) Partial F D
c) TRANSITIVE F D
d) None of the above
Q9. Decomposition of relation should always be
a) Lossy
b) Lossless
c) Both a and b
d) None of the above

Chapter: 4
STRUCTURE QUERY LANGUAGE

1. INTRODUCTARY CONCEPT
1.1 What is SQL?
SQL stands for Structured Query Language

SQL allows you to access a database

SQL is an ANSI standard computer language

SQL can execute queries against a database

SQL can retrieve data from a database

SQL can insert new records in a database

SQL can delete records from a database

SQL can update records in a database

SQL is easy to learn

SQL is an ANSI (American National Standards Institute) standard computer language for
accessing and manipulating database systems. SQL statements are used to retrieve and update
data in a database. SQL works with database programs like MS Access, DB2, Informix, MS
SQL Server, Oracle, Sybase, etc
1.2 SQL Database Tables:
A database most often contains one or more tables. Each table is identified by a name (e.g.
"Customers" or "Orders"). Tables contain records (rows) with data.
Below is an example of a table called "Persons":
LastName

FirstName

Address

City

Hansen

Ola

Timoteivn 10

Sandnes

Svendson

Tove

Borgvn 23

Sandnes

Pettersen

Kari

Storgt 20

Stavanger

The table above contains three records (one for each person) and four columns (LastName,
FirstName, Address, and City).

2. DATABASE LANGUAGE
2.1 SQL Data Definition Language (DDL)
The Data Definition Language (DDL) part of SQL permits database tables to be created or
deleted. We can also define indexes (keys), specify links between tables, and impose constraints
between database tables.

The most important DDL statements in SQL are:

CREATE TABLE - creates a new database table

ALTER TABLE - alters (changes) a database table

DROP TABLE - deletes a database table

Create a Table
To create a table in a database:
CREATE TABLE table_name
(
column_name1 data_type,
column_name2 data_type,
.......
)

Example
This example demonstrates how you can create a table named "Person", with four columns. The
column names will be "LastName", "FirstName", "Address", and "Age":

ALTER TABLE
The ALTER TABLE statement is used to add, drop and modify columns in an existing table.

ALTER TABLE table_name

ADD column_name datatype

ALTER TABLE table_name


MODIFY column_name datatype

ALTER TABLE table_name


DROP COLUMN column_name

Delete a Table or Database


To delete a table (the table structure attributes, and indexes will also be deleted):
DROP TABLE table_name

2.2 SQL Data Manipulation Language (DML)


DML language includes syntax to update, insert, and delete records. These query and update
commands together form the Data Manipulation Language (DML) part of SQL:

UPDATE - updates data in a database table

DELETE - deletes data from a database table

INSERT INTO - inserts new data into a database table

The INSERT INTO Statement


The INSERT INTO statement is used to insert new rows into a table.
Syntax
INSERT INTO table_name
VALUES (value1, value2,....)

You can also specify the columns for which you want to insert data:

INSERT INTO table_name (column1, column2,...)


VALUES (value1, value2,....)

The Update Statement


The UPDATE statement is used to modify the data in a table.
Syntax
UPDATE table_name
SET column_name = new_value
WHERE column_name = some_value

The DELETE Statement


The DELETE statement is used to delete rows in a table.
Syntax
DELETE FROM table_name
WHERE column_name = some_value

2.3 SQL Data Manipulation Language (DQL)


It is used to retrieve the existing data from the database, using select statements.
SQL SELECT Example
To select the content of columns named "LastName" and "FirstName", from the database table
called "Persons", use a SELECT statement like this:

SELECT LastName, FirstName


FROM Persons

The WHERE Clause


To conditionally select data from a table, a WHERE clause can be added to the SELECT
statement.
Syntax
SELECT column FROM table
WHERE column operator value

With the WHERE clause, the following operators can be used:

Operator

Description

Equal

<>

Not equal

>

Greater than

<

Less than

>=

Greater than or equal

<=

Less than or equal

BETWEEN

Between an inclusive range

LIKE

Search for a pattern

Using the WHERE Clause


To select only the persons living in the city "Sandnes", we add a WHERE clause to the SELECT
statement:
SELECT * FROM Persons
WHERE City='Sandnes'

"Persons" table
LastName

FirstName

Address

City

Year

Hansen

Ola

Timoteivn 10

Sandnes

1951

Svendson

Tove

Borgvn 23

Sandnes

1978

Svendson

Stale

Kaivn 18

Sandnes

1980

Pettersen

Kari

Storgt 20

Stavanger

1960

Result
LastName

FirstName

Address

City

Year

Hansen

Ola

Timoteivn 10

Sandnes

1951

Svendson

Tove

Borgvn 23

Sandnes

1978

Svendson

Stale

Kaivn 18

Sandnes

1980

The LIKE Condition


The LIKE condition is used to specify a search for a pattern in a column.
Syntax
SELECT column
FROM table
WHERE column LIKE pattern

A "%" sign can be used to define wildcards (missing letters in the pattern) both before and after
the pattern.

Using LIKE
The following SQL statement will return persons with first names that start with an 'O':
SELECT *
FROM Persons
WHERE FirstName LIKE 'O%'

The ORDER BY keyword is used to sort the result.

Sort the Rows

The ORDER BY clause is used to sort the rows.


Orders:
Company

OrderNumber

Sega

3412

ABC Shop

5678

W3Schools

2312

W3Schools

6798

Example
To display the companies in alphabetical order:
SELECT Company, OrderNumber FROM Orders
ORDER BY Company

Result:
Company

OrderNumber

ABC Shop

5678

Sega

3412

W3Schools

6798

W3Schools

2312

Example
To display the companies in alphabetical order AND the order numbers in numerical order:

SELECT Company, OrderNumber FROM Orders


ORDER BY Company, OrderNumber

Result:
Company

OrderNumber

ABC Shop

5678

Sega

3412

W3Schools

2312

W3Schools

6798

GROUP BY...
Aggregate functions (like SUM) often need an added GROUP BY functionality.
GROUP BY... was added to SQL because aggregate functions (like SUM) return the aggregate
of all column values every time they are called, and without the GROUP BY function it was
impossible to find the sum for each individual group of column values.
The syntax for the GROUP BY function is:

SELECT column, SUM(column)


FROM table
GROUP BY column

GROUP BY Example
This "Sales" Table:
Company

Amount

W3Schools

5500

IBM

4500

W3Schools

7100

3. What is a View?
In SQL, a VIEW is a virtual table based on the result-set of a SELECT statement.
A view contains rows and columns, just like a real table. The fields in a view are fields from one
or more real tables in the database. You can add SQL functions, WHERE, and JOIN statements
to a view and present the data as if the data were coming from a single table.
Syntax
CREATE VIEW view_name AS
SELECT
FROM
WHERE

column_name(s)
table_name
condition

View is of two types updateable view and non-updateable view. Using updateable view value of
the table can be modified where as in case of non updateable view base table can not be updated.

4. Rename of a Table Column


ALTER TABLE

<table>

RENAME <oldname> TO <newname>;

RENAME TABLE student TO student_new


This SQL command will rename the student table to student_new

5. Renames a SQL view in the current database.


RENAME VIEW ViewName1 TO ViewName2
Parameters
ViewName1
Specifies the name of the SQL view to be renamed.
ViewName2
Specifies the new name of the SQL view.

6. Renaming Columns & Constraints

In addition to renaming tables and indexes Oracle9i Release 2 allows the renaming of columns
and constraints on tables. In this example once the the TEST1 table is created it is renamed along
with it's columns, primary key constraint and the index that supports the primary key:
SQL> CREATE TABLE test1
(
2

col1 NUMBER(10) NOT NULL,

col2 VARCHAR2(50) NOT NULL );

Table created.
SQL> ALTER TABLE test1
ADD (
2

CONSTRAINT test1_pk PRIMARY KEY (col1));

Table altered.
SQL> DESC
Name

test1
Null?

Type

-------------------- -------- -------------------COL1

NOT NULL NUMBER(10)

COL2

NOT NULL VARCHAR2(50)

SQL> SELECT constraint_name


2 FROM

user_constraints

3 WHERE

table_name

4 AND c

= 'TEST1'

onstraint_type = 'P';

CONSTRAINT_NAME
-----------------------------TEST1_PK

1 row selected.
SQL> SELECT index_name, column_name
2 FROM user_ind_columns
3 WHERE table_name = 'TEST1';

INDEX_NAME

COLUMN_NAME

-------------------- -------------------TEST1_PK

COL1

1 row selected.
SQL> -- Rename the table, columns, primary key
SQL> -- and supporting index.
SQL> ALTER TABLE test1 RENAME TO test;
Table altered.
SQL> ALTER TABLE test RENAME COLUMN col1 TO id;
Table altered.
SQL> ALTER TABLE test RENAME COLUMN col2 TO description;
Table altered.
SQL> ALTER TABLE test RENAME CONSTRAINT test1_pk TO test_pk;
Table altered.
SQL> ALTER INDEX test1_pk RENAME TO test_pk;
Index altered.
SQL> DESC test
Name

Null?

Type

-------------------- -------- -------------------ID

NOT NULL NUMBER(10)

DESCRIPTION

NOT NULL VARCHAR2(50)

SQL> SELECT constraint_name


2 FROM user_constraints
3 WHERE table_name

= 'TEST'

4 AND constraint_type = 'P';

CONSTRAINT_NAME
-------------------TEST_PK

1 row selected.

SQL> SELECT index_name, column_name


2 FROM user_ind_columns
3 WHERE table_name = 'TEST';

INDEX_NAME

COLUMN_NAME

-------------------- -------------------TEST_PK

1 row selected.

ID

STRUCTURE QUERY LANGUAGE


End Chapter quizzes:
Q1 SELECT statement is used for
a)
b)
c)
d)

Updating data in the database


Retrieving data from the database
Change in the structure of database
None of the above

Q2. Select the correct statement


a) ALTER statement is used to modify the structure of Database.
b) Update statement is used to change the data into the table.
c) SELECT statement is used to retrieve the data from the database
d) All of the above.
Q3. Which of the following statements are NOT TRUE about ORDER BY clauses?
A. Ascending or descending order can be defined with the asc or desc keywords.
B. Only one column can be used to define the sort order in an order by clause.
C. Multiple columns can be used to define sort order in an order by clause.
D. Columns can be represented by numbers indicating their listed order in the select

Q4 GRANT and REVOKE are


(a) DDL statements
(b) DML statements
(c) DCL statements
(d) None of these.

Q5. Oracle 8i can be best described as


(a) Object-based DBMS
(b) Object-oriented DBMS
(c) Object-relational DBMS
(d) Relational DBMS
Q6. Select the correct statement.
a) View has no physical existence.
b) Data from the view are retrieved through the Table.
c) Both (a) and (b)
d) None of these.
Q7 INSERT statement is used to
a) Storing data into the Table

b) Deleting data from the Table


c) Both a and b
d) Updating data in the table
Q8 ALTER statement is used to
A) Changing structure of the table
B ) Changing data from the Table
C ) Both a and b
D ) Deleting data from the table
Q9. RENAME TABLE student TO student_new
a) Rename the column of the Table
b) Change the Table name student to student_new
c) Rename the row of the table
d) None of the above.

Q10. ORDER By clause ids used to


a) Sort the row of the table in a particular order
b) Remove the column of the table
c) Rename the Table
d) Both a and c

Chapter: 5
PROCEDURAL QUERY LANGUAGE

1. Introduction to PL/SQL
PL/SQL is a procedural extension for Oracles Structured Query Language. PL/SQL is not a
separate language rather a technology. Mean to say that you will not have a separate place or
prompt for executing your PL/SQL programs. PL/SQL technology is like an engine that executes
PL/SQL blocks and subprograms. This engine can be started in Oracle server or in application
development tools such as Oracle Forms, Oracle Reports etc.

As shown in the above figure PL/SQL engine executes procedural statements and sends SQL
part of statements to SQL statement processor in the Oracle server. PL/SQL combines the data
manipulating power of SQL with the data processing power of procedural languages.

2 Block Structure of PL/SQL:


PL/SQL is a block-structured language. It means that Programs of PL/SQL contain logical
blocks. PL/SQL block consists of SQL and PL/SQL statements.

A PL/SQL Block consists of three sections:

The Declaration section (optional).

The Execution section (mandatory).

The Exception (or Error) Handling section (optional).

2.1 Declaration Section:


The Declaration section of a PL/SQL Block starts with the reserved keyword DECLARE. This
section is optional and is used to declare any placeholders like variables, constants, records and
cursors, which are used to manipulate data in the execution section. Placeholders may be any of
Variables, Constants and Records, which stores data temporarily. Cursors are also declared in
this section.
Declaring

Variables:

Variables

are

declared

in

DECLARE

section

of

PL/SQL.

DECLARE
SNO NUMBER (3);
SNAME VARCHAR2 (15);

2.2 Execution Section:


The Execution section of a PL/SQL Block starts with the reserved keyword BEGIN and ends
with END. This is a mandatory section and is the section where the program logic is written to
perform any task. The programmatic constructs like loops, conditional statement and SQL
statements form the part of execution section.

2.3 Exception Section:


The Exception section of a PL/SQL Block starts with the reserved keyword EXCEPTION. This
section is optional. Any errors in the program can be handled in this section, so that the PL/SQL
Blocks terminates gracefully. If the PL/SQL Block contains exceptions that cannot be handled,

the Block terminates abruptly with errors. Every statement in the above three sections must end
with a semicolon (;). PL/SQL blocks can be nested within other PL/SQL blocks. Comments can
be used to document code.

3. How a sample PL/SQL Block looks.


DECLARE
Variable declaration
BEGIN
Program Execution
EXCEPTION
Exception handling
Variables and Constants: Variables are used to store query results. Forward references are not
allowed.

Hence

you

must

first

declare

the

variable

and

then

use

it.

Variables can have any SQL data type, such as CHAR, DATE, NUMBER etc or any PL/SQL
data type like BOOLEAN, BINARY_INTEGER etc.
Declaring

Variables:

Variables

are

declared

in

DECLARE

section

of

PL/SQL.

DECLARE
SNO NUMBER (3);
SNAME VARCHAR2 (15);
BEGIN
Assigning values to variables:
SNO NUMBER: = 1001;
or
SNAME: = JOHN; etc
Following screen shot explain you how to write a simple PL/SQL program and execute it

.
SET SERVEROUTPUT ON is a command used to access results from Oracle Server.
A PL/SQL program is terminated by a / . DBMS_OUTPUT is a package and PUT_LINE is a
procedure in it.
You will learn more about procedures, functions and packages in the following sections of this
tutorial.

Above program can also be written as a text file in Notepad editor and then executed as
explained in the following screen shot.

4. Control Statements
This section explains about how to structure flow of control through a PL/SQL program. The
control structures of PL/SQL are simple yet powerful. Control structures in PL/SQL can be
divided into selection:
Conditional,
Iterative and
Sequential.

4.1 Conditional Control (Selection): This structure tests a condition, depending on the
condition is true or false it decides the sequence of statements to be executed.
Example
Syntax for IF-THEN
IF THEN
Statements
END IF;
Example:

Syntax for IF-THEN-ELSE:


IF THEN
Statements
ELSE
statements
END IF;

Example:

Syntax for IF-THEN-ELSIF:


IF THEN
Statements
ELSIF THEN
Statements
ELSE
Statements
END IF;

4.2 Iterative Control


LOOP statement executes the body statements multiple times. The statements are placed
between LOOP END LOOP keywords. The simplest form of LOOP statement is an infinite
loop. EXIT statement is used inside LOOP to terminate it.
Syntax for LOOP- END LOOP
LOOP
Statements
END LOOP;
Example:
BEGIN
LOOP
DBMS_OUTPUT.PUT_LINE (Hello);
END LOOP;
END;

5. CURSOR
For every SQL statement execution certain area in memory is allocated. PL/SQL allows you to
name this area. This private SQL area is called context area or cursor. A cursor acts as a handle
or pointer into the context area. A PL/SQL program controls the context area using the cursor.
Cursor represents a structure in memory and is different from cursor variable. When you declare
a cursor, you get a pointer variable, which does not point any thing. When the cursor is opened,
memory is allocated and the cursor structure is created. The cursor variable now points the
cursor. When the cursor is closed the memory allocated for the cursor is released.
Cursors allow the programmer to retrieve data from a table and perform actions on that data one
row at a time. There are two types of cursors implicit cursors and explicit cursors.
5.1 Implicit cursors
For SQL queries returning single row PL/SQL declares implicit cursors. Implicit cursors are
simple SELECT statements and are written in the BEGIN block (executable section) of the
PL/SQL. Implicit cursors are easy to code, and they retrieve exactly one row. PL/SQL implicitly
declares cursors for all DML statements.

The most commonly raised exceptions here are NO_DATA_FOUND or TOO_MANY_ROWS.

Syntax:
SELECT Ename , sal

INTO ena ,esa FROM EMP WHERE EMPNO =7845;

Note: Ename and sal are columns of the table EMP and ena and esa are the variables
used to store ename and sal fetched by the query.

5.2 Explicit Cursors


Explicit cursors are used in queries that return multiple rows. The set of rows fetched by a query
is called active set. The size of the active set meets the search criteria in the select statement.
Explicit cursor is declared in the DECLARE section of PL/SQL program.
Syntax:
CURSOR <cursor-name> IS <select statement>
Sample Code:

DECLARE
CURSOR

emp_cur

IS

SELECT

ename

FROM

EMP;

BEGIN
-----END;
Processing multiple rows is similar to file processing. For processing a file you need to open it,
process records and then close. Similarly user-defined explicit cursor needs to be opened, before
reading the rows, after which it is closed. Like how file pointer marks current position in file
processing, cursor marks the current position in the active set.

5.3 Opening Cursor


Syntax: OPEN <cursor-name>;
Example:
OPEN emp_cur;
When a cursor is opened the active set is determined, the rows satisfying the where clause in the
select statement are added to the active set. A pointer is established and points to the first row in
the active set.
5.4 Fetching from the cursor: To get the next row from the cursor we need to use fetch
statement.
Syntax: FETCH <cursor-name> INTO <variables>;
Example: FETCH emp_cur INTO ena;

FETCH statement retrieves one row at a time. Bulk collect clause need to be used to fetch more
than one row at a time. Closing the cursor: After retrieving all the rows from active set the
cursor should be closed. Resources allocated for the cursor are now freed. Once the cursor is
closed the execution of fetch statement will lead to errors.

CLOSE <cursor-name>;

5.5 Explicit Cursor Attributes


Every cursor defined by the user has 4 attributes. When appended to the cursor name these
attributes let the user access useful information about the execution of a multi row query.
The attributes are:
1.

%NOTFOUND: It is a Boolean attribute, which evaluates to true, if the last fetch failed.

i.e. when there are no rows left in the cursor to fetch.


2.

%FOUND: Boolean variable, which evaluates to true if the last fetch, succeeded.

3.

%ROWCOUNT: Its a numeric attribute, which returns number of rows fetched by the

cursor so far.
4.

%ISOPEN: A Boolean variable, which evaluates to true if the cursor is opened otherwise

to false.

In above example I wrote a separate fetch for each row, instead loop statement could be used
here. Following example explains the usage of LOOP.

6. Exceptions
An Exception is an error situation, which arises during program execution. When an error occurs
exception is raised, normal execution is stopped and control transfers to exception-handling part.
Exception handlers are routines written to handle the exception. The exceptions can be internally
defined (system-defined or pre-defined) or User-defined exception.

6.1 Predefined exception is raised automatically whenever there is a violation of Oracle coding
rules. Predefined exceptions are those like ZERO_DIVIDE, which is raised automatically when
we try to divide a number by zero. Other built-in exceptions are given below. You can handle
unexpected Oracle errors using OTHERS handler. It can handle all raised exceptions that are not
handled by any other handler. It must always be written as the last handler in exception block.

CURSOR_ALREADY_OPEN Raised when we try to open an already open cursor.

DUP_VAL_ON_INDEX When you try to insert a duplicate value into a unique column

INVALID_CURSOR It occurs when we try accessing an invalid cursor

INVALID_NUMBER On usage of something other than number in place of number

value.

LOGIN_DENIED At the time when user login is denied

TOO_MANY_ROWS When a select query returns more than one row and the

destination variable can take only single value.

VALUE_ERROR When an arithmetic, value conversion, truncation, or constraint error

occurs.
Predefined exception handlers are declared globally in package STANDARD. Hence we need
not have to define them rather just use them.

The biggest advantage of exception handling is it improves readability and reliability of the code.
Errors from many statements of code can be handles with a single handler. Instead of checking

for an error at every point we can just add an exception handler and if any exception is raised it is
handled by that.
For checking errors at a specific spot it is always better to have those statements in a separate
begin end block.

Examples 1: Following example gives the usage of ZERO_DIVIDE exception

Exmpmple 2: I have explained the usage of NO_DATA_FOUND exception in the following

The DUP_VAL_ON_INDEX is raised when a SQL statement tries to create a duplicate value in
a column on which a primary key or unique constraints are defined.
Example: To demonstrate the exception DUP_VAL_ON_INDEX.

More than one Exception can be written in a single handler as shown below.

EXCEPTION
When
Statements;
END;

NO_DATA_FOUND

or

TOO_MANY_ROWS

then

6.2 User-defined Exceptions


A User-defined exception has to be defined by the programmer. User-defined exceptions are
declared in the declaration section with their type as exception. They must be raised explicitly
using RAISE statement, unlike pre-defined exceptions that are raised implicitly. RAISE
statement
can
also
be
used
to
raise
internal
exceptions.
Declaring Exception:
DECLARE
myexception
EXCEPTION;
BEGIN
-----Raising

Exception:

BEGIN
RAISE
-------

myexception;

Handling Exception:
BEGIN
--------EXCEPTION
WHEN
Statements;
END;

myexception

THEN

Points To Ponder:

An Exception cannot be declared twice in the same block.

Exceptions declared in a block are considered as local to that block and global to its sub-

blocks.

An enclosing block cannot access Exceptions declared in its sub-block. Where as it

possible for a sub-block to refer its enclosing Exceptions.

The following example explains the usage of User-defined Exception

RAISE_APPLICATION_ERROR
To display your own error messages one can use the built-in RAISE_APPLICATION_ERROR.
They display the error message in the same way as Oracle errors. You should use a negative
number between 20000 to 20999 for the error_number and the error message
should not exceed 512 characters. The syntax to call raise_application_error is
RAISE_APPLICATION_ERROR (error_number, error_message, { TRUE | FALSE })

Fetch is used twice in the above example to make % FOUND available.

Using Cursor For Loop:


The cursor for Loop can be used to process multiple records. There are two benefits with cursor
for Loop
1. It implicitly declares a %ROWTYPE variable, also uses it as LOOP index
2. Cursor For Loop itself opens a cursor, read records then closes the cursor
automatically. Hence OPEN, FETCH and CLOSE statements are not necessary in it.

2. Example:

emp_rec is automatically created variable of %ROWTYPE. We have not used OPEN, FETCH ,
and CLOSE in the above example as for cursor loop does it automatically. The above example
can be rewritten as shown in the Fig , with less lines of code. It is called Implicit for Loop.

Deletion or Updation Using Cursor:


In all the previous examples I explained about how to retrieve data using cursors. Now we will
see how to modify or delete rows in a table using cursors. In order to Update or Delete rows, the
cursor must be defined with the FOR UPDATE clause. The Update or Delete statement must be
declared with WHERE CURRENT OF
Following example updates comm of all employees with salary less than 2000 by adding 100 to
existing comm.

7. PL/SQL subprograms
A subprogram is a named block of PL/SQL. There are two types of subprograms in PL/SQL
namely Procedures and Functions. Every subprogram will have a declarative part, an executable
part or body, and an exception handling part, which is optional.

Declarative part contains variable declarations. Body of a subprogram contains executable


statements of SQL and PL/SQL. Statements to handle exceptions are written in exception part.

When client executes a procedure are function, the processing is done in the server. This reduces
network traffic. The subprograms are compiled and stored in the Oracle database as stored
programs and can be invoked whenever required. As they are stored in compiled form when
called they only need to be executed. Hence they save time needed for compilation.
Subprograms provide the following advantages

1. They allow you to write PL/SQL program that meet our need
2. They allow you to break the program into manageable modules.
3. They provide reusability and maintainability for the code.
7.1 Procedures
Procedure is a subprogram used to perform a specific action. A procedure contains two parts
specification and the body. Procedure specification begins with CREATE and ends with
procedure name or parameters list. Procedures that do not take parameters are written without a
parenthesis. The body of the procedure starts after the keyword IS or AS and ends with keyword
END.

In the above given syntax things enclosed in between angular brackets (&lt; &gt; ) are user
defined

and

those

enclosed

in

square

brackets

([

])

are

optional.

OR REPLACE is used to overwrite the procedure with the same name if there is any.
AUTHID clause is used to decide whether the procedure should execute with invoker (currentuser or person who executes it) or with definer (owner or person created) rights
Example
CREATE PROCEDURE MyProc
(ENO NUMBER)
AUTHID DEFINER AS
BEGIN
DELETE FROM EMP
WHERE EMPNO= ENO;
EXCEPTION
WHEN NO_DATA_FOUND THEN
DBMS_OUTPUT.PUT_LINE
(No

employee

with

this

number);

END;
Let us assume that above procedure is created in SCOTT schema (SCOTT user area) and say is
executed by user SEENU. It will delete rows from the table EMP owned by SCOTT, but not
from the EMP owned by SEENU. It is possible to use a procedure owned by one user on tables
owned by other users. It is possible by setting invoker-rights

AUTHID

CURRENT_USER

PRAGMA AUTONOMOUS_TRANSACTION is used to instruct the compiler to treat the


procedure as autonomous. i.e. commit or rollback the changes made by the procedure.
Parameter Modes

Parameters are used to pass the values to the procedure being called. There are 3 modes to be
used with parameters based on their usage. IN, OUT, and IN OUT. IN mode parameter used to
pass the values to the called procedure. Inside the program IN parameter acts like a constant. i.e
it cannot be modified. OUT mode parameter allows you to return the value from the procedure.
Inside Procedure the OUT parameter acts like an uninitialized variable. Therefore its value
cannot be assigned to another variable.
IN OUT mode parameter allows you to both pass to and return values from the subprogram.
Default mode of an argument is IN.

POSITIONAL

vs. NOTATIONAL parameters

A procedure can be communicated by passing parameters to it. The parameters passed to a


procedure may follow either positional notation or named notation.
Example
If a procedure is defined as GROSS (ESAL NUMBER, ECOM NUMBER)
If we call this procedure as GROSS (ESA, ECO) then parameters used are called positional
parameters.
For
Notational
Parameters
we
use
the
following
syntax
GROSS (ECOM =&gt; ECO, ESAL =&gt; ESA)

A procedure can also be executed by invoking it as an executable statement as shown below.


BEGIN
PROC1;

---

PROC1

is

name

of

the

procedure.

END;
/

Functions:
A function is a PL/SQL subprogram, which is used to compute a value. Function is same like a
procedure
except
for
the
difference
that
it
have
RETURN
clause.
Syntax for Function

Examples
Function without arguments

Function with arguments. Different ways of executing the function.

Chapter-5
PROCEDURAL QUERY LANGUAGE
End Chapter quizzes
Q1 Select the correct statement
c)
d)
e)
f)

User-defined exceptions are defined by the programmer


PL/SQL improves the capacity of SQL

%NOTFOUND: It is a Boolean attribute


All of the above

Q2) Select the correct statement


a) Declaration section is optional.
b) The Execution section is mandatory
c) The Exception (or Error) Handling section is mandatory.
d) Only a and c is correct.
Q3. A command used to access results from Oracle Server
a) SET SERVEROUTPUT ON
b) PRINT
c) WRITE
d) OUTPUT_SERVER
Q4. Which cursors are used in queries that return multiple rows?
a) Explicit cursor
b) Implicit cursors
c) Open Cursor
d) Both a and c
Q5 Program logic of PL SQL is written in:
a) Declaration section
b) Execution Section
c) Exception Handling
d) Program Section.
Q6 Variable and Constants are declared in
a) Variable Section
b) Declaration Section
c) Execution Section
d) Program Section
Q7. There are two types of subprograms in PL/SQL namely
a) Procedures
b) Cursor
c) Functions
d) Both a and c

Q8. User-defined exception has to be defined by


a) Programmer
b) User
c) Technical Writer
d) None
Q9. Biggest advantage of exception handling is it improves
a) Readability
b) Reliability
c) Both a and b
d) None
Q10. NO_DATA_FOUND or TOO_MANY_ROWS. are
a)
b)
c)
d)

most commonly used function


most commonly used raised exceptions
Triggers
Procedures

Chapter: 6
TRANSACTION MANAGEMENT & CONCURRENCY
CONYROL TECHNIQUE

1. Introductory Concept to Database Transaction


A database transaction comprises of a logical unit of work performed within a database
management system (or similar system) against a database, and treated in a coherent and reliable
way independent of other transactions.
Transactions in a database environment have two main purposes:
1. To provide reliable units of work that allow correct recovery from failures and keep a database
consistent even in cases of system failure, when execution stops (completely or partially) and
many operations upon a database remain uncompleted, with unclear status.
2. To provide isolation between programs accessing a database concurrently. Without isolation
the programs' outcomes are possibly erroneous.
A database transaction, by definition, must be atomic, consistent, isolated and durable.
Database practitioners often refer to these properties of database transactions using the acronym
ACID.
Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a
database must either complete in its entirety or have no effect whatsoever. Further, the system
must isolate each transaction from other transactions, results must conform to existing
constraints in the database, and transactions that complete successfully must get written to
durable storage.
Most modern relational database management systems fall into the category of databases that
support transactions: transactional databases.
In a database system a transaction might consist of one or more data-manipulation statements
and queries, each reading and/or writing information in the database. Users of database systems
consider consistency and integrity of data as highly important. A simple transaction is usually
issued to the database system in a language like SQL wrapped in a transaction, using a pattern
similar to the following:

1.

Begin the transaction

2.

Execute several data manipulations and queries

3.

If no errors occur then commit the transaction and end it

4.

If errors occur then rollback the transaction and end it

If no errors occurred during the execution of the transaction then the system commits the
transaction. A transaction commit operation applies all data manipulations within the scope of
the transaction and persists the results to the database. If an error occurs during the transaction,
or if the user specifies a rollback operation, the data manipulations within the transaction are not
persisted to the database. In no case can a partial transaction be committed to the database since
that would leave the database in an inconsistent state.
Internally, multi-user databases store and process transactions, often by using a transaction ID or
XID.

2. ACID properties
When a transaction processing system creates a transaction, it will ensure that the transaction
will have certain characteristics. The developers of the components that comprise the transaction
are assured that these characteristics are in place. They do not need to manage these
characteristics themselves. These characteristics are known as the ACID properties. ACID is an
acronym for atomicity, consistency, isolation, and durability.
2.1 Atomicity
The atomicity property identifies that the transaction is atomic. An atomic transaction is either
fully completed, or is not begun at all. Any updates that a transaction might affect on a system
are completed in their entirety. If for any reason an error occurs and the transaction is unable to
complete all of its steps, the then system is returned to the state it was in before the transaction
was started. An example of an atomic transaction is an account transfer transaction. The money
is removed from account A then placed into account B. If the system fails after removing the
money from account A, then the transaction processing system will put the money back into
account A, thus returning the system to its original state. This is known as a rollback, as we said
at the beginning of this chapter..

2.2 Consistency
A transaction enforces consistency in the system state by ensuring that at the end of any
transaction the system is in a valid state. If the transaction completes successfully, then all
changes to the system will have been properly made, and the system will be in a valid state. If
any error occurs in a transaction, then any changes already made will be automatically rolled
back. This will return the system to its state before the transaction was started. Since the system
was in a consistent state when the transaction was started, it will once again be in a consistent
state.
Looking again at the account transfer system, the system is consistent if the total of all accounts
is constant. If an error occurs and the money is removed from account A and not added to
account B, then the total in all accounts would have changed. The system would no longer be
consistent. By rolling back the removal from account A, the total will again be what it should be,
and the system back in a consistent state.
2.3 Isolation
When a transaction runs in isolation, it appears to be the only action that the system is carrying
out at one time. If there are two transactions that are both performing the same function and are
running at the same time, transaction isolation will ensure that each transaction thinks it has
exclusive use of the system. This is important in that as the transaction is being executed, the
state of the system may not be consistent. The transaction ensures that the system remains
consistent after the transaction ends, but during an individual transaction, this may not be the
case. If a transaction was not running in isolation, it could access data from the system that may
not be consistent. By providing transaction isolation, this is prevented from happening.
2.4 Durability
A transaction is durable in that once it has been successfully completed, all of the changes it
made to the system are permanent. There are safeguards that will prevent the loss of information,
even in the case of system failure. By logging the steps that the transaction performs, the state of
the system can be recreated even if the hardware itself has failed. The concept of durability
allows the developer to know that a completed transaction is a permanent part of the system,
regardless of what happens to the system later on.

3 The Concept of Schedules


When transactions are executing concurrently in an interleaved fashion, not only does the
action of each transaction becomes important, but also the order of execution of operations from
each of these transactions. Hence, for analyzing any problem, it is not just the history of previous
transactions that one should be worrying about, but also the schedule of operations.
3.1 Schedule (History of transaction):
We formally define a schedule S of n transactions T1, T2 Tn as on ordering of operations of
the transactions subject to the constraint that, for each transaction, Ti that participates in S, the
operations of Ti must appear in the same order in which they appear in Ti. i.e. if two operations
Ti1 and Ti2 are listed in Ti such that Ti1 is earlier to Ti2, then in the schedule also Ti1 should appear
before Ti2. However, if Ti2 appears immediately after Ti1 in Ti, the same may not be true in S,
because some other operations Tj1 (of a transaction Tj) may be interleaved between them. In
short, a schedule lists the sequence of operations on the database in the same order in which
it was effected in the first place.
For the recovery and concurrency control operations, we concentrate mainly on read and write
operations of the transactions, because these operations actually effect changes to the database.
The other two (equally) important operations are commit and abort, since they decide when the
changes effected have actually become active on the database.
Since listing each of these operations becomes a lengthy process, we make a notation for
describing the schedule. The read operations (Readtr) , write operations(Writetr) of transactions
, commit and abort, we indicate by r, w, c and a and each of them come with a subscript to
indicate the transaction number
For example SA : r1(x); y2(y); w2(y); r1(y), W1 (x); a1
Indicates the following operations in the same order:
Readtr(x)

transaction 1

Read tr (y)

transaction 2

Write tr (y)

transaction 2

Read tr(y)

transaction 1

Write tr(x)

transaction 1

Abort

transaction 1

3.2 Conflicting operations: Two operations in a schedule are said to be in conflict if they satisfy
these conditions
i)

The operations belong to different transactions

ii)

They access the same item x

iii)

At least one of the operations is a write operation.

For example: r1(x); w2 (x)


W1 (x); r2(x)
w1 (y); w2(y)
Conflict because both of them try to write on the same item.
But r1 (x); w2(y) and r1(x) and r2(x) do not conflict, because in the first case the read and
write are on different data items, in the second case both are trying read the same data
item, which they can do without any conflict.

3.3 A Complete Schedule: A schedule S of n transactions T1, T2.. Tn is said to be a


Complete Schedule if the following conditions are satisfied.
i)

The operations listed in S are exactly the same operations as in T1, T2 Tn,
including the commit or abort operations. Each transaction is terminated by either a
commit or an abort operation.

ii)

The operations in any transaction. Ti appear in the schedule in the same order in
which they appear in the Transaction.

iii)

Whenever there are conflicting operations, one of two will occur before the other in
the schedule.

A Partial order of the schedule is said to occur, if the first two conditions of the complete
schedule are satisfied, but whenever there are non conflicting operations in the schedule, they
can occur without indicating which should appear first.
This can happen because non conflicting operations any way can be executed in any order
without affecting the actual outcome.
However, in a practical situation, it is very difficult to come across complete schedules. This is
because new transactions keep getting included into the schedule. Hence, often one works with a

committed projection C(S) of a schedule S. This set includes only those operations in S that
have committed transactions i.e. transaction Ti whose commit operation Ci is in S.
Put in simpler terms, since non committed operations do not get reflected in the actual outcome
of the schedule, only those transactions, who have completed their commit operations, contribute
to the set and this schedule is good enough in most cases.
3.4 Schedules and Recoverability :
Recoverability is the ability to recover from transaction failures. The success or otherwise of
recoverability depends on the schedule of transactions. If fairly straightforward operations
without much interleaving of transactions are involved, error recovery is a straight forward
process. On the other hand, if lot of interleaving of different transactions have taken place, then
recovering from the failure of any one of these transactions could be an involved affair. In
certain cases, it may not be possible to recover at all. Thus, it would be desirable to characterize
the schedules based on their recovery capabilities.
To do this, we observe certain features of the recoverability and also of schedules. To begin
with, we note that any recovery process, most often involves a roll back operation, wherein
the operations of the failed transaction will have to be undone. However, we also note that
the roll back need to go only as long as the transaction T has not committed. If the
transaction T has committed once, it need not be rolled back. The schedules that satisfy this
criterion are called recoverable schedules and those that do not, are called nonrecoverable schedules. As a rule, such non-recoverable schedules should not be permitted.
Formally, a schedule S is recoverable if no transaction T which appears is S commits, until all
transactions T1 that have written an item which is read by T have committed. The concept is a
simple one. Suppose the transaction T reads an item X from the database completes its
operations (based on this and other values) and commits the values. i.e. the output values of T
become permanent values of database.
But suppose, this value X is written by another transaction T (before it is read by T), but
aborts after T has committed. What happens? The values committed by T are no more valid,
because the basis of these values (namely X) itself has been changed. Obviously T also needs to
be rolled back (if possible), leading to other rollbacks and so on.
The other aspect to note is that in a recoverable schedule, no committed transaction needs to be
rolled back. But, it is possible that a cascading roll back scheme may have to be effected, in
which an uncommitted transaction has to be rolled back, because it read from a value contributed
by a transaction which later aborted. But such cascading rollbacks can be very time consuming

because at any instant of time, a large number of uncommitted transactions may be operating.
Thus, it is desirable to have cascadeless schedules, which avoid cascading rollbacks.

This can be ensured by ensuring that transactions read only those values which are written by
committed transactions i.e. there is no fear of any aborted or failed transactions later on. If the
schedule has a sequence wherein a transaction T1 has to read a value X by an uncommitted
transaction T2, then the sequence is altered, so that the reading is postponed, till T2 either
commits or aborts.

This delays T1, but avoids any possibility of cascading rollbacks.


The third type of schedule is a strict schedule, which as the name suggests is highly
restrictive in nature. Here, transactions are allowed neither to read nor write a value X
until the last transaction that wrote X has committed or aborted. Note that the strict
schedule largely simplifies the recovery process, but the many cases, it may not be
possible device strict schedules.

It may be noted that the recoverable schedule, cascadeless schedules and strict schedules each is
more stringent than its predecessor. It facilitates the recovery process, but sometimes the
process may get delayed or even may become impossible to schedule.

4 Serializability
Given two transaction T1 and T2 are to be scheduled, they can be scheduled in a number of ways.
The simplest way is to schedule them without in that bothering about interleaving them. i.e.
schedule all operation of the transaction T1 followed by all operations of T2 or alternatively
schedule all operations of T2 followed by all operations of T1.
T1

T2

read_tr(X)
X=X+N
write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
Time

read_tr(X)
X=X+P
Write_tr(X)

Non-interleaved (Serial Schedule): A

T1

T2
read_tr(X)
X=X+P
Write_tr(X)

read_tr(X)
X=X+N
write_tr(X )
read_tr(Y)
Y=Y+N
Write_tr(Y)

Non-interleaved (Serial Schedule):B

These now can be termed as serial schedules, since the entire sequence of operation in one
transaction is completed before the next sequence of transactions is started.
In the interleaved mode, the operations of T1 are mixed with the operations of T2. This can be
done in a number of ways. Two such sequences are given below:

T1

T2

read_tr(X )
X=X+N
read_tr(X)
X=X+P
write_tr(X)
read_tr(Y)
Write_tr(X)
Y=Y+N
Write_tr(Y)
Interleaved (non-serial schedule): C

T2

T1
read_tr(X)
X=X+N
write_tr(X)

read_tr(X)
X=X+P
Write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)

Interleaved (non- serial) Schedule D.

Formally a schedule S is serial if, for every transaction, T in the schedule, all operations of T are
executed consecutively, otherwise it is called non serial. In such a non-interleaved schedule, if
the transactions are independent, one can also presume that the schedule will be correct, since
each transaction commits or aborts before the next transaction begins.

As long as the

transactions individually are error free, such sequences of events are guaranteed to give correct
results.
The problem with such a situation is the wastage of resources. If in a serial schedule, one
of the transactions is waiting for an I/O, the other transactions also cannot use the system
resources and hence the entire arrangement is wasteful of resources. If some transaction T is
very long, the other transaction will have to keep waiting till it is completed. Moreover, wherein
hundreds of machines operate concurrently becomes unthinkable. Hence, in general, the serial
scheduling concept is unacceptable in practice.
However, once the operations are interleaved, so that the above cited problems are overcome,
unless the interleaving sequence is well thought of, all the problems that we encountered in the
beginning of this block become addressable. Hence, a methodology is to be adopted to find out
which of the interleaved schedules give correct results and which do not.
A schedule S of N transactions is serializable if it is equivalent to some serial schedule
of the some N transactions. Note that there are n different serial schedules possible to
be made out of n transaction. If one goes about interleaving them, the numbers of
possible combinations become unmanageably high. To ease our operations, we form
two disjoint groups of non serial schedules- these non serial schedules that are
equivalent to one or more serial schedules, which we call serializable schedules and
those that are not equivalent to any serial schedule and hence are not serializable once
a non-serial schedule is serializable, it becomes equivalent to a serial schedule and by
our previous definition of serial schedule will become a correct schedule. But now can
one prove the equivalence of a non-serial schedule to a serial schedule?
The simplest and the most obvious method to conclude that two such schedules are
equivalent is to find out their results. If they produce the same results, then they can be
considered equivalent. i.e. it two schedules are result equivalent, then they can be considered
equivalent. But such an oversimplification is full of problems. Two sequences may produce the

same set of results of one or even a large number of initial values, but still may not be equivalent.
Consider the following two sequences:
S1

S2

read_tr(X)

read_tr(X)

X=X+X

X=X*X

write_tr(X)

Write_tr(X)

For a value X=2, both produce the same result. Can be conclude that they are equivalent?
Though this may look like a simplistic example, with some imagination, one can always come
out with more sophisticated examples wherein the bugs of treating them as equivalent are less
obvious. But the concept still holds -result equivalence cannot mean schedule equivalence. One
more refined method of finding equivalence is available. It is called conflict equivalence.
Two schedules can be said to be conflict equivalent, if the order of any two conflicting
operations in both the schedules is the same (Note that the conflicting operations essentially
belong to two different transactions and if they access the same data item, and atleast one of
them is a write _tr(x) operation). If two such conflicting operations appear in different orders in
different schedules, then it is obvious that they produce two different databases in the end and
hence they are not equivalent.
4.1 Testing for conflict serializability of a schedule:
We suggest an algorithm that tests a schedule for conflict serializability.
1. For each transaction Ti, participating in the schedule S, create a node labeled T1 in the
precedence graph.
2. For each case where Tj executes a readtr(x) after Ti executes write_tr(x), create an
edge from Ti to Tj in the precedence graph.
3. For each case where Tj executes write_tr(x) after Ti executes a read_tr(x), create an
edge from Ti to Tj in the graph.
4. For each case where Tj executes a write_tr(x) after Ti executes a write_tr(x), create
an edge from Ti to Tj in the graph.
5. The schedule S is serialisable if and only if there are no cycles in the graph.

If we apply these methods to write the precedence graphs for the four cases of section 4,
we get the following precedence graphs.

X
T1

T2

T1

T2

X
Schedule A

Schedule B

X
T1

T2

T1

T2

Schedule C

Schedule D

We may conclude that schedule D is equivalent to schedule A.

4.2. View equivalence and view serializability:


Apart from the conflict equivalence of schedules and conflict serializability, another
restrictive equivalence definition has been used with reasonable success in the context
of serializability. This is called view serializability.
Two schedules S and S1 are said to be view equivalent if the following conditions are
satisfied.
i)

The same set of transactions participates in S and S1 and S and S1 include the
same operations of those transactions.

ii)

For any operation ri(X) of Ti in S, if the value of X read by the operation has been
written by an operation wj(X) of Tj(or if it is the original value of X before the
schedule started) the same condition must hold for the value of x read by
operation ri(X) of Ti in S1.

iii)

If the operation Wk(Y) of Tk is the last operation to write, the item Y in S, then
Wk(Y) of Tk must also be the last operation to write the item y in S1.
The concept being view equivalent is that as long as each read operation of the
transaction reads the result of the same write operation in both the schedules, the
write operations of each transaction must produce the same results. Hence, the
read operations are said to see the same view of both the schedules. It can easily
be verified when S or S1 operate independently on a database with the same initial
state, they produce the same end states.

A schedule S is said to be view

serializable, if it is view equivalent to a serial schedule.


It can also be verified that the definitions of conflict serializability and view serializability are similar, if a condition of constrained write
assumption holds on all transactions of the schedules. This condition states that any write operation wi(X) in Ti is preceded by a ri(X) is
Ti and that the value written by wi(X) in Ti depends only on the value of X read by ri(X). This assumes that computation of the new value
of X is a function f(X) based on the old value of x read from the database. However, the definition of view serializability is less restrictive
than that of conflict serializability under the unconstrained write assumption where the value written by the operation Wi(x) in Ti can be
independent of its old value from the database. This is called a blind write.

But the main problem with view serializability is that it is extremely complex
computationally and there is no efficient algorithm to do the same.

4.3 Uses of serializability:


If one were to prove the serializability of a schedule S, it is equivalent to saying that S is
correct. Hence, it guarantees that the schedule provides correct results. But being
serializable is not the same as being serial. A serial scheduling inefficient because of the
reasons explained earlier, which leads to under utilization of the CPU, I/O devices and in
some cases like mass reservation system, becomes untenable. On the other hand, a
serializable schedule combines the benefits of concurrent execution (efficient system
utilization, ability to cater to larger no of concurrent users) with the guarantee of correctness.

But all is not well yet. The scheduling process is done by the operating system routines
after taking into account various factors like system load, time of transaction submission,
priority of the process with reference to other process and a large number of other factors.
Also since a very large number of possible interleaving combinations are possible, it is
extremely difficult to determine before hand the manner in which the transactions are
interleaved.

In other words getting the various schedules itself is difficult, let alone

testing them for serializability.

Hence, instead of generating the schedules, checking them for serializability and then
using them, most DBMS protocols use a more practical method impose restrictions on
the transactions themselves. These restrictions, when followed by every participating
transaction, automatically ensure serializability in all schedules that are created by these
participating schedules.

Also, since transactions are being submitted at different times, it is difficult to determine
when a schedule begins and when it ends. Hence serializability theory can be used to
deal with the problem by considering only the committed projection C(CS) of the
schedule. Hence, as an approximation, we can define a schedule S as serializable if its
committed C(CS) is equivalent to some serial schedule.

5. The need for concurrency control


Let us imagine a situation wherein a large number of users (probably spread over vast
geographical areas) are operating on a concurrent system. Several problems can occur if they are
allowed to execute their transactions operations in an uncontrolled manner.
Consider a simple example of a railway reservation system. Since a number of people
are accessing the database simultaneously, it is obvious that multiple copies of the transactions
are to be provided so that each user can go ahead with his operations. Let us make the concept a

little more specific. Suppose we are considering the number of reservations in a particular train
of a particular date. Two persons at two different places are trying to reserve for this train. By
the very definition of concurrency, each of them should be able to perform the operations
irrespective of the fact that the other person is also doing the same. In fact they will not even
know that the other person is also booking for the same train. The only way of ensuring the
same is to make available to each of these users their own copies to operate upon and finally
update the master database at the end of their operation.
Now suppose there are 10 seats are available. Both the persons, say A and B want to get
this information and book their seats. Since they are to be accommodated concurrently, the
system provides them two copies of the data. The simple way is to perform a read tr (X) so that
the value of X is copied on to the variable X of person A (let us call it XA) and of the person B
(XB). So each of them know that there are 10 seats available.
Suppose A wants to book 8 seats. Since the number of seats he wants is (say Y) less than
the available seats, the program can allot him the seats, change the number of available seats (X)
to X-Y and can even give him the seat numbers that have been booked for him.
The problem is that a similar operation can be performed by B also. Suppose he needs 7
seats. So, he gets his seven seats, replaces the value of X to 3 (10 7) and gets his reservation.
The problem is noticed only when these blocks are returned to main database

(the

disk in the above case).


Before we can analyze these problems, we look at the problem from a more technical
view.

5.1 The lost update problem: This problem occurs when two transactions that access the same
database items have their operations interleaved in such a way as to make the value of some
database incorrect. Suppose the transactions T1 and T2 are submitted at the (approximately)
same time. Because of the concept of interleaving, each operation is executed for some period of
time and then the control is passed on to the other transaction and this sequence continues.
Because of the delay in updating, this creates a problem. This was what happened in the
previous example. Let the transactions be called TA and TB.

TA

TB

Read tr(X)
Read tr(X)

Time

X = X NA
X = X - NB
Write tr(X)
Write tr(X)

Note that the problem occurred because the transaction TB failed to record the
transactions TA. I.e. TB lost on TA. Similarly since TA did the writing later on, TA lost the
updating of TB.

5.2 Dirty read problem

This happens when a transaction TA updates a data item, but later on (for some reason) the
transaction fails. It could be due to a system failure or any other operational reason or the system
may have later on noticed that the operation should not have been done and cancels it. To be
fair, it also ensures that the original value is restored.
But in the meanwhile, another transaction TB has accessed the data and since it has no indication
as to what happened later on, it makes use of this data and goes ahead. Once the original value is
restored by TA, the values generated by TB are obviously invalid.

TA

TB

Read tr(X)

Time

X=XN
Write tr(X)
Read tr(X)
X=X-N
Write tr(X)

Failure
X=X+N
Write tr(X)

The value generated by TA out of a non-sustainable transaction is a dirty data which is read by
TB, produces an illegal value. Hence the problem is called a dirty read problem.
5.3 The Incorrect Summary Problem: Consider two concurrent operations, again called TA and
TB. TB is calculating a summary (average, standard deviation or some such operation) by
accessing all elements of a database (Note that it is not updating any of them, only is reading
them and is using the resultant data to calculate some values). In the meanwhile TA is updating
these values. In case, since the Operations are interleaved, TA, for some of its operations will be
using the not updated data, whereas for the other operations will be using the updated data. This
is called the incorrect summary problem.

TA

TB
Sum = 0
Read tr(A)
Sum = Sum + A

Read tr(X)
X=XN
Write tr(X)
Read tr(X)
Sum = Sum + X
Read tr(Y)
Sum = Sum + Y
Read (Y)
Y=YN
Write tr(Y)
In the above example, both TA will be updating both X and Y. But since it first updates X
and then Y and the operations are so interleaved that the transaction TB uses both of them in
between the operations, it ends up using the old value of Y with the new value of X. In the
process, the sum we got does not refer either to the old set of values or to the new set of
values.

6 Locking techniques for concurrency control

Many of the important techniques for concurrency control make use of the concept of the lock. A
lock is a variable associated with a data item that describes the status of the item with respect to
the possible operations that can be done on it. Normally every data item is associated with a
unique lock. They are used as a method of synchronizing the access of database items by the
transactions that are operating concurrently. Such controls, when implemented properly can
overcome many of the problems of concurrent operations listed earlier. However, the locks
themselves may create a few problems, which we shall be seeing in some detail in subsequent
sections.

6.1 Types of locks and their uses:


6.1.1: Binary locks: A binary lock can have two states or values (1 or 0) one of them indicates
that it is locked and the other says it is unlocked. For example if we presume 1 indicates that the
lock is on and 0 indicates it is open, then if the lock of item(X) is 1 then the read_tr(x) cannot
access the time as long as the locks value continues to be 1. We can refer to such a state as lock
(x).
The concept works like this. The item x can be accessed only when it is free to be used by the
transactions. If, say, its current value is being modified, then X cannot be (in fact should not be)
accessed, till the modification is complete. The simple mechanism is to lock access to X as long
as the process of modification is on and unlock it for use by the other transactions only when the
modifications are complete.
So we need two operations lock item(X) which locks the item and unlock item(X) which opens
the lock. Any transaction that wants to makes use of the data item, first checks the lock status of
X by the lock item(X). If the item X is already locked, (lock status=1) the transaction will have
to wait. Once the status becomes = 0, the transaction accesses the item, and locks it (makes its
status=1). When the transaction has completed using the item, it issues an unlock item (X)
command, which again sets the status to 0, so that other transactions can access the item.

6.1.2 Shared and Exclusive locks


While the operation of the binary lock scheme appears satisfactory, it suffers from a
serious drawback. Once a transaction holds a lock (has issued a lock operation), no
other transaction can access the data item. But in large concurrent systems, this can

become a disadvantage. It is obvious that more than one transaction should not go on
writing into X or while one transaction is writing into it, no other transaction should be
reading it, no harm is done if several transactions are allowed to simultaneously read
the item. This would save the time of all these transactions, without in anyway affecting
the performance.
This concept gave rise to the idea of shared/exclusive locks. When only read operations are
being performed, the data item can be shared by several transactions, only when a transaction
wants to write into it that the lock should be exclusive. Hence the shared/exclusive lock is also
sometimes called multiple mode lock. A read lock is a shared lock (which can be used by
several transactions), whereas a write lock is an exclusive lock. So, we need to think of three
operations, a read lock, a write lock and unlock. The algorithms can be as follows:

Read lock (X):


Start: If Lock (X) = unlocked
Then {
Lock(X)

read locked,

No of reads(X)

}
else if Lock(X) = read locked
then no. of reads(X) = no of reads(X)0+1;
else { wait until Lock(X)

unlocked and the lock manager

wakes up the transaction) }


go to start
end.

Read Lock Operation:

Write lock(X)

Start: If lock(X) = unlocked


Then Lock(X)

unlocked.

Else {wait until Lock(X) = unlocked and


The lock manager wakes up the transaction}
Go to start
End;

The write lock operation:

Unlock(X)
If lock(X) = write locked
Then {Lock(X)

unlocked

Wakeup one of the waiting transaction, if any


}
else if Lock(X) = read locked
then { no of reads(X)

no of reads 1

if no of reads(X)=0
then { Lock(X) = unlocked
wake up one of the waiting transactions, if any
}
}

The Unlock Operation:


The algorithms are fairly straight forward, except that during the unlocking operation, if a
number of read locks are there, then all of them are to be unlocked before the unit itself becomes
unlocked.
To ensure smooth operation of the shared / exclusive locking system, the system must
enforce the following rules:
1. A transaction T must issue the operation read lock(X) or writelock(X) before any read
or write operations are performed.
2. A transaction T must issue the operation write lock(X) before any writetr(X)
operation is performed on it.

3. A transaction T must issue the operation unlock (X) after all readtr(X) are completed
in T.
4. A transaction T will not issue a read lock(X) operation if it already holds a readlock
or write lock on X.
5. A transaction T will not issue a write lock(X) operation if it already holds a readlock
or write lock on X.
6.1.3 Two phase locking:
A transaction is said to be following a two phase locking if the operation of the
transaction can be divided into two distinct phases. In the first phase, all items that are
needed by the transaction are acquired by locking them. In this phase, no item is
unlocked even if its operations are over. In the second phase, the items are unlocked
one after the other. The first phase can be thought of as a growing phase, wherein the
store of locks held by the transaction keeps growing. The second phase is called the
shrinking phase, the no. of locks held by the transaction keep shrinking.
readlock(Y)
readtr(Y)

Phase I

writelock(X)
----------------------------------unlock(Y)
readtr(X)

Phase II

X=X+Y
writetr(X)
unlock(X)
Example: A two phase locking

The two phase locking, though provides serializability has a disadvantage. Since the
locks are not released immediately after the use of the item is over, but is retained till all
the other needed locks are also acquired, the desired amount of interleaving may not be
derived worse, while a transaction T may be holding an item X, though it is not using
it, just to satisfy the two phase locking protocol, another transaction T1 may be
genuinely needing the item, but will be unable to get it till T releases it. This is the price

that is to be paid for the guaranteed serializability provided by the two phase locking
system.

6.2 Deadlock and Starvation:


A deadlock is a situation wherein each transaction T1 which is in a set of two or more
transactions is waiting for some item that is locked by some other transaction T1 in the set i.e.
taking the case of only two transactions T11 and T21 , T11 is waiting for an item X which is with
T21 , and is also holding another item Y. T11 will release Y when X becomes available from T21
and T11 can complete some operations. Meanwhile T21 is waiting for Y held by T11 and T21
will release X only Y, held by T11

is released and after T21 has performed same operations on

that. It can be easily seen that this is an infinite wait and the dead lock will never get resolved.
T11

T21

readlock(Y)
T11

T21

readtr(Y)
readlock(X)

The status graph

readtr(X)
writelock(X)
writelock(Y)

A partial schedule leading to Deadlock.


While in the case of only two transactions, it is rather easy to notice the possibility of
deadlock, though preventing it may be difficult. The case may become more complicated, when
more then two transactions are in a deadlock and even identifying a deadlock may be difficult.
6.2.1 Deadlock prevention protocols
The simplest way of preventing deadlock is to look at the problem in detail. Deadlock
occurs basically because a transaction has locked several items, but could not get one more item

and is not releasing other items held by it. The solution is to develop a protocol wherein a
transaction will first get all the items that it needs & then only locks them. I.e. if it cannot get any
one/more of the items, it does not hold the other items also, so that these items can be useful to
any other transaction that may be needing them. Their method, though prevents deadlocks,
further limits the prospects of concurrency.

A better way to deal with deadlocks is to identify the deadlock when it occurs and then
take some decision. The transaction involved in the deadlock may be blocked or aborted or the
transaction can preempt and abort the other transaction involved. In a typical case, the concept
of transaction time stamp TS (T) is used. Based on when the transaction was started, (given by
the time stamp, larger the value of TS, younger is the transaction), two methods of deadlock
recovery are devised.

1.

Wait-die method: suppose a transaction Ti tries to lock an item X, but is unable to do

so because X is locked by Tj with a conflicting lock. Then if TS(Ti)<TS(Tj), (Ti is older then
Tj) then Ti waits. Otherwise (if Ti is younger than Tj) then Ti is aborted and restarted later with
the same time stamp. The policy is that the older of the transactions will have already spent
sufficient efforts & hence should not be aborted.
2.

Wound-wait method: If TS(Ti) <TS(Tj), (Ti is older then Tj), abort and restart Tj

with the same time stamp later. On the other hand, if Ti is younger then Ti is allowed to wait.

It may be noted that in both cases, the younger transaction will get aborted. But the actual
method of aborting is different. Both these methods can be proved to be deadlock free, because
no cycles of waiting as seen earlier are possible with these arrangements.
There is another class of protocols that do not require any time stamps. They include the no
waiting algorithm and the cautious waiting algorithms. In the no-waiting algorithm, if a
transaction cannot get a lock, it gets aborted immediately (no-waiting). It is restarted again at a
later time. But since there is no guarantee that the new situation. is dead lock free, it may have to
aborted again. This may lead to a situation where a transaction may end up getting aborted
repeatedly.

To overcome this problem, the cautious waiting algorithm was proposed. Here, suppose the
transaction Ti tries to lock an item X, but cannot get X since X is already locked by another
transaction Tj. Then the solution is as follows: If Tj is not blocked (not waiting for same other
locked item) then Ti is blocked and allowed to wait. Otherwise Ti is aborted. This method not
only reduces repeated aborting, but can also be proved to be deadlock free, since out of Ti & Tj,
only one is blocked, after ensuring that the other is not blocked.

6.2.2

Deadlock detection & timeouts:

The second method of dealing with deadlocks is to detect deadlocks as and when they happen.
The basic problem with the earlier suggested protocols is that they assume that we know what is
happening in the system which transaction is waiting for which item and so on. But in a
typical case of concurrent operations, the situation is fairly complex and it may not be possible to
predict the behavior of transaction.
In such cases, the easier method is to take on deadlocks as and when they happen and try
to solve them. A simple way to detect a deadlock is to maintain a wait forgraph. One node in
the graph is created for each executing transaction. Whenever a transaction Ti is waiting to lock
an item X which is currently held by Tj, an edge (TiTj) is created in their graph. When Tj
releases X, this edge is dropped. It is easy to see that whenever there is a deadlock situation,
there will be loops formed in the wait-for graph, so that suitable corrective action can be taken.
Again, once a deadlock has been detected, the transaction to be aborted is to be chosen. This is
called the victim selection and generally newer transactions are selected for victimization.
Another easy method of dealing with deadlocks is the use of timeouts. Whenever a
transaction is made to wait for periods longer than a predefined period, the system assumes that a
deadlock has occurred and aborts the transaction. This method is simple & with low overheads,
but may end up removing the transaction, even when there is no deadlock.

6.3 Starvation:
The other side effect of locking in starvation, which happens when a transaction cannot proceed
for indefinitely long periods, though the other transactions in the system, are continuing
normally. This may happen if the waiting schemes for locked items is unfair. I.e. if some
transactions may never be able to get the items, since one or the other of the high priority

transactions may continuously be using them. Then the low priority transaction will be forced to
starve for want of resources.

The solution to starvation problems lies in choosing proper priority algorithms like first-comefirst serve. If this is not possible, then the priority of a transaction may be increased every time it
is made to wait / aborted, so that eventually it becomes a high priority transaction and gets the
required services.

6.4 Concurrency control based on Time Stamp ordering


6.4.1 The Concept of time stamps: A time stamp is a unique identifier created by the DBMS,
attached to each transaction which indicates a value that is measure of when the transaction came
into the system. Roughly, a time stamp can be thought of as the starting time of the transaction,
denoted by TS (T).
They are generated by a counter that is initially zero and is incremented each time its
value is assigned to the transaction. The counter is also given a maximum value and if the
reading goes beyond that value, the counter is reset to zero, indicating, most often, that the
transaction has lived its life span inside the system and needs to be taken out. A better way of
creating such time stamps is to make use of the system time/date facility or even the internal
clock of the system.

6.4.2 An algorithm for ordering the time stamp: The basic concept is to order the transactions
based on their time stamps. A schedule made of such transactions is then serializable. This
concept is called the time stamp ordering (To). The algorithm should ensure that whenever a
data item is accessed by conflicting operations in the schedule, the data is available to them in
the serializability order. To achieve this, the algorithm uses two time stamp values.
1. Read_Ts (X): This indicates the largest time stamp among the transactions that have
successfully read the item X. Note that the largest time stamp actually refers to the
youngest of the transactions in the set (that has read X).
2. Write_Ts(X): This indicates the largest time stamp among all the transactions that have
successfully written the item-X. Note that the largest time stamp actually refers to the
youngest transaction that has written X.

The above two values are often referred to as read time stamp and write time stamp of the
item X.

6.4.3 The concept of basic time stamp ordering: When ever a transaction tries to read or write
an item X, the algorithm compares the time stamp of T with the read time stamp or the write
stamp of the item X, as the case may be. This is done to ensure that T does not violate the order
of time stamps. The violation can come in the following ways.
1. Transaction T is trying to write X
a) If read TS(X) > Ts(T) or if write Ts (X) > Ts (T) then abort and roll back T and
reject the operation. In plain words, if a transaction younger than T has already
read or written X, the time stamp ordering is violated and hence T is to be aborted
and all the values written by T so far need to be rolled back, which may also
involve cascaded rolling back.
b) If read TS(X) < TS(T) or if write Ts(X) < Ts(T), then execute the write tr(X)
operation and set write TS(X) to TS(T). i.e. allow the operation and the write time
stamp of X to that of T, since T is the latest transaction to have accessed X.

2. Transaction T is trying to read X


a) If write TS (X) > TS(T) , then abort and roll back T and reject the operation. This
is because a younger transaction has written into X.
b) If write TS(X) < = TS(T), execute read tr(X) and set read Ts(X) to the larger of
the two values, namely TS(T) and current read_TS(X).
This algorithm ensures proper ordering and also avoids deadlocks by penalizing the older
transaction when it is trying to overhaul the operation done by an younger transaction. Of
course, the aborted transaction will be reintroduced later with a new time stamp. However, in
the absence of any other monitoring protocol, the algorithm may create starvation in the case of
some transactions.
6.4.4

Strict time Stamp Ordering:


This variation of the time stamp ordering algorithm ensures that the schedules are strict

(so that recoverability is enhanced) and serializable. In this case, any transaction T that tries to
read or write such that write TS(X) < TS(T) is made to wait until the transaction T that

originally wrote into X (hence whose time stamp matches with the writetime time stamp of X,
i.e. TS(T) = write TS(X)) is committed or aborted. This algorithm also does not cause any dead
lock, since T waits for T only if TS(T) > TS(T).

6.5 Multi version concurrency control techniques


The main reason why some of the transactions have to be aborted is that they try to
access data items that have been updated (by transactions that are younger than it). One way of
overcoming this problem is to maintain several versions of the data items, so that if a transaction
tries to access an updated data item, instead of aborting it, it may be allowed to work on the older
version of data. This concept is called the multiversion method of concurrency control.
Whenever a transaction writes a data item, the new value of the item is made available, as
also the older version. Normally the transactions are given access to the newer version, but in
case of conflicts the policy is to allow the older transaction to have access to the older
version of the item.
The obvious drawback of this technique is that more storage is required to maintain the
different versions. But in many cases, this may not be a major drawback, since most database
applications continue to retain the older versions anyway, for the purposes of recovery or for
historical purposes.
6.5.1

Multiversion Technique based on timestamp ordering


In this method, several version of the data item X, which we call X1, X2, .. Xk are
maintained. For each version Xi two timestamps are appended
i)

Read TS(Xi): the read timestamp of Xi indicates the largest of all time stamps of
transactions that have read Xi. (This, in plain language means the youngest of the
transactions which has read it).

ii)

Write TS(Xi) :

The write timestamp of Xi indicates the timestamp of the

transaction time stamp of the transaction that wrote Xi.

Whenever a transaction T writes into X, a new version XK+1 is created, with both write.
TS(XK+1) and read TS(Xk+1) being set to TS(T). Whenever a transaction T reads into X, the value
of read TS(Xi) is set to the larger of the two values namely read TS(Xi) and TS(T).
To ensure serializability, the following rules are adopted.

i) If T issues a write tr(X) operation and Xi has the highest write TS(Xi) which is less than or
equal to TS(T), and has a read TS(Xi) >TS(T), then abort and roll back T, else create a new
version of X, say Xk with read TS(Xk) = write TS(Xk) = TS(T)
In plain words, if the highest possible write timestamp among all versions is less than or
equal to that of T, and if it has been read by a transaction younger than T, then, we have no
option but to abort T and roll back all its effects otherwise a new version of X is created with
its read and write timestamps initiated to that of T.

ii)

If a transaction T issues a read tr(X) operation, find the version Xi with the highest write

TS(Xi) that is also less than or equal to TS(T) then return the value of Xi to T and set the value
of read TS(Xi) to the value that is larger amongst TS(T) and current read TS(Xi).
This only means, try to find the highest version of Xi that T is eligible to read, and return
its value of X to T. Since T has now read the value find out whether it is the youngest
transaction to read X by comparing its timestamp with the current read TS stamp of X. If X is
younger (if timestamp is higher), store it as the youngest timestamp to visit X, else retain the
earlier value.

6.5.2

Multiversion two phase locking certify locks:


Note that the motivation behind the two phase locking systems have been discussed
previously. In the standard locking mechanism, write lock is an exclusive lock i.e. only
one transaction can use a write locked data item. However, no harm is done, if the item
write locked by a transaction is read by one/more other transactions. On the other hand,
it enhances the interleavability of operation.

That is, more transactions can be

interleaved. This concept is extended to the multiversion locking system by using what
are known as multiple-mode locking schemes. In this, there are three locking modes
for the item : read, write and certify. I.e. a unit can be locked for read(X), write(x) or
certify(X), as also it can remain unlocked. To see how the scheme works, we first see
how the normal read, write system works by means of a lock compatibility table.
Lock compatibility Table
Read
Read

Yes

Write
No

Write

No

No

The explanation is as follows:


If there is an entry yes in a particular cell, if a transaction T holds the type of lock
specified in the column header and if another transaction T requests for the type of lock
specified in row header, the T can obtain the lock, because the lock modes are compatible. For
example, there is a yes in the first cell. Its column header is read. So if a transaction T holds
the read lock, and another transaction T requests for the read lock, it can be granted. On the
other hand, if T holds a write lock and another T requests for a readlock it will not be granted,
because the action now has shifted to the first row, second column element. In the modified
(multimode) locking system, the concept is extended by adding one more row and column to the
tables.
Read

Write

Certify

Read

Yes

Yes

No

Write

Yes

No

No

Certify

No

No

No

The multimode locking system works on the following lines. When one of the transactions has
obtained a write lock for a data item, the other transactions may still be provided with the read
locks for the item. To ensure this, two versions of the X are maintained. X(old) is a version
which has been written and committed by a previous transaction. When a transaction T wants a
write lock to be provided to it, a new version X(new) is created and handed over to T for writing.
While T continues to hold the lock for X(new) other transactions can continue to use X(old)
under read lock.
Once T is ready to commit it should get exclusive certify locks on all items it wants to
commit by writing. Note that write lock is no more an exclusive lock under our new scheme
of things, since while one transaction is holding a write lock on X, one/more other transactions
may be holding the read locks of the same X. To provide certify lock, the system waits till all
other read locks are cleared on the item. Note that this process has to repeat on all items that T
wants to commit.

Once all these items are under the certify lock of the transaction, it can commit to its
values. From now on, the X(new) become X(old) and X(new) values will be created only if
another T wants a write lock on X. This scheme avoids cascading rollbacks. But since a
transaction will have to get exclusive certify rights on all items, before it can commit, a delay in
the commit operation is inevitable. This may also leads to complexities like dead locks and
starvation.

Chapter: 6
TRANSACTION MANAGEMENT & CONCURRENCY CONYROL
TECHNIQUE
End Chapter quizzes
Q1. The sequence of operations on the database is called
a) Schedule
b) Database Recovery
c) Locking
d) View
Q2. Two operations in a schedule are said to be in conflict if they satisfy the conditions
a) The operations belong to different transactions
b) They access the same item x
c) At least one of the operations is a write operation.
d) All of the above.
Q3.

If, for every transaction, T in the schedule S, all operations of T is executed

consecutively then schedule S is called


a) Serial schedule
b) Non serial schedule
c) Time stamping
d) None of the above
Q4. Concurrency control is needed to manage
a) Transactions from large number of users
b) Maintain consistency of database
c) Both a and b
d) None of the above
Q5. A time stamp is a unique identifier created by the DBMS, attached to each
a) Data Item
b) Transaction
c) Schedule
d) All of the above

Q6. A read lock is also called as


a) Shared LOCK
b) Binary Lock
c) Write Lock
d) Dead Lock
Q7. Write lock is also called as
a) Two Phase Lock
b) Exclusive Lock
c) Binary Lock
d) None of the above
Q8. The ability to recover from failures of transaction is called
a) Recoverability
b) Back up
c) Database Detection
d) Both a and b
Q9 A lock can have ONLY two states or values (1 or 0) is known as
a) Binary Lock
b) 2 Phase Lock
c) Both a and b
d) Read Lock
Q10. The property the transaction that identifies that the transaction is either fully
completed, or is not begun at all
a) Consistency
b) Atomic
c) Durability
d) Isolation

Chapter: 7
DATABASE RECOVEY, BACKUP & SECURITY

1. Introductory Concept of Database Failures and Recovery


Database operations can not be protected to the system on which it operates (both the hardware
and the software, including the operating systems). The system should ensure that any
transaction submitted to it is terminated in one of the following ways.
a) All the operations listed in the transaction are completed, the changes are
recorded permanently back to the database and the database is indicated that
the operations are complete.
b) In case the transaction has failed to achieve its desired objective, the system
should ensure that no change, whatsoever, is reflected onto the database. Any
intermediate changes made to the database are restored to their original
values, before calling off the transaction and intimating the same to the
database.
In the second case, we say the system should be able to Recover from the failure.

1.1 Database failure


Database Failures can occur in a variety of ways.
i)

A System Crash: A hardware, software or network error can make the completion
of the transaction impossibility.

ii)

A transaction or system error: The transaction submitted may be faulty like


creating a situation of division by zero or creating a negative numbers which
cannot be handled (For example, in a reservation system, negative number of
seats conveys no meaning). In such cases, the system simply discontinuous the
transaction by reporting an error.

iii)

Some programs provide for the user to interrupt during execution. If the user
changes his mind during execution, (but before the transactions are complete) he
may opt out of the operation.

iv)

Local exceptions: Certain conditions during operation may force the system to
raise what are known as exceptions. For example, a bank account holder may
not have sufficient balance for some transaction to be done or special instructions
might have been given in a bank transaction that prevents further continuation of
the process. In all such cases, the transactions are terminated.

v)

Concurrency control enforcement: In certain cases when concurrency constrains


are violated, the enforcement regime simply aborts the process to restart later.

The other reasons can be physical problems like theft, fire etc or system problems like
disk failure, viruses etc. In all such cases of failure, a recovery mechanism is to be in
place.
1.2 Database Recovery
Recovery most often means bringing the database back to the most recent consistent state, in the
case of transaction failures. This obviously demands that status information about the previous
consistent states are made available in the form a log (which has been discussed in one of the
previous sections in some detail).
A typical algorithm for recovery should proceed on the following lines.
1. If the database has been physically damaged or there are catastrophic crashes like disk
crash etc, the database has to be recovered from the archives.

In many cases, a

reconstruction process is to be adopted using various other sources of information.


2. In situations where the database is not damaged but has lost consistency because of
transaction failures etc, the method is to retrace the steps from the state of the crash
(which has created inconsistency) until the previously encountered state of consistency is
reached. The method normally involves undoing certain operation, restoring previous
values using the log etc.
In general two broad categories of these retracing operations can be identified. As we
have seen previously, most often, the transactions do not update the database as and when
they complete the operation. So, if a transaction fails or the system crashes before the
commit operation, those values need not be retraced. So no undo operation is needed.
However, if one is still interested in getting the results out of the transactions, then a
Redo operation will have to be taken up. Hence, this type of retracing is often called

the no-undo /Redo algorithm. The whole concept works only when the system is
working on a deferred update mode.
However, this may not be the case always. In certain situations, where the system is
working on the immediate update mode, the transactions keep updating the database
without bothering about the commit operation. In such cases however, the updating will
be normally onto the disk also. Hence, if a system fails when the immediate updating are
being made, then it becomes necessary to undo the operations using the disk entries. This
will help us to reach the previous consistent state. From there onwards, the transactions
will have to be redone. Hence, this method of recovery is often termed as the Undo/Redo
algorithm.
2. Role of check points in recovery:
A Check point, as the name suggests, indicates that everything is fine up to the point.
In a log, when a check point is encountered, it indicates that all values up to that have been
written back to the DBMS on the disk. Any further crash / system failure will have to take care
of the data appearing beyond this point only. Put the other way, all transactions that have their
commit entries in the log before this point need no rolling back.

The recovery manager of the DBMS will decide at what intervals, check points need to be
inserted (in turn, at what intervals data is to be written back to the disk). It can be either after
specific periods of time (say M minutes) or specific number of transaction (t transactions) etc.,
When the protocol decides to check point it does the following:-

a) Suspend all transaction executions temporarily.


b) Force write all memory buffers to the disk.
c) Insert a check point in the log and force write the log to the disk.
d) Resume the execution of transactions.

The force writing need not only refer to the modified data items, but can include the various lists
and other auxiliary information indicated previously.
However, the force writing of all the data pages may take some time and it would be wasteful to
halt all transactions until then. A better way is to make use of the Fuzzy check pointing where

in the check point is inserted and while the buffers are being written back (beginning from the
previous check point) the transactions are allowed to restart. This way the i/o time is saved.
Until all data up to the new check point is written back, the previous check point is held valid for
recovery purposes.
3 Write ahead logging:
When updating is being used, it is necessary to maintain a log for recovery purposes. Normally
before the updated value is written on to the disk, the earlier value (called Before Image Value
(BFIM)) is to noted down elsewhere in the disk for recovery purposes. This process of recording
entries is called the write ahead logging (write ahead of logging). It is to be noted that the
type of logging also depends on the type of recovery. If no undo / Redo type of recovery is
being used, then only those values which could not be written back before the crash, need to be
logged. But in a undo / Redo types, the values before the image was created as well as those that
were computed, but could not be written back need to be logged.
Two other update mechanisms need brief mention. The cache pages, updated by the transaction,
cannot be written back to the disk, by the DBMS manager, until and unless the transaction
commits. If the system strictly follows this approach, then it is called a no steal approach.
However, in some cases, the protocol allows the writing of the updated buffer back to the disk,
even before the transaction commits.

This may be done, for example, when some other

transaction is in need of the results. This is called the steal approach.

Secondly, if all pages are updated once the transaction commits, then it is a force approach,
otherwise it is called a no force approach.
Most protocols make use of steal / no force strategies, so that there is no urgency of writing back
to the buffer once the transaction commits.

However, just the before image (BIM) and After image (AIM) values may not be sufficient for
successful recovery. A number of lists, including the list of active transaction (those that have
started operating, but have not committed yet), committed transactions as also aborted
transactions need to be maintained, to avoid a brute force method of recovery.

4. Recovery techniques based on Deferred Update:


This is a very simple method of recovery. Theoretically, no transaction can write back
into the database, until it has committed. Till then, it can only write into a buffer. Thus in case
of any crash, the buffer needs to be reconstructed, but the DBMS need not be recovered.

However, in practice, most transactions are very long and it is dangerous us to hold all their
updates in the buffer, since the buffers can run out of space and may need a page replacement.
To avoid such situations, where in a page is removed inadvertently, a simple two pronged
protocol is used.

1. A transaction cannot change the DBMS values on the disk until it commits.
2. A transaction does not reach commit stage until all its update values are written on to the
log and log itself in force written on to the disk.

Notice that in case of failures, recovery is by the No UNDO/REDO techniques, since all data
will be in the log if a transaction fails after committing.

4.1 An algorithm for recovery using the deferred update in a single user environment.
In a single user entrainment, the algorithm is a straight application of the REDO
procedure i.e. it uses two lists of transactions: The committed transactions since the last check
point and the currently active transactions when the crash occurs, apply the REDO to all write tr
operations of the committed transactions from the log. And let the active transactions run again.

The assumption is that the REDO operations are idem potent. I.e. the operations produce the
same results irrespective of the number of times they are redone provided, they start from the
same initial state. This is essential to ensure that the recovery operation does not produce a result
that is different from the case where no crash was there to begin with.

(Through this may look like a trivial constraint, students may verify themselves that not all
DBMS applications satisfy this condition).

Also since there was only one transaction active (because it was a single user system)
and it had not updated the buffer yet, all that remains to be done is to restart this
transaction.

4.2 Deferred update with Concurrent execution:


Most of the DBMS applications, we have insisted repeatedly, are multi-user in nature and
the best way to run them is by concurrent execution. Hence, protocols for recovery from a crash
in such cases are of prime importance.

To simplify the matters, we presume that we are in talking of strict and serializable
schedules. I.e. there is strict two phase locking and they remain effective till the
transactions commit themselves. In such a scenario, an algorithm for recovery could
be as follows:-

Use two lists: The list of committed transactions T since the last check point and the list of active
checkpoints T1 REDO all the write operations of committed transactions in the order in which
they were written into the log. The active transactions are simply cancelled and resubmitted.

Note that once we put the strict serializability conditions, the recovery process does not
vary too much from the single user system.

Note that in the actual process, a given item x may be updated a number of times, either
by the same transaction or by different transactions at different times. What is important to the
user is its final value. However, the above algorithm simply updates the value whenever its
value was updated in the log. This can be made more efficient by the following manner. Instead
of starting from the check point and proceeding towards the time of the crash, traverse the log
from the time of the crash backwards. Whenever a value is updated, for the first time, update it
and maintain the information that its value has been updated. Any further updating of the same
can be ignored.

This method though guarantees correct recovery has some drawbacks. Since the items
remain locked with the transactions until the transaction commits, the concurrent execution
efficiency comes down. Also lot of buffer space is wasted to hold the values, till the transactions
commit. The number of such values can be large, when the long transactions are working in
concurrent mode, they delay the commit operation of one another.

5 Recovery techniques on immediate update


In these techniques, whenever a writetr(X) is given, the data is written on to the
database, without bothering about the commit operation of the transaction. However,
as a rule, the update operate is accompanied by writing on to the log(on the disk), using
a write ahead logging protocol.
This helps in undoing the update operations whenever a transaction fails. This rolling
back can be done by using the data on the log. Further, if the transaction is made to commit only
after writing on to the log, there is no need for a redo of these operations after the transaction has
failed, because the values are available in the log. This concept is called the UNDO/NO-REDO
recovery algorithm. On the other hand, if some transaction commits before writing all its
values, then a general UNDO/REDO type of recovery algorithm is necessary.

5.1 A typical UNDO/REDO algorithm for a immediate update single user environment

Here, at the time of failure, the changes envisaged by the transaction may have
already been recorded in the database. These must be undone. A typical procedure
for recovery should follow the following lines:

a) The system maintains two lists: The list of committed transactions since the last
checkpoint and the list of active transactions (only one active transaction, infact,
because it is a single user system).
b) In case of failure, undo all the write_tr operations of the active transaction, by using
the information on the log, using the UNDO procedure.

c) For undoing a write_tr(X) operation, examine the corresponding log entry


writetr(T,X,oldvalue, newvalue) and set the value of X to oldvalue. The sequence of
undoing must be in the reverse order, in which operations were written on to the log.
d) REDO the writetr operations of the committed transaction from the log in the order in
which they were written in the log, using the REDO procedure.
5.2 The UNDO/REDO recovery based on immediate update with concurrent
execution:
In the concurrent execution scenario, the process becomes slightly complex. In the
following algorithm, we presume that the log includes checkpoints and the concurrency protocol
uses strict schedules. I.e. the schedule does not allow a transaction to read or write an item until
the transaction that wrote the item previously has committed. Hence, the danger of transaction
failures is minimal. However, deadlocks can force abort and UNDO operations. The simplistic
procedure is as follows:
a) Use two lists maintained by the system: The committed transactions list(since the last
check point) and the list of active transactions.
b) Undo all writetr(X) operations of the active transactions which have not yet
committed, using the UNDO procedure.

The undoing operation must be in the

reverse order of writing process in the log.


c) Redo all writetr(X) operations of the committed transactions from the log in the order
in which they were written into the log.
Normally, the process of redoing the writetr(X) operations begins at the end of the log and
proceeds in the reverse order, so that when a X is written into more than once in the log, only
the latest entry is recorded, as discussed in a previous section.

6. Shadow paging
It is not always necessary that the original database is updated by overwriting the
previous values. As discussed in an earlier section, we can make multiple versions of
the data items, whenever a new update is made. The concept of shadow paging
illustrates this:

Current Directory
1
2
3
4
5
6
7

Pages

Shadow Directory

Page 2
Page 5
Page 7
Page 7(new)
Page5 (New)
Page 2 (new)

1
2
3
4
5
6
7
8

In a typical case, the database is divided into pages and only those pages that need
updation are brought to the main memory(or cache, as the case may be). A shadow directory
holds pointers to these pages. Whenever an update is done, a new block of the page is created
(indicated by the suffice(new) in the figure) and the updated values are included there. Note that
(i) the new pages are created in the order of updatings and not in the serial order of the pages. A
current directory holds pointers to these new pages. For all practical purposes, these are the
valid pages and they are written back to the database at regular intervals.

Now, if any roll back is to be done, the only operation to be done is to discard the current
directory and treat the shadow directory as the valid directory.

One difficulty is that the new, updated pages are kept at unrelated spaces and hence the
concept of a continuous database is lost. More importantly, what happens when the new
pages are discarded as a part of UNDO strategy? These blocks form garbage in the system.
(The same thing happens when a transaction commits the new pages become valid pages, while
the old pages become garbage). A mechanism to systematically identify all these pages and
reclaim them becomes essential.

7 Database security and authorization


It is common knowledge that the databases should be held secure, against damages,
unauthorized accesses and updatings. A DBMS typically includes a database security
and authorization subsystem that is responsible for the security of the database against
unauthorized accesses and attacks. Traditionally, two types of security mechanisms
are in use.

i)

Discretionary security mechanisms: Here each user (or a group of


users) is granted privileges and authorities to access certain records,
pages or files and denied access to others. The discretion normally lies
with the database administer (DBA)

ii)

Mandatory

security

mechanisms:

These

are

standard

security

mechanisms that are used to enforce multilevel security by classifying the


data into different levels and allowing the users (or a group of users)
access to certain levels only based on the security policies of the
organization. Here the rules apply uniformly across the board and the
discretionary powers are limited.
While all these discussions assume that a user is allowed access to the system,
but not to all parts of the database, at another level, effects should be made to prevent
unauthorized access of the system by outsiders. This comes under the purview of the
security systems.

Another type of security enforced in the statistical database security often


large databases are used to provide statistical informations about various aspects like,
say income levels, qualifications, health conditions etc. These are derived by collecting
a large number of individual data. A person who is doing the statistical analysis may be
allowed access to the statistical data which is an aggregated data, but he should not
be allowed access to individual data. I.e. he may know, for example, the average
income level of a region, but cannot verify the income level of a particular individual.
This problem is more often encountered in government and quasi-government
organizations and is studied under the concept of statistical database security.

It may be noted that in all these cases, the role of the DBA becomes critical. He
normally logs into the system under a DBA account or a superuser account, which
provides full capabilities to manage the Database, ordinarily not available to the other
uses. Under the superuser account, he can manage the following aspects regarding
security.

i)

Account creation: He can create new accounts and passwords to users


or user groups.

ii)

Privilege granting: He can pass on privileges like ability to access


certain files or certain records to the users.

iii)

Privilege revocation: The DBA can revoke certain or all privileges


granted to one/several users.

iv)

Security level assignment: The security level of a particular user account


can be assigned, so that based on the policies, the users become
eligible /not eligible for accessing certain levels of information.

Another aspect of having individual accounts is the concept of database audit.


It is similar to the system log that has been created and used for recovery purposes. If
we can include in the log entries details regarding the users name and account number
who has created/used the transactions which are writing the log details, one can have
record of the accesses and other usage made by the user. This concept becomes
useful in followup actions, including legal examinations, especially in sensitive and high
security installations.

Another concept is the creation of views. While the database record may have
large number of fields, a particular user may be authorized to have information only
about certain fields. In such cases, whenever he requests for the data item, a view is
created for him of the data item, which includes only those fields which he is authorized
to have access to. He may not even know that there are many other fields in the
records.

The concept of views becomes very important when large databases, which
cater to the needs of various types of users are being maintained. Every user can have
and operate upon his view of the database, without being bogged down by the details.
It also makes the security maintenance operations convenient.

Chapter: 7
DATABASE RECOVEY, BACKUP & SECURITY
End Chapter quizzes

Q1. Database Failures can occur due to;


a) Transaction Failure
b) System crash
c) Both a and b
d) Data backup
Q2. The granting of a right or privilege that enables a subject to have legitimate access to a
system or a systems objects.
a) Authentication
b) Authorization
c) Data Unlocking
d) Data Encryption
Q3. The process of periodically taking a copy of the database and log file on to offline
storage media
a) Back up
b) Data Recovery
c) Data Mining
d) Data Locking
Q4. The encoding of the data by a special algorithm that renders the data unreadable
a)
b)
c)
d)

Data hiding
Encryption
Data Mining
Both a and c

Q5.. Access right to a database is controlled by


(a) top management
(b) system designer
(c) system analyst
(d) database administrator
Q6. Firewall - is a system that prevents unauthorized access to or from
a) Locks
b) Private network
c) Email
d) Data Recovery

Q7. 5. Digital Certificate is an attachment to an electronic message used for


a) security purposes
b) Recovery purpose
c) Database Locking
d) Both a and c
Q8. Rollback and Commit affect
(a) Only DML statements
(b) Only DDL statements
(c) Both (a) and (b)
(d) All statements executed in SQL*PLUS
Q9 Large databases are used to provide statistical information is known as:
a)
b)
c)
d)

Geographical Database
Statistical Database
Web Database
Time Database