Database Management Systems

Database Management Systems Riyaz Mohammed
JNTU HYDERABAD
DATABASE MANAGEMENT SYSTEMS
B.TECH/CSE & IT/R18
SYLLABUS
UNIT – I
Database System Applications: A Historical Perspective, File Systems versus

a DBMS, the Data Model, Levels of Abstraction in a DBMS, Data
Independence, Structure of a DBMS.
Introduction to Database Design: Database Design and ER Diagrams,

Entities, Attributes, and Entity Sets, Relationships and Relationship Sets,
Additional Features of the ER Model, Conceptual Design with the ER Model.
UNIT – II
Introduction to the Relational Model: Integrity constraint over relations,

enforcing integrity constraints, querying relational data, logical data base
design, introduction to views, destroying/altering tables and views. Relational
Algebra, Tuple relational Calculus, Domain relational calculus.
UNIT – III
SQL: QUERIES, CONSTRAINTS, TRIGGERS: form of basic SQL query,

UNION, INTERSECT, and EXCEPT, Nested Queries, aggregation operators,
NULL values, complex integrity constraints in SQL, triggers and active data
bases.
Schema Refinement: Problems caused by redundancy, decompositions,

problems related to decomposition, reasoning about functional dependencies,
FIRST, SECOND, THIRD normal forms, BCNF, lossless join decomposition,
multi-valued dependencies, FOURTH normal form, FIFTH normal form.
UNIT – IV
Transaction Concept, Transaction State, Implementation of Atomicity and

Durability, Concurrent Executions, Serializability, Recoverability,
Page 1 of 75
Implementation of Isolation, Testing for serializability, Lock Based Protocols,

Timestamp Based Protocols, Validation- Based Protocols, Multiple
Granularity, Recovery and Atomicity, Log–Based Recovery, Recovery with
Concurrent Transactions.
UNIT – V
Data on External Storage, File Organization and Indexing, Cluster Indexes,

Primary and Secondary Indexes, Index data Structures, Hash Based Indexing,
Tree base Indexing, Comparison of File Organizations, Indexes and
Performance Tuning, Intuitions for tree Indexes, Indexed Sequential Access
Methods (ISAM), B+ Trees: A Dynamic Index Structure.
*****
Page 2 of 75
UNIT – I
INTRODUCTION
Data:
 Facts that can be recorded.

 Raw facts.
 Unprocessed information.
 Unorganized data.
The facts that can be recorded and which have implicit meaning known as
'data'.
Data Base:
 It is a collection of interrelated data. These can be stored in the form of

tables.
 A database can be of any size and varying complexity.
 Example: Customer database consists the fields as cname, cno, and
ccity
Database Management System (DBMS):
It is a collection of programs that enables user to create and maintain a

database. In other words it is general-purpose software that provides the
users with the processes of defining, constructing and manipulating the
database for various applications.
Advantages:
1) Redundency can be reduced.

2) Inconsistency can be avoided.
3) Data Independence.
4) Reduced Application Development Time.
5) Security restrictions can be applied.
6) Integrity can be maintained.
Page 3 of 75
7) Concurrent access and Crash recovery.
Disadvantages:
1) Data redundancy and inconsistency.

2) Difficult in accessing data.
3) Data isolation.
4) Data integrity.
5) Concurrent access is not possible.
6) Security Problems.
Database Applications:
1) Banking: all transactions.

2) Airlines: reservations, schedules.
3) Universities: registration, grades.
4) Sales: customers, products, purchases.
5) Online retailers: order tracking, customized recommendations.
6) Manufacturing: production, inventory, orders, supply chain.
7) Human resources: employee records, salaries, tax deductions.
A HISTORICAL PERSPECTIVE
From the earliest days of computers, storing and manipulating data have been
a major application focus. The first general-purpose DBMS, designed by
Charles Bachman at General Electric in the early 1960s, was called the
Integrated Data Store. It formed the basis for the network data model, which
was standardized by the Conference on Data Systems Languages (CODASYL)
and strongly influenced database systems through the 1960s. Bachman was
the first recipient of ACM's Turing Award (the computer science equivalent of
a Nobel Prize) for work in the database area; he received the award in 1973.
In the late 1960s, IBM developed the Information Management System (IMS)
DBMS, used even today in many major installations. IMS formed the basis for
an alternative data representation framework called the hierarchical data
model.
In 1970, Edgar Codd, at IBM's San Jose Research Laboratory, proposed a new
data representation framework called the relational data model. This proved
Page 4 of 75
to be a watershed in the development of database systems. Codd won the

1981 Turing Award for his seminal work.
In the 1980s, the relational model consolidated its position as the dominant
DBMS paradigm, and database systems continued to gain widespread use.
The SQL query language for relational databases, developed as part of IBM's
System R project, is now the standard query language. SQL was standardized
in the late 1980s, and the current standard.
FILE SYSTEMS VERSUS A DBMS
Page 5 of 75
THE DATA MODEL
 A data model is a collection of high-level data description that hide

many low-level storage details.
 Data models are fundamental entities to introduce abstraction in a
DBMS.
 A data model defines how data is connected to each other and how they
are processed and stored in a system.
 Some of the most common data models are:
1) Relational model.
2) Entity-relationship model.
3) Hierarchical database model.
4) Network model.
5) Object-oriented database model.
6) Document model etc.,
 Most database management systems today are based on the relational
data model.
 Relational database are typically written in structured query language
(SQL).
Relational model:
 The central data description construct in this model is a relation, which

can be thought of as a set of records.
 A description of data in terms of a data model is called a schema. In the
relational model, the schema for a relation specifies its name, the name
of each field (or attribute or column), and the type of each field.
 As an example, student information in a university database may be
stored in a relation with the following schema:
Students (sid: string, name: string, login: string, age: integer, gpa: real)
Page 6 of 75
LEVELS OF ABSTRACTION IN A DBMS
The data in a DBMS is described at three levels of abstraction. The database

description consists of a schema at each of these three levels of abstraction:
1) Physical/Internal schema.
2) Conceptual schema.
3) External schema.
Page 7 of 75
1) Physical schema/Internal schema:
 Physical schema summarizes how the data is actually stored on the

secondary storage devices like disks, tapes etc., This is a lowest level.
 The process of developing a good physical schema is called physical
data base design.
 Indexes were used in internal level to speed up data retrieval operations.
2) Conceptual schema:
 Conceptual schema describes the stored data in terms of data model of

the DBMS.
 In relational DBMS the conceptual schema describes all relations that
are stored in the database.
 Conceptual schema acts as a connection between external schema and
physical schema.
3) External schema:
 External schema allows data access to individual user or a group of

users.
 Any given database will have exactly one conceptual schema and one
physical schema, but external schema will have several (many external
schemas).
 The external schema design is guided by end user requirement.
DATA INDEPENDENCE
A very important advantage of using a DBMS is that it offers data

independence. Data Independence is defined as a property of DBMS that
helps you to change the Database schema at one level of a database system
without requiring to change the schema at the next higher level.
1) Physical data independence.

2) Logical data independence.
That is, application programs are insulated from changes in the way the data
is structured and stored. Data independence is achieved through use of the
Page 8 of 75
three levels of data abstraction; in particular, the conceptual schema and the
external schema provide distinct benefits in this area.
For example, suppose that the faculty relation in our university database is
replaced by the following two relations:
The conceptual schema insulates users from changes in physical storage

details. This property is referred to as physical data independence. The
conceptual schema hides details such as how the data is actually laid out on
disk, the file structure, and the choice of indexes.
STRUCTURE OF A DBMS
The DBMS accepts SQL commands generated from a variety of user

interfaces, produces query evaluation plans, executes these plans against the
database, and returns the answers. (This is a simplification: SQL commands
can be embedded in host-language application programs, e.g., Java or COBOL
programs.
When a user issues a query, the parsed query is presented to a query

optimizer, which uses information about how the data is stored to produce an
efficient execution plan for evaluating the query.
An execution plan is a blueprint for evaluating a query, usually represented

as a tree of relational operations. Relational operators serve as the building
blocks for evaluating queries posed against the data.
The code that implements relational operators sits on top of the file and access
methods layer. This layer supports the concept of a file, which, in a DBMS, is
a collection of pages or a collection of records. Heap files, or files of unordered
pages, as well as indexes are supported. In addition to keeping track of the
pages in a file, this layer organizes the information within a page.
Page 9 of 75
Fig: Structure of a DBMS
The files and access methods layer code sits on top of the buffer manager,
which brings pages in from disk to main memory ct." needed in response to
read requests.
The DBMS supports concurrency and crash recovery by carefully scheduling

user requests and maintaining a log of all changes to the database. DBMS
commands associated with concurrency control and recovery include the
transaction manager, which ensures that transactions request and release
locks according to a suitable locking protocol and schedules the execution
transactions; the lock manager, which keeps track of requests for locks and
grants locks on database objects when they become available; and the
recovery manager, which is responsible for maintaining a log and restoring
the system to a consistent state after a crash.
The disk space manager, buffer manager, and file and access method layers
must interact with components.
Page 10 of 75
INTRODUCTION TO DATABASE DESIGN
DATABASE DESIGN AND ER DIAGRAMS
The database design process can be divided into six steps. The ER model is
most relevant to the first three steps.
1) Requirements analysis.
2) Conceptual Database Design.
3) Logical Database Design.
4) Schema Refinement.
5) Physical Database Design.
6) Application and security Design.
1) Requirements Analysis: The very first step in designing a database

application is to understand what data is to be stored in the database, what
applications must be built on top of it, and what operations are most frequent
and subject to performance requirements. In other words, we must find out
what the users want from the database. This is usually an informal process
that involves discussions with user groups, a study of the current operating
environment and how it is expected to change, analysis of any available
documentation on existing applications that are expected to be replaced or
complemented by the database, and so on.
2) Conceptual Database Design: The information gathered in the

requirements analysis step is used to develop a high-level description of the
data to be stored in the database, along with the constraints known to hold
over this data. This step is often carried out using the ER model and is
discussed in the rest of this chapter. The ER model is one of several high-
level, or semantic, data models used in database design. The goal is to create
a simple description of the data that closely matches how users and
developers think of the data (and the people and processes to be represented
in the data). This facilitates discussion among all the people involved in the
design process, even those who have no technical background. At the same
time, the initial design must be sufficiently precise to enable a straightforward
Page 11 of 75
translation into a data model supported by a commercial database system

(which, in practice, means the relational model).
3) Logical Database Design: We must choose a DBMS to implement our

database design, and convert the conceptual database design into a database
schema in the data model of the chosen DBMS. We will consider only
relational DBMSs, and therefore, the task in the logical design step is to
convert an ER schema into a relational database schema.
4) Schema Refinement: Analyze the collection of relations in our relational

database schema to identify potential problems and to refine it.
5) Physical Database Design: In this step, we consider typical expected

workloads that our database must support and further refine the database
design to ensure that it meets desired performance criteria. This step may
simply involve building indexes on some tables and clustering some tables, or
it may involve a substantial redesign of parts of the database schema obtained
from the earlier design steps.
6) Application and Security Design: Any software project that involves a

DBMS must consider aspects of the application that go beyond the database
itself. We must describe the role of each entity in every process that is reflected
in some application task, as part of a complete workflow for that task. For
each role, we must identify the parts of the database that must be accessible
and the parts of the database that must not be accessible, and we must take
steps to ensure that these access rules are enforced.
E-R diagrams:
Entity: An entity is an object in the real world that is distinguishable from

other objects.
Attribute: Properties of an entity. An entity is described using a set of

attributes.
Relationship: Association among several entities.
We collect a set of similar relationships in a relationship set.
Page 12 of 75
Page 13 of 75
ER-diagram:
RELATIONSHIPS AND RELATIONSHIP SETS
 A relationship is an association among two or more entities.

 Relationship is represented using a diamond symbol.
 A relationship can also have a descriptive attribute.
 We collect a set of similar relationships in a relationship set.
Fig: works in relationship set
Mapping cardinality constraints:
Mapping cardinality means number of entities to which another entity can be

associated via a relationship.
1) One to one (1:1) relationship.

2) One to many (1:M) relationship.
3) Many to one (M:1) relationship.
4) Many to many (M:N) relationship.
Page 14 of 75
ER-Diagram for college:
Page 15 of 75
ADDITIONAL FEATURES OF THE ER MODEL
1) Key constraints.
2) Participating constraints.
3) Weak entity.
4) Class hierarchies.
5) Aggregation.
1) Key constraints:
 Constraint means condition.

 Every relation has some condition.
 In a relation with a key attribute no 2 tuples cannot have same values.
 Key attribute cannot have null value.
 Example: An employee can work in several departments, and a
department can have several employees but the department will have
only one manager.
2) Participation Constraints:
The key constraint on Manages tells us that a department has at most one
manager. Let us say that every department is required to have a manager.
This requirement is an example of a participation constraint; the participation
of the entity set Departments in the relationship set Manages is said to be
total. A participation that is not total is said to be partial. Participation of the
entity set Employees in Manages is partial, since not every employee gets to
manage a department.
Page 16 of 75
3) Class Hierarchies:
Sometimes it is natural to classify the entities in an entity set into subclasses.

For example, we might want to talk about an Hourly-Employee entity set and
a Contract-Employee entity set to distinguish the basis on which they are
paid. We might have attributes hours worked and hourly wage defined for
Hourly_ Employee and an attribute contract id defined for Contract Employee.
Class hierarchies can be viewed in 2 ways:
i. Specialization: specialization is the process of identifying subsets of

an entityset (superclass). The super class is defined first, the
subclasses are defined next.
ii. Generalization: Generalization means identifying some common
characteristics of a collection of entity sets and creating a new entity
set that contains entities possessing these common characteristics.
The sub class is defined first and the super class is defined next.
4) Aggregation:
 We express aggregation when we need to express relationship among

relationships.
Page 17 of 75
 Aggregation is indicated using dotted box.

 Example: consider a department that sponsors a project, department
will assign a employee to monitor the sponsorship.
*****
Page 18 of 75
UNIT – II
INTRODUCTION TO THE RELATIONAL MODEL
Relational algebra is one of the two formal query languages associated with
the relational model. Queries in algebra are composed using a collection of
operators. A fundamental property is that every operator in the algebra
accepts (one or two) relation instances as arguments and returns a relation
instance as the result. This property makes it easy to compose operators to
form a complex query —a relational algebra expression is recursively defined
to be a relation, a unary algebra operator applied to a single expression, or a
binary algebra operator applied to two expressions. We describe the basic
operators of the algebra (selection, projection, union, cross product, and
difference).
Relational algebra is a procedural query language. It gives a step by step

process to obtain the result of query. It uses operators to perform queries.
Types of Relational operation:
INTEGRITY CONSTRAINT (IC) OVER RELATIONS
 An integrity constraint (IC) is a condition that is specified on a database

schema and restricts the data can be stored in an instance of the
database.
 Various restrictions on data that can be specified on a relational
database schema in the form of ‘constraints’.
 A DBMS enforces integrity constraints, in that it permits only legal
instances to be stored in the database.
Page 19 of 75
 Integrity constraints are specified and enforced at different times as

below:
1) When the DBA or end user defines a database schema, he or she
specifies the ICs that must hold on any instance of this database.
2) When a data base application is run, the DBMS checks for
violations and disallows changes to the data that violate the
specified ICs.
The constraints can be classified into 4 types as below:
1) Domain Constraints.
2) Key Constraints.
3) Entity Integrity Constraints.
4) Referential Integrity Constraints.
1) Domain Constraints:
 Domain constraints are the most elementary form of integrity

constraints. They are tested easily by the system whenever a new data
item is entered into the database.
 Domain constraints specify the set of possible values that may be
associated with an attribute. Such constraints may also prohibit the
use of null values for particular attributes.
 The data types associated with domains typically include standard
numeric data types for integers.
 A relation schema specifies the domain of each field or column in the
relation instance.
 These domain constraints in the schema specify an important condition
that each instance of the relation to satisfy: The values that appear in
a column must be drawn from the domain associated with that column.
Thus the domain of a field is essentially the type of that field.
2) Key Constraints:
 A key constraint is a statement that a certain minimal subset of the

fields of a relation is a unique identifier for a tuple.
Page 20 of 75
 Example: The ‘students’ relation and the constraint that no 2 students

have tha same student id (sid).
 These can be classified into 3 types as below:
i. Candidate Key or Key.
ii. Super Key.
iii. Primary Key.
i. Candidate Key or Key:
 A set of fields that uniquely identifies a tuple according to a key

constraint is called as a ‘Candidate Key’ for the relation.
 This is also called as a ‘key’.
 From the definition of candidate key, we have, Two distinct tuples in a
legal instance cannot have identical values in all the fields of a key. i.e,
in any legal instance, the values in the key fields uniquely identify a
tuple in the instance. values in the key fields uniquely identify a tuple
in the instance.
 No subset of the set of fields in key is a unique identifier for a tuple, i.e.,
the set of fields {sid, name} is not a key for Students.
ii. Super Key:
 The set of fields that contains a key is called as a ‘super key’.

 The set of 1 or more attributes that allows us to identify uniquely an
entity in the entity set.
 A super key specifies a uniqueness constraint that no 2 distinct tuples
can have the same value.
 Every relation has at least 1 default super key as the set of all attributes.
Page 21 of 75
iii. Primary Key:
 This is also a candidate key, whose values are used to identify tuples in
the relation.
 It is common to designate one of the candidate keys as a primary key
of the relation.
 The attributes that form the primary key of a relation schema are
underlined.
 It is used to denote a candidate key that is chosen by the database
designer as the principal means of identifying entities with an entity set.
 Example: ‘Sid’ of Students relation.
3) Entity Integrity Constraints:
 This states that no primary key value can be null.

 The primary key value is used to identify individual tuples in a relation.
 Having null values for the primary key implies that we cannot identify
some tuples.
 NOTE: Key Constraints, Entity Integrity Constraints are specified on
individual relations. PRIMARY KEYS comes under this.
4) Referential Integrity Constraints:
 The Referential Integrity Constraint is specified between 2 relations and

is used to maintain the consistency among tuples of the 2 relations.
 Informally, the referential integrity constraint states that ‘a tuple in 1
relation that refers to another relation must refer to an existing tuple in
that relation.
 We can diagrammatically display the referential integrity constraints by
drawing a directed arc from each foreign key to the relation it
references. The arrowhead may point to the primary key of the
referenced relation.
RELATIONAL ALGEBRA
 Relational algebra is one of the two formal query languages associated

with the relational model.
Page 22 of 75
 Mathematical concepts were implemented in SQL.

 Relational algebra mainly provides theoretical foundation for relational
databases and SQL
 Queries in algebra are composed using a collection of operators, these
operators makes us easy to form a complex query.
Unary relational operations:
1) Select (σ).
2) Project (π).
3) Rename (ρ).
Binary relational operations:
1) Divisions.
2) Joints.
Set operations:
1) Union (U).
2) Intersection (∩).
3) Difference (-).
4) Cartesian product or cross product (×).
Selection operator in relational algebra (σ):
1) It selects a subset of tuples/records from a table.

2) Select operator is indicated by σ
3) This operator allows us to manipulate data in a single relation.
4) The tuples subset must satisfy a selection condition.
Syntax:
Projection operator in relational algebra(π):
 The projection operator π allows us to extract columns from a relation.

 Duplicate values will not be projected using π operator.
Page 23 of 75
 Example:
Π gender(s1)
Π sname, gender(s1)
Rename operator in relational algebra:
 Renaming the column names.

 Syntax: ρ new name/old name (table name).
 Example: ρ rating/sail rating(s2)
RELATIONAL CALCULUS: TUPLE & DOMAIN RELATIONAL CALCULUS
 Relational calculus is an alternative to relational algebra.
 Relational calculus is a non-procedural query language. In the non-

procedural query language, the user is concerned with the details of
how to obtain the end results.
 The relational calculus tells what to do but never explains how to do.
 Relational calculus allows us to describe the set of answers without
describing about computing process.
Types of Relational calculus:
In relational calculus we are having two variants:
1) Tuple Relational Calculus (TRC).

2) Domain Relational Calculus (DRC).
Page 24 of 75
1) Tuple Relational Calculus (TRC):
 The tuple relational calculus is specified to select the tuples in a

relation. In TRC, filtering variable uses the tuples of a relation.
 Result of the relation can have one or more tuples.
2) Domain Relational Calculus (DRC):
 A second form of relation is known as Domain relational calculus. In

domain relational calculus, filtering variable uses the domain of
attributes.
 Domain relational calculus uses the same operators as tuple calculus.
It uses logical connectives ∧ (and), ∨ (or) and ┓ (not).
Page 25 of 75
 It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.
Output: This query will yield the article, page and subject from the relational
javatpoint, where subject is a database.
KEY DIFFERENCES BETWEEN RELATIONAL ALGEBRA & RELATIONAL

CALCULUS
1) The basic difference between Relational Algebra and Relational

Calculus is that Relational Algebra is a Procedural language whereas,
the Relational Calculus is a Non-Procedural, instead it is a Declarative
language.
2) The Relational Algebra defines how to obtain the result whereas, the
Relational Calculus define what information the result must contain.
3) Relational Algebra specifies the sequence in which operations have to
be performed in the query. On the other hands, Relational calculus does
not specify the sequence of operations to performed in the query.
4) The Relational Algebra is not domain dependent whereas, the Relational
Calculus can be domain dependent as we have Domain Relational
Calculus.
5) The Relational Algebra query language is closely related to
programming language whereas, the Relational Calculus is closely
related to the Natural Language.
Page 26 of 75
Relational Algebra and Relational Calculus both have equivalent expressive

power. The main difference between them is just that Relational Algebra
specify how to retrieve data and Relational Calculus defines what data is to
be retrieved.
*****
Page 27 of 75
UNIT – III
SQL
 SQL stands for Structure Query Language. It is used for storing and
managing data in relational database management system (RDMS).
 It is a standard language for Relational Database System. It enables a
user to create, read, update and delete relational databases and tables.
 All the RDBMS like: MySQL, Informix, Oracle, MS Access and SQL
Server use SQL as their standard database language.
 SQL allows users to query the database in a numbers of ways, using
English-like statements.
Rules:
SQL follows the following rules:
1) Structure query language is not case sensitive. Generally keywords of

SQL are written in uppercase.
2) Statements of SQL are dependent on text lines. We can use a single SQL
statement on one or multiple text line.
3) Using the SQL statements, you can perform most of the actions in a
database.
4) SQL depends on tuple relational calculus and relational algebra.
SQL process:
 When an SQL command is executing for any RDBMS then the system
figure out the best way to carry out the request and SQL engine
determines that how to interpret the task.
 In the process, various components are included. These components
are optimization Engine, Query engine, Query dispatcher, classic etc.
 All the non-SQL queries are handled by the classic query engine but
SQL query engine won't handle logical files.
Page 28 of 75
Characteristics of SQL:
1) SQL is easy to learn.

2) SQL is used to access data from relational database management
systems.
3) SQL can execute queries against the database.
4) SQL is used to describe the data.
5) SQL is used to define the data in the database and manipulate it when
needed.
6) SQL is used to create and drop the database and table.
7) SQL is used to create view, stored procedure, function in a database.
8) SQL allows users to set permissions on tables, procedures and views.
FORM OF BASIC SQL QUERY
SELECT [DISTINCT] select-list
FROM from-list
WHERE qualifications
Every query must have a SELECT clause, which specifies columns to be

retained in the result, and a FROM clause, which specifies a cross-product of
tables. The optional WHERE clause specifies selection conditions on the
tables mentioned in the FROM clause.
Eg: 1. Find the names and ages of all sailors.
Page 29 of 75
SELECT DISTINCT S.sname, S.age FROM Sailors S
2.Find all sailors with a rating above 7.
SELECT S.sid, S.sname, S.rating, S.age FROM Sailors AS S WHERE S.rating>

7
We now consider the syntax of a basic SQL query in detail.
 The from-list in the FROM clause is a list of table names. A table name
can be followed by a range variable; a range variable is particularly
useful when the same table name appears more than once in the from-
list.
 The select-list is a list of (expressions involving) column names of tables
named in the from-list.
Column names can be prefixed by a range variable.
 The qualification in the WHERE clause is a boolean combination (i.e.,

an expression using the logical connectives AND, OR, and NOT) of
conditions of the form expression op expression, where op is one of the
comparison operators {<, <=, =, <>, >=,>}. An expression is a column
name, a constant, or an (arithmetic or string) expression.
 The DISTINCT keyword is optional. It indicates that the table computed
as an answer to this query should not contain duplicates, that is, two
copies of the same row. The default is that duplicates are not
eliminated.
 The following is the conceptual evaluation strategy of SQL query:
1) Compute the cross-product of the tables in the from-list.
2) Delete those rows in the cross-product that fail the qualification
conditions.
3) Delete all columns that do not appear in the select-list.
4) If DISTINCT is specified, eliminate duplicate rows.
UNION, INTERSECT & EXCEPT
SQL provides three set-manipulation constructs that extend the basic query
form. Since the answer to a query is a multiset of rows, it is natural to consider
Page 30 of 75
the use of operations such as union, intersection, and difference. SQL

supports these operations under the names UNION, INTERSECT, and
EXCEPT.
Page 31 of 75
NESTED QUERIES
 A nested query is a query that has another query embedded within it;
the embedded query is called a subquery.
 SQL provides other set operations: IN (to check if an element is in a
given set), NOT IN (to check if an element is not in a given set).
The nested subquery computes the (multi)set of sids for sailors who have
reserved boat 103, and the top-level query retrieves the names of sailors
whose sid is in this set. The IN operator allows us to test whether a value is
in a given set of elements; an SQL query is used to generate the set to be
tested.
Page 32 of 75
AGGREGATION OPERATORS
NULL VALUES
SQL provides a special column value called null to use where some column
does not have a value to hold or the value is unknown. We use null when the
column value is either unknown or inapplicable.
Page 33 of 75
TRIGGERS AND ACTIVE DATA BASES
A trigger is a procedure that is automatically invoked by the DBMS in

response to specified changes to the database, and is typically specified by
the DBA. A database that has a set of associated triggers is called an active
database.
A trigger description contains three parts:
1) Event: A change to the database that activates the trigger.

2) Condition: A query or test that is run when the trigger is activated.
3) Action: A procedure that is executed when the trigger is activated and
its condition is true.
OR
Trigger is invoked by Oracle engine automatically whenever a specified event

occurs. Trigger is stored into database and invoked repeatedly, when specific
condition match.
Triggers are stored programs, which are automatically executed or fired when
some event occurs.
Triggers are written to be executed in response to any of the following events:
 A database manipulation (DML) statement (DELETE, INSERT, or

UPDATE).
 A database definition (DDL) statement (CREATE, ALTER, or DROP).
 A database operation (SERVERERROR, LOGON, LOGOFF, STARTUP,
or SHUTDOWN).
Triggers could be defined on the table, view, schema, or database with which
the event is associated.
Advantages of Triggers:
1) Trigger generates some derived column values automatically.

2) Enforces referential integrity.
3) Event logging and storing information on table access.
4) Auditing.
5) Synchronous replication of tables.
Page 34 of 75
6) Imposing security authorizations.

7) Preventing invalid transactions.
Creating a trigger:
Syntax for creating trigger:
Here,
1) CREATE [OR REPLACE] TRIGGER trigger_name: It creates or replaces

an existing trigger with the trigger_name.
2) {BEFORE | AFTER | INSTEAD OF} : This specifies when the trigger
would be executed. The INSTEAD OF clause is used for creating trigger
on a view.
3) {INSERT [OR] | UPDATE [OR] | DELETE}: This specifies the DML
operation.
4) [OF col_name]: This specifies the column name that would be updated.
5) [ON table_name]: This specifies the name of the table associated with
the trigger.
Page 35 of 75
6) [REFERENCING OLD AS o NEW AS n]: This allows you to refer new and
old values for various DML statements, like INSERT, UPDATE, and
DELETE.
7) [FOR EACH ROW]: This specifies a row level trigger, i.e., the trigger
would be executed for each row being affected. Otherwise the trigger
will execute just once when the SQL statement is executed, which is
called a table level trigger.
8) WHEN (condition): This provides a condition for rows for which the
trigger would fire.
This clause is valid only for row level triggers.
SCHEMA REFINEMENT
 The Schema Refinement refers to refine the schema by using some

technique. The best technique of schema refinement is decomposition.
 Normalisation or Schema Refinement is a technique of organizing
the data in the database. It is a systematic approach of
decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies.
 Redundancy refers to repetition of same data or duplicate copies of
same data stored in different locations.
 Anomalies: Anomalies refers to the problems occurred after poorly
planned and normalised databases where all the data is stored in one
table which is sometimes called a flat file database.
PROBLEMS CAUSED BY REDUNDANCY [OR] ANOMALIES OR PROBLEMS

FACING WITHOUT NORMALIZATION [OR] PROBLEMS DUE TO
REDUNDANCY
Anomalies refers to the problems occurred after poorly planned and un-
normalised databases where all the data is stored in one table which is
Page 36 of 75
sometimes called a flat file database. Let us consider such type of schema:
Here all the data is stored in a single table which causes redundancy of data
or say anomalies as SID and Sname are repeated once for same CID. Let us
discuss anomalies one by one. Due to redundancy of data we may get the
following problems, those are:
1) Insertion anomalies: It may not be possible to store some information

unless some other information is stored as well.
2) Redundant storage: some information is stored repeatedly.
3) Update anomalies: If one copy of redundant data is updated, then
inconsistency is created unless all redundant copies of data are
updated.
4) Deletion anomalies: It may not be possible to delete some information
without losing some other information as well.
Problem in updation/updation anomaly: If there is updation in the fee from

5000 to 7000, then we have to update FEE column in all the rows, else data
will become inconsistent.
Insertion Anomaly and Deletion Anomaly: These anomalies exist only due
to redundancy, otherwise they do not exist.
Page 37 of 75
Insertion Anomalies: New course is introduced C4, But no student is there

who is having C4 subject.
Because of insertion of some data, It is forced to insert some other dummy

data.
Deletion Anomaly: Deletion of S3 student cause the deletion of course.

Because of deletion of some data forced to delete some other useful data.
Solutions To Anomalies: Decomposition of Tables – Schema Refinement
Page 38 of 75
There are some Anomalies in this again:
What is the Solution?
Decomposing into relations as shown below:
To avoid redundancy and problems due to redundancy, we use refinement

technique called Decomposition.
Page 39 of 75
DECOMPOSITIONS
 Process of decomposing a larger relation into smaller relations.

 Each of smaller relations contain subset of attributes of original
relation.
FUNCTIONAL DEPENDENCIES
 Functional dependency is a relationship that exist when one attribute

uniquely determines another attribute.
 Functional dependency is a form of integrity constraint that can identify
schema with redundant storage problems and to suggest refinement.
 A functional dependency A->B in a relation holds true if two tuples
having the same value of attribute A also have the same value of
attribute B.
 If t1.X=t2.X then t1.Y=t2.Y where t1,t2 are tuples and X,Y are
attributes.
REASONING ABOUT FUNCTIONAL DEPENDENCIES
Armstrong Axioms: Armstrong axioms defines the set of rules for reasoning
about functional dependencies and also to infer all the functional
dependencies on a relational database.
Various axioms rules or inference rules:
Primary axioms:
Secondary or derived axioms:
Page 40 of 75
Attribute closure: Attribute closure of an attribute set can be defined as set

of attributes which can be functionally determined from it.
Note: To find attribute closure of an attribute set:
1) add elements of attribute set to the result set.

2) recursively add elements to the result set which can be functionally
determined from the elements of result set.
Types of functional dependencies:
Prime and non-prime attributes: Attributes which are parts of any

candidate key of relation are called as prime attribute, others are non-prime
attributes.
Candidate Key: Candidate Key is minimal set of attributes of a relation which

can be used to identify a tuple uniquely.
Page 41 of 75
Super Key: Super Key is set of attributes of a relation which can be used to
identify a tuple uniquely.
Finding candidate keys problems:
Page 42 of 75
Page 43 of 75
NORMALIZATION: FIRST, SECOND, THIRD, BCNF, FOURTH & FIFTH

NORMAL FORMS
 Normalization is a process of designing a consistent database with

minimum redundancy which support data integrity by grating or
decomposing given relation into smaller relations preserving
constraints on the relation.
 Normalisation removes data redundancy and it will helps in designing
a good data base which involves a set of normal forms as follows:
1) First normal form (1NF).
2) Second normal form (2NF).
3) Third normal form (3NF).
4) Boyce coded normal form (BCNF).
5) Forth normal form (4NF).
6) Fifth normal form (5NF).
7) Sixth normal form (6NF).
8) Domain key normal form.
Page 44 of 75
1) First normal form: A relation is said to be in first normal form if it contains

all atomic values or single values.
Page 45 of 75
2) Second normal form: A relation is said to be in second normal form if it is

in first normal form without any partial dependencies. In second normal form
non-prime attributes should not depend on proper subset of key attributes.
3) Third normal form: A relation is said to be in third normal form , if it is

already in second normal form and no transitive dependencies exists.
Page 46 of 75
4) Boyce coded normal form (BCNF): It is an extension of third normal form

where in a functional dependency X->A, X must be a super key. A relation is
in BCNF if in every non-trivial functional dependency X –> Y, X is a super key.
5) Fourth normal form: A relation is said to be in fourth normal form if it is

in third normal form and no multi value dependencies should exist between
attributes.
Note: In some cases multi value dependencies may exist not more than one
time in a given relation.
6) Fifth normal form: Fifth normal form is related to join dependencies.
7) Domain key normal form: A domain key normal form keeps a constraint
that every constraint on the relation is a logical sequence of definition of keys
and domains.
8) Sixth normal form: A relation is said to be in sixth normal form such that
the relation R should not contain any non-trivial join dependencies. Also sixth
normal form considers temporal dimensions (time) to the relational model.
Page 47 of 75
Key Points related to normal forms:
1) BCNF is free from redundancy.

2) If a relation is in BCNF, then 3NF is also satisfied.
3) If all attributes of relation are prime attribute, then the relation is
always in 3NF.
4) A relation in a Relational Database is always and at least in 1NF form.
5) Every Binary Relation (a Relation with only 2 attributes) is always in
BCNF.
6) If a Relation has only singleton candidate keys (i.e. every candidate key
consists of only 1 attribute), then the Relation is always in 2NF
(because no Partial functional dependency possible).
7) Sometimes going for BCNF form may not preserve functional
dependency. In that case go for BCNF only if the lost FD(s) is not
required, else normalize till 3NF only.
8) There are many more Normal forms that exist after BCNF, like 4NF and
more. But in real world database systems it’s generally not required to
go beyond BCNF.
Problems on normal forms:
Page 48 of 75
LOSSLESS JOIN DECOMPOSITION
While joining two smaller tables no data should be lost and should satisfy all
the rules of decomposition. No additional data should be generated on natural
join of decomposed tables.
Page 49 of 75
*****
Page 50 of 75
UNIT – IV
TRANSACTION CONCEPT
 The transaction is a set of logically related operation. It contains a group

of tasks.
 A transaction is an action or series of actions. It is performed by a single
user to perform operations for accessing the contents of the database.
 Example: Suppose an employee of bank transfers Rs 800 from X's
account to Y's account. This small transaction contains several low-
level tasks:
Operations of Transaction:
Following are the main operations of transaction:
1) Read(X): Read operation is used to read the value of X from the

database and stores it in a buffer in main memory.
2) Write(X): Write operation is used to write the value back to the database
from the buffer.
Let's take an example to debit transaction from an account which consists of

following operations:
Page 51 of 75
Let's assume the value of X before starting of the transaction is 4000.
 The first operation reads X's value from database and stores it in a
buffer.
 The second operation will decrease the value of X by 500. So buffer will
contain 3500.
 The third operation will write the buffer's value to the database. So X's
final value will be 3500.
But it may be possible that because of the failure of hardware, software or

power, etc. that transaction may fail before finished all the operations in the
set.
For example: If in the above transaction, the debit transaction fails after
executing operation 2 then X's value will remain 4000 in the database which
is not acceptable by the bank.
To solve this problem, we have two important operations:
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.
Transaction property: The transaction has the four properties. These are
used to maintain consistency in a database, before and after the transaction.
Property of Transaction:
1) Atomicity.
2) Consistency.
3) Isolation.
4) Durability.
Page 52 of 75
TRANSACTION STATE [OR] STATES OF TRANSACTION
In a database, the transaction can be in one of the following states:
1) Active state:
 The active state is the first state of every transaction. In this state, the
transaction is being executed.
 For example: Insertion or deletion or updating a record is done here.
But all the records are still not saved to the database.
Page 53 of 75
2) Partially committed:
 In the partially committed state, a transaction executes its final

operation, but the data is still not saved to the database.
 In the total mark calculation example, a final display of the total marks
step is executed in this state.
3) Committed: A transaction is said to be in a committed state if it executes

all its operations successfully. In this state, all the effects are now
permanently saved on the database system.
4) Failed state:
 If any of the checks made by the database recovery system fails, then
the transaction is said to be in the failed state.
 In the example of total mark calculation, if the database is not able to
fire a query to fetch the marks, then the transaction will fail to execute.
5) Aborted:
 If any of the checks fail and the transaction has reached a failed state
then the database recovery system will make sure that the database is
in its previous consistent state. If not then it will abort or roll back the
transaction to bring the database into a consistent state.
 If the transaction fails in the middle of the transaction then before
executing the transaction, all the executed transactions are rolled back
to its consistent state.
 After aborting the transaction, the database recovery module will select
one of the two operations:
i. Re-start the transaction.
ii. Kill the transaction.
IMPLEMENTATION OF ATOMICITY AND DURABILITY
The recovery-management component of a database system can support

atomicity and durability by a variety of schemes. We first consider a simple,
but extremely in- efficient, scheme called the shadow copy scheme. This
scheme, which is based on making copies of the database, called shadow
Page 54 of 75
copies, assumes that only one transaction is active at a time. The scheme also
assumes that the database is simply a file on disk. A pointer called db-pointer
is maintained on disk; it points to the current copy of the database.
In the shadow-copy scheme, a transaction that wants to update the database

first creates a complete copy of the database. All updates are done on the new
database copy, leaving the original copy, the shadow copy, untouched. If at
any point the transaction has to be aborted, the system merely deletes the
new copy. The old copy of the database has not been affected.
If the transaction completes, it is committed as follows. First, the operating

system is asked to make sure that all pages of the new copy of the database
have been written out to disk. (Unix systems use the flush command for this
purpose.) After the operating system has written all the pages to disk, the
database system updates the pointer db-pointer to point to the new copy of
the database; the new copy then becomes the current copy of the database.
The old copy of the database is then deleted. Figure 15.2 depicts the scheme,
showing the database state before and after the update.
The transaction is said to have been committed at the point where the updated
db- pointer is written to disk.
We now consider how the technique handles transaction and system failures.
First, consider transaction failure. If the transaction fails at any time before
db-pointer is updated, the old contents of the database are not affected. We
can abort the transaction by just deleting the new copy of the database.
Page 55 of 75
Once the transaction has been committed, all the updates that it performed
are in the database pointed to by db- pointer. Thus, either all updates of the
transaction are reflected, or none of the effects are reflected, regardless of
transaction failure.
Now consider the issue of system failure. Suppose that the system fails at any
time before the updated db-pointer is written to disk. Then, when the system
restarts, it will read db-pointer and will thus see the original contents of the
database, and none of the effects of the transaction will be visible on the
database. Next, suppose that the system fails after db-pointer has been
updated on disk. Before the pointer is updated, all updated pages of the new
copy of the database were written to disk. Again, we assume that, once a file
is written to disk, its contents will not be damaged even if there is a system
failure. Therefore, when the system restarts, it will read db-pointer and will
thus see the contents of the database after all the updates performed by the
transaction.
CONCURRENT EXECUTIONS IN DBMS
 In a multi-user system, multiple users can access and use the same
database at one time, which is known as the concurrent execution of
the database. It means that the same database is executed
simultaneously on a multi-user system by different users.
 While working on the database transactions, there occurs the
requirement of using the database by multiple users for performing
different operations, and in that case, concurrent execution of the
database is performed.
 The thing is that the simultaneous execution that is performed should
be done in an interleaved manner, and no operation should affect the
other executing operations, thus maintaining the consistency of the
database. Thus, on making the concurrent execution of the transaction
operations, there occur several challenging problems that need to be
solved.
Page 56 of 75
Problems with Concurrent Execution:
In a database transaction, the two main operations are READ and WRITE
operations. So, there is a need to manage these two operations in the
concurrent execution of the transactions as if these operations are not
performed in an interleaved manner, and the data may become inconsistent.
So, the following problems occur with the Concurrent Execution of the
operations:
1) Lost Update Problems (W - W Conflict): The problem occurs when two

different database transactions perform the read/write operations on
the same database items in an interleaved manner (i.e., concurrent
execution) that makes the values of the items incorrect hence making
the database inconsistent.
2) Dirty Read Problems (W-R Conflict): The dirty read problem occurs
when one transaction updates an item of the database, and somehow
the transaction fails, and before the data gets rollback, the updated
database item is accessed by another transaction. There comes the
Read-Write Conflict between both transactions.
3) Unrepeatable Read Problem (W-R Conflict): Also known as
Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.
CONCURRENCY CONTROL PROTOCOLS: LOCK BASED PROTOCOLS,

TIMESTAMP BASED PROTOCOLS & VALIDATION-BASED PROTOCOLS
The concurrency control protocols ensure the atomicity, consistency,

isolation, durability and serializability of the concurrent execution of the
database transactions. Therefore, these protocols are categorized as:
1) Lock Based Concurrency Control Protocol.

2) Time Stamp Concurrency Control Protocol.
3) Validation Based Concurrency Control Protocol.
1) Lock-Based Protocol: In this type of protocol, any transaction cannot read

or write data until it acquires an appropriate lock on it. There are two types
of lock:
Page 57 of 75
i. Shared lock: It is also known as a Read-only lock. In a shared lock, the

data item can only read by the transaction. It can be shared between
the transactions because when the transaction holds a lock, then it
can't update the data on the data item.
ii. Exclusive lock: In the exclusive lock, the data item can be both reads
as well as written by the transaction. This lock is exclusive, and in this
lock, multiple transactions do not modify the same data
simultaneously.
There are four types of lock protocols available:
i. Simplistic lock protocol: It is the simplest way of locking the data while
transaction. Simplistic lock-based protocols allow all the transactions to get
the lock on the data before insert or delete or update on it. It will unlock the
data item after completing the transaction.
ii. Pre-claiming Lock Protocol:
 Pre-claiming Lock Protocols evaluate the transaction to list all the data
items on which they need locks.
 Before initiating an execution of the transaction, it requests DBMS for
all the lock on all those data items.
 If all the locks are granted then this protocol allows the transaction to
begin. When the transaction is completed then it releases all the lock.
 If all the locks are not granted then this protocol allows the transaction
to rolls back and waits until all the locks are granted.
iii. Two-phase locking (2PL):
 The two-phase locking protocol divides the execution phase of the

transaction into three parts.
Page 58 of 75
 In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
 In the second part, the transaction acquires all the locks. The third
phase is started as soon as the transaction releases its first lock.
 In the third phase, the transaction cannot demand any new locks. It
only releases the acquired locks.
iv. Strict Two-phase locking (Strict-2PL):
 The first phase of Strict-2PL is similar to 2PL. In the first phase, after
acquiring all the locks, the transaction continues to execute normally.
 The only difference between 2PL and strict 2PL is that Strict-2PL does
not release a lock after using it.
 Strict-2PL waits until the whole transaction to commit, and then it
releases all the locks at a time.
 Strict-2PL protocol does not have shrinking phase of lock release.
 It does not have cascading abort as 2PL does.
2) Timestamp Ordering Protocol:
 The Timestamp Ordering Protocol is used to order the transactions

based on their Timestamps. The order of transaction is nothing but the
ascending order of the transaction creation.
Page 59 of 75
 The priority of the older transaction is higher that's why it executes first.
To determine the timestamp of the transaction, this protocol uses
system time or logical counter.
 The lock-based protocol is used to manage the order between conflicting
pairs among transactions at the execution time. But Timestamp based
protocols start working as soon as a transaction is created.
 Let's assume there are two transactions T1 and T2. Suppose the
transaction T1 has entered the system at 007 times and transaction T2
has entered the system at 009 times. T1 has the higher priority, so it
executes first as it is entered the system first.
 The timestamp ordering protocol also maintains the timestamp of last
'read' and 'write' operation on a data.
Basic Timestamp ordering protocol works as follows:
Page 60 of 75
Advantages and Disadvantages of TO protocol:
 TO protocol ensures serializability since the precedence graph is as

follows:
 TS protocol ensures freedom from deadlock that means no transaction

ever waits.
 But the schedule may not be recoverable and may not even be cascade-
free.
3) Validation Based Protocol: Validation phase is also known as optimistic

concurrency control technique. In the validation based protocol, the
transaction is executed in the following three phases:
i. Read phase: In this phase, the transaction T is read and executed. It

is used to read the value of various data items and stores them in
temporary local variables. It can perform all the write operations on
temporary variables without an update to the actual database.
ii. Validation phase: In this phase, the temporary variable value will be
validated against the actual data to see if it violates the serializability.
iii. Write phase: If the validation of the transaction is validated, then the
temporary results are written to the database or system otherwise the
transaction is rolled back.
Here each phase has the following different timestamps:
Start (Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts
its validation phase.
Finish (Ti): It contains the time when Ti finishes its write phase.
Page 61 of 75
SERIALIZABILITY
 The serializability of schedules is used to find non-serial schedules that

allow the transaction to execute concurrently without interfering with
one another.
 It identifies which schedules are correct when executions of the
transaction have interleaving of their operations.
 A non-serial schedule will be serializable if its result is equal to the
result of its transactions executed serially.
Page 62 of 75
Here,
Schedule A and Schedule B are serial schedule.
Schedule C and Schedule D are Non-serial schedule.
TESTING FOR SERIALIZABILITY
Serialization Graph is used to test the Serializability of a schedule.
Assume a schedule S. For S, we construct a graph known as precedence

graph. This graph has a pair G = (V, E), where V consists a set of vertices, and
E consists a set of edges. The set of vertices is used to contain all the
transactions participating in the schedule. The set of edges is used to contain
all edges Ti ->Tj for which one of the three conditions holds:
Page 63 of 75
If a precedence graph contains a single edge Ti → Tj, then all the instructions
of Ti are executed before the first instruction of Tj is executed.
If a precedence graph for schedule S contains a cycle, then S is non-

serializable. If the precedence graph has no cycle, then S is known as
serializable.
For example:
Explanation:
Page 64 of 75
Precedence graph for schedule S1:
The precedence graph for schedule S1 contains a cycle that's why Schedule
S1 is non-serializable.
Explanation:
Page 65 of 75
Precedence graph for schedule S2:
The precedence graph for schedule S2 contains no cycle that's why Schedule
S2 is serializable.
RECOVERABILITY
Sometimes a transaction may not execute completely due to a software issue,

system crash or hardware failure. In that case, the failed transaction has to
be rollback. But some other transaction may also have used value produced
by the failed transaction. So we also have to rollback those transactions.
The above table 1 shows a schedule which has two transactions. T1 reads and
writes the value of A and that value is read and written by T2. T2 commits
Page 66 of 75
but later on, T1 fails. Due to the failure, we have to rollback T1. T2 should
also be rollback because it reads the value written by T1, but T2 can't be
rollback because it already committed. So this type of schedule is known as
irrecoverable schedule.
Irrecoverable schedule: The schedule will be irrecoverable if Tj reads the

updated value of Ti and Tj committed before Ti commit.
The above table 2 shows a schedule with two transactions. Transaction T1

reads and writes A, and that value is read and written by transaction T2. But
later on, T1 fails. Due to this, we have to rollback T1. T2 should be rollback
because T2 has read the value written by T1. As it has not committed before
T1 commits so we can rollback transaction T2 as well. So it is recoverable
with cascade rollback.
Recoverable with cascading rollback: The schedule will be recoverable with

cascading rollback if Tj reads the updated value of Ti. Commit of Tj is delayed
till commit of Ti.
Page 67 of 75
The above Table 3 shows a schedule with two transactions. Transaction T1

reads and write A and commits, and that value is read and written by T2. So
this is a cascade less recoverable schedule.
LOG–BASED RECOVERY
 The log is a sequence of records. Log of each transaction is maintained

in some stable storage so that if any failure occurs, then it can be
recovered from there.
 If any operation is performed on the database, then it will be recorded
in the log.
 But the process of storing the logs should be done before the actual
transaction is applied in the database.
Let's assume there is a transaction to modify the City of a student. The

following logs are written for this transaction.
 When the transaction is initiated, then it writes 'start' log. 1. <Tn,

Start>
 When the transaction modifies the City from 'Noida' to 'Bangalore',
then another log is written to the file. 1. <Tn, City, 'Noida', 'Bangalore'
>
 When the transaction is finished, then it writes another log to indicate
the end of the transaction. 1. <Tn, Commit>
There are two approaches to modify the database:
1) Deferred database modification: The deferred modification technique

occurs if the transaction does not modify the database until it has
committed. In this method, all the logs are created and stored in the
stable storage, and the database is updated when a transaction
commits.
2) Immediate database modification: The Immediate modification
technique occurs if database modification occurs while the transaction
is still active. In this technique, the database is modified immediately
after every operation. It follows an actual database modification.
Page 68 of 75
RECOVERY WITH CONCURRENT TRANSACTIONS
 Whenever more than one transaction is being executed, then the

interleaved of logs occur. During recovery, it would become difficult for
the recovery system to backtrack all logs and then start recovering.
 To ease this situation, 'checkpoint' concept is used by most DBMS.
Checkpoint:
 The checkpoint is a type of mechanism where all the previous logs are
removed from the system and permanently stored in the storage disk.
 The checkpoint is like a bookmark. While the execution of the
transaction, such checkpoints are marked, and the transaction is
executed then using the steps of the transaction, the log files will be
created.
 When it reaches to the checkpoint, then the transaction will be updated
into the database, and till that point, the entire log file will be removed
from the file. Then the log file is updated with the new step of
transaction till next checkpoint and so on.
 The checkpoint is used to declare a point before which the DBMS was
in the consistent state, and all transactions were committed.
Recovery using Checkpoint: In the following manner, a recovery system

recovers the database from this failure:
 The recovery system reads log files from the end to start. It reads log
files from T4 to T1.
 Recovery system maintains two lists, a redo-list, and an undo-list.
Page 69 of 75
 The transaction is put into redo state if the recovery system sees a log
with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-
list and their previous list, all the transactions are removed and then
redone before saving their logs.
 For example: In the log file, transaction T2 and T3 will have <Tn, Start>
and <Tn, Commit>. The T1 transaction will have only <Tn, commit> in
the log file. That's why the transaction is committed after the checkpoint
is crossed. Hence it puts T1, T2 and T3 transaction into redo list.
 The transaction is put into undo state if the recovery system sees a log
with <Tn, Start> but no commit or abort log found. In the undo-list, all
the transactions are undone, and their logs are removed.
 For example: Transaction T4 will have <Tn, Start>. So T4 will be put
into undo list since this transaction is not yet complete and failed amid.
*****
Page 70 of 75
UNIT – V
FILE ORGANIZATION
The method of mapping file records to disk blocks defines file organization,
i.e. how the file records are organized. The following are the types of file
organization.
1) Heap File Organization: When a file is created using Heap File

Organization mechanism, the Operating Systems allocates memory
area to that file without any further accounting details. File records can
be placed anywhere in that memory area.
2) Sequential File Organization: Every file record contains a data field
(attribute) to uniquely identify that record. In sequential file
organization mechanism, records are placed in the file in the some
sequential order based on the unique key field or search key.
Practically, it is not possible to store all the records sequentially in
physical form.
3) Hash File Organization: This mechanism uses a Hash function
computation on some field of the records. As we know, that file is a
collection of records, which has to be mapped on some block of the disk
space allocated to it.
4) Clustered File Organization: Clustered file organization is not
considered good for large databases. In this mechanism, related records
from one or more relations are kept in a same disk block, that is, the
ordering of records is not based on primary key or search key.
Page 71 of 75
FILE ORGANIZATIONS
Operations on database files can be classified into two categories broadly:
1) Update Operations.
2) Retrieval Operations.
Update operations change the data values by insertion, deletion or update.

Retrieval operations on the other hand do not alter the data but retrieve them
after optional conditional filtering. In both types of operations, selection plays
significant role. Other than creation and deletion of a file, there could be
several operations, which can be done on files.
Open: A file can be opened in one of two modes, read mode or write mode. In
read mode, operating system does not allow anyone to alter data it is solely
for reading purpose. Files opened in read mode can be shared among several
entities. The other mode is write mode, in which, data modification is allowed.
Files opened in write mode can be read also but cannot be shared.
Locate: Every file has a file pointer, which tells the current position where the
data is to be read or written. This pointer can be adjusted accordingly. Using
find (seek) operation it can be moved forward or backward.
Read: By default, when files are opened in read mode the file pointer points
to the beginning of file. There are options where the user can tell the operating
system to where the file pointer to be located at the time of file opening. The
very next data to the file pointer is read.
Write: User can select to open files in write mode, which enables them to edit
the content of file. It can be deletion, insertion or modification. The file pointer
can be located at the time of opening or can be dynamically changed if the
operating system allowed doing so.
Close: This also is most important operation from operating system point of
view. When a request to close a file is generated, the operating system removes
all the locks (if in shared mode) and saves the content of data (if altered) to
the secondary storage media and release all the buffers and file handlers
associated with the file.
Page 72 of 75
INDEXING: CLUSTER INDEXES, PRIMARY & SECONDARY INDEXES
We know that information in the DBMS files is stored in form of records. Every
record is equipped with some key field, which helps it to be recognized
uniquely.
Indexing is defined based on its indexing attributes. Indexing can be one of

the following types:
1) Primary Index: If index is built on ordering 'key-field' of file it is called

Primary Index. Generally it is the primary key of the relation.
2) Secondary Index: If index is built on non-ordering field of file it is called
Secondary Index.
3) Clustering Index: If index is built on ordering non-key field of file it is
called Clustering Index.
B+ TREES
B<sup+< sup=""> tree is multi-level index format, which is balanced binary

search trees. As mentioned earlier single level index records becomes large as
the database size grows, which also degrades performance.</sup+<> All leaf
nodes of B+ tree denote actual data pointers. B+ tree ensures that all leaf
nodes remain at the same height, thus balanced. Additionally, all leaf nodes
are linked using link list, which makes B+ tree to support random access as
well as sequential access.
Structure of B+ Tree:
Every leaf node is at equal distance from the root node. A B+ tree is of order
n where n is fixed for every B+ tree.
Page 73 of 75
Internal nodes:
 Internal (non-leaf) nodes contains at least ⌈n/2⌉ pointers, except the

root node.
 At most, internal nodes contain n pointers.
Leaf nodes:
 Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
 At most, leaf nodes contain n record pointers and n key values.
 Every leaf node contains one block pointer P to point to next leaf node
and forms a linked list.
B+ tree insertion: B+ tree are filled from bottom. And each node is inserted
at leaf node.
If leaf node overflows:
Split node into two parts
Partition at i = ⌊(m+1)/2⌋
First i entries are stored in one node
Rest of the entries (i+1 onwards) are moved to a new node
ith key is duplicated in the parent of the leaf
If non-leaf node overflows:
Split node into two parts
Partition the node at i = ⌈(m+1)/2⌉
Entries upto i are kept in one node
Rest of the entries are moved to a new node
B+ tree deletion:
 B+ tree entries are deleted leaf nodes.

 The target entry is searched and deleted.
If it is in internal node, delete and replace with the entry from the left position.
After deletion underflow is tested
Page 74 of 75
If underflow occurs
Distribute entries from nodes left to it.
If distribution from left is not possible
Distribute from nodes right to it
If distribution from left and right is not possible
Merge the node with left and right to it.
*****
Prepared by:
RIYAZ MOHAMMED
Page 75 of 75

Database Management Systems

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Management Systems

Uploaded by

Copyright:

Available Formats

Database Management Systems Riyaz Mohammed

DATABASE MANAGEMENT SYSTEMS

B.TECH/CSE & IT/R18

Database System Applications: A Historical Perspective, File Systems versus

Introduction to Database Design: Database Design and ER Diagrams,

Introduction to the Relational Model: Integrity constraint over relations,

SQL: QUERIES, CONSTRAINTS, TRIGGERS: form of basic SQL query,

Schema Refinement: Problems caused by redundancy, decompositions,

Transaction Concept, Transaction State, Implementation of Atomicity and

Implementation of Isolation, Testing for serializability, Lock Based Protocols,

Data on External Storage, File Organization and Indexing, Cluster Indexes,

 Facts that can be recorded.

 It is a collection of interrelated data. These can be stored in the form of

Database Management System (DBMS):

It is a collection of programs that enables user to create and maintain a

1) Redundency can be reduced.

7) Concurrent access and Crash recovery.

1) Data redundancy and inconsistency.

1) Banking: all transactions.

to be a watershed in the development of database systems. Codd won the

FILE SYSTEMS VERSUS A DBMS

THE DATA MODEL

 A data model is a collection of high-level data description that hide

 The central data description construct in this model is a relation, which

LEVELS OF ABSTRACTION IN A DBMS

The data in a DBMS is described at three levels of abstraction. The database

1) Physical schema/Internal schema:

 Physical schema summarizes how the data is actually stored on the

 Conceptual schema describes the stored data in terms of data model of

 External schema allows data access to individual user or a group of

A very important advantage of using a DBMS is that it offers data

1) Physical data independence.

The conceptual schema insulates users from changes in physical storage

The DBMS accepts SQL commands generated from a variety of user

When a user issues a query, the parsed query is presented to a query

An execution plan is a blueprint for evaluating a query, usually represented

Fig: Structure of a DBMS

The DBMS supports concurrency and crash recovery by carefully scheduling

INTRODUCTION TO DATABASE DESIGN

DATABASE DESIGN AND ER DIAGRAMS

1) Requirements Analysis: The very first step in designing a database

2) Conceptual Database Design: The information gathered in the

translation into a data model supported by a commercial database system

3) Logical Database Design: We must choose a DBMS to implement our

4) Schema Refinement: Analyze the collection of relations in our relational

5) Physical Database Design: In this step, we consider typical expected

6) Application and Security Design: Any software project that involves a

Entity: An entity is an object in the real world that is distinguishable from

Attribute: Properties of an entity. An entity is described using a set of

Relationship: Association among several entities.

We collect a set of similar relationships in a relationship set.

RELATIONSHIPS AND RELATIONSHIP SETS

 A relationship is an association among two or more entities.

Fig: works in relationship set

Mapping cardinality constraints:

Mapping cardinality means number of entities to which another entity can be

1) One to one (1:1) relationship.

ER-Diagram for college:

ADDITIONAL FEATURES OF THE ER MODEL

 Constraint means condition.

Sometimes it is natural to classify the entities in an entity set into subclasses.

i. Specialization: specialization is the process of identifying subsets of

 We express aggregation when we need to express relationship among