Unit 3 Notes

Unit 3: Advanced Design Concepts and
Implementation
Design theory:
A relation schema is formed by grouping of attributes.

A good database design can be achieved by deciding which attributes should be grouped in
to a relation to provide better performance.
There are two levels at which the goodness of relation schemas can be discussed”
The logical (conceptual) level
The implementation (base relation-storage) level
Bad design leads to the following problem
Redundancy
Inability to represent certain information
Waste of storage
Good design example:-

Example of an Update Anomaly:
Consider the relation:

EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
- Changing the name of project number P1 from “Billing” to “Customer-Accounting”

may cause this update to be made for all 100 employees working on project P1.
Example of an Insert Anomaly:

- Cannot insert a project unless an employee is assigned to it

- Cannot insert an employee unless an he/she is assigned to a project.
Example of an Insert Anomaly:

- When a project is deleted, it will result in deleting all the employees who work on that
project.
- Alternately, if an employee is the sole employee on a project, deleting that employee
would result in deleting the corresponding project.
Null values in Tuples:

Spurious Tuples:
The two relations EMP_PROJ1 and EMP_LOCS as the base relations of EMP_PROJ, is not
a good schema design.
Problem is if a Natural Join is performed on the above two relations it produces more tuples
than original set of tuples in EMP_PROJ.
- These additional tuples that were not in EMP_PROJ are called spurious tuples
because they represent spurious or wrong information that is not valid.
- This is because the PLOCATION attribute which is used for joining is neither a
primary key, nor a foreign key in either EMP_LOCS AND EMP_PROJ1.
The below provided example can be generated if you improperly join the relations together
with one another.
There are two important properties of decompositions:

1. Non-additive or lossless join
2. Preservation of the functional dependencies.
Normalization Theory:
The method for designing a relational database is to use a process commonly known
as normalization.
The goal is to generate a set of relation schemas that allows us to store information
without unnecessary redundancy, yet also allows us to retrieve information easily.
e.g: Based on Functional dependencies + Multivalued dependencies.
Decomposition:
The only way to avoid the repetition-of-information problem in the in_dep schema is to
decompose it into two schemas – instructor and department schemas.
Not all decompositions are good. Suppose we decompose
employee(ID, name, street, city, salary) =>

employee1 (ID, name)
employee2 (name, street, city, salary)
The problem arises when we have two employees with the same name
The next slide shows how we lose information -- we cannot reconstruct the original
employee relation -- and so, this is a lossy decomposition.
Minimal Cover:
An attribute in a functional dependency is considered extraneous attribute if we can remove

it without changing the closure of the set of dependencies. Formally, given F, the set of
functional dependencies and a functional dependency X → A in F , attribute Y is extraneous
in X if Y is a subset of X, and F logically implies ( F - (X → A) ∪ { (X – Y) → A } )
A set of FDs is minimal if it satisfies the following conditions:

1. Every dependency in F has a single attribute for its RHS.
2. We cannot replace any dependency X → A in F with a dependency Y → A, where

Y is a proper-subset-of X and still have a set of dependencies that is equivalent to F.
(removing the redundancies via extraneous attributes on the LHS of a dependency)
3. We cannot remove any dependency from F and still have a set of dependencies that
is equivalent to F. (removing the redundancies via having a dependency that can be
inferred from the remaining FD s in F)
Functional dependencies:
The functional dependency (FD) is a relationship that exists between two attributes,
Where one attribute can directly or indirectly derived from the other attribute.
● Are used to specify formal measures of the "goodness" of relational designs
● And keys are used to define normal forms for relations
● Are constraints that are derived from the meaning and interrelationships of the data
attributes.
A functional dependency is a generalization of the notion of a key
X → Y holds if whenever two tuples have the same value for X, they must have the same
value for Y. For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then
t1[Y]=t2[Y]
Examples of Functional Dependencies constraints:
Social security number determines employee name

SSN → ENAME
Project number determines project name and location

PNUMBER → {PNAME, PLOCATION}
Employee ssn and project number determines the hours per week that the employee works
on the project
{SSN, PNUMBER} → HOURS
Types of functional dependency
1. Full functional Dependency

2. Partial Functional Dependency
3. Transitive Functional Dependency
4. Multi-Valued Dependency
Consider the following TEACH relation.

we can say that the FD: Text → Course may exist.
However, the FDs Teacher → Course, Teacher → Text, Course → Text are ruled out.
(This is because the element on the LHS have similar values in the current column, but are
not the same on the RHS column)
Decomposition:
The only way to avoid the repetition-of-information problem in a schema is to decompose it

into two schemas. One must be careful while doing this as this could lead to the loss of
interesting relationships.
e.g:
employee (ID, name, street, city, salary) INTO
employee1 (ID, name) + employee2 (name, street, city, salary)
The flaw in this decomposition arises from the possibility that the enterprise has two
employees with the same name.
Inference rules:
Def: An FD X → Y is inferred from or implied by a set of dependencies F specified on R if X
→ Y holds in every legal relation state r of R; that is, whenever r satisfies all the
dependencies in F, X → Y also holds in r.
Additional Rules:
1. IR1. (Reflexive) If Y is a subset-of X, then X → Y always holds.
2. IR2. (Augmentation) If X → Y, then XZ → YZ always holds.
3. IR3. (Transitive) If X → Y and Y → Z, then X → Z always holds.
4. Decomposition: If X → YZ, then X → Y and X → Z
5. Union/Additive: If X → Y and X → Z, then X → YZ
6. Pseudo-Transitivity: If X → Y and WY → Z, then WX → Z
Examples:
Consider the relation schema <R,F> where R= (ABCDEGHI) and dependencies
F= { AB->E,AG->J, BE->I, E ->G GI ->H }. Show that AB ->GH is derived by F.
Step Statement Explanation
AB ->E Given
E ->G Given
BE->I Given
GI ->H Given
AB ->G Transitivity on (1) and (2)
AB -> BE Augmentation (1) by B
AB -> I Transitivity on (6) and (3)
AB -> GI Union on (5) and (7)
AB -> H Transitivity on (8) and (4)
AB -> GH Union on (5) and (9)
Usage of Functional Dependencies:
We use functional dependencies to:

● To test relations to see if they are legal under a given set of functional
dependencies.
● If a relation r is legal under a set F of functional dependencies, we say
that r satisfies F.
● To specify constraints on the set of legal relations,We say that F holds
on R if all legal relations on R satisfy the set of functional dependencies
F.
Equivalence:
Two sets of FDs F and G are equivalent if:

Every FD in F can be inferred from G (G covers F), and
Every FD in G can be inferred from F (F covers G),
Examples:
F (Set of functional dependencies):
A -> B
B -> C
G (Set of functional dependencies):
B -> C
C -> A
In this example, let's check if F and G are equivalent:
F covers G:
B -> C is in F, so B+ includes C.
C -> A is in F, so C+ includes A.
G covers F:
B -> C is in G, so B+ includes C.
C -> A is in G, so C+ includes A
Closure:
Closure of a set F of FDs is the set F+ of all FDs that can be

inferred from F
Closure of a set of attributes X with respect to F is the set X+ of
all attributes that are functionally determined by X
X+ can be calculated by repeatedly applying IR1, IR2, IR3 using

the FDs in F
We can compute F+, the closure of F, by repeatedly applying Armstrong’s

Axioms, sound and complete rules:-
Reflexive rule: if β ⊆ α, then α → β
2. Augmentation rule: if α → β, then γ α → γ β
3. Transitivity rule: if α → β, and β → γ, then α → γ
Dependency Preseveration:
Testing functional dependency constraints each time the database is updated can be costly.It
is useful to design the database in a way that constraints can be tested efficiently.If testing a
functional dependency can be done by considering just one relation, then the cost of testing
this constraint is low.
Example:-
Consider a schema:
dept_advisor(s_ID, i_ID, department_name)
With function dependencies:
i_ID → dept_name
s_ID, dept_name → i_ID
In the above design we are forced to repeat the department name once for each time an
instructor participates in a dept_advisor relationship.
To fix this, we need to decompose dept_advisor
Any decomposition will not include all the attributes in
s_ID, dept_name → i_ID
Thus, the composition NOT be dependency preserving
1NF, 2NF, 3NF-Comparison:
Normalization: The process of decomposing unsatisfactory "bad"

relations by breaking up their attributes into smaller relations.
Normalization of Relations:
2NF, 3NF, BCNF based on keys and FDs of a relation
schema
4NF based on keys, multi-valued dependencies : MVDs;
5NF based on keys, join dependencies : JDs
Normalization is carried out in practice so that the resulting

designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable

when the constraints on which they are based are hard to understand or
to detect.
The database designers need not normalize to the highest

possible normal form (usually up to 3NF and BCNF. 4NF rarely used in
practice.)
Denormalization: The process of storing the join of higher normal

form relations as a base relations,which is in a lower normal
form.(Usually done to reduce latency due to complex joins).
A superkey of a relation schema R = {A1, A2, ...., An} is a set of

attributes S subset-of R with the property that no two tuples t1 and t2 in
any legal relation state r of R will have t1[S] = t2[S].
If a relation schema has more than one key, each is called a
candidate key.
One of the candidate keys is arbitrarily designated to be the
primary key, and the others are called secondary keys.
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attribute—that is, it is not
a member of any candidate key.
First Normal Form:

Disallows:
Composite attributes.
Multivalued attributes.
Nested relations; attributes whose values for an individual
tuple are non-atomic.
Second Normal Form:

Uses the concepts of FDs, primary key.
Definitions:
Prime attribute: An attribute that is member of the primary
key K .
Full functional dependency: a FD Y -> Z where removal of
any attribute from Y means the FD does not hold any more.
Examples:
{SSN, PNUMBER} -> HOURS is a full FD since neither SSN
-> HOURS nor PNUMBER -> HOURS hold.
{SSN, PNUMBER} -> ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds.
A relation schema R is in second normal form (2NF) if every

non-prime attribute A in R is fully functionally dependent on the primary
key.
R can be decomposed into 2NF relations via the process of
2NF normalization or “second normalization”.
Third Normal Form:
definition:
Transitive functional dependency: a FD X -> Z that can
be derived from two FDs X -> Y and Y -> Z.
examples:
SSN -> DMGRSSN is a transitive FD
Since SSN -> DNUMBER and DNUMBER ->
DMGRSSN hold
SSN -> ENAME is non-transitive
Since there is no set of attributes X where SSN -> X
and X -> ENAME
A relation schema R is in third normal form (3NF) if it is in

2NF and no non-prime attribute A in R is transitively
dependent on the primary key.
R can be decomposed into 3NF relations via the process of
3NF normalization.
Any attribute involved in a candidate key is a prime attribute.

All other attributes are called non-prime attributes.
BCNF and Higher Normal Forms:
BCNF (Boyce-Codd Normal Form):

A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X → A holds in R, then X is a
superkey of R .
Each normal form is strictly stronger than the previous
one
Every 2NF relation is in 1NF.
Every 3NF relation is in 2NF.
Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
Hence BCNF is considered a stronger form of 3NF
The goal is to have each relation in BCNF (or 3NF)
Two FDs exist in the relation TEACH:

fd1: { student, course} -> instructor
fd2: instructor -> course
{student, course} is a candidate key for this relation and that
the dependencies shown follow the pattern in Figure 14.13
(b).
So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to
meet this property, while possibly forgoing the preservation
of all functional dependencies in the decomposed relations.
Test for checking non-additivity of Binary Relational

Decompositions:
Binary Decomposition: Decomposition of a
relation R into two relations.
PROPERTY NJB (non-additive join test for binary
decompositions): A decomposition D = {R1, R2} of R has the lossless
join property with respect to a set of functional dependencies F on R if
and only if either
The f.d. ((R1 ∩ R2) → (R1- R2)) is in F+, or
The f.d. ((R1 ∩ R2) → (R2 - R1)) is in F+.
If you apply the NJB test to the 3 decompositions

of the TEACH relation:
D1 gives Student → Instructor or Student →
Course, none of which is true.
D2 gives Course → Instructor or Course →
Student, none of which is true.
However, in D3 we get Instructor → Course or
Instructor → Student. Since Instructor → Course is indeed true, the NJB
property is satisfied and D3 is determined as a non-additive (good)
decomposition.
Important points for BCNF:
Every binary relation (a relation with only two attributes) is
always in BCNF.
BCNF is free from redundancies arising out of functional
dependencies (zero redundancy).
A relation with only trivial functional dependencies is always
in BCNF. In other words, a relation with no non-trivial functional
dependencies is always in BCNF.
BCNF decomposition is always lossless but not always
dependency preserving.
Sometimes, going for BCNF may not preserve functional
dependencies. So, go for BCNF only if the lost functional dependencies
are not required else normalize till 3NF only.
There exist many more normal forms even after BCNF like
4NF and more.
But in the real world database systems, it is generally not
required to go beyond BCNF.
Multivalued Dependencies and Fourth Normal Form:

Definition:
A multivalued dependency (MVD) X ->> Y specified on
relation schema R, where X and Y are both subsets of R, specifies
the following constraint on any relation state r of R: If two tuples t1
and t2 exist in r such that t1[X] = t2[X], then two tuples t3 and t4
should also exist in r with the following properties, where we use Z
to denote (R - (X υ Y)):
t3[X] = t4[X] = t1[X] = t2[X].
t3[Y] = t1[Y] and t4[Y] = t2[Y].
t3[Z] = t2[Z] and t4[Z] = t1[Z].
An MVD X —>> Y in R is called a trivial MVD if (a) Y is a subset of

X, or (b) X υ Y = R.
Definition: A relation schema R is in 4NF with respect to a set of

dependencies F (that includes functional dependencies and multivalued
dependencies) if, for every nontrivial multivalued dependency X ->> Y in
F+, X is a superkey for R.
We can state the following points:

An all-key relation is always in BCNF since it has no FDs.
An all key relation such as the EMP relation in fig 14.15(a), which
has no FDs but has MVD Ename ->> Pname | Dname, is not in 4NF.
A relation that is not in 4NF due to non trivial MVD must be
decomposed to convert it into a set of relations in 4NF.
The decomposition removes the redundancy caused by the MVD.
note: F+ is the (complete) set of all dependencies (functional or

multivalued) that will hold in every relation state r of R that satisfies F. It
is also called the closure of F.
Join Dependencies and Fifth Normal Form:
Definition: A join dependency (JD), denoted by JD(R1, R2, ...,

Rn), specified on relation schema R, specifies a constraint on the states
r of R.
The constraint states that every legal state r of R should have a
non-additive join decomposition into R1, R2, ..., Rn; that is, for every
such r we have
* (piR1(r), piR2(r), ..., piRn(r)) = r
note: an MVD is a special case of a JD where n = 2.
A join dependency JD(R1, R2, ..., Rn), specified on relation

schema R, is a trivial JD if one of the relation schemas Ri in JD(R1, R2,
..., Rn) is equal to R.
Definition: A relation schema R is in fifth normal form (5NF) (or

Project-Join Normal Form (PJNF)) with respect to a set F of functional,
multivalued, and join dependencies if,
for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+
(that is, implied by F),
every Ri is a superkey of R.
Discovering join dependencies in practical databases with

hundreds of relations is next to impossible. Therefore, 5NF is rarely used
in practice.
Chapter Summary:
Remember the following diagram which implies-
A relation in BCNF will surely be in all other normal forms.
A relation in 3NF will surely be in 2NF and 1NF.
A relation in 2NF will surely be in 1NF.
While determining the normal form of any given relation,

i)Start checking from BCNF
ii)This is because if it is found to be in BCNF, then it will
surely be in all other normal forms.
iii)If the relation is not in BCNF, then start moving towards the
outer circles and check for other normal forms in the order
they appear.
Database Transactions, ACID Properties, Storage

Structure
Introduction to Transactions
A collection of several operations on the database appears to be a

single unit from the point of view of the database user.However, within
the database system it consists of several operations.
Example: A transfer of funds from a checking account to a savings

account is a single operation from the customer’s standpoint, however,
within the database system it consists of several operations.
Collections of operations that form a single logical unit of work are

called transactions.
Transaction Concept
▪ A transaction is a unit of program execution that accesses and
possibly updates various data items.
▪ E.g., transaction to transfer $50 from account A to account B:

1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
▪ Two main issues to deal with:

• Failures of various kinds, such as hardware failures and
system crashes
• Concurrent execution of multiple transactions
Begin and end transaction statements
Specify transaction boundaries, all operations between is a part of a
single transaction.
● Read-only transaction :If the database operations in a

transaction do not update the database.
● Read-write transaction: If the database operations in a

transaction update the database. As well as retrieve.
A Simple Transactional Model
Transactions access data using two operations:
● read(X): which transfers the data item X from the database to a

variable, also called X, in a buffer in main memory belonging to the
transaction that executed the read operation.
● write(X): which transfers the value in the variable X in the
main-memory buffer of the transaction that executed the write to
the data item X in the database.
ACID Properties
A transaction is a very small unit of a program and it may contain

several low level tasks. A transaction in a database system must
maintain Atomicity, Consistency, Isolation, and Durability − commonly
known as ACID properties − in order to ensure accuracy, completeness,
and data integrity.
To preserve the integrity of data the database system must ensure:
▪ Atomicity. Either all operations of the transaction are properly

reflected in the database or none are. This “all-or-none” property is
referred to as atomicity.
▪ Consistency. Execution of a transaction in isolation preserves the

consistency of the database.
▪ Isolation. Although multiple transactions may execute

concurrently, each transaction must be unaware of other
concurrently executing transactions. Intermediate transaction
results must be hidden from other concurrently executed
transactions.
▪ Durability. After a transaction completes successfully, the changes

it has made to the database must persist, even if there are system
failures.
Example of Fund Transfer
Transaction to transfer $50 from account A to account B:

1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
❖ Atomicity requirement
➢ If the transaction fails after step 3 and before step 6, money
will be “lost” leading to an inconsistent database state
■ Failure could be due to software or hardware
➢ The system should ensure that updates of a partially
executed transaction are not reflected in the database
❖ Durability: Once the user has been notified that the transaction
has completed (i.e., the transfer of the $50 has taken place), the
updates to the database by the transaction must persist even if
there are software or hardware failures.
❖ Consistency:
➢ The sum of A and B is unchanged by the execution of the
transaction
In general, consistency requirements include
➢ Explicitly specified integrity constraints such as primary keys
and foreign keys
➢ Implicit integrity constraints
➢ A transaction must see a consistent database.
➢ During transaction execution the database may be
temporarily inconsistent.
➢ When the transaction completes successfully the database
must be consistent
❖ Isolation — if between steps 3 and 6, another transaction T2 is

allowed to access the partially updated database, it will see an
inconsistent database (the sum A + B will be less than it should
be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
Isolation can be ensured trivially by running transactions serially.

That is, one after the other.
However, executing multiple transactions concurrently has
significant benefits, as we will see later.
Storage Structure
To understand how to ensure the atomicity and durability properties of a

transaction, we must gain a better understanding of how the various
data items in the database may be stored and accessed.
● Volatile storage: Information residing in volatile storage does not
usually survive system crashes. Access to volatile storage is
extremely fast.
Examples: main memory and cache memory.
● Non-volatile storage: Information residing in non-volatile storage
survives system crashes. Non-volatile storage is slower than
volatile storage, particularly for random access.
Examples: magnetic disk and flash storage(for online storage) and
optical media and magnetic tapes(for archival storage)
● Stable storage: Information residing in stable storage is never
lost. Although stable storage is theoretically impossible to obtain,
it can be closely approximated by techniques that make data loss
extremely unlikely.
To implement stable storage, we replicate the information in

several non-volatile storage media (usually disk) with independent
failure modes.
Transaction Atomicity and Durability
• If we are to ensure the atomicity property, an aborted transaction

must have no effect on the state of the database. Thus, any
changes that the aborted transaction made to the database must
be undone, which is called as a rollback.
• Once a transaction has committed, we cannot undo its effects by

aborting it. The only way to undo the effects of a committed
transaction is to execute a compensating transaction.
• The recovery-management component of a database system can

support atomicity and durability by a variety of schemes.
Example: the shadow-database scheme,
• A transaction that wants to update the database

first creates a complete copy of the database.
• All updates are done on the new database copy,
leaving the original copy, the shadow copy,
untouched.
• If at any point the transaction has to be aborted,
the system merely deletes the new
Transaction State
A transaction must be in one of the following states:
• Active – the initial state; the transaction stays in this state while it
is executing
• Partially committed – after the final statement has been
executed.
• Failed -- after the discovery that normal execution can no longer
proceed.
• Aborted – after the transaction has been rolled back and the
database restored to its state prior to the start of the transaction.
Two options after it has been aborted:
• Restart the transaction
• Can be done only if no internal logical error
• Kill the transaction
• Committed – after successful completion.
• It can restart the transaction, but only if the transaction was
aborted as a result of some hardware or software error that was
not created through the internal logic of the transaction. A
restarted transaction is considered to be a new transaction.
• It can kill the transaction. It usually does so because of some

internal logical error that can be corrected only by rewriting the
application program, or because the input was bad, or because
the desired data were not found in the database
The System Log

● System log keeps track of transaction operations
● Sequential, append-only file
● Not affected by failure (except disk or catastrophic failure)
● Log buffer
○ Main memory buffer
○ When full, appended to end of log file on disk
● Log file is backed up periodically
● Undo and redo operations based on log possible
Transaction Isolation
Concurrent Executions
● Multiple transactions are allowed to run concurrently in the system.

Advantages are:
○ Increased processor and disk utilization, leading to better
transaction throughput
■ E.g., one transaction can be using the CPU while another
is reading from or writing to the disk
○ Reduced average response time for transactions: short
transactions need not wait behind long ones.
● Concurrency control schemes – mechanisms to achieve isolation

○ That is, to control the interaction among the concurrent
transactions in order to prevent them from destroying the
consistency of the database
Schedules
• Schedule – a sequences of instructions that specify the

chronological order in which instructions of concurrent
transactions are executed
• A schedule for a set of transactions must consist of all
instructions of those transactions
• Must preserve the order in which the instructions appear in
each individual transaction.
• A transaction that successfully completes its execution will have a
commit instructions as the last statement
• By default transaction assumed to execute commit
instruction as its last step
• A transaction that fails to successfully complete its execution will
have an abort instruction as the last statement
Serializability
▪ Basic Assumption – Each transaction preserves database

consistency.
▪ Thus, serial execution of a set of transactions preserves database

consistency.
▪ A (possibly concurrent) schedule is serializable if it is equivalent to a

serial schedule. Different forms of schedule equivalence give rise to
the notions of:
1. Conflict serializability
2. View serializability
▪ Our simplified schedules consist of only read and write

instructions.
Conflicting Instructions
▪ Instructions li and lj of transactions Ti and Tj respectively, conflict if

and only if there exists some item Q accessed by both li and lj, and
at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
▪ Intuitively, a conflict between li and lj forces a (logical) temporal

order between them.
▪ If li and lj are consecutive in a schedule and they do not conflict,

their results would remain the same even if they had been
interchanged in the schedule.
Conflict Serializability
▪ If a schedule S can be transformed into a schedule S’ by a series of

swaps of non-conflicting instructions, we say that S and S’ are
conflict equivalent.
▪ We say that a schedule S is conflict serializable if it is conflict

equivalent to a serial schedule
Schedule 3 can be transformed into Schedule 6, a serial schedule

where T2 follows T1, by series of swaps of non-conflicting
instructions. Therefore Schedule 3 is conflict serializable.
▪ Example of a schedule that is not conflict serializable:

▪ We are unable to swap instructions in the above schedule to obtain
either the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.
View Serializability
▪ Let S and S’ be two schedules with the same set of transactions. S

and S’ are view equivalent if the following three conditions are
met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then
in
schedule S’ also transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value
was
produced by transaction Tj (if any), then in schedule S’ also
transaction Ti must read the value of Q that was produced by
the
same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q)
operation in
schedule S must also perform the final write(Q) operation in
schedule S’.
▪ As can be seen, view equivalence is also based purely on reads

and writes alone.
▪ A schedule S is view serializable if it is view equivalent to a serial
schedule.
▪ Every conflict serializable schedule is also view serializable.
▪ Below is a schedule which is view-serializable but not conflict

serializable.
▪ What serial schedule is above equivalent to?
▪ Every view serializable schedule that is not conflict serializable has

blind writes.
Transaction Serializability, Isolation and

Atomicity
Testing for Serializability

▪ Consider some schedule of a set of transactions T1, T2, ..., Tn
▪ Precedence graph — a direct graph where the vertices are the

transactions (names).
▪ We draw an arc from Ti to Tj if the two transaction conflict, and Ti

accessed the data item on which the conflict arose earlier.
▪ We may label the arc by the item that was accessed.
▪ Example of a precedence graph:
Test for Conflict Serializability
▪ A schedule is conflict serializable if and only if its precedence graph

is acyclic.
▪ Cycle-detection algorithms exist which take order n2 time, where n

is the number of vertices in the graph.
• (Better algorithms take order n + e where e is the number of
edges.)
▪ If precedence graph is acyclic, the serializability order can be

obtained by a topological sorting of the graph.
• This is a linear order consistent with the partial order of the
graph.
• For example, a serializability order for Schedule A would be
T5 → T1 → T3 → T2 → T4
▪ Are there others?
Test for View Serializability

▪ The precedence graph test for conflict serializability cannot be used
directly to test for view serializability.
• Extension to test for view serializability has cost exponential in
the size of the precedence graph.
▪ The problem of checking if a schedule is view serializable falls in

the class of NP-complete problems.
• Thus, existence of an efficient algorithm is extremely unlikely.
▪ However practical algorithms that just check some sufficient

conditions for view serializability can still be used.
Concurrency Control
▪ A database must provide a mechanism that will ensure that all

possible schedules are
• either conflict or view serializable, and
• are recoverable and preferably cascadeless
▪ A policy in which only one transaction can execute at a time

generates serial schedules, but provides a poor degree of
concurrency
• Are serial schedules recoverable/cascadeless?
▪ Testing a schedule for serializability after it has executed is a little

too late!
▪ Goal – to develop concurrency control protocols that will assure

serializability.
▪ Schedules must be conflict or view serializable, and recoverable, for
the sake of database consistency, and preferably cascadeless.
▪ A policy in which only one transaction can execute at a time

generates serial schedules, but provides a poor degree of
concurrency.
▪ Concurrency-control schemes tradeoff between the amount of

concurrency they allow and the amount of overhead that they incur.
▪ Some schemes allow only conflict-serializable schedules to be

generated, while others allow view-serializable schedules that are
not conflict-serializable.
Concurrency Control vs Serializability Tests
▪ Concurrency-control protocols allow concurrent schedules, but

ensure that the schedules are conflict/view serializable, and are
recoverable and cascadeless .
▪ Concurrency control protocols do not examine the precedence

graph as it is being created
• Instead a protocol imposes a discipline that avoids
non-serializable schedules.
▪ Different concurrency control protocols provide different tradeoffs

between the amount of concurrency they allow and the amount of
overhead that they incur.
▪ Tests for serializability help us understand why a concurrency

control protocol is correct.
How Serializability is Used for Concurrency

Control
■ Being serializable is different from being serial
■ Serializable schedule gives benefit of concurrent execution
■ Without giving up any correctness
■ Difficult to test for serializability in practice
■ Factors such as system load, time of transaction submission,
and process
priority affect ordering of operations
■ DBMS enforces protocols
■ Set of rules to ensure serializability
Transaction Isolation and Atomicity
• If a transaction Ti fails, for whatever reason, we need to undo the

effect of this transaction to ensure the atomicity property of the
transaction.
• In a system that allows concurrent execution, the atomicity
property requires that any transaction Tj that is dependent on Ti
(i.e., Tj has read data written by Ti ) is also aborted.
• To achieve this, we need to place restrictions on the types of
schedules permitted in the system.
Recoverable Schedules
▪ Recoverable schedule — if a transaction Tj reads a data item

previously written by a transaction Ti , then the commit operation of
Ti appears before the commit operation of Tj.
▪ The following schedule is not recoverable

▪
▪ If T6 should abort, T7 would have read (and possibly shown to the

user) an inconsistent database state. Hence, database must ensure
that schedules are recoverable.
Cascading Rollbacks
▪ Cascading rollback – a single transaction failure leads to a series

of transaction rollbacks. Consider the following schedule where
none of the transactions has yet committed (so the schedule is
recoverable)
If T10 fails, T11 and T12 must also be rolled back.
▪ Can lead to the undoing of a significant amount of work
Cascadeless Schedule
▪ Cascadeless schedules — cascading rollbacks cannot occur;
• For each pair of transactions Ti and Tj such that Tj reads a
data item previously written by Ti, the commit operation of Ti
appears before the read operation of Tj.
▪ Every Cascadeless schedule is also recoverable
▪ It is desirable to restrict the schedules to those that are cascadeless
Transaction Isolation Levels
▪ Serializable — default
▪ Repeatable read — only committed records to be read.

• Repeated reads of same record must return same value.
• However, a transaction may not be serializable – it may find
some records inserted by a transaction but not find others.
▪ Read committed — only committed records can be read.

• Successive reads of record may return different (but
committed) values.
▪ Read uncommitted — even uncommitted records may be read.
Implementation of Isolation Levels
▪ Locking
• Lock on whole database vs lock on items
• How long to hold lock?
• Shared vs exclusive locks
▪ Timestamps
• Transaction timestamp assigned e.g. when a transaction
begins
• Data items store two timestamps
▪ Read timestamp
▪ Write timestamp
• Timestamps are used to detect out of order accesses
▪ Multiple versions of each data item

• Allow transactions to read from a “snapshot” of the database
Uses Of Transaction Management
▪ The DBMS is used to schedule the access of data concurrently. It

means that the user can access multiple data from the database
without being interfered with by each other. Transactions are used
to manage concurrency.
▪ It is also used to satisfy ACID properties.
▪ It is used to solve Read/Write Conflicts.
▪ It is used to implement Recoverability, Serializability, and
Cascading.
▪ Transaction Management is also used for Concurrency Control
Protocols and the Locking of data.
Transactions as SQL statements
▪ E.g., Transaction 1:
select ID, name from instructor where salary > 90000
▪ E.g., Transaction 2:
insert into instructor values ('11111', 'James', 'Marketing',
100000)
▪ Suppose
• T1 starts, finds tuples salary > 90000 using index and locks
them
• And then T2 executes.
• Do T1 and T2 conflict? Does tuple level locking detect the
conflict?
• Instance of the phantom phenomenon
▪ Also consider T3 below, with Wu’s salary = 90000

update instructor
set salary = salary * 1.1
where name = 'Wu’
▪ Key idea: Detect “predicate” conflicts, and use some form of

“predicate locking”
62.Concurrency control, Lock-based

protocols
What is Concurrency Control
● Concurrency control is the mechanism that allows multiple transactions
to run concurrently while preserving the consistency, isolation, and
integrity of the database.
● Isolation: Transactions should be isolated from each other, meaning one
transaction's changes should not be visible to others until it is committed.
● Consistency: The database should transition from one consistent state to
another after each transaction.
● Serializability: Transactions should appear to be executed in some serial
order, even if they are executed concurrently.
● Concurrency Issues:
○ Lost Updates: When two transactions try to update the same data
simultaneously, one may overwrite the changes made by the other.
○ Dirty Reads: A transaction reads uncommitted changes made by
another transaction.
○ Inconsistent Analysis: Inconsistencies may arise if multiple
transactions read the same data simultaneously.
● When conflicts occur, DBMSs may use strategies like blocking, aborting,
or delaying transactions to resolve conflicts and maintain database
integrity.
● Concurrency control plays a crucial role in ensuring the ACID
(Atomicity, Consistency, Isolation, Durability) properties of database
transactions.
Lock-Based Protocols
● Shared Lock (S):
○ A transaction holding a shared lock (S) on a data item can read the
data but cannot write to it.
○ Multiple transactions can hold shared locks on the same data item
concurrently for reading.
● Exclusive Lock (X):
○ A transaction holding an exclusive lock (X) on a data item can both
read and write to it.
○ Only one transaction at a time can hold an exclusive lock on a data
item.
● Compatibility between lock modes is determined by a compatibility
function. If a mode A is compatible with mode B, it means a transaction
requesting mode A can be granted the lock even if another transaction
holds mode B on the same data item..
● The compatibility matrix for shared and exclusive locks is shown below.
● Transactions.
○ A transaction requests a lock in an appropriate mode (either shared
or exclusive) on a data item before it can proceed with read or
write operations.
○ If a data item is already locked in an incompatible mode, the
requesting transaction is made to wait until the incompatible locks
are released.
○ Transactions must hold a lock on a data item as long as they access
that item to ensure data consistency and isolation.
○ Transactions can unlock a data item when they no longer need it.
● Locks are used to control access to data items, ensuring data consistency.
● Locking Strategies:
○ Locking can be delayed until the end of a transaction to avoid
inconsistencies.
○ Locks are granted by a concurrency-control manager.
○ The exact timing of lock grants is not critical.
● Deadlock occurs when transactions are waiting for each other's locks,
leading to a standstill. Deadlocks require a system to roll back one of the
transactions to resolve the situation.
● Deadlocks, while undesirable, are preferable to inconsistent states. This is
because deadlocks can be resolved by rolling back transactions, while
inconsistent states can cause real-world problems.
● Locking protocols define rules for when transactions can lock and unlock
data items.
● Locking protocols restrict the possible schedules of transactions.
● Conflict-serializable schedules are schedules that ensure isolation and
avoid conflicts between transactions.
● "Ti → Tj" denotes that transaction Ti precedes transaction Tj in a
schedule due to locking.
● Precedence is established when Ti holds a lock on a data item, and Tj
subsequently holds a different type of lock on the same data item.
● A schedule is considered legal under a locking protocol if it adheres to
the rules specified by the protocol.
● A locking protocol ensures conflict serializability if all legal schedules
produced by the protocol are conflict serializable.
● Conflict serializability implies that the precedence relationship (Ti → Tj)
is acyclic.
Granting of lock
● Transactions can request locks on data items in different modes, such as
shared-mode (read) or exclusive-mode (write). Locks requested in
conflicting modes cannot coexist. For example, an exclusive-mode lock
conflicts with a shared-mode lock.
● To grant a lock to a transaction Ti, certain conditions must be met:
○ No other transaction should currently hold a conflicting lock on the
same data item Q.
○ No other transaction that requested a lock on data item Q before Ti
should be waiting for that lock.
● Locks can be granted when there is no conflict between the requested
mode M and the existing locks. For instance, a shared-mode lock can be
granted as long as there are no exclusive-mode locks in place.
● The goal is to prevent transactions from waiting indefinitely and
potentially starving. Lock requests should not be blocked by lock requests
made after them. This means that older lock requests are given priority.
● When a transaction is continually blocked from acquiring a lock due to
newer transactions continually being granted the lock, it can be
considered starved. This situation can lead to a lack of progress for the
starved transaction.
● Transactions should release their locks as soon as they no longer need
them to allow other transactions to progress.
● This lock granting strategy ensures fairness by prioritizing older lock
requests, preventing newer requests from always taking precedence. This
way, all transactions have a chance to make progress.
● The system's concurrency-control manager or scheduler is responsible for
enforcing these rules and making decisions about when to grant or deny
lock requests.
● While this strategy prevents starvation, it's essential to strike a balance
between fairness and system performance. Allowing older transactions to
have absolute priority may not always be the most efficient approach.
● To enforce these rules effectively, the system may use queues to track and
manage lock requests, ensuring that older requests are processed before
newer ones.
Two Phase Locking

● The two-phase locking protocol consists of two phases: the growing
phase and the shrinking phase.
● In the growing phase, a transaction can acquire locks but cannot release
any lock.
● In the shrinking phase, a transaction can release locks but cannot obtain
any new locks.
● Initially, a transaction is in the growing phase and transitions to the
shrinking phase when it releases its first lock.
● The two-phase locking protocol ensures conflict serializability, which
means that all schedules generated using this protocol are
conflict-serializable.
● Lock points, representing the moment when a transaction has acquired its
final lock, can be used to order transactions in a serializability-preserving
manner.
● Two-phase locking does not guarantee freedom from deadlock.
Transactions may still deadlock if their lock requests create circular wait
conditions.
● Cascadeless schedules, where cascading rollbacks do not occur, are
desirable. However, two-phase locking does not guarantee cascadeless
schedules.
● To avoid cascading rollbacks, a variant called strict two-phase locking
requires exclusive-mode locks to be held until a transaction commits.
● This ensures that data written by an uncommitted transaction remains
locked in exclusive mode, preventing other transactions from reading it.
● Rigorous two-phase locking extends the strict variant, requiring all locks
to be held until a transaction commits.
● This guarantees that transactions can be serialized in the order they
commit.
● The basic two-phase locking protocol can be refined to allow lock
conversion, enabling shared locks to be upgraded to exclusive locks and
vice versa.
● Upgrading can occur only in the growing phase, and downgrading can
occur only in the shrinking phase.
● This flexibility allows for more concurrency while preserving conflict
serializability.
● Many database systems automatically generate lock and unlock
instructions based on read and write requests from transactions.
● For read operations, a shared lock is acquired before reading, and for
write operations, an exclusive lock is obtained before writing.
● All locks acquired by a transaction are released after the transaction
commits or aborts.
Lock manager Algorithm

● Lock Table
○ The lock manager maintains a data structure called the lock table.
○ For each data item that is currently locked, it maintains a linked list
of records.
○ Each record in the linked list corresponds to a lock request and
includes information about the requesting transaction and the
requested lock mode.
○ There is also an index on transaction identifiers to efficiently
determine the locks held by a given transaction.
● Lock Request Handling
○ When a lock request message arrives, if the data item is not
currently locked, the lock manager grants the request by adding a
new record to the linked list for that data item.
○ If the data item is already locked, the request is granted only if it is
compatible with the existing locks and all earlier requests have
been granted. Otherwise, the request is queued.
● Unlock Handling
○ When the lock manager receives an unlock message from a
transaction, it deletes the corresponding record for that data item in
the linked list.
○ It then checks if the next request in the list (if any) can now be
granted, following the same compatibility and queuing rules.
○ This process continues until no more requests can be granted.
● Transaction Abort Handling
○ If a transaction aborts. the lock manager deletes any waiting lock
requests made by the aborted transaction.
○ Once the database system has undone the effects of the aborted
transaction, it releases all locks held by that transaction.
● Freedom from Starvation
○ The algorithm guarantees freedom from starvation for lock
requests.
○ A request can never be granted while an earlier request is waiting
to be granted, ensuring fairness in lock allocation.
Graph-Based Protocol
● Need for protocols that are not two-phase, requiring additional information on
transaction data access.
● Partial ordering → imposed on data items to provide information on the order
of access.
● This partial ordering creates a directed acyclic graph, a database graph.
● Tree Protocol:
○ A simple protocol for conflict serializability using exclusive locks.
○ Rules for transactions:
○ The first lock can be on any data item.
○ Subsequent locks are allowed only if the parent of the data item is
currently locked.
○ Data items can be unlocked at any time.
○ A data item locked and unlocked by a transaction cannot be relocked.
● Conflict Serializability and Deadlock
○ The tree protocol guarantees conflict serializability and is
deadlock-free.
○ Ensuring recoverability and cascadelessness can involve holding
exclusive locks until the end of a transaction.
○ Alternatively, recording commit dependencies on uncommitted data
items to ensure recoverability.
● Advantages of tree protocol:
○ Deadlock-free.
○ Earlier unlocking can lead to shorter waiting times and increased
concurrency.
● Disadvantages:
○ Transactions may have to lock data items they do not access.
○ Increased locking overhead, potential for additional waiting time, and
decreased concurrency.
○ Locking the root of the tree reduces concurrency.
○ Some conflict-serializable schedules are not possible with the tree
protocol.
● The tree protocol is a concurrency control method for databases that ensures
conflict serializability and deadlock avoidance but may lead to increased
locking overhead and reduced concurrency in some cases.
63.Deadlock Handling
Deadlock Definition
● A system is in a deadlock state when a set of transactions are waiting for
each other, preventing any progress.
● Deadlock resolution typically involves transaction rollback.
Deadlock Prevention
● Two approaches: preventing cyclic waits and using preemption with
transaction rollbacks.
● Cyclic waits can be prevented by requiring transactions to lock all their
data items before execution, which can lead to low data item utilization.
● Alternatively, a total order of data items and two-phase locking can be
used.
● Preemption approach uses timestamps to control locking and decide
whether to wait or rollback.
Prevention Schemes Using Timestamps

1. Wait-Die Scheme (Nonpreemptive):
- Tj can wait for Ti if Ti's timestamp is smaller.
- If Ti's timestamp is larger, Ti is rolled back.
2. Wound-Wait Scheme (Preemptive):

- Tj is rolled back if Ti's timestamp is larger.
- If Ti's timestamp is smaller, Ti can wait.
Challenges with Timestamp Schemes

- Unnecessary rollbacks can occur.
Lock Timeout Approach

● Transactions have a maximum waiting time for a lock request.
● If the lock is not granted within this time, the transaction times out, rolls
back, and restarts.
● Suitable for short transactions and situations with potential deadlocks.
● Difficult to determine the optimal timeout, leading to delays or premature
rollbacks.
● May result in starvation.
● Falls between deadlock prevention and detection and recovery.
Deadlock Detection and Recovery

● In the absence of deadlock prevention, a system needs a detection and
recovery scheme.
● The system periodically checks for deadlock using a detection algorithm.
● To recover from a deadlock, the system must maintain information about
data item allocation and outstanding data item requests.
Deadlock Detection
● Deadlocks can be represented with a directed graph called a wait-for

graph (G = (V, E)).
● V: Set of vertices representing transactions.
● E: Set of edges, where each edge is an ordered pair Ti → Tj, indicating Ti
is waiting for Tj to release a needed data item.
● Deadlock exists if the wait-for graph contains a cycle; transactions in the
cycle are deadlocked.
● System maintains the wait-for graph and periodically invokes a cycle
detection algorithm.
Timing of Detection Algorithm

● Frequency of invoking the detection algorithm depends on:
○ How often deadlocks occur.
○ The number of transactions affected by a deadlock.
● Frequent deadlocks or many affected transactions require more frequent
detection algorithm invocation.
Deadlock Detection and Recovery
● When a deadlock is detected in a system, recovery actions are necessary
to resolve it.
● The most common recovery solution involves rolling back one or more
transactions to break the deadlock.
1. Selection of a Victim
a. When a deadlock is detected, the system must decide which
transaction(s) to roll back.
b. The goal is to minimize the cost associated with the rollback.
c. Factors affecting the cost of a rollback include:
d. How far the transaction has progressed in its execution.
e. The number of data items the transaction has used.
f. The number of data items needed for the transaction to complete.
g. The number of transactions involved in the rollback.
2. Rollback
a. Once a victim transaction is selected, the system must determine
how far to roll it back.
b. Total rollback involves aborting the transaction and restarting it,
but partial rollback is more efficient.
c. Partial rollback requires the system to maintain information about
the state of all running transactions.
d. The deadlock detection mechanism decides which locks the
selected transaction must release to break the deadlock.
e. The transaction is rolled back to the point where it obtained the
first of these locks, undoing all actions taken after that point.
f. The recovery mechanism must support partial rollbacks, and
transactions must be able to resume execution after such a rollback.
3. Starvation
a. In a system where victim selection is primarily based on cost
factors, the same transaction may be repeatedly chosen as a victim.
b. This can lead to starvation, where the transaction is never able to
complete its designated task.
c. To mitigate starvation, it's important to ensure that a transaction
can only be selected as a victim a finite number of times.
d. One common approach is to include the number of rollbacks in the
cost factor when selecting victims.
66. Time-stamp and Validation based

Protocols
Timestamps
● With each transaction Ti in the system, we associate a unique fixed
timestamp, denoted by TS(Ti).
● This timestamp is assigned by the database system before the transaction
Ti starts execution.
● Newer transactions have timestamps strictly greater than earlier ones.
● Timestamp could be based on a logical counter or system clock to ensure
uniqueness.
● The timestamps of the transactions determine the serializability order.
● If TS(Ti) < TS(Tj), then the system must ensure that the produced
schedule is equivalent to a serial schedule in which transaction Ti appears
before transaction Tj
The timestamp ordering (TSO) protocol

● The Timestamp-Ordering Protocol ensures that conflicting read and write
operations are executed in timestamp order, preventing data
inconsistencies.
● For a read operation in transaction Ti, it is rejected if TS(Ti) <
W-timestamp(Q) because it attempts to read an already overwritten value.
If TS(Ti) is greater than or equal to W-timestamp(Q), the read operation
is executed.
● For a write operation in transaction Ti, it is rejected if
TS(Ti)<R-timestamp(Q) because the value it is producing was previously
needed. It is also rejected if TS(Ti) < W-timestamp(Q) because Ti is
trying to write an obsolete value of Q. Otherwise, the write operation is
executed.
● If a transaction is rolled back, it is assigned a new timestamp and
restarted.
● The Timestamp-Ordering Protocol ensures conflict serializability by
processing conflicting operations in timestamp order.
● Deadlock is avoided because no transaction ever waits.
● However, long transactions may face starvation if short conflicting
transactions cause repeated restarts, in which case conflicting transactions
need to be temporarily blocked.
● The protocol can generate schedules that are not recoverable, but
recoverability can be ensured by performing all writes at the end of a
transaction or by tracking uncommitted writes and allowing a transaction
to commit only after the commit of any transaction that wrote a value it
read.
● When applied only to tuples, the protocol is vulnerable to phantom
problems, but it can be extended to include relation metadata and index
data or modified to treat index nodes as data items with associated
timestamps to avoid phantom problems and ensure serializability, even
with predicate reads.
Thomas' write rule

● The standard timestamp-ordering protocol’s drawback is in handling
concurrency and unnecessary rollbacks.
● The modification allows for the ignoring of certain write operations when
specific conditions are met, resulting in schedules that may not be conflict
serializable but are still correct.
● Thomas' write rule consists of three key rules for handling write
operations:
○ If TS(Ti) < R-timestamp(Q), the write operation is rejected, and Ti
is rolled back.
○ If TS(Ti) < W-timestamp(Q), the write operation can be ignored as
it attempts to write an obsolete value.
○ If neither of the above conditions is met, the system executes the
write operation and updates W-timestamp(Q) to TS(Ti).
● The main difference between these rules and the standard
timestamp-ordering protocol is in the second rule, which allows for the
ignoring of obsolete writes.
● Ignoring obsolete writes under Thomas' write rule enables the generation
of schedules that are view serializable but not necessarily conflict
serializable.
● View serializability is used to describe schedules that are allowed by
Thomas' write rule and can provide more concurrency compared to other
protocols like two-phase locking, tree protocol, or standard
timestamp-ordering.
● Recoverability
○ The protocol can generate schedules that are not recoverable.
○ Solution 1:
■ A transaction is structured such that its writes are all
performed at the end of its processing
■ All writes of a transaction form an atomic action; no
transaction may execute while a transaction is being written
■ A transaction that aborts is restarted with a new timestamp
○ Solution 2:
■ Limited form of locking: wait for data to be committed
before reading it
○ Solution 3:
■ Use commit dependencies to ensure recoverability
Validation-Based Protocols
● Validation-Based Protocols are introduced as an alternative to traditional
concurrency control schemes, particularly in cases where a majority of
transactions are read-only, resulting in fewer conflicts.
● These protocols aim to minimise the overhead imposed by concurrency
control and transaction delays, especially in scenarios where many
transactions can be executed without supervision and still maintain a
consistent system state.
● The main challenge is that it's difficult to predict in advance which
transactions will conflict with each other, requiring a system monitoring
scheme to gain such knowledge.
● The validation protocol consists of three distinct phases for each
transaction (Ti):
○ Read phase: During this phase, Ti executes and reads values from
data items, storing them in local variables. It performs all write
operations on temporary local variables without affecting the actual
database.
○ Validation phase: A validation test is applied to Ti during this phase
to determine whether it can proceed to the write phase without
violating serializability. If Ti fails the validation test, it is aborted.
○ Write phase: If Ti passes the validation test, the temporary local
variables containing the results of any write operations are copied
to the actual database. Read-only transactions skip this phase.
● Transactions must follow the prescribed order of these phases, but
concurrent transactions can interleave their phases.
● To conduct the validation test, each transaction is associated with three
timestamps:
○ StartTS(Ti): The time when Ti begins its execution.
○ ValidationTS(Ti): The time when Ti completes its read phase and
enters the validation phase.
○ FinishTS(Ti): The time when Ti completes its write phase.
● These timestamps help in determining when each phase of a transaction
occurs and are crucial for applying the validation test to ensure
serializability.
● The serializability order in the validation-based protocol is determined
using the timestamp ValidationTS(Ti).
● To ensure serializability, the following conditions must be met:
○ FinishTS(Tk) < StartTS(Ti) for all Tk with TS(Tk) < TS(Ti). This
condition ensures that Tk finishes execution before Ti starts,
preserving the serializability order.
○ The set of data items written by Tk does not intersect with the set
of data items read by Ti, and Tk completes its write phase before Ti
starts its validation phase (StartTS(Ti) < FinishTS(Tk) <
ValidationTS(Ti)). This condition ensures that the writes of Tk and
Ti do not overlap and maintains the serializability order.
● The validation scheme automatically prevents cascading rollbacks, as
actual writes occur only after a transaction commits.
● Starvation of long transactions can occur due to repeated restarts caused
by short conflicting transactions. To prevent starvation, conflicting
transactions should be temporarily blocked to allow long transactions to
complete.
● The validation conditions only apply to transactions Ti that finished after
T started and are serialized before T. Transactions that finished before T
started or those serialized after T can be ignored in the validation tests.
● This validation scheme is called an optimistic concurrency-control
scheme because transactions execute optimistically, assuming they will
complete and validate at the end, as opposed to locking and
timestamp-ordering schemes, which are pessimistic and force waits or
rollbacks upon conflict detection.
● Using TS(Ti) = StartTS(Ti) instead of ValidationTS(Ti) would result in a
situation where a transaction Ti enters the validation phase before another
transaction Tj with TS(Tj) < TS(Ti). This could cause a delay in Ti's
validation, as it would have to wait for Tj to complete. Using
ValidationTS avoids this problem.
68-Recovery System, Failure Classification, Storage
RECOVERY SYSTEM
A computer system, like any other device, is subject to failure from a variety of
causes such as disk crashes, power outages, software errors, a fire in the
machine room, and even sabotage.
In any failure, information may be lost. Therefore, the database system must
take action in advance to preserve the atomicity and durability properties of
transactions.
An integral part of a database system is a recovery scheme that can restore the
database to the consistent state that existed before the failure.
The recovery scheme must support high availability by keeping a synchronized
backup copy of the database for use in case of machine failure or maintenance.
FAILURE CLASSIFICATION
There are various types of failure that may occur in a system, each of which
needs to be
dealt with in a different manner.
In this chapter, we shall consider only the following types of failure:

1)Transaction Failure
2)System Crash
3)Disk Failure
TRANSACTION FAILURE
1) Transaction Failure:
There are two types of errors that may cause a transaction to fail:
● Logical error
● System error
Logical error: The transaction can no longer continue with its normal
execution
because of some internal condition, such as bad input, data not found, overflow,
or resource limit exceeded.
System error: The system has entered an undesirable state (e.g., deadlock), as
a result of which a transaction cannot continue with its normal execution. The
transaction, however, can be re-executed at a later time.
SYSTEM CRASH
System crash: A power failure or other hardware or software failure causes the
system to crash.
There are three potential causes for the loss of volatile storage and interruption
of transaction processing: hardware malfunction, database software bug, or
operating system issue. However, the content of non-volatile storage remains
unaffected.
● Fail-stop assumption: The assumption that hardware errors and bugs in
the software bring the system to a halt, but do not corrupt the non-volatile
storage contents, is known as the fail-stop assumption.
Well-designed systems have numerous internal checks, at the hardware and the
software level, that bring the system to a halt when there is an error.
Hence, the fail-stop assumption is a reasonable one.
DISK FAILURE
Disk failure:
A head crash or similar disk failure destroys all or part of the disk
storage
● A disk block loses its content as a result of either a head crash or

failure during a data-transfer operation. Copies of the data on other
disks, or archival backups on tertiary media, such as DVDs or
tapes, are used to recover from the failure.
● Destruction is assumed to be detectable: disk drives use
checksums to detect failures
To determine how the system should recover from failures, we need to

identify the failure modes of those devices used for storing data.
Next, we must consider how these failure modes affect the contents of
the database.
We can then propose algorithms to ensure database consistency and
transaction atomicity despite failures.
These algorithms, known as recovery algorithms, have two parts:
1. Actions are taken during normal transaction processing to ensure that
enough information exists to allow recovery from failures.
2. Actions taken after a failure to recover the database contents to a
state that ensures database consistency, transaction atomicity, and
durability.
STORAGE
The various data items in the database may be stored and accessed in a
number of different storage media. we know that storage media can be
distinguished by their relative speed, capacity, and resilience against
failure.
We identified three categories of storage:

1. Volatile storage
2. Non-Volatile storage
3. Stable storage
Stable storage or, more accurately, an approximation thereof, plays a
critical role in recovery algorithms.
Volatile storage: Does not survive system crashes
Examples: main memory, cache memory
Non-volatile storage: Survives system crashes
Examples: disk, tape, flash memory, non-volatile RAM
But may still fail, losing data
Stable storage:
A mythical form of storage that survives all failures
Approximated by maintaining multiple copies on distinct non-volatile
media
Stable-Storage Implementation
Maintain multiple copies of each block on separate disks

Copies can be at remote sites to protect against disasters such as fire
or flooding.
Failure during data transfer can still result in inconsistent copies: Block
transfer can result in
Successful completion: The transferred information arrived safely at its
destination.
Partial failure: A failure occurred in the midst of a transfer, and the
destination block has incorrect information.
Total failure: The failure occurred sufficiently early during the transfer
that the destination block remains intact. (destination block was never
updated)
Protecting storage media from failure during data transfer (one solution):
Execute the output operation as follows : (assuming two copies of
each block)
1. Write the information onto the first physical block.
2. When the first write successfully completes, write the same
information onto the second physical block.
3. The output is completed only after the second write successfully
completes.
Copies of a block may differ due to failure during output operation.

To recover from failure:
1. First, find inconsistent blocks:
Expensive solution: Compare the two copies of every disk block.
Better solution:
Record in-progress disk writes on non-volatile storage (Flash,
Non-volatile RAM, or special area of disk).
Use this information during recovery to find blocks that may be
inconsistent, and only compare copies of these.
Used in hardware RAID systems
If either copy of an inconsistent block is detected to have an error (bad

checksum), overwrite it with the other copy. If both have no errors but
are different, overwrite the second block by the first block.
Note: Refer to the textbook for more details on how to implement stable
storage
Data Access
Physical blocks are those blocks residing on the disk.

Buffer blocks are the blocks residing temporarily in the main memory.
Block movements between disk and main memory are initiated through
the following two operations:
input (B) transfers the physical block B to the main memory.
output (B) transfers the buffer block B to the disk and replaces the
appropriate physical block there.
We assume, for simplicity, that each data item fits in, and is stored inside
a single block.
Each transaction Ti has its private work area in which local copies of all
data items accessed and updated by it are kept.
Ti's local copy of a data item X is called xi.
Transferring data items between system buffer blocks and their private
work area done by:
read(X) assigns the value of data item X to the local variable xi.
write(X) assigns the value of local variable xi to data item {X} in the
buffer block.
Note: output(BX) need not immediately follow write(X). The system can
perform the output operation when it deems fit.
Transactions
Must perform read(X) before accessing X for the first time (subsequent
reads can be from local copy)
write(X) can be executed at any time before the transaction commits
Example of Data Access
69_70-Recovery Atomicity, Recovery Algorithm
Recovery and Atomicity

To ensure atomicity despite failures, we first output information describing the
modifications to stable storage without modifying the database itself.
We study log-based recovery mechanisms in detail
We first present key concepts
And then present the actual recovery algorithm
Less used alternatives: shadow-copy and shadow-paging (brief details in the
book Database system concepts seventh edition – page: 914)
Log Records
The most widely used structure for recording database

modifications is the log. The log is a sequence of log records,
recording all the update activities in the database.
There are several types of log records. An update log record
describes a single database write. It has these fields:
• Transaction identifier, which is the unique identifier of the
transaction that per-
formed the write operation.
• Data-item identifier, which is the unique identifier of the data
item written. Typically, it is the location on disk of the data item,
consisting of the block identifier of the block on which the data item
resides and an offset within the block.
• Old value, which is the value of the data item prior to the
write.
• New value, which is the value that the data item will have
after the write.
We represent an update log record as <Ti, Xj, V1, V2>,

indicating that transaction Ti has performed a write on data item Xj.
Xj had value V1 before the write and has value V2 after the write.
Other special log records exist to record significant events during
transaction processing, such as the start of a transaction and the
commit or abort of a transaction. Among the types of log records
are:
• <Ti start>. Transaction Ti has started.
• <Ti commit>. Transaction Ti has committed.
• <Ti abort>. Transaction Ti has aborted.
Log-based Recovery
A log is a sequence of log records. The records keep information about
updated activities on the database.
The log is kept in stable storage
When transaction Ti starts, it registers itself by writing a <Ti start> log
record
Before Ti executes write(X), a log record <T, X, V1, V2> is written,
where V1 is the value of X before the write (the old value), and V2 is the
value to be written to X (the new value).
When Ti finishes its last statement, the log record <Ti commit> is written.
Two approaches using logs
Immediate database modification
Deferred database modification.
Database Modification
As we noted earlier, a transaction creates a log record prior to modifying
the database.
The log records allow the system to undo changes made by a
transaction in the event that the transaction must be aborted; they allow
the system also to redo changes made by a transaction if the transaction
has committed but the system crashed before those changes could be
stored in the database on disk. In order for us to understand the role of
these log records in recovery, we need to consider the steps a
transaction takes in modifying a data item:
1. The transaction performs some computations in its own private part of
main memory.
2. The transaction modifies the data block in the disk buffer in main
memory holding the data item.
3. The database system executes the output operation that writes the
data block to disk.
Immediate Database Modification
The immediate-modification scheme allows updates of an uncommitted
transaction to be made to the buffer, or the disk itself before the
transaction commits
Update log record must be written before the database item is written
We assume that the log record is output directly to stable storage
(Will see later that how to postpone log record output to some extent)
Output of updated blocks to disk can take place at any time before or
after the transaction commit
Order in which blocks are output can be different from the order in which
they are written.
The deferred-modification scheme performs updates to buffer/disk only
at the time of transaction commit
Simplifies some aspects of recovery
But has overhead of storing a local copy
Immediate Database Modification Example
A recovery algorithm must take into account a variety of factors,

including:
• The possibility that a transaction may have committed although some
of its database modifications exist only in the disk buffer in main memory
and not in the database on disk.
• The possibility that a transaction may have modified the database while
in the active state and, as a result of a subsequent failure, may need to
abort.
Because all database modifications must be preceded by the creation of
a log record, the system has available both the old value prior to the
modification of the data item and the new value that is to be written for
the data item. This allows the system to perform undo and redo
operations as appropriate.
• The undo operation using a log record sets the data item specified in
the log record to the old value contained in the log record.
• The redo operation using a log record sets the data item specified in
the log record to the new value contained in the log record.
Concurrency Control and Recovery

With concurrent transactions, all transactions share a single disk buffer
and a single log
A buffer block can have data items updated by one or more transactions
We assume that if a transaction Ti has modified an item, no other
transaction can modify the same item until Ti has committed or aborted
i.e., the updates of uncommitted transactions should not be visible to
other transactions
Otherwise, how to perform undo if T1 updates A, then T2 updates A and
commits, and finally T1 has to abort?
Can be ensured by obtaining exclusive locks on updated items and
holding the locks till the end of the transaction (strict two-phase locking)
Log records of different transactions may be interspersed in the log.
Transaction Commit
A transaction is said to have committed when its commit log record is

output to stable storage
All previous log records of the transaction must have been output
already
Writes performed by a transaction may still be in the buffer when the
transaction commits and may be output later
Undo and Redo of Transactions
undo(Ti) -- restores the value of all data items updated by Ti to their old
values, going backward from the last log record for Ti
Each time a data item X is restored to its old value V a special log
record <Ti , X, V> is written out
When undo of a transaction is complete, a log record
<Ti abort> is written out.
redo(Ti) -- sets the value of all data items updated by Ti to the new
values, going forward from the first log record for Ti
No logging is done in this case
Recovering from Failure

When recovering after failure:
Transaction Ti needs to be undone if the log
Contains the record <Ti start>,
But does not contain either the record <Ti commit> or <Ti abort>.
Transaction Ti needs to be redone if the log
Contains the records <Ti start>
And contains the record <Ti commit> or <Ti abort>
Suppose that transaction Ti was undone earlier and the <Ti abort>
record was written to the log, and then a failure occurs,
On recovery from failed transaction Ti is redone
Such a redo redoes all the original actions of transaction Ti including the
steps that restored old values
Known as repeating history
Seems wasteful, but simplifies recovery greatly
Immediate Database Modification Recovery

Example
Below we show the log as it appears at three instances
of time.
Recovery actions in each case above are:

(a) undo (T0): B is restored to 2000 and A to 1000, and log records <T0,
B, 2000>, <T0, A, 1000>, <T0, abort> are written out
(b) redo (T0) and undo (T1): A and B are set to 950 and 2050 and C is
restored to 700. Log records <T1, C, 700>, <T1, abort> are written out.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
Checkpoints
When a system crash occurs, we must consult the log to determine
those transactions that need to be redone and those that need to be
undone. In principle, we need to search the entire log to determine this
information. There are two major difficulties with this approach:
1. The search process is time-consuming.
2. Most of the transactions that, according to our algorithm, need
to be redone have
already written their updates into the database. Although redoing them
will cause no harm, it will nevertheless cause recovery to take longer.
To reduce these types of overhead, we introduce checkpoints.
Streamline recovery procedure by periodically performing checkpointing

Output all log records currently residing in main memory onto stable
storage.
Output all modified buffer blocks to the disk.
Write a log record < checkpoint L> onto stable storage where L is a list
of all transactions active at the time of checkpoint.
All updates are stopped while doing checkpointing
During recovery we need to consider only the most recent transaction Ti
that started before the checkpoint and transactions that started after Ti.
Scan backward from end of the log to find the most recent <checkpoint
L> record
Only transactions that are in L or started after the checkpoint need to be

redone or undone
Transactions that committed or aborted before the checkpoint already
have all their updates output to stable storage.
Some earlier part of the log may be needed for undo operations
Continue scanning backwards till a record <Ti start> is found for every
transaction Ti in L.
Parts of the log prior to the earliest <Ti start> record above are not
needed for recovery and can be erased whenever desired.
The requirement that transactions must not perform any updates to
buffer blocks or to the log during checkpointing can be bothersome,
since transaction processing has to halt while a checkpoint is in
progress. A fuzzy checkpoint is a checkpoint where transactions are
allowed to perform updates even while buffer blocks are being written
out. Section 19.5.4 describes fuzzy-checkpointing schemes. Later in
Section 19.9 we describe a checkpoint scheme that is not only fuzzy, but
does not even require all modified buffer blocks to be output to disk at
the time of the checkpoint.
Example of Checkpoints
T1 can be ignored (updates already output to disk due to checkpoint)
T2 and T3 redone.
T4 undoneT1 can be ignored (updates already output to disk due to
checkpoint)
T2 and T3 redone.
T4 undone
Recovery Algorithm
Logging (during normal operation):

<Ti start> at transaction start
<Ti, Xj, V1, V2> for each update, and
<Ti commit> at transaction end
Transaction rollback (during normal operation)
Let Ti be the transaction to be rolled back
Scan log backwards from the end, and for each log record of Ti of the
form <Ti, Xj, V1, V2>
Perform the undo by writing V1 to Xj,
Write a log record <Ti , Xj, V1>
such log records are called compensation log records
Once the record <Ti start> is found stop the scan and write the log
record <Ti abort>
Recovery from failure: Two phases

Redo phase: replay updates of all transactions, whether they
committed, aborted, or are incomplete
Undo phase: undo all incomplete transactions
Redo phase:
1. Find the last <checkpoint L> record, and set undo-list to L.
2. Scan forward from above <checkpoint L> record
Whenever a record <Ti, Xj, V1, V2> or <Ti, Xj, V2> is found, redo it by
writing V2 to Xj
Whenever a log record <Ti start> is found, add Ti to undo-list
Whenever a log record <Ti commit> or <Ti abort> is found, remove Ti
from undo-list
Undo phase:
Scan log backwards from end
Whenever a log record <Ti, Xj, V1, V2> is found where Ti is in undo-list
perform the same actions as for transaction rollback:
1. perform undo by writing V1 to Xj.
2.write a log record <Ti , Xj, V1>
2) Whenever a log record <Ti start> is found where Ti is in undo-list,
1. Write a log record <Ti abort>
2. Remove Ti from undo-list
3) Stop when undo-list is empty
1. i.e., <Ti start> has been found for every transaction in undo-list
After undo phase completes, normal transaction processing can
commence
Example of Recovery Algorithm
Optimizing Commit Processing
Transaction commit processing is a critical aspect of database systems,
as it ensures data consistency and durability. However, the traditional
approach of forcing log records to disk for each transaction can lead to
significant overhead. To address this challenge and enhance transaction
commit performance, database systems employ various optimization
techniques, including group commit.
Group Commit Technique:
Group commit is a technique that optimizes the process of committing
transactions by delaying log writes until a group of transactions is ready
to be committed. Instead of forcing the log as soon as an individual
transaction completes, the system waits until several transactions have
completed or a predefined time has passed. It then commits this group
of transactions together, resulting in fewer log write operations per
committed transaction.
Benefits of Group Commit:
Reduced Overhead: Group commit significantly reduces the overhead
caused by frequent log writes, improving the overall performance of the
database system.
Increased Transaction Rate: The careful choice of group size and
maximum waiting time ensures that blocks are efficiently utilized,
allowing more transactions to be committed per second. The increase in
transaction rate depends on the storage medium.
a. Hard Disk: Without group commit, hard disk-based logging allows a
maximum of 100 to 200 transactions per second. With group commit,
this can be increased to 1000 to 2000 transactions per second.
b. Flash Storage: Flash storage, with its faster write times, can support
up to 10,000 transactions per second without group commit. However,
with group commit, this rate can skyrocket to 100,000 transactions per
second. Additionally, group commit minimizes the number of erase
operations, optimizing flash storage performance.
3. Reduced Commit Delay: While group commit introduces a slight
delay in committing transactions that perform updates, it effectively
reduces the overall commit delay, especially when dealing with high
rates of transaction commit.
Programmer's Role in Optimization:
Database performance can also be improved by programmers through
certain strategies:
Batch Processing: When dealing with data loading applications, inserting
each record as a separate transaction can be limiting. Instead,
performing a batch of inserts as a single transaction allows log records
to be written together on one page, increasing the number of inserts that
can be performed per second.
Optimizing transaction commit processing is essential for improving the

overall performance of a database system. The group commit technique
efficiently reduces overhead and enhances transaction rates, particularly
when dealing with high-commit-rate scenarios. Additionally,
programmers can play a vital role in improving transaction commit
performance by adopting batch processing strategies to maximize
database throughput.
77_78_Database Security:
Secrecy: Users should not be able to see things they are not supposed
to.
•E.g: A student can’t see other students’ grades.
Integrity: Users should not be able to modify things they are not
supposed to.
•E.g., Only instructors can assign grades.
Availability: Users should be able to see and modify things they are
allowed to.
Threats to Database:
Loss of integrity: Database integrity refers to the requirement
that information be protected from improper modification.
Modification of data includes creating, inserting, and updating
data; changing the status of data; and deleting data. Integrity is
lost if unauthorized changes are made to the data by either
intentional or accidental acts.
Loss of availability: Loss of availability occurs when the user
or program cannot access these objects.
Loss of confidentiality: Database confidentiality refers to the
protection of data from unauthorized disclosure.
Control Measures:
To protect databases against the threats discussed above, it
is common to implement four kinds:
i)Access Control : Restricting access to the database by
unauthorized users. Access control is done by creating user
accounts and to control login process by the DBMS.
ii)Inference Control: This method is known as the

countermeasures to statistical database security problem.
(Statistical databases are used to provide statistical information or
summaries of values based on various criteria.)
iii)Flow Control: This prevents information from flowing in
such a way that it reaches unauthorized users.
iv)Data Encryption: This method is mainly used to protect

sensitive data (such as credit card numbers, OTP numbers) and
other sensitive numbers.
Database Security and DBA:

The DBA’s responsibilities include granting privileges to
users who need to use the system and classifying users and data
in accordance with the policy of the organization. DBA-privileged
commands include commands for granting and revoking privileges
to individual accounts, users, or user groups and for performing
the following types of actions:
Account creation. This action creates a new account and
password for a user or a group of users to enable access to the
DBMS.
Privilege granting. This action permits the DBA to grant
certain privileges to certain accounts.
Privilege revocation. This action permits the DBA to revoke
(cancel) certain privileges that were previously given to certain
accounts.
Security level assignment. This action consists of assigning
user accounts to the appropriate security clearance level.
Access Control, User Accounts, and Database Audits:

Whenever a person or a group of persons needs to access a
database system, the individual or group must first apply for a user
account. The DBA will then create a new account number and
password for the user if there is a legitimate need to access the
database.
The user must log in to the DBMS by entering the account
number and password whenever database access is needed. The
DBMS checks that the account number and password are valid; if
they are, the user is permitted to use the DBMS and to access the
database.Application programs can also be considered users and
are required to log in to the database .
The database system must also keep track of all operations
on the database that are applied by a certain user throughout each
login session, which consists of the sequence of database
interactions that a user performs from the time of logging in to the
time of logging off.
Sensitive Data and Types of Disclosures:

It is the responsibility of the database administrator and
security administrator to collectively enforce the security
policies of an organization. This dictates whether access should
or should not be permitted to a certain database attribute for
individual users or for categories of users.
Several factors must be considered before deciding whether it is
safe to reveal the
data. The three most important factors are:
❏ Data availability
❏ Access acceptability
❏ Authenticity assurance.
Access Control:
A security policy specifies who is authorized to do what.
Two A security mechanism allows us to enforce a chosen
security policy. Main mechanisms at the DBMS level:
Discretionary access control
Mandatory access control
Discretionary Access Control:

Based on the concept of access rights or privileges for
objects (tables and views), and mechanisms for giving users
privileges (and revoking privileges).
Creator of a table or a view automatically gets all privileges
on it.
DBMS keeps track of who subsequently gains and loses
privileges, and ensures that only requests from users who have
the necessary privileges (at the time the request is issued) are
allowed.
Types of Discretionary Privileges:

Informally,there are two levels for assigning privileges to use
the database system:
i)The account level.At this level, the DBA specifies the
particular privileges that each account holds independently of the
relations in the database.Ex: CREATE SCHEMA or CREATE
TABLE privilege.
ii)The relation (or table) level. At this level, the DBA can
control the privilege to access each individual relation or view in
the database.EX:Privileges referring to individual columns
(attributes) of relations
Mandatory security mechanisms:

In many applications, an additional security policy is needed
that classifies data and users based on security classes. This
approach, known as mandatory access control (MAC), the need
for multilevel security exists in government, military, and
intelligence applications, as well as in many industrial and
corporate applications. Because of the overriding concerns for
privacy, in many systems the levels are determined by who has
what access to what private information.
Unlike DAC, where the owner can determine the access and
privileges and can restrict the resources based on the identity of
the users, In MAC, the system only determines the access and the
resources will be restricted based on the clearance of the subjects.
Grant Permission:
GRANT privileges ON object TO users [WITH GRANT
OPTION]
The following privileges can be specified:
SELECT: Can read all columns (including those added later via
ALTER TABLE command).
INSERT(col-name): Can insert tuples with non-null or non-default
values in this column. INSERT means same right with respect to
all columns.
DELETE: Can delete tuples.
REFERENCES (col-name): Can define foreign keys (in other
tables) that refer to this column.
Example for Grant Permissions:

Suppose that the DBA creates four accounts—A1, A2, A3,
and A4—and wants only A1 to be able to create base relations. To
do this, the DBA must issue the following GRANT command in
SQL:
GRANT CREATETAB TO A1;
The CREATETAB (create table) privilege gives account A1

the capability to create new database tables (base relations) and is
hence an account privilege.This privilege was part of earlier
versions of SQL but is now left to each individual system
implementation to define.
Suppose that A1 creates the two base relations
EMPLOYEE and DEPARTMENT
Next, suppose that account A1 wants to grant to account A2 the privilege

to insert and delete tuples in both of these relations. However, A1 does
not want A2 to be able to propagate these privileges to additional
accounts. A1 can issue the following command:
GRANT INSERT, DELETE ON EMPLOYEE,
DEPARTMENT TO A2;
Next, suppose that A1 wants to allow account A3 to retrieve

information from either of the two tables and also to be able to
propagate the SELECT privilege to other accounts. A1 can issue
the following command:
GRANT SELECT ON EMPLOYEE, DEPARTMENT
TO A3 WITH GRANT OPTION;
The clause WITH GRANT OPTION means that A3 can now
propagate the privilege to other accounts by using GRANT. For example,
A3 can grant the SELECT privilege on the EMPLOYEE relation to A4 by
issuing the following command:
GRANT SELECT ON EMPLOYEE TO A4;
Revoke:
Revoke commands are used to revoke the permission
granted to users using Grant command.
Syntax :
REVOKE [GRANT OPTION FOR] privileges ON

object FROM users {RESTRICT | CASCADE}
Example for Revoke:
Now suppose that A1 decides to revoke the SELECT
privilege on the EMPLOYEE relation from A3; A1 then can issue this
command: REVOKE SELECT ON EMPLOYEE FROM A3;
The DBMS must now revoke the SELECT privilege on
EMPLOYEE from A3, and it must also automatically revoke the
SELECT privilege on EMPLOYEE from A4. This is because A3 granted
that privilege to A4, but A3 does not have the privilege any more.

Unit 3 Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3 Notes

Uploaded by

Copyright:

Available Formats

Unit 3: Advanced Design Concepts and

A relation schema is formed by grouping of attributes.

Good design example:-

Consider the relation:

- Changing the name of project number P1 from “Billing” to “Customer-Accounting”

Example of an Insert Anomaly:

Consider the relation:

- Cannot insert a project unless an employee is assigned to it

Example of an Insert Anomaly:

Consider the relation:

Null values in Tuples:

There are two important properties of decompositions:

employee(ID, name, street, city, salary) =>

An attribute in a functional dependency is considered extraneous attribute if we can remove

A set of FDs is minimal if it satisfies the following conditions:

2. We cannot replace any dependency X → A in F with a dependency Y → A, where

A functional dependency is a generalization of the notion of a key

Examples of Functional Dependencies constraints:

Social security number determines employee name

Project number determines project name and location

Types of functional dependency

1. Full functional Dependency

Consider the following TEACH relation.

The only way to avoid the repetition-of-information problem in a schema is to decompose it

employee1 (ID, name) + employee2 (name, street, city, salary)

1. IR1. (Reflexive) If Y is a subset-of X, then X → Y always holds.

2. IR2. (Augmentation) If X → Y, then XZ → YZ always holds.

3. IR3. (Transitive) If X → Y and Y → Z, then X → Z always holds.

4. Decomposition: If X → YZ, then X → Y and X → Z

5. Union/Additive: If X → Y and X → Z, then X → YZ

6. Pseudo-Transitivity: If X → Y and WY → Z, then WX → Z

Usage of Functional Dependencies:

We use functional dependencies to:

Two sets of FDs F and G are equivalent if:

F (Set of functional dependencies):

Closure of a set F of FDs is the set F+ of all FDs that can be

X+ can be calculated by repeatedly applying IR1, IR2, IR3 using

We can compute F+, the closure of F, by repeatedly applying Armstrong’s

Reflexive rule: if β ⊆ α, then α → β

2. Augmentation rule: if α → β, then γ α → γ β

3. Transitivity rule: if α → β, and β → γ, then α → γ

Normalization: The process of decomposing unsatisfactory "bad"

Normalization is carried out in practice so that the resulting

The practical utility of these normal forms becomes questionable

The database designers need not normalize to the highest

Denormalization: The process of storing the join of higher normal

A superkey of a relation schema R = {A1, A2, ...., An} is a set of

First Normal Form:

Second Normal Form:

A relation schema R is in second normal form (2NF) if every

A relation schema R is in third normal form (3NF) if it is in

Any attribute involved in a candidate key is a prime attribute.

BCNF and Higher Normal Forms:

BCNF (Boyce-Codd Normal Form):

Two FDs exist in the relation TEACH:

Test for checking non-additivity of Binary Relational

If you apply the NJB test to the 3 decompositions

Multivalued Dependencies and Fourth Normal Form:

An MVD X —>> Y in R is called a trivial MVD if (a) Y is a subset of

Definition: A relation schema R is in 4NF with respect to a set of

We can state the following points:

note: F+ is the (complete) set of all dependencies (functional or

Join Dependencies and Fifth Normal Form:

Definition: A join dependency (JD), denoted by JD(R1, R2, ...,