Professional Documents
Culture Documents
Implementation
Design theory:
- When a project is deleted, it will result in deleting all the employees who work on that
project.
- Alternately, if an employee is the sole employee on a project, deleting that employee
would result in deleting the corresponding project.
The two relations EMP_PROJ1 and EMP_LOCS as the base relations of EMP_PROJ, is not
a good schema design.
Problem is if a Natural Join is performed on the above two relations it produces more tuples
than original set of tuples in EMP_PROJ.
- These additional tuples that were not in EMP_PROJ are called spurious tuples
because they represent spurious or wrong information that is not valid.
- This is because the PLOCATION attribute which is used for joining is neither a
primary key, nor a foreign key in either EMP_LOCS AND EMP_PROJ1.
The below provided example can be generated if you improperly join the relations together
with one another.
The method for designing a relational database is to use a process commonly known
as normalization.
The goal is to generate a set of relation schemas that allows us to store information
without unnecessary redundancy, yet also allows us to retrieve information easily.
e.g: Based on Functional dependencies + Multivalued dependencies.
Decomposition:
The only way to avoid the repetition-of-information problem in the in_dep schema is to
decompose it into two schemas – instructor and department schemas.
Not all decompositions are good. Suppose we decompose
The problem arises when we have two employees with the same name
The next slide shows how we lose information -- we cannot reconstruct the original
employee relation -- and so, this is a lossy decomposition.
Minimal Cover:
3. We cannot remove any dependency from F and still have a set of dependencies that
is equivalent to F. (removing the redundancies via having a dependency that can be
inferred from the remaining FD s in F)
Functional dependencies:
The functional dependency (FD) is a relationship that exists between two attributes,
Where one attribute can directly or indirectly derived from the other attribute.
● Are used to specify formal measures of the "goodness" of relational designs
● And keys are used to define normal forms for relations
● Are constraints that are derived from the meaning and interrelationships of the data
attributes.
X → Y holds if whenever two tuples have the same value for X, they must have the same
value for Y. For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then
t1[Y]=t2[Y]
However, the FDs Teacher → Course, Teacher → Text, Course → Text are ruled out.
(This is because the element on the LHS have similar values in the current column, but are
not the same on the RHS column)
Decomposition:
The flaw in this decomposition arises from the possibility that the enterprise has two
employees with the same name.
Inference rules:
Def: An FD X → Y is inferred from or implied by a set of dependencies F specified on R if X
→ Y holds in every legal relation state r of R; that is, whenever r satisfies all the
dependencies in F, X → Y also holds in r.
Additional Rules:
Examples:
Consider the relation schema <R,F> where R= (ABCDEGHI) and dependencies
F= { AB->E,AG->J, BE->I, E ->G GI ->H }. Show that AB ->GH is derived by F.
Step Statement Explanation
AB ->E Given
E ->G Given
BE->I Given
GI ->H Given
AB ->G Transitivity on (1) and (2)
AB -> BE Augmentation (1) by B
AB -> I Transitivity on (6) and (3)
AB -> GI Union on (5) and (7)
AB -> H Transitivity on (8) and (4)
AB -> GH Union on (5) and (9)
Equivalence:
Examples:
A -> B
B -> C
G (Set of functional dependencies):
B -> C
C -> A
In this example, let's check if F and G are equivalent:
F covers G:
B -> C is in F, so B+ includes C.
C -> A is in F, so C+ includes A.
G covers F:
B -> C is in G, so B+ includes C.
C -> A is in G, so C+ includes A
Closure:
Dependency Preseveration:
Testing functional dependency constraints each time the database is updated can be costly.It
is useful to design the database in a way that constraints can be tested efficiently.If testing a
functional dependency can be done by considering just one relation, then the cost of testing
this constraint is low.
Example:-
Consider a schema:
dept_advisor(s_ID, i_ID, department_name)
With function dependencies:
i_ID → dept_name
s_ID, dept_name → i_ID
In the above design we are forced to repeat the department name once for each time an
instructor participates in a dept_advisor relationship.
To fix this, we need to decompose dept_advisor
Any decomposition will not include all the attributes in
s_ID, dept_name → i_ID
Thus, the composition NOT be dependency preserving
1NF, 2NF, 3NF-Comparison:
Normalization of Relations:
2NF, 3NF, BCNF based on keys and FDs of a relation
schema
4NF based on keys, multi-valued dependencies : MVDs;
5NF based on keys, join dependencies : JDs
Examples:
{SSN, PNUMBER} -> HOURS is a full FD since neither SSN
-> HOURS nor PNUMBER -> HOURS hold.
{SSN, PNUMBER} -> ENAME is not a full FD (it is called a
partial dependency ) since SSN -> ENAME also holds.
definition:
Transitive functional dependency: a FD X -> Z that can
be derived from two FDs X -> Y and Y -> Z.
examples:
SSN -> DMGRSSN is a transitive FD
Since SSN -> DNUMBER and DNUMBER ->
DMGRSSN hold
SSN -> ENAME is non-transitive
Since there is no set of attributes X where SSN -> X
and X -> ENAME
Chapter Summary:
Remember the following diagram which implies-
A relation in BCNF will surely be in all other normal forms.
A relation in 3NF will surely be in 2NF and 1NF.
A relation in 2NF will surely be in 1NF.
Transaction Concept
▪ A transaction is a unit of program execution that accesses and
possibly updates various data items.
ACID Properties
❖ Atomicity requirement
➢ If the transaction fails after step 3 and before step 6, money
will be “lost” leading to an inconsistent database state
■ Failure could be due to software or hardware
➢ The system should ensure that updates of a partially
executed transaction are not reflected in the database
❖ Durability: Once the user has been notified that the transaction
has completed (i.e., the transfer of the $50 has taken place), the
updates to the database by the transaction must persist even if
there are software or hardware failures.
❖ Consistency:
➢ The sum of A and B is unchanged by the execution of the
transaction
In general, consistency requirements include
➢ Explicitly specified integrity constraints such as primary keys
and foreign keys
➢ Implicit integrity constraints
➢ A transaction must see a consistent database.
➢ During transaction execution the database may be
temporarily inconsistent.
➢ When the transaction completes successfully the database
must be consistent
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
Storage Structure
Transaction State
• Active – the initial state; the transaction stays in this state while it
is executing
• Partially committed – after the final statement has been
executed.
• Failed -- after the discovery that normal execution can no longer
proceed.
• Aborted – after the transaction has been rolled back and the
database restored to its state prior to the start of the transaction.
Two options after it has been aborted:
• Restart the transaction
• Can be done only if no internal logical error
• Kill the transaction
• Committed – after successful completion.
• It can restart the transaction, but only if the transaction was
aborted as a result of some hardware or software error that was
not created through the internal logic of the transaction. A
restarted transaction is considered to be a new transaction.
Transaction Isolation
Concurrent Executions
Schedules
Conflicting Instructions
Conflict Serializability
View Serializability
Concurrency Control
Recoverable Schedules
Cascading Rollbacks
Cascadeless Schedule
▪ Cascadeless schedules — cascading rollbacks cannot occur;
• For each pair of transactions Ti and Tj such that Tj reads a
data item previously written by Ti, the commit operation of Ti
appears before the read operation of Tj.
▪ Serializable — default
▪ Locking
• Lock on whole database vs lock on items
• How long to hold lock?
• Shared vs exclusive locks
▪ Timestamps
• Transaction timestamp assigned e.g. when a transaction
begins
• Data items store two timestamps
▪ Read timestamp
▪ Write timestamp
• Timestamps are used to detect out of order accesses
▪ E.g., Transaction 1:
select ID, name from instructor where salary > 90000
▪ E.g., Transaction 2:
insert into instructor values ('11111', 'James', 'Marketing',
100000)
▪ Suppose
• T1 starts, finds tuples salary > 90000 using index and locks
them
• And then T2 executes.
• Do T1 and T2 conflict? Does tuple level locking detect the
conflict?
• Instance of the phantom phenomenon
Lock-Based Protocols
● Shared Lock (S):
○ A transaction holding a shared lock (S) on a data item can read the
data but cannot write to it.
○ Multiple transactions can hold shared locks on the same data item
concurrently for reading.
● Exclusive Lock (X):
○ A transaction holding an exclusive lock (X) on a data item can both
read and write to it.
○ Only one transaction at a time can hold an exclusive lock on a data
item.
● Compatibility between lock modes is determined by a compatibility
function. If a mode A is compatible with mode B, it means a transaction
requesting mode A can be granted the lock even if another transaction
holds mode B on the same data item..
● The compatibility matrix for shared and exclusive locks is shown below.
● Transactions.
○ A transaction requests a lock in an appropriate mode (either shared
or exclusive) on a data item before it can proceed with read or
write operations.
○ If a data item is already locked in an incompatible mode, the
requesting transaction is made to wait until the incompatible locks
are released.
○ Transactions must hold a lock on a data item as long as they access
that item to ensure data consistency and isolation.
○ Transactions can unlock a data item when they no longer need it.
● Locks are used to control access to data items, ensuring data consistency.
● Locking Strategies:
○ Locking can be delayed until the end of a transaction to avoid
inconsistencies.
○ Locks are granted by a concurrency-control manager.
○ The exact timing of lock grants is not critical.
● Deadlock occurs when transactions are waiting for each other's locks,
leading to a standstill. Deadlocks require a system to roll back one of the
transactions to resolve the situation.
● Deadlocks, while undesirable, are preferable to inconsistent states. This is
because deadlocks can be resolved by rolling back transactions, while
inconsistent states can cause real-world problems.
● Locking protocols define rules for when transactions can lock and unlock
data items.
● Locking protocols restrict the possible schedules of transactions.
● Conflict-serializable schedules are schedules that ensure isolation and
avoid conflicts between transactions.
● "Ti → Tj" denotes that transaction Ti precedes transaction Tj in a
schedule due to locking.
● Precedence is established when Ti holds a lock on a data item, and Tj
subsequently holds a different type of lock on the same data item.
● A schedule is considered legal under a locking protocol if it adheres to
the rules specified by the protocol.
● A locking protocol ensures conflict serializability if all legal schedules
produced by the protocol are conflict serializable.
● Conflict serializability implies that the precedence relationship (Ti → Tj)
is acyclic.
Granting of lock
● Transactions can request locks on data items in different modes, such as
shared-mode (read) or exclusive-mode (write). Locks requested in
conflicting modes cannot coexist. For example, an exclusive-mode lock
conflicts with a shared-mode lock.
● To grant a lock to a transaction Ti, certain conditions must be met:
○ No other transaction should currently hold a conflicting lock on the
same data item Q.
○ No other transaction that requested a lock on data item Q before Ti
should be waiting for that lock.
● Locks can be granted when there is no conflict between the requested
mode M and the existing locks. For instance, a shared-mode lock can be
granted as long as there are no exclusive-mode locks in place.
● The goal is to prevent transactions from waiting indefinitely and
potentially starving. Lock requests should not be blocked by lock requests
made after them. This means that older lock requests are given priority.
● When a transaction is continually blocked from acquiring a lock due to
newer transactions continually being granted the lock, it can be
considered starved. This situation can lead to a lack of progress for the
starved transaction.
● Transactions should release their locks as soon as they no longer need
them to allow other transactions to progress.
● This lock granting strategy ensures fairness by prioritizing older lock
requests, preventing newer requests from always taking precedence. This
way, all transactions have a chance to make progress.
● The system's concurrency-control manager or scheduler is responsible for
enforcing these rules and making decisions about when to grant or deny
lock requests.
● While this strategy prevents starvation, it's essential to strike a balance
between fairness and system performance. Allowing older transactions to
have absolute priority may not always be the most efficient approach.
● To enforce these rules effectively, the system may use queues to track and
manage lock requests, ensuring that older requests are processed before
newer ones.
Deadlock Prevention
● Two approaches: preventing cyclic waits and using preemption with
transaction rollbacks.
● Cyclic waits can be prevented by requiring transactions to lock all their
data items before execution, which can lead to low data item utilization.
● Alternatively, a total order of data items and two-phase locking can be
used.
● Preemption approach uses timestamps to control locking and decide
whether to wait or rollback.
Deadlock Detection
1. Selection of a Victim
a. When a deadlock is detected, the system must decide which
transaction(s) to roll back.
b. The goal is to minimize the cost associated with the rollback.
c. Factors affecting the cost of a rollback include:
d. How far the transaction has progressed in its execution.
e. The number of data items the transaction has used.
f. The number of data items needed for the transaction to complete.
g. The number of transactions involved in the rollback.
2. Rollback
a. Once a victim transaction is selected, the system must determine
how far to roll it back.
b. Total rollback involves aborting the transaction and restarting it,
but partial rollback is more efficient.
c. Partial rollback requires the system to maintain information about
the state of all running transactions.
d. The deadlock detection mechanism decides which locks the
selected transaction must release to break the deadlock.
e. The transaction is rolled back to the point where it obtained the
first of these locks, undoing all actions taken after that point.
f. The recovery mechanism must support partial rollbacks, and
transactions must be able to resume execution after such a rollback.
3. Starvation
a. In a system where victim selection is primarily based on cost
factors, the same transaction may be repeatedly chosen as a victim.
b. This can lead to starvation, where the transaction is never able to
complete its designated task.
c. To mitigate starvation, it's important to ensure that a transaction
can only be selected as a victim a finite number of times.
d. One common approach is to include the number of rollbacks in the
cost factor when selecting victims.
Validation-Based Protocols
● Validation-Based Protocols are introduced as an alternative to traditional
concurrency control schemes, particularly in cases where a majority of
transactions are read-only, resulting in fewer conflicts.
● These protocols aim to minimise the overhead imposed by concurrency
control and transaction delays, especially in scenarios where many
transactions can be executed without supervision and still maintain a
consistent system state.
● The main challenge is that it's difficult to predict in advance which
transactions will conflict with each other, requiring a system monitoring
scheme to gain such knowledge.
● The validation protocol consists of three distinct phases for each
transaction (Ti):
○ Read phase: During this phase, Ti executes and reads values from
data items, storing them in local variables. It performs all write
operations on temporary local variables without affecting the actual
database.
○ Validation phase: A validation test is applied to Ti during this phase
to determine whether it can proceed to the write phase without
violating serializability. If Ti fails the validation test, it is aborted.
○ Write phase: If Ti passes the validation test, the temporary local
variables containing the results of any write operations are copied
to the actual database. Read-only transactions skip this phase.
● Transactions must follow the prescribed order of these phases, but
concurrent transactions can interleave their phases.
● To conduct the validation test, each transaction is associated with three
timestamps:
○ StartTS(Ti): The time when Ti begins its execution.
○ ValidationTS(Ti): The time when Ti completes its read phase and
enters the validation phase.
○ FinishTS(Ti): The time when Ti completes its write phase.
● These timestamps help in determining when each phase of a transaction
occurs and are crucial for applying the validation test to ensure
serializability.
● The serializability order in the validation-based protocol is determined
using the timestamp ValidationTS(Ti).
● To ensure serializability, the following conditions must be met:
○ FinishTS(Tk) < StartTS(Ti) for all Tk with TS(Tk) < TS(Ti). This
condition ensures that Tk finishes execution before Ti starts,
preserving the serializability order.
○ The set of data items written by Tk does not intersect with the set
of data items read by Ti, and Tk completes its write phase before Ti
starts its validation phase (StartTS(Ti) < FinishTS(Tk) <
ValidationTS(Ti)). This condition ensures that the writes of Tk and
Ti do not overlap and maintains the serializability order.
● The validation scheme automatically prevents cascading rollbacks, as
actual writes occur only after a transaction commits.
● Starvation of long transactions can occur due to repeated restarts caused
by short conflicting transactions. To prevent starvation, conflicting
transactions should be temporarily blocked to allow long transactions to
complete.
● The validation conditions only apply to transactions Ti that finished after
T started and are serialized before T. Transactions that finished before T
started or those serialized after T can be ignored in the validation tests.
● This validation scheme is called an optimistic concurrency-control
scheme because transactions execute optimistically, assuming they will
complete and validate at the end, as opposed to locking and
timestamp-ordering schemes, which are pessimistic and force waits or
rollbacks upon conflict detection.
● Using TS(Ti) = StartTS(Ti) instead of ValidationTS(Ti) would result in a
situation where a transaction Ti enters the validation phase before another
transaction Tj with TS(Tj) < TS(Ti). This could cause a delay in Ti's
validation, as it would have to wait for Tj to complete. Using
ValidationTS avoids this problem.
RECOVERY SYSTEM
A computer system, like any other device, is subject to failure from a variety of
causes such as disk crashes, power outages, software errors, a fire in the
machine room, and even sabotage.
In any failure, information may be lost. Therefore, the database system must
take action in advance to preserve the atomicity and durability properties of
transactions.
An integral part of a database system is a recovery scheme that can restore the
database to the consistent state that existed before the failure.
The recovery scheme must support high availability by keeping a synchronized
backup copy of the database for use in case of machine failure or maintenance.
FAILURE CLASSIFICATION
There are various types of failure that may occur in a system, each of which
needs to be
dealt with in a different manner.
TRANSACTION FAILURE
1) Transaction Failure:
There are two types of errors that may cause a transaction to fail:
● Logical error
● System error
Logical error: The transaction can no longer continue with its normal
execution
because of some internal condition, such as bad input, data not found, overflow,
or resource limit exceeded.
System error: The system has entered an undesirable state (e.g., deadlock), as
a result of which a transaction cannot continue with its normal execution. The
transaction, however, can be re-executed at a later time.
SYSTEM CRASH
System crash: A power failure or other hardware or software failure causes the
system to crash.
There are three potential causes for the loss of volatile storage and interruption
of transaction processing: hardware malfunction, database software bug, or
operating system issue. However, the content of non-volatile storage remains
unaffected.
● Fail-stop assumption: The assumption that hardware errors and bugs in
the software bring the system to a halt, but do not corrupt the non-volatile
storage contents, is known as the fail-stop assumption.
Well-designed systems have numerous internal checks, at the hardware and the
software level, that bring the system to a halt when there is an error.
Hence, the fail-stop assumption is a reasonable one.
DISK FAILURE
Disk failure:
A head crash or similar disk failure destroys all or part of the disk
storage
Stable-Storage Implementation
Note: Refer to the textbook for more details on how to implement stable
storage
Data Access
We assume, for simplicity, that each data item fits in, and is stored inside
a single block.
Each transaction Ti has its private work area in which local copies of all
data items accessed and updated by it are kept.
Ti's local copy of a data item X is called xi.
Transferring data items between system buffer blocks and their private
work area done by:
read(X) assigns the value of data item X to the local variable xi.
write(X) assigns the value of local variable xi to data item {X} in the
buffer block.
Note: output(BX) need not immediately follow write(X). The system can
perform the output operation when it deems fit.
Transactions
Must perform read(X) before accessing X for the first time (subsequent
reads can be from local copy)
write(X) can be executed at any time before the transaction commits
Example of Data Access
69_70-Recovery Atomicity, Recovery Algorithm
Database Modification
As we noted earlier, a transaction creates a log record prior to modifying
the database.
The log records allow the system to undo changes made by a
transaction in the event that the transaction must be aborted; they allow
the system also to redo changes made by a transaction if the transaction
has committed but the system crashed before those changes could be
stored in the database on disk. In order for us to understand the role of
these log records in recovery, we need to consider the steps a
transaction takes in modifying a data item:
1. The transaction performs some computations in its own private part of
main memory.
2. The transaction modifies the data block in the disk buffer in main
memory holding the data item.
3. The database system executes the output operation that writes the
data block to disk.
Immediate Database Modification
The immediate-modification scheme allows updates of an uncommitted
transaction to be made to the buffer, or the disk itself before the
transaction commits
Update log record must be written before the database item is written
We assume that the log record is output directly to stable storage
(Will see later that how to postpone log record output to some extent)
Output of updated blocks to disk can take place at any time before or
after the transaction commit
Order in which blocks are output can be different from the order in which
they are written.
The deferred-modification scheme performs updates to buffer/disk only
at the time of transaction commit
Simplifies some aspects of recovery
But has overhead of storing a local copy
Transaction Commit
Checkpoints
When a system crash occurs, we must consult the log to determine
those transactions that need to be redone and those that need to be
undone. In principle, we need to search the entire log to determine this
information. There are two major difficulties with this approach:
1. The search process is time-consuming.
2. Most of the transactions that, according to our algorithm, need
to be redone have
already written their updates into the database. Although redoing them
will cause no harm, it will nevertheless cause recovery to take longer.
To reduce these types of overhead, we introduce checkpoints.
Example of Checkpoints
T1 can be ignored (updates already output to disk due to checkpoint)
T2 and T3 redone.
T4 undoneT1 can be ignored (updates already output to disk due to
checkpoint)
T2 and T3 redone.
T4 undone
Recovery Algorithm
77_78_Database Security:
Secrecy: Users should not be able to see things they are not supposed
to.
•E.g: A student can’t see other students’ grades.
Integrity: Users should not be able to modify things they are not
supposed to.
•E.g., Only instructors can assign grades.
Availability: Users should be able to see and modify things they are
allowed to.
Threats to Database:
Loss of integrity: Database integrity refers to the requirement
that information be protected from improper modification.
Modification of data includes creating, inserting, and updating
data; changing the status of data; and deleting data. Integrity is
lost if unauthorized changes are made to the data by either
intentional or accidental acts.
Loss of availability: Loss of availability occurs when the user
or program cannot access these objects.
Loss of confidentiality: Database confidentiality refers to the
protection of data from unauthorized disclosure.
Control Measures:
To protect databases against the threats discussed above, it
is common to implement four kinds:
i)Access Control : Restricting access to the database by
unauthorized users. Access control is done by creating user
accounts and to control login process by the DBMS.
Access Control:
A security policy specifies who is authorized to do what.
Two A security mechanism allows us to enforce a chosen
security policy. Main mechanisms at the DBMS level:
Discretionary access control
Mandatory access control
Grant Permission:
GRANT privileges ON object TO users [WITH GRANT
OPTION]
The following privileges can be specified:
SELECT: Can read all columns (including those added later via
ALTER TABLE command).
INSERT(col-name): Can insert tuples with non-null or non-default
values in this column. INSERT means same right with respect to
all columns.
DELETE: Can delete tuples.
REFERENCES (col-name): Can define foreign keys (in other
tables) that refer to this column.
Revoke:
Revoke commands are used to revoke the permission
granted to users using Grant command.
Syntax :