You are on page 1of 20

UNIT 6

Definition of Normalization
Normalization is a scientific method of breaking down complex table structures into Simple table
structures by using certain rules. Using this method, you can reduce redundancy in a table and eliminate
the problems of inconsistency and disk space usage. You can also ensure that there is no loss of
information.
Normalization has several benefits. it enables faster sorting .Normalization helps to simplify the structure
of tables. The performance of an application indirectly linked to the database design. A poor design
hinders the performance of the system. The logical design of the database lays the foundation of an
optimal database.
Some rules that should be followed to achieve a good database design are:
*Each table should have in identifier.
*Each table should store data for a single type of entity.
*Columns that accept NULL should be avoided.
*The repetition of values in columns should be avoided.
Normalization results in the formation of tables that satisfy certain specified rules and represent certain
normal forms. The normal forms are used to ensure that various types of anomalies and inconsistencies are
not introduced in the database. A table structure is always in a certain normal form. Several normal tonus
have been identified. The most important and widely used normal forms are:
First Normal form (INF)
Second Normal form (2NF)
Third Normal form (3NF)
Boyce-Codd Normal form (BCNF)
Fourth Normal form (4NF)
Fifth Normal form (5NF)
First Normal Form (1 NF)
A table is said to be in the 1NF when each cell of the table contains precisely one value. Consider
the following table PROJECT.

PROJECT
ECODE DEPT DEPTHEAD PROJCODE HOURS
EIOI SYSTEMS E901 P27 90
P51 101
P20 60
E305 SALES E906 P27 109
P22 98
E50S ADMIN E908 P51 NULL
P27 72
The data in the table is not normalized because the cells in PROJCODE and HOURS have more
than one value.
By applying the 1NF- definition to the PROJECT table, you arrive at the following table
PROJECT
ECODE DEPT DEPTHEAD PROJCODE HOURS
E101 SYSTEMS E901 P27 90
EIOI SYSTEMS E901 P51 101
EIOI SYSTEMS E901 P20 60
E305 SALES E906 P27 109
E30S SALES E906 P22 98
E508 ADMIN E906 P51 NULL

Functional Dependency

The normalization theory is based on the fundamental notion of Functional dependency. First, let's
examine the concept of functional dependency. Given relation R, attribute A is functionally
dependent on attribute B if each value of A in R is associated with precisely one value of B. In other
words, attribute A is functionally dependent on B if and only if, for each value of B, there is exactly
one value of A. Attribute B is called the determinant.

Consider the following table EMPLOYEE.

EMPLOYEE

CODE NAME CITY


El Raj Delhi
E2 Ravi Meerut
E3 Pankaj Goa

Given a particular value of CODE, there is precisely one corresponding value for NAME For
example, for CODE El there is exactly one value of NAME Ra. Hence. NAME is functionally
dependent on CODE. Similarly, there is exactly one value of CITY for each value of CODE. Hence,
the attribute CITY is functionally dependent on the attribute CODE. The attribute CODE is the
determinant. You can also say that CODE determines CITY and NAME.

Second Normal Form (2NF)

A table is said to be in 2NF when it is in 1 N F and every attribute in the row is


functionally dependent upon the whole key. And not just part of the key

Consider the PROJECT Table

PROJECT
ECODE
PROJCODE
DEPT
DEPTHEAD
HOURS

The table has the Following value


ECODE PROJCODE DEPT DEPTHEAD HOURS
E101 P27 Systems E901 90
E305 P27 Finance E909 10
E508 P51 Admin E908 NULL
E101 P51 Systems E901 101
E101 P20 Systems E901 60
E508 P27 Admin E908 72

This situation could lead the following problems:


■ I n s e r t i o n: The department of a particular employee cannot be recorded until the employee is
assigned a project.

■ Updation For a given employee, the employee code, department name, and department head are
repeated several times. Hence, if an employee is transferred to another department, this change
will have to be recorded in every row of the EMPLOYEE table pertaining to that employee. Any
omission will lead to inconsistencies.
■ D e l e t i o n : When an employee completes work on a project, the employee's record is deleted.
The information regarding the department to which the employee belongs will also be lost.
The primary key here is composite (ECODF>PROJCODE).
The table satisfies the definition of INF. You need to now check if it satisfies 2NF.

In the table, for each value of ECODE, there is more than one value of HOURS. For example, for
ECODE, El01, there are three values of HOURS 90, 101, and 60. Hence, HOURS is not functionally
dependent on ECODE. Similarly, for each value of PROJCODE, there is more than one value of
HOURS. For example, for PROJCODE P27, there are three values of HOURS - 90, 10, and 72.
However, for a combination of the ECODE and PROJCODE values, there is exactly one value of
HOURS Hence, HOURS is functionally dependent on the whole key, ECODE+PROJCODE.
Now, you must check if DEPT is functionally dependent on the whole key, ECODE+PROJCODE.
For each value of ECODE, there is exactly one value of DEPT. For example, for ECODE 101, there
is exactly one value, the System department. Hence, DEPT is functionally dependent on ECODE.
However, for each value of PROJCODE, there is more than one value of DEPT. For example,
PROJCODE P27 is associated with two values of DEPT, System and Finance. Hence, DFPT is not
functionally dependent on PROJCODF. DFPT is, therefore, functionally dependent on part of the
key (which is ECODE) and not functionally dependent on the whole key (ECODE+PROJCODE).
Similar dependency is true for the DEPTHEAD attribute. Therefore, the table PROJECT is not in
2NF. For the table to be in 2NF, the non-key attributes must be functionally dependent on the whole
key and not part of the key.
Guidelines for Converting a Table to 2NF
• Find and remove attributes that are functionally dependent on only a part of the key and not
on the whole key. Place them in a different table.
■ Group the remaining attributes.
To convert the table PROJECT into 2NF, you must remove the attributes that are not functionally
dependent on the whole key and place them in a different table along with the attribute that it is
functionally dependent on. In the above example, since DEPT is not functionally dependent on the
whole key ECODE+PROJCODE, you place DEPT along with ECODE in a separate table called
EMPLOYEEDEPT. We also move the DEPTHEAD to the EMPLOYEEDEPT table.

Now the table PROJECT will contain ECODE ,PROJCODE and HOURS
EMPLOYEEDEPT
ECODE DEPT DEPTHEAD
E101 SYSTEMS E901
E305 FINANCE E909
E508 ADMIN E908

PROJECT
ECODE PROJCODE HOURS
E101 P27 90
E101 P51 101
E101 P20 60
E305 P27 10
E508 P51 NULL
E508 P27 72

Transitive Dependency: A functional dependency X —» Y in a relation scheme R, is a transitive


dependency if there is a set of attributes Z that is neither a candidate key nor a subset key of R and
both X —> Z and Z —»Y hold.
A general case of transitive dependencies is as follows:
A, B, C are three columns in a table If C is related to B If B is related to A Then C is indirectly
related to A. This is the case when transitive dependency in a table exists. We can remove transitive
dependency by splitting each relation into two separate tables, which are to be linked using a foreign
key.
When one non-key attribute depends on other non-key attribute, it is called a transitive dependency
, transitive dependency occurs because one non-key attribute is dependent on other non-key attribute
i.e. if student is in Is' year then he has been assigned Hostel Gandhi, if he is in IInd year, Hostel
assigned is Jawahar; IIIrd year the hostel assigned is Indra. Thus, Hostel assigned to a student is
dependent on the year of study in the college.

RNo Name Dept Year Hostel

Third Normal Form (3NF)

A table is said to be said in 3NF when it is in 2NF and every non-key attribute is functionally
dependent only on the primary key..
Consider the table EMPLOYEE.

ECODE DEPT DEPTHEAD


E101 Systems E901
E305 Finance E909
E402 Sales E906
E508 Admin E908
E607 Finance E909
E608 Finance E909

The problems with dependencies of this kind are:


■ Insertion: The department head of a new department that does not have any employees at
present cannot be entered in the DEPTHEAD column This is because the primary key is
unknown.
■ Updation: For a given department, the code for a particular department head (DEPTHEAD)
is repeated several times Hence, if a department head moves to another department, the
change will have to be made consistently across the table.
■ Deletion: If the record of an employee is deleted, the information regarding the head of the
department will also be deleted. Hence, there will be a loss of information.

You must check it the table is in 3NF. Since each cell in the table has a single value, the table is in
INF
The primary key in the EMPLOYEE table is ECODE. For each value of ECODE, there is exactly
one value of DEPT. Hence, the attribute DEPT is functionally dependent on the primary key,
ECODE. Similarly, for each value of ECODE, there is exactly one value of DEPTHEAD. Therefore,
DEPTHEAD is functionally dependent on the primary key ECODE. Hence, all the attributes are
functionally dependent on the whole key, ECODE. Hence the table is in 2NF.
However, the attribute DEPTHEAD is dependent on the attribute DEPT also. As per 3NF, all non-
key attributes have to be functionally dependent only on the primary key. This table is not in 3NF
since DEPTHEAD is functionally dependent on DEPT, which is not a primary key.

EMPLOYEE DEPARTMENT
ECODE DEPT
DEPT DEPTHEAD
E101 Systems
Systems E90I
E305 Finance
E402 Sales Sales E906
Admin E908
E508 ADMIN Finance E909
E607 Finance
E608 Finance

Boyce-Codd Normal Form (BCNF)

The original definition of 3NF was inadequate in some situations. It was not satisfactory for the
tables:
■ that had multiple candidate keys
■ where the multiple candidate keys were composite.
■ where the multiple candidate keys overlapped (had at least one attribute in common)
Therefore, a new normal form, the BCNF was introduced. You must understand that in tables
where the above three conditions do not apply, you can stop at the third normal form. In such
cases, the third NF is the same as the BCNF.
A relation is in the BCNF if and only if every determinant is a candidate key.

Consider the following PROFESSOR table.


PROF CODE DEPT HOD T_PERCENTAGE
P1 Physics Ravi Singh 50
P1 Math C.V.Rao 50
P2 Chemistry P.K. Singh 25
P2 Physics Ravi Singh 75
P3 Math C.V.Rao 100

PROFCODE+DEPT is the primary key. You will notice that PROFCODE+HOD could be chosen as
the primary key and hence, is a candidate key.
You will notice that this table has.
■ Multiple candidate k ey s , that is PROFCODE+DEPT and PROFCODE+HOD.
■ Th e candidate keys are composite.
■ The candidate keys overlap since the attribute PROFCODE is common.
This is a situation that requires conversion to BCNF.
DEPT and HOD are determinants since they are functionally dependent on each other However, they
are not candidate keys by themselves. As per BCNF, the determinants have to be candidate keys.
Guidelines for Converting a Table to BCNF
• Find and remove the overlapping candidate keys. Place them of the candidate key and the
attribute it is functionally dependent on., in a different table.
• Group the remaining items into a table.
Hence, remove DEPT and HOD and place them in a different table. You will arrive at the following
tables

DEPARTMENT

DEP HOD PROF ESSOR


Physics Ravi Singh PROFCODE DEPT T_PERCENTAGE
Math C.V.Rao P1 Physics 50
Chemistry P.K. Singh P1 Math 50
Physics Ravi Singh P2 Chemistry 25
P2 Physics 75
P3 Math 100

Multi-valued Dependency
Functional dependencies rule out certain tuples from being in a relation. If A —> B, then we cannot have two
tuples with the same A value but different B values. Multi-valued dependencies, on the other hand, do not rule
out the existence of certain tuples, which have multiple dependencies. Instead, they require that other tuples of a
certain form be present in the relation.
Multi-valued dependencies are a consequence of first normal form. First normal form does not allow an attribute
in a tuple to have more than one value. If we have two or more multi-valued independent attributes in the same
relation schema, then we would get into a problem of having to repeat every value of one of the attributes with
every value of the other attribute to keep the relation instances consistent. This constraint is specified by a multi
valued dependency.
Functional dependencies are also referred to as equality generating dependencies, and multi valued
dependencies are also referred to as tuple generating dependencies.
Consider the following FACULTY table.
FACULTY SUBJECT COMMITTE E
A mit DBMS P l a c e me nt
A mit Net w or k i n g P l a c e me nt
A mit Data Structur e P l a c e me nt
A mit DBMS S c h o l a r shi p
A mit Net w o r k i n g S c h o l a r shi p
A mit Data Structur e S c h o l a r shi p
A tuple in this FACULTY relation represents the fact that an FACULTY teaching different SUBJECT and
the committees for the which they are the in charge.
For FACULTY attribute there are multiple values of SUBJECT and COMMITTEE attributes, but
SUBJECT and COMMITTEE are not related to each other. So multi-valued dependency exist in this
table.

4 N ormal For m :
A t abl e i s in 4N F if it is in BCN F an d it d o e s co nt ain n o mu l t i - valued
dep e nd en cy .
C o n s i d er th e f ol l o w in g F AC U LTY t abl e wit h mu l t i - v al ue d d ep e n d e n c y ,
w h e r e a f acu lty has mu lt ip le s u b j e c t s to t e ac h and h e i s head ing se ver a l
co mmi t t e e s .
FACULTY SUBJECT COMMITTE E
A mit DBMS P l a c e me nt
A mit Net w or k i n g P l a c e me nt
A mit Data Structur e P l a c e me nt
A mit DBMS S c h o l a r shi p
A mit Net w o r k i n g S c h o l a r shi p
A mit Data Structur e S c h o l a r shi p

Th is r e l a t io n is BC N F . B ut th is rel a ti o n n e ed t o de co mp os it ion .T h e ru l e f or
deco mp os i t i o n is to de c o m p o s e th e o f f e n d i n g tab l e in to tw o, wit h the mu lti -
d et er min an t attr ib ute or a t t r i b u t e as par t of th e k ey of b ot h . In t hi s cas e , to
pu t th e rel a tio n in 4 N F , tw o s e par a t e r e l a t io n s ar e fo r me d as fo ll o w s .

FACULTY SUBJECT
A mit DBMS
A mit Net w or k i n g
A mit Data Structur e
Ami t D BM S
A mit Net w or k i n g
A mit Data Structur e

FACULTY COMMITTEE
A mit Pl ac e men t
Ami t Sc ho l a r s hip

Jo in dependency:

COMPL TREATM DOCTOR


AINS ENT

Hierarchical representation of a NURS_HOME database

Join dependency is a constraint, similar to a Functional Dependency or a Multivalued Dependency. It is satisfied


if and only if the relation concerned is the join of certain number of projections. And therefore, such a constraint
is called a join dependency.
We now consider a special class of join dependencies which help to capture data dependencies present in a
hierarchical data structure. For example, in the NURS_HOME database shown in Figure 5.9, data has an
inherent hierarchical organization. If implies that information regarding wards and patients currently admitted to
a ward depend only on the Nurs_home but not the facilities present in that hospital (and vice versa).
Since a Nurs_home can have multiple wards, functional dependencies are not adequate to describe
the data dependency among NURS_HOME and WARDS or FACILITIES. In this case, multivalued
dependencies, NURS_HOME ->-> WARD or NURS_HOME ->-> FACILITIES hold.
Using, first order hierarchical decomposition (FOHD) would enable us to represent data depen-
dencies present in a hierarchical data structure in a more natural way.

Thus we can store NURS_HOME


database as the lossless join of
NURS_FACILITY (NURS_HOME, FACILITY)
NURS.WARD
(NURS_HOME, WARD,
PATIENT,
COMPLAINTS,
TREATMENT,
DOCTOR)
Relations.
5 Normal Form A relation is in 5NF if it is in 4NF and cannot be further non-loss
decomposed.
No loss decomposition is possible because of the availability of the join operator in
Relational model. Such decomposition can only be achieved by decomposing into
three or more separate tables. Such decomposition is not always possible as in some
case we can loose some information if we do the decomposition. Taking the
following example, we can show this:
Consider the following table
COMPANY PRODUCT SUPPLIER
Godrej Soap .Mr. X
Godrej Shampoo Mr.X
Godrej Shampoo Mr. Y
Godrej Shampoo Mr.
H.Lever Soap Mr.X
H. Lever Soap Mr. Y
H. Lever Shampoo Mr. Y
The table is in fourth normal form as there is no multi-valued dependency. It does,
however, have a lot of redundancy. For example, Mr. X is a supplier for Godrej for
twice and Mr. Y is also for twice for Hindustan Lever Company. But if we
decompose the table then we will loose information, which can be shown as
follows:
Suppose the tables decomposed into two parts as:
COMPANY_PRODUCT
COMPANY PRODUCT
Godrej Soap
Godrej Shampoo
H. Lever Soap
H. Lever Shampoo

COMPANY_SUPPLIER
COMPNAY SUPPLIER
Soap Mr.X
Soap Mr.Y
Shampoo Mr.Z
Shampoo Mr.X
Shampoo Mr.Y

The above said redundancy has been eliminated but we have lots of information. For example if we
want to display the products and their suppliers, then we will have to use the join based on the
company attribute. The result will display some spurious records. For Mr. Z, it will display both the
products, soap and shampoo as the company for which Mr. Z is the supplier (Godrej) is producing
soap and shampoo, which is incorrect.
Now suppose that the original tables were to be decomposed in three parts, company_product,
company_supplier and one more produc_supplier, which is as shown:
PRODUCT_SUPPLIER
PRODUCT SUPPLIER
Soap Mr.X
Soap Mr. Y
Shampoo Mr.X
Shampoo Mr. Y
Shampoo Mr. Z

If a join is taken of all the projections, again we will get wrong results. So it is not possible to
decompose the original table without losing information. Thus using the normalization
techniques cannot eliminate all redundancies, because it cannot be assumed that all
decompositions will be non-loss. So it is clear that if a table is in 4NF and cannot be further
non-loss decomposed, it is said to be in 5NF.
Concurrency Control

Transaction Concept : A transaction is a unit of program execution that accesses and possibly updates
various data items. Usually, a transaction is initiated by a user program written in a high-level data-
manipulation language (typically SQL), or programming language (for example, C++, or Java), with
embedded database accesses in JDBC or ODBC. A transaction is delimited by statements (or function calls) of
the form begin transaction and end transaction. The transaction consists of all operations executed between the
begin transaction and end transaction. This collection of steps must appear to the user as a single, indivisible
unit.

ACID Properties:

a) Atomicity: Since a transaction is indivisible, it either executes in its entirety or not at all. Thus, if a
transaction begins to execute but fails for whatever reason, any changes to the database that the
transaction may have made must be undone. This requirement holds regardless of whether the
transaction itself failed (for ex, if it divided by zero), the operating system crashed, or the computer
itself stopped operating. As we shall see, ensuring that this requirement is met is difficult since some
changes to the database may still be stored only in the main-memory variables of the transaction,
while others may have been written to the database and stored on disk. This “all-or-none” property is
referred to as atomicity.

b) Isolation: Furthermore, since a transaction is a single unit, its actions cannot appear to be separated
by other database operations not part of the transaction. While we wish to present this user-level
impression of transactions, we know that reality is quite different. Even a single SQL statement
involves many separate accesses to the database, and a transaction may consist of several SQL
statements. Therefore, the database system must take special actions to ensure that transactions
operate properly without interference from concurrently executing database statements. This property
is referred to as isolation.

c) Durability: Even if the system ensures correct execution of a transaction, this serves little purpose if
the system subsequently crashes and, as a result, the system “forgets” about the transaction. Thus, a
transaction’s actions must persist across crashes. This property is referred to as durability.

d) Consistency: Because of the above three properties, transactions are an ideal way of structuring
interaction with a database. This leads us to impose a requirement on transactions themselves. A
transaction must preserve database consistency—if a transaction is run atomically in isolation starting
from a consistent database, the database must again be consistent at the end of the transaction. This
consistency requirement goes beyond the data integrity constraints we have seen earlier (such as
primary-key constraints, referential integrity, check constraints, and the like). Rather, transactions are
expected to go beyond that to ensure preservation of those application-dependent consistency
constraints that are too complex to state using the SQL constructs for data integrity. How this is done
is the responsibility of the programmer who codes a transaction. This property is referred to as
consistency.

Example : We shall illustrate the transaction concept using a simple bank application consisting of several
accounts and a set of transactions that access and update those accounts. Transactions access data using two
operations:
• read(X), which transfers the data item X from the database to a variable, also called X, in a buffer in main
memory belonging to the transaction that executed the read operation.
• write(X), which transfers the value in the variable X in the main-memory buffer of the transaction that
executed the write to the data item X in the database.
It is important to know if a change to a data item appears only in main memory or if it has been written to the
database on disk. In a real database system, the write operation does not necessarily result in the immediate
update of the data on the disk; the write operation may be temporarily stored elsewhere and executed on the
disk later. For now, however, we shall assume that the write operation updates the database immediately.

Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be defined as:
Ti : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Let us now consider each of the ACID properties.
a) Atomicity requirement

 if the transaction fails after step 3 and before step 6, money will be “lost” leading to an
inconsistent database state
 Failure could be due to software or hardware
 the system should ensure that updates of a partially executed transaction are not reflected in
the database

b) Durability requirement: once the user has been notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to the database by the transaction must persist even if
there are software or hardware failures

c) Consistency requirement: The consistency requirement here is that the sum of A and B be
unchanged by the execution of the transaction. Without the consistency requirement, money could be
created or destroyed by the transaction! It can be verified easily that, if the database is consistent
before an execution of the transaction, the database remains consistent after the execution of the
transaction.
Ensuring consistency for an individual transaction is the responsibility of the application programmer
who codes the transaction. This task may be facilitated by automatic testing of integrity constraints.

d) Isolation requirement: if between steps 3 and 6, another transaction T2 is allowed to access the
partially updated database, it will see an inconsistent database (the sum A + B will be less than it
should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
• Isolation can be ensured trivially by running transactions serially
o that is, one after the other.
Transaction State: We need to be more precise about what we mean by successful completion of a
transaction. We therefore establish a simple abstract transaction model. A transaction must be in one of the
following states:

1. Active: the initial state; the transaction stays in this state while it is executing
2. Partially committed: after the final statement has been executed.
3. Failed: after the discovery that normal execution can no longer proceed.
4. Aborted: after the transaction has been rolled back and the database restored to its state prior to the
start f the transaction. Two options after it has been aborted:
❖ restart the transaction (can be done only if no internal logical error)
❖ kill the transaction
5. Committed: after successful completion

State diagram of a transaction.

Transaction Isolation (Need of concurrency): Transaction-processing systems usually allow


multiple transactions to run concurrently. Allowing multiple transactions to update data concurrently causes
several complications with consistency of the data, as we saw earlier. Ensuring consistency in spite of
concurrent execution of transactions requires extra work; it is far easier to insist that transactions run
serially—that is, one at a time, each starting only after the previous one has completed. However, there are
two good reasons for allowing concurrency:
1- Improved throughput and resource utilization: A transaction consists of many steps. Some involve
I/O activity; others involve CPU activity. The CPU and the disks in a computer system can operate in
parallel. Therefore, I/O activity can be done in parallel with processing at the CPU. The parallelism of
the CPU and the I/O system can therefore be exploited to run multiple transactions in parallel. While a
read r write on behalf of one transaction is in progress on one disk, another transaction can be
running in the CPU, while another disk may be executing a read or write on behalf of a third
transaction. All of this increases the throughput of the system—that is, the number of transactions
executed in a given amount of time. Correspondingly, the processor and disk utilization also increase;
in other words, the processor and disk spend less time idle, or not performing any useful work.

2- Reduced waiting time. There may be a mix of transactions running on a system, some short and
some long. If transactions run serially, a short transaction may have to wait for a preceding long
transaction to complete, which can lead to unpredictable delays in running a transaction. If the
transactions are operating on different parts of the database, it is better to let them run concurrently,
sharing the CPU cycles and disk accesses among them. Concurrent execution reduces the
unpredictable delays in running transactions. Moreover, it also reduces the average response time: the
average time for a transaction to be completed after it has been submitted.

Schedule: a sequences of instructions that specify the chronological order in which instructions of
concurrent transactions are executed
❖ a schedule for a set of transactions must consist of all instructions of those transactions
❖ Must preserve the order in which the instructions appear in each individual transaction.
❖ A transaction that successfully completes its execution will have a commit instructions as the
last statement (by default transaction assumed to execute commit instruction as its last step)
❖ A transaction that fails to successfully complete its execution will have an abort instruction as
the last statement

Schedule 1
➢ Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
➢ A serial schedule in which T1 is followed by T2 :

Schedule 2
➢ A serial schedule where T2 is followed by T1

Schedule 3
➢ Let T1 and T2 be the transactions defined previously. The following schedule is not a serial
schedule, but it is equivalent to Schedule 1.
Note: In Schedules 1, 2 and 3, the sum A + B is preserved.

Schedule 4
➢ The following concurrent schedule does not preserve the value of (A + B ).

Serializability

Before we can consider how the concurrency-control component of the database system can ensure
serializability, we consider how to determine when a schedule is serializable. Certainly, serial schedules are
serializable, but if steps of multiple transactions are interleaved, it is harder to determine whether a schedule is
serializable.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of
schedule equivalence give rise to the notions of:
1. Conflict serializability
2. View serializability

Conflicting Instructions:
Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q
accessed by both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
1) Conflict Serializability: If a schedule S can be transformed into a schedule S´ by a series of
swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent. We say that a
schedule S is conflict serializable if it is conflict equivalent to a serial schedule.
Example: Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by
series of swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.

Example of a schedule that is not conflict serializable:

We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the
serial schedule < T4, T3 >

2) View Serializability: Let S and S´ be two schedules with the same set of transactions. S and S´
are view equivalent if the following three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also transaction
Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value was produced by transaction Tj
(if any), then in schedule S’ also transaction Ti must read the value of Q that was produced by
the same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation in schedule S must also
perform the final write(Q) operation in schedule S’.

A schedule S is view serializable if it is view equivalent to a serial schedule. Every conflict


serializable schedule is also view serializable. Below is a schedule which is view-serializable but not
conflict serializable.
Concurrency Control

In a multiprogramming environment where multiple transactions can be executed simultaneously, it


is highly important to control the concurrency of transactions. We have concurrency control
protocols to ensure atomicity, isolation, and serializability of concurrent transactions. Concurrency
control protocols can be broadly divided into two categories −

• Lock based protocols


• Time stamp based protocols

Lock-based Protocols
Database systems equipped with lock-based protocols use a mechanism by which any transaction
cannot read or write data until it acquires an appropriate lock on it. Locks are of two kinds −

• Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
• Shared/exclusive − This type of locking mechanism differentiates the locks based on their
uses. If a lock is acquired on a data item to perform a write operation, it is an exclusive lock.
Allowing more than one transaction to write on the same data item would lead the database
into an inconsistent state. Read locks are shared because no data value is being changed.

There are four types of lock protocols available −

Simplistic Lock Protocol

Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write'
operation is performed. Transactions may unlock the data item after completing the ‘write’
operation.

Pre-claiming Lock Protocol

Pre-claiming protocols evaluate their operations and create a list of data items on which they need
locks. Before initiating an execution, the transaction requests the system for all the locks it needs
beforehand. If all the locks are granted, the transaction executes and releases all the locks when all its
operations are over. If all the locks are not granted, the transaction rolls back and waits until all the
locks are granted.

Two-Phase Locking 2PL


This locking protocol divides the execution phase of a transaction into three parts. In the first part,
when the transaction starts executing, it seeks permission for the locks it requires. The second part is
where the transaction acquires all the locks. As soon as the transaction releases its first lock, the third
phase starts. In this phase, the transaction cannot demand any new locks; it only releases the acquired
locks.

Two-phase locking has two phases, one is growing, where all the locks are being acquired by the
transaction; and the second phase is shrinking, where the locks held by the transaction are being
released.

To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then
upgrade it to an exclusive lock.

Strict Two-Phase Locking

The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the
transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a lock
after using it. Strict-2PL holds all the locks until the commit point and releases all the locks at a time.

Strict-2PL does not have cascading abort as 2PL does.

Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This protocol uses
either system time or logical counter as a timestamp.

Lock-based protocols manage the order between the conflicting pairs among transactions at the time
of execution, whereas timestamp-based protocols start working as soon as a transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age of the
transaction. A transaction created at 0002 clock time would be older than all other transactions that
come after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger
and the priority would be given to the older one.

In addition, every data item is given the latest read and write-timestamp. This lets the system know
when the last ‘read and write’ operation was performed on the data item.

Timestamp Ordering Protocol


The timestamp-ordering protocol ensures serializability among transactions in their conflicting read
and write operations. This is the responsibility of the protocol system that the conflicting pair of
tasks should be executed according to the timestamp values of the transactions.

• The timestamp of transaction Ti is denoted as TS(Ti).


• Read time-stamp of data-item X is denoted by R-timestamp(X).
• Write time-stamp of data-item X is denoted by W-timestamp(X).

Timestamp ordering protocol works as follows −

• If a transaction Ti issues a read(X) operation −


o If TS(Ti) < W-timestamp(X)
▪ Operation rejected.
o If TS(Ti) >= W-timestamp(X)
▪ Operation executed.
o All data-item timestamps updated.
• If a transaction Ti issues a write(X) operation −
o If TS(Ti) < R-timestamp(X)
▪ Operation rejected.
o If TS(Ti) < W-timestamp(X)
▪ Operation rejected and Ti rolled back.
o Otherwise, operation executed.

Database security

Introduction
DB2 database and functions can be managed by two different modes of security controls:

1. Authentication
2. Authorization

Authentication
Authentication is the process of confirming that a user logs in only in accordance with the rights to
perform the activities he is authorized to perform. User authentication can be performed at operating
system level or database level itself. By using authentication tools for biometrics such as retina and
figure prints are in use to keep the database from hackers or malicious users.

The database security can be managed from outside the db2 database system. Here are some type of
security authentication process:

• Based on Operating System authentications.


• Lightweight Directory Access Protocol (LDAP)

For DB2, the security service is a part of operating system as a separate product. For Authentication,
it requires two different credentials, those are userid or username, and password.

Authorization

You can access the DB2 Database and its functionality within the DB2 database system, which is
managed by the DB2 Database manager. Authorization is a process managed by the DB2 Database
manager. The manager obtains information about the current authenticated user, that indicates which
database operation the user can perform or access.

Here are different ways of permissions available for authorization:

Primary permission: Grants the authorization ID directly.

Secondary permission: Grants to the groups and roles if the user is a member

Public permission: Grants to all users publicly.

Context-sensitive permission: Grants to the trusted context role.

Authorization can be given to users based on the categories below:

• System-level authorization
• System administrator [SYSADM]
• System Control [SYSCTRL]
• System maintenance [SYSMAINT]
• System monitor [SYSMON]

Authorities provide of control over instance-level functionality. Authority provide to group


privileges, to control maintenance and authority operations. For instance, database and database
objects.

• Database-level authorization
• Security Administrator [SECADM]
• Database Administrator [DBADM]
• Access Control [ACCESSCTRL]
• Data access [DATAACCESS]
• SQL administrator. [SQLADM]
• Workload management administrator [WLMADM]
• Explain [EXPLAIN]

Authorities provide controls within the database. Other authorities for database include with LDAD
and CONNECT.

• Object-Level Authorization: Object-Level authorization involves verifying privileges when


an operation is performed on an object.
• Content-based Authorization: User can have read and write access to individual rows and
columns on a particular table using Label-based access Control [LBAC].

DB2 tables and configuration files are used to record the permissions associated with authorization
names. When a user tries to access the data, the recorded permissions verify the following
permissions:

• Authorization name of the user


• Which group belongs to the user
• Which roles are granted directly to the user or indirectly to a group
• Permissions acquired through a trusted context.

You might also like