Professional Documents
Culture Documents
DBMS Unit 6
DBMS Unit 6
Definition of Normalization
Normalization is a scientific method of breaking down complex table structures into Simple table
structures by using certain rules. Using this method, you can reduce redundancy in a table and eliminate
the problems of inconsistency and disk space usage. You can also ensure that there is no loss of
information.
Normalization has several benefits. it enables faster sorting .Normalization helps to simplify the structure
of tables. The performance of an application indirectly linked to the database design. A poor design
hinders the performance of the system. The logical design of the database lays the foundation of an
optimal database.
Some rules that should be followed to achieve a good database design are:
*Each table should have in identifier.
*Each table should store data for a single type of entity.
*Columns that accept NULL should be avoided.
*The repetition of values in columns should be avoided.
Normalization results in the formation of tables that satisfy certain specified rules and represent certain
normal forms. The normal forms are used to ensure that various types of anomalies and inconsistencies are
not introduced in the database. A table structure is always in a certain normal form. Several normal tonus
have been identified. The most important and widely used normal forms are:
First Normal form (INF)
Second Normal form (2NF)
Third Normal form (3NF)
Boyce-Codd Normal form (BCNF)
Fourth Normal form (4NF)
Fifth Normal form (5NF)
First Normal Form (1 NF)
A table is said to be in the 1NF when each cell of the table contains precisely one value. Consider
the following table PROJECT.
PROJECT
ECODE DEPT DEPTHEAD PROJCODE HOURS
EIOI SYSTEMS E901 P27 90
P51 101
P20 60
E305 SALES E906 P27 109
P22 98
E50S ADMIN E908 P51 NULL
P27 72
The data in the table is not normalized because the cells in PROJCODE and HOURS have more
than one value.
By applying the 1NF- definition to the PROJECT table, you arrive at the following table
PROJECT
ECODE DEPT DEPTHEAD PROJCODE HOURS
E101 SYSTEMS E901 P27 90
EIOI SYSTEMS E901 P51 101
EIOI SYSTEMS E901 P20 60
E305 SALES E906 P27 109
E30S SALES E906 P22 98
E508 ADMIN E906 P51 NULL
Functional Dependency
The normalization theory is based on the fundamental notion of Functional dependency. First, let's
examine the concept of functional dependency. Given relation R, attribute A is functionally
dependent on attribute B if each value of A in R is associated with precisely one value of B. In other
words, attribute A is functionally dependent on B if and only if, for each value of B, there is exactly
one value of A. Attribute B is called the determinant.
EMPLOYEE
Given a particular value of CODE, there is precisely one corresponding value for NAME For
example, for CODE El there is exactly one value of NAME Ra. Hence. NAME is functionally
dependent on CODE. Similarly, there is exactly one value of CITY for each value of CODE. Hence,
the attribute CITY is functionally dependent on the attribute CODE. The attribute CODE is the
determinant. You can also say that CODE determines CITY and NAME.
PROJECT
ECODE
PROJCODE
DEPT
DEPTHEAD
HOURS
■ Updation For a given employee, the employee code, department name, and department head are
repeated several times. Hence, if an employee is transferred to another department, this change
will have to be recorded in every row of the EMPLOYEE table pertaining to that employee. Any
omission will lead to inconsistencies.
■ D e l e t i o n : When an employee completes work on a project, the employee's record is deleted.
The information regarding the department to which the employee belongs will also be lost.
The primary key here is composite (ECODF>PROJCODE).
The table satisfies the definition of INF. You need to now check if it satisfies 2NF.
In the table, for each value of ECODE, there is more than one value of HOURS. For example, for
ECODE, El01, there are three values of HOURS 90, 101, and 60. Hence, HOURS is not functionally
dependent on ECODE. Similarly, for each value of PROJCODE, there is more than one value of
HOURS. For example, for PROJCODE P27, there are three values of HOURS - 90, 10, and 72.
However, for a combination of the ECODE and PROJCODE values, there is exactly one value of
HOURS Hence, HOURS is functionally dependent on the whole key, ECODE+PROJCODE.
Now, you must check if DEPT is functionally dependent on the whole key, ECODE+PROJCODE.
For each value of ECODE, there is exactly one value of DEPT. For example, for ECODE 101, there
is exactly one value, the System department. Hence, DEPT is functionally dependent on ECODE.
However, for each value of PROJCODE, there is more than one value of DEPT. For example,
PROJCODE P27 is associated with two values of DEPT, System and Finance. Hence, DFPT is not
functionally dependent on PROJCODF. DFPT is, therefore, functionally dependent on part of the
key (which is ECODE) and not functionally dependent on the whole key (ECODE+PROJCODE).
Similar dependency is true for the DEPTHEAD attribute. Therefore, the table PROJECT is not in
2NF. For the table to be in 2NF, the non-key attributes must be functionally dependent on the whole
key and not part of the key.
Guidelines for Converting a Table to 2NF
• Find and remove attributes that are functionally dependent on only a part of the key and not
on the whole key. Place them in a different table.
■ Group the remaining attributes.
To convert the table PROJECT into 2NF, you must remove the attributes that are not functionally
dependent on the whole key and place them in a different table along with the attribute that it is
functionally dependent on. In the above example, since DEPT is not functionally dependent on the
whole key ECODE+PROJCODE, you place DEPT along with ECODE in a separate table called
EMPLOYEEDEPT. We also move the DEPTHEAD to the EMPLOYEEDEPT table.
Now the table PROJECT will contain ECODE ,PROJCODE and HOURS
EMPLOYEEDEPT
ECODE DEPT DEPTHEAD
E101 SYSTEMS E901
E305 FINANCE E909
E508 ADMIN E908
PROJECT
ECODE PROJCODE HOURS
E101 P27 90
E101 P51 101
E101 P20 60
E305 P27 10
E508 P51 NULL
E508 P27 72
A table is said to be said in 3NF when it is in 2NF and every non-key attribute is functionally
dependent only on the primary key..
Consider the table EMPLOYEE.
You must check it the table is in 3NF. Since each cell in the table has a single value, the table is in
INF
The primary key in the EMPLOYEE table is ECODE. For each value of ECODE, there is exactly
one value of DEPT. Hence, the attribute DEPT is functionally dependent on the primary key,
ECODE. Similarly, for each value of ECODE, there is exactly one value of DEPTHEAD. Therefore,
DEPTHEAD is functionally dependent on the primary key ECODE. Hence, all the attributes are
functionally dependent on the whole key, ECODE. Hence the table is in 2NF.
However, the attribute DEPTHEAD is dependent on the attribute DEPT also. As per 3NF, all non-
key attributes have to be functionally dependent only on the primary key. This table is not in 3NF
since DEPTHEAD is functionally dependent on DEPT, which is not a primary key.
EMPLOYEE DEPARTMENT
ECODE DEPT
DEPT DEPTHEAD
E101 Systems
Systems E90I
E305 Finance
E402 Sales Sales E906
Admin E908
E508 ADMIN Finance E909
E607 Finance
E608 Finance
The original definition of 3NF was inadequate in some situations. It was not satisfactory for the
tables:
■ that had multiple candidate keys
■ where the multiple candidate keys were composite.
■ where the multiple candidate keys overlapped (had at least one attribute in common)
Therefore, a new normal form, the BCNF was introduced. You must understand that in tables
where the above three conditions do not apply, you can stop at the third normal form. In such
cases, the third NF is the same as the BCNF.
A relation is in the BCNF if and only if every determinant is a candidate key.
PROFCODE+DEPT is the primary key. You will notice that PROFCODE+HOD could be chosen as
the primary key and hence, is a candidate key.
You will notice that this table has.
■ Multiple candidate k ey s , that is PROFCODE+DEPT and PROFCODE+HOD.
■ Th e candidate keys are composite.
■ The candidate keys overlap since the attribute PROFCODE is common.
This is a situation that requires conversion to BCNF.
DEPT and HOD are determinants since they are functionally dependent on each other However, they
are not candidate keys by themselves. As per BCNF, the determinants have to be candidate keys.
Guidelines for Converting a Table to BCNF
• Find and remove the overlapping candidate keys. Place them of the candidate key and the
attribute it is functionally dependent on., in a different table.
• Group the remaining items into a table.
Hence, remove DEPT and HOD and place them in a different table. You will arrive at the following
tables
DEPARTMENT
Multi-valued Dependency
Functional dependencies rule out certain tuples from being in a relation. If A —> B, then we cannot have two
tuples with the same A value but different B values. Multi-valued dependencies, on the other hand, do not rule
out the existence of certain tuples, which have multiple dependencies. Instead, they require that other tuples of a
certain form be present in the relation.
Multi-valued dependencies are a consequence of first normal form. First normal form does not allow an attribute
in a tuple to have more than one value. If we have two or more multi-valued independent attributes in the same
relation schema, then we would get into a problem of having to repeat every value of one of the attributes with
every value of the other attribute to keep the relation instances consistent. This constraint is specified by a multi
valued dependency.
Functional dependencies are also referred to as equality generating dependencies, and multi valued
dependencies are also referred to as tuple generating dependencies.
Consider the following FACULTY table.
FACULTY SUBJECT COMMITTE E
A mit DBMS P l a c e me nt
A mit Net w or k i n g P l a c e me nt
A mit Data Structur e P l a c e me nt
A mit DBMS S c h o l a r shi p
A mit Net w o r k i n g S c h o l a r shi p
A mit Data Structur e S c h o l a r shi p
A tuple in this FACULTY relation represents the fact that an FACULTY teaching different SUBJECT and
the committees for the which they are the in charge.
For FACULTY attribute there are multiple values of SUBJECT and COMMITTEE attributes, but
SUBJECT and COMMITTEE are not related to each other. So multi-valued dependency exist in this
table.
4 N ormal For m :
A t abl e i s in 4N F if it is in BCN F an d it d o e s co nt ain n o mu l t i - valued
dep e nd en cy .
C o n s i d er th e f ol l o w in g F AC U LTY t abl e wit h mu l t i - v al ue d d ep e n d e n c y ,
w h e r e a f acu lty has mu lt ip le s u b j e c t s to t e ac h and h e i s head ing se ver a l
co mmi t t e e s .
FACULTY SUBJECT COMMITTE E
A mit DBMS P l a c e me nt
A mit Net w or k i n g P l a c e me nt
A mit Data Structur e P l a c e me nt
A mit DBMS S c h o l a r shi p
A mit Net w o r k i n g S c h o l a r shi p
A mit Data Structur e S c h o l a r shi p
Th is r e l a t io n is BC N F . B ut th is rel a ti o n n e ed t o de co mp os it ion .T h e ru l e f or
deco mp os i t i o n is to de c o m p o s e th e o f f e n d i n g tab l e in to tw o, wit h the mu lti -
d et er min an t attr ib ute or a t t r i b u t e as par t of th e k ey of b ot h . In t hi s cas e , to
pu t th e rel a tio n in 4 N F , tw o s e par a t e r e l a t io n s ar e fo r me d as fo ll o w s .
FACULTY SUBJECT
A mit DBMS
A mit Net w or k i n g
A mit Data Structur e
Ami t D BM S
A mit Net w or k i n g
A mit Data Structur e
FACULTY COMMITTEE
A mit Pl ac e men t
Ami t Sc ho l a r s hip
Jo in dependency:
COMPANY_SUPPLIER
COMPNAY SUPPLIER
Soap Mr.X
Soap Mr.Y
Shampoo Mr.Z
Shampoo Mr.X
Shampoo Mr.Y
The above said redundancy has been eliminated but we have lots of information. For example if we
want to display the products and their suppliers, then we will have to use the join based on the
company attribute. The result will display some spurious records. For Mr. Z, it will display both the
products, soap and shampoo as the company for which Mr. Z is the supplier (Godrej) is producing
soap and shampoo, which is incorrect.
Now suppose that the original tables were to be decomposed in three parts, company_product,
company_supplier and one more produc_supplier, which is as shown:
PRODUCT_SUPPLIER
PRODUCT SUPPLIER
Soap Mr.X
Soap Mr. Y
Shampoo Mr.X
Shampoo Mr. Y
Shampoo Mr. Z
If a join is taken of all the projections, again we will get wrong results. So it is not possible to
decompose the original table without losing information. Thus using the normalization
techniques cannot eliminate all redundancies, because it cannot be assumed that all
decompositions will be non-loss. So it is clear that if a table is in 4NF and cannot be further
non-loss decomposed, it is said to be in 5NF.
Concurrency Control
Transaction Concept : A transaction is a unit of program execution that accesses and possibly updates
various data items. Usually, a transaction is initiated by a user program written in a high-level data-
manipulation language (typically SQL), or programming language (for example, C++, or Java), with
embedded database accesses in JDBC or ODBC. A transaction is delimited by statements (or function calls) of
the form begin transaction and end transaction. The transaction consists of all operations executed between the
begin transaction and end transaction. This collection of steps must appear to the user as a single, indivisible
unit.
ACID Properties:
a) Atomicity: Since a transaction is indivisible, it either executes in its entirety or not at all. Thus, if a
transaction begins to execute but fails for whatever reason, any changes to the database that the
transaction may have made must be undone. This requirement holds regardless of whether the
transaction itself failed (for ex, if it divided by zero), the operating system crashed, or the computer
itself stopped operating. As we shall see, ensuring that this requirement is met is difficult since some
changes to the database may still be stored only in the main-memory variables of the transaction,
while others may have been written to the database and stored on disk. This “all-or-none” property is
referred to as atomicity.
b) Isolation: Furthermore, since a transaction is a single unit, its actions cannot appear to be separated
by other database operations not part of the transaction. While we wish to present this user-level
impression of transactions, we know that reality is quite different. Even a single SQL statement
involves many separate accesses to the database, and a transaction may consist of several SQL
statements. Therefore, the database system must take special actions to ensure that transactions
operate properly without interference from concurrently executing database statements. This property
is referred to as isolation.
c) Durability: Even if the system ensures correct execution of a transaction, this serves little purpose if
the system subsequently crashes and, as a result, the system “forgets” about the transaction. Thus, a
transaction’s actions must persist across crashes. This property is referred to as durability.
d) Consistency: Because of the above three properties, transactions are an ideal way of structuring
interaction with a database. This leads us to impose a requirement on transactions themselves. A
transaction must preserve database consistency—if a transaction is run atomically in isolation starting
from a consistent database, the database must again be consistent at the end of the transaction. This
consistency requirement goes beyond the data integrity constraints we have seen earlier (such as
primary-key constraints, referential integrity, check constraints, and the like). Rather, transactions are
expected to go beyond that to ensure preservation of those application-dependent consistency
constraints that are too complex to state using the SQL constructs for data integrity. How this is done
is the responsibility of the programmer who codes a transaction. This property is referred to as
consistency.
Example : We shall illustrate the transaction concept using a simple bank application consisting of several
accounts and a set of transactions that access and update those accounts. Transactions access data using two
operations:
• read(X), which transfers the data item X from the database to a variable, also called X, in a buffer in main
memory belonging to the transaction that executed the read operation.
• write(X), which transfers the value in the variable X in the main-memory buffer of the transaction that
executed the write to the data item X in the database.
It is important to know if a change to a data item appears only in main memory or if it has been written to the
database on disk. In a real database system, the write operation does not necessarily result in the immediate
update of the data on the disk; the write operation may be temporarily stored elsewhere and executed on the
disk later. For now, however, we shall assume that the write operation updates the database immediately.
Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be defined as:
Ti : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Let us now consider each of the ACID properties.
a) Atomicity requirement
if the transaction fails after step 3 and before step 6, money will be “lost” leading to an
inconsistent database state
Failure could be due to software or hardware
the system should ensure that updates of a partially executed transaction are not reflected in
the database
b) Durability requirement: once the user has been notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to the database by the transaction must persist even if
there are software or hardware failures
c) Consistency requirement: The consistency requirement here is that the sum of A and B be
unchanged by the execution of the transaction. Without the consistency requirement, money could be
created or destroyed by the transaction! It can be verified easily that, if the database is consistent
before an execution of the transaction, the database remains consistent after the execution of the
transaction.
Ensuring consistency for an individual transaction is the responsibility of the application programmer
who codes the transaction. This task may be facilitated by automatic testing of integrity constraints.
d) Isolation requirement: if between steps 3 and 6, another transaction T2 is allowed to access the
partially updated database, it will see an inconsistent database (the sum A + B will be less than it
should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
• Isolation can be ensured trivially by running transactions serially
o that is, one after the other.
Transaction State: We need to be more precise about what we mean by successful completion of a
transaction. We therefore establish a simple abstract transaction model. A transaction must be in one of the
following states:
1. Active: the initial state; the transaction stays in this state while it is executing
2. Partially committed: after the final statement has been executed.
3. Failed: after the discovery that normal execution can no longer proceed.
4. Aborted: after the transaction has been rolled back and the database restored to its state prior to the
start f the transaction. Two options after it has been aborted:
❖ restart the transaction (can be done only if no internal logical error)
❖ kill the transaction
5. Committed: after successful completion
2- Reduced waiting time. There may be a mix of transactions running on a system, some short and
some long. If transactions run serially, a short transaction may have to wait for a preceding long
transaction to complete, which can lead to unpredictable delays in running a transaction. If the
transactions are operating on different parts of the database, it is better to let them run concurrently,
sharing the CPU cycles and disk accesses among them. Concurrent execution reduces the
unpredictable delays in running transactions. Moreover, it also reduces the average response time: the
average time for a transaction to be completed after it has been submitted.
Schedule: a sequences of instructions that specify the chronological order in which instructions of
concurrent transactions are executed
❖ a schedule for a set of transactions must consist of all instructions of those transactions
❖ Must preserve the order in which the instructions appear in each individual transaction.
❖ A transaction that successfully completes its execution will have a commit instructions as the
last statement (by default transaction assumed to execute commit instruction as its last step)
❖ A transaction that fails to successfully complete its execution will have an abort instruction as
the last statement
Schedule 1
➢ Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
➢ A serial schedule in which T1 is followed by T2 :
Schedule 2
➢ A serial schedule where T2 is followed by T1
Schedule 3
➢ Let T1 and T2 be the transactions defined previously. The following schedule is not a serial
schedule, but it is equivalent to Schedule 1.
Note: In Schedules 1, 2 and 3, the sum A + B is preserved.
Schedule 4
➢ The following concurrent schedule does not preserve the value of (A + B ).
Serializability
Before we can consider how the concurrency-control component of the database system can ensure
serializability, we consider how to determine when a schedule is serializable. Certainly, serial schedules are
serializable, but if steps of multiple transactions are interleaved, it is harder to determine whether a schedule is
serializable.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of
schedule equivalence give rise to the notions of:
1. Conflict serializability
2. View serializability
Conflicting Instructions:
Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q
accessed by both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
1) Conflict Serializability: If a schedule S can be transformed into a schedule S´ by a series of
swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent. We say that a
schedule S is conflict serializable if it is conflict equivalent to a serial schedule.
Example: Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by
series of swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.
We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the
serial schedule < T4, T3 >
2) View Serializability: Let S and S´ be two schedules with the same set of transactions. S and S´
are view equivalent if the following three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also transaction
Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value was produced by transaction Tj
(if any), then in schedule S’ also transaction Ti must read the value of Q that was produced by
the same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation in schedule S must also
perform the final write(Q) operation in schedule S’.
Lock-based Protocols
Database systems equipped with lock-based protocols use a mechanism by which any transaction
cannot read or write data until it acquires an appropriate lock on it. Locks are of two kinds −
• Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
• Shared/exclusive − This type of locking mechanism differentiates the locks based on their
uses. If a lock is acquired on a data item to perform a write operation, it is an exclusive lock.
Allowing more than one transaction to write on the same data item would lead the database
into an inconsistent state. Read locks are shared because no data value is being changed.
Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write'
operation is performed. Transactions may unlock the data item after completing the ‘write’
operation.
Pre-claiming protocols evaluate their operations and create a list of data items on which they need
locks. Before initiating an execution, the transaction requests the system for all the locks it needs
beforehand. If all the locks are granted, the transaction executes and releases all the locks when all its
operations are over. If all the locks are not granted, the transaction rolls back and waits until all the
locks are granted.
Two-phase locking has two phases, one is growing, where all the locks are being acquired by the
transaction; and the second phase is shrinking, where the locks held by the transaction are being
released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then
upgrade it to an exclusive lock.
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the
transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a lock
after using it. Strict-2PL holds all the locks until the commit point and releases all the locks at a time.
Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This protocol uses
either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the time
of execution, whereas timestamp-based protocols start working as soon as a transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age of the
transaction. A transaction created at 0002 clock time would be older than all other transactions that
come after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger
and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system know
when the last ‘read and write’ operation was performed on the data item.
Database security
Introduction
DB2 database and functions can be managed by two different modes of security controls:
1. Authentication
2. Authorization
Authentication
Authentication is the process of confirming that a user logs in only in accordance with the rights to
perform the activities he is authorized to perform. User authentication can be performed at operating
system level or database level itself. By using authentication tools for biometrics such as retina and
figure prints are in use to keep the database from hackers or malicious users.
The database security can be managed from outside the db2 database system. Here are some type of
security authentication process:
For DB2, the security service is a part of operating system as a separate product. For Authentication,
it requires two different credentials, those are userid or username, and password.
Authorization
You can access the DB2 Database and its functionality within the DB2 database system, which is
managed by the DB2 Database manager. Authorization is a process managed by the DB2 Database
manager. The manager obtains information about the current authenticated user, that indicates which
database operation the user can perform or access.
Secondary permission: Grants to the groups and roles if the user is a member
• System-level authorization
• System administrator [SYSADM]
• System Control [SYSCTRL]
• System maintenance [SYSMAINT]
• System monitor [SYSMON]
• Database-level authorization
• Security Administrator [SECADM]
• Database Administrator [DBADM]
• Access Control [ACCESSCTRL]
• Data access [DATAACCESS]
• SQL administrator. [SQLADM]
• Workload management administrator [WLMADM]
• Explain [EXPLAIN]
Authorities provide controls within the database. Other authorities for database include with LDAD
and CONNECT.
DB2 tables and configuration files are used to record the permissions associated with authorization
names. When a user tries to access the data, the recorded permissions verify the following
permissions: