You are on page 1of 15

Types of keys in DBMS

Primary Key – A primary is a column or set of columns in a table that uniquely


identifies tuples (rows) in that table.

Super Key – A super key is a set of one of more columns (attributes) to


uniquely identify rows in a table.

Candidate Key – A super key with no redundant attribute is known as


candidate key

Alternate Key – Out of all candidate keys, only one gets selected as primary
key, remaining keys are known as alternate or secondary keys.

Composite Key – A key that consists of more than one attribute to uniquely
identify rows (also known as records & tuples) in a table is called composite
key.

Foreign Key – Foreign keys are the columns of a table that points to the
primary key of another table. They act as a cross-reference between tables.

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database

Normalization is a process of organizing the data in database to avoid data redundancy,


insertion anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first
then we will discuss normal forms with examples.

Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These
are: Insertion, update and deletion anomaly. Let’s take an example to understand this.

Example: A manufacturing company stores the employee details in a table Employee that


has four attributes: Emp_Id for storing employee’s id, Emp_Name for storing employee’s
name, Emp_Address for storing employee’s address and Emp_Dept for storing the
department details in which the employee works. At some point of time the table looks like
this:

Emp_Id Emp_Name Emp_Address Emp_Dept

101 Rick Delhi D001

101 Rick Delhi D002


123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004


This table is not normalized. We will see the problems that we face when a table in
database is not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he belongs to
two departments of the company. If we want to update the address of Rick then we have to
update the same in two rows or the data will become inconsistent. If somehow, the correct
address gets updated in one department but not in other then as per the database, Rick would
be having two different addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into the
table if Emp_Dept field doesn’t allow null.

Delete anomaly: Let’s say in future, company closes the department D890 then deleting the
rows that are having Emp_Dept as D890 would also delete the information of
employee Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will
discuss about normalization.

Normalization
Here are the most commonly used normal forms:

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)
 Boyce & Codd normal form (BCNF)

First normal form (1NF)


A relation is said to be in 1NF (first normal form), if it doesn’t contain any multi-valued
attribute. In other words you can say that a relation is in 1NF if each attribute contains only
atomic(single) value only.

As per the rule of first normal form, an attribute (column) of a table cannot hold multiple
values. It should hold only atomic values.

Example: Let’s say a company wants to store the names and contact details of its employees.
It creates a table in the database that looks like this:

Emp_Id Emp_Name Emp_Address Emp_Mobile


101 Herschel New Delhi 8912312390

8812121212 ,
102 Jon Kanpur
9900012222

103 Ron Chennai 7778881212

9990000123,
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) have two mobile numbers that caused the Emp_Mobile field
to have multiple values for these two employees.

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the Emp_Mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we need to create separate rows for the each mobile
number in such a way so that none of the attributes contains multiple values.

Emp_Id Emp_Name Emp_Address Emp_Mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

To learn more about 1NF refer this article: 1NF

Second normal form (2NF)


A table is said to be in 2NF if both the following conditions hold:

 Table is in 1NF (First normal form)


 No non-prime attribute is dependent on the proper subset of any candidate
key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Let’s say a school wants to store the data of teachers and the subjects they teach.
They create a table Teacher that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.
Teacher_Id Subject Teacher_Age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {Teacher_Id, Subject}
Non prime attribute: Teacher_Age

This table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute Teacher_Age is dependent on Teacher_Id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime
attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can disintegrate it in two tables like this:
Teacher_Details table:

Teacher_Id Teacher_Age

111 38

222 38

333 40
Teacher_Subject table:

Teacher_Id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables are in Second normal form (2NF). To learn more about 2NF refer this
guide: 2NF

Third Normal form (3NF)


A table design is said to be in 3NF if both the following conditions hold:

 Table must be in 2NF


 Transitive functional dependency of non-prime attribute on any super key
should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:

 X is a super key of table


 Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Let’s say a company wants to store the complete address of each employee, they
create a table named Employee_Details that looks like this:

Emp Emp_NaEmp_ Emp_St Emp_ Emp_Dis


_Id me Zip ate City trict

28200 Dayal
1001 John UP Agra
5 Bagh

22200 Chenn
1002 Ajeet TN M-City
8 ai

28200 Chenn Urrapakk


1006 Lora TN
7 ai am

29200
1101 Lilly UK Pauri Bhagwan
8

22299 Gwalio
1201 Steve MP Ratan
9 r

Super keys: {Emp_Id}, {Emp_Id, Emp_Name}, {Emp_Id, Emp_Name, Emp_Zip}…so on


Candidate Keys: {Emp_Id}
Non-prime attributes: all attributes except Emp_Id are non-prime as they are not part of any
candidate keys.

Here, Emp_State, Emp_City & Emp_District dependent on Emp_Zip. Further Emp_zip is


dependent on Emp_Id that makes non-prime attributes
(Emp_State, Emp_City & Emp_District) transitively dependent on super key (Emp_Id). This
violates the rule of 3NF.
To make this table complies with 3NF we have to disintegrate the table into two tables to
remove the transitive dependency:

Employee Table:

Emp_Id Emp_Name Emp_Zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999

Employee_Zip table:

Emp_Zip Emp_State Emp_City Emp_District

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan


Boyce Codd normal form (BCNF)
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than
3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X-
>Y, X should be the super key of the table.

Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:

Emp_Id Emp_Nationality Emp_Dept Dept_Type Dept_No_Of_Emp


1001 Austrian Production and planning D001 200
1001 Austrian stores D001 250
1002 American design and technical support D134 100
1002 American Purchasing department D134 600

Functional dependencies in the table above:


Emp_Id -> Emp_Nationality
Emp_Dept -> {Dept_Type, Dept_No_Of_Emp}

Candidate key: {Emp_Id, Emp_Dept}

The table is not in BCNF as neither Emp_Id nor Emp_Dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:
Emp_Nationality table:

Emp_Id Emp_Nationality

1001 Austrian

1002 American

Emp_Dept table:

Emp_Dept Dept_Type Dept_No_Of_Emp

Production and planning D001 200

stores D001 250

design and technical support D134 100

Purchasing department D134 600

Emp_Dept_Mapping table:

Emp_Id Emp_Dept
1001 Production and planning
1001 stores
1002 design and technical support

1002 Purchasing department

Functional dependencies:
Emp_Id -> Emp_Nationality
Emp_Dept -> {Dept_Type, Dept_No_Of_Emp}

Candidate keys:
For first table: Emp_Id
For second table: Emp_Dept
For third table: {Emp_Id, Emp_Dept}

This table is now in BCNF as in both the functional dependencies left side part is a key.

Functional dependency in DBMS

The attributes of a table is said to be dependent on each other when an attribute of a table
uniquely identifies another attribute of the same table.

For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age.
Here Stu_Id attribute uniquely identifies the Stu_Name attribute of student table because if
we know the student id we can tell the student name associated with it. This is known as
functional dependency and can be written as Stu_Id->Stu_Name or in words we can say
Stu_Name is functionally dependent on Stu_Id.

Formally:
If column A of a table uniquely identifies the column B of same table then it can represented
as A->B (Attribute B is functionally dependent on attribute A)

Types of Functional Dependencies

 Trivial functional dependency


 non-trivial functional dependency
 Multivalued dependency
 Transitive dependency

Transaction Management in DBMS

A transaction is a set of logically related operations. For example, you are transferring
money from your bank account to your friend’s account, the set of operations would be like
this:
Simple Transaction Example
1. Read your account balance
2. Deduct the amount from your balance
3. Write the remaining balance to your account
4. Read your friend’s account balance
5. Add the amount to his account balance
6. Write the new updated balance to his account

This whole set of operations can be called a transaction. Although I have shown you read,
write and update operations in the above example but the transaction can have operations like
read, write, insert, update, delete.

In DBMS, we write the above 6 steps transaction like this:


Lets say your account is A and your friend’s account is B, you are transferring 10000 from A
to B, the steps of the transaction are:

1. R(A);
2. A = A - 10000;
3. W(A);
4. R(B);
5. B = B + 10000;
6. W(B);
In the above transaction R refers to the Read operation and W refers to the write operation.

Transaction failure in between the operations


Now that we understand what is transaction, we should understand what are the problems
associated with it.

The main problem that can happen during a transaction is that the transaction can fail before
finishing the all the operations in the set. This can happen due to power failure, system crash
etc. This is a serious problem that can leave database in an inconsistent state. Assume that
transaction fail after third operation (see the example above) then the amount would be
deducted from your account but your friend will not receive it.

To solve this problem, we have the following two operations

Commit: If all the operations in a transaction are completed successfully then commit those
changes to the database permanently.

Rollback: If any of the operation fails then rollback all the changes done by previous
operations.

Even though these operations can help us avoiding several issues that may arise during
transaction but they are not sufficient when two transactions are running concurrently. To
handle those problems we need to understand database ACID properties.
ACID properties in DBMS

 Atomicity: This property ensures that either all the operations of a transaction
reflect in database or none. Let’s take an example of banking system to
understand this: Suppose Account A has a balance of 400$ & B has 700$.
Account A is transferring 100$ to Account B. This is a transaction that has two
operations a) Debiting 100$ from A’s balance b) Creating 100$ to B’s balance.
Let’s say first operation passed successfully while second failed, in this case
A’s balance would be 300$ while B would be having 700$ instead of 800$.
This is unacceptable in a banking system. Either the transaction should fail
without executing any of the operation or it should process both the operations.
The Atomicity property ensures that.
 Consistency: To preserve the consistency of database, the execution of
transaction should take place in isolation (that means no other transaction
should run concurrently when there is a transaction already running). For
example account A is having a balance of 400$ and it is transferring 100$ to
account B & C both. So we have two transactions here. Let’s say these
transactions run concurrently and both the transactions read 400$ balance, in
that case the final balance of A would be 300$ instead of 200$. This is wrong.
If the transaction were to run in isolation then the second transaction would
have read the correct balance 300$ (before debiting 100$) once the first
transaction went successful.
 Isolation: For every pair of transactions, one transaction should start execution
only when the other finished execution. I have already discussed the example of
Isolation in the Consistency property above.
 Durability: Once a transaction completes successfully, the changes it has made
into the database should be permanent even if there is a system failure. The
recovery-management component of database systems ensures the durability of
transaction.

DBMS Transaction States


DBMS Transaction States Diagram

Lets discuss these states one by one.

Active State
As we have discussed in the DBMS transaction introduction that a transaction is a
sequence of operations. If a transaction is in execution then it is said to be in active state. It
doesn’t matter which step is in execution, until unless the transaction is executing, it remains
in active state.

Failed State
If a transaction is executing and a failure occurs, either a hardware failure or a software
failure then the transaction goes into failed state from the active state.

Partially Committed State


As we can see in the above diagram that a transaction goes into “partially committed” state
from the active state when there are read and write operations present in the transaction.

A transaction contains number of read and write operations. Once the whole transaction is
successfully executed, the transaction goes into partially committed state where we have all
the read and write operations performed on the main memory (local memory) instead of the
actual database.

The reason why we have this state is because a transaction can fail during execution so if we
are making the changes in the actual database instead of local memory, database may be left
in an inconsistent state in case of any failure. This state helps us to rollback the changes
made to the database in case of a failure during execution.

Committed State
If a transaction completes the execution successfully then all the changes made in the local
memory during partially committed state are permanently stored in the database. You can
also see in the above diagram that a transaction goes from partially committed state to
committed state when everything is successful.

Aborted State
As we have seen above, if a transaction fails during execution then the transaction goes into a
failed state. The changes made into the local memory (or buffer) are rolled back to the
previous consistent state and the transaction goes into aborted state from the failed state.
Refer the diagram to see the interaction between failed and aborted state.

DBMS Schedules and the Types of Schedules

We know that transactions are set of instructions and these instructions perform operations


on database. When multiple transactions are running concurrently then there needs to be a
sequence in which the operations are performed because at a time only one operation can be
performed on the database. This sequence of operations is known as Schedule.

Lets take an example to understand what is a schedule in DBMS.

DBMS Schedule example


The following sequence of operations is a schedule. Here we have two transactions T1 & T2
which are running concurrently.

This schedule determines the exact order of operations that are going to be performed on
database. In this example, all the instructions of transaction T1 are executed before the
instructions of transaction T2, however this is not always necessary and we can have various
types of schedules which we will discuss in this article.

T1 T2
---- ----
R(X)
W(X)
R(Y)
R(Y)
R(X)
W(Y)
Types of Schedules in DBMS
We have various types of schedules in DBMS. Lets discuss them one by one.

Serial Schedule
In Serial schedule, a transaction is executed completely before starting the execution of
another transaction. In other words, you can say that in serial schedule, a transaction does not
start execution until the currently running transaction finished execution. This type of
execution of transaction is also known as non-interleaved execution. The example we have
seen above is the serial schedule.

Lets take another example.

Serial Schedule example


Here R refers to the read operation and W refers to the write operation. In this example, the
transaction T2 does not start execution until the transaction T1 is finished.

T1 T2
---- ----
R(A)
R(B)
W(A)
commit
R(B)
R(A)
W(B)
commit
Strict Schedule
In Strict schedule, if the write operation of a transaction precedes a conflicting operation
(Read or Write operation) of another transaction then the commit or abort operation of such
transaction should also precede the conflicting operation of other transaction.

Lets take an example.

Strict Schedule example


Lets say we have two transactions Ta and Tb. The write operation of transaction Ta precedes
the read or write operation of transaction Tb, so the commit or abort operation of transaction
Ta should also precede the read or write of Tb.

Ta Tb
----- -----
R(X)
R(X)
W(X)
commit
W(X)
R(X)
commit
Here the write operation W(X) of Ta precedes the conflicting operation (Read or Write
operation) of Tb so the conflicting operation of Tb had to wait the commit operation of Ta.

Cascadeless Schedule
In Cascadeless Schedule, if a transaction is going to perform read operation on a value, it has
to wait until the transaction who is performing write on that value commits.

Cascadeless Schedule example


For example, lets say we have two transactions Ta and Tb. Tb is going to read the value X
after the W(X) of Ta then Tb has to wait for the commit operation of transaction Ta before it
reads the X.

Ta Tb
----- -----
R(X)
W(X)
W(X)
commit
R(X)
W(X)
commit
Recoverable Schedule
In Recoverable schedule, if a transaction is reading a value which has been updated by some
other transaction then this transaction can commit only after the commit of other transaction
which is updating value.
Recoverable Schedule example
Here Tb is performing read operation on X after the Ta has made changes in X using W(X)
so Tb can only commit after the commit operation of Ta.

Ta Tb
----- -----
R(X)
W(X)
R(X)
W(X)
R(X)
commit
commit

DBMS Serializability

When multiple transactions are running concurrently then there is a possibility that the
database may be left in an inconsistent state. Serializability is a concept that helps us to check
which schedules are serializable. A serializable schedule is the one that always leaves the
database in consistent state.

What is a serializable schedule?


A serializable schedule always leaves the database in consistent state. A serial schedule is
always a serializable schedule because in serial schedule, a transaction only starts when the
other transaction finished execution. However a non-serial schedule needs to be checked for
Serializability.

A non-serial schedule of n number of transactions is said to be serializable schedule, if it is


equivalent to the serial schedule of those n transactions. A serial schedule doesn’t allow
concurrency, only one transaction executes at a time and the other starts when the already
running transaction finished.

Types of Serializability
There are two types of Serializability.

1. Conflict Serializability
2. View Serializability

You might also like