You are on page 1of 79

UNIT-2

DATABASE INTEGRITY AND NORMALIZATION


Relational Database Integrity
• Database is a collection of data
• Relations form DB
• Must satisfy some properties like no duplicate tuples, no ordering of the
tuples, atomic attributes, etc
• Integrity means maintenance of data consistency
• integrity constraints ensures no compromise on data consistency by
authorised users
• they will not allow damage to the DB
• 2 integrity constraints- entity integrity constraint and referential integrity
constraint
Keys
• Key
• Simple key
• Super key
• Composite key
• Candidate key
• Primary key
• Foreign key
• Alternate key
Referential Integrity constraint
• refers to the relationship between tables.
• Because each table in a database must have a primary key, this
primary key can appear in other tables because of its relationship to
data within those tables.
• When a primary key from one table appears in another table, it is
called a foreign key.
• Foreign keys join tables and establish dependencies between tables.
• Tables can form a hierarchy of dependencies in such a way that if you
change or delete a row in one table, you destroy the meaning of
rows in other tables.
Example
Customer Table Orders Table Customer Calls Table
Order_No. Order_Dat Customer_
Customer_No First_Nam Last_Name e No Custome Call_time User_id
. e r_No.
1002 05/21/199 101 106 1998-06- maryl
103 Phillip Currie 8 12
1003 05/22/199 104 119 1998-07- richc
106 George Watson 8 07
1004 05/22/199 106 119 1998-07- riche
8 01
The following figure shows that the customer_num column of the
customer table is a primary key for that table and a foreign key in the
orders and cust_call tables. Customer number 106, George Watson™, is
referenced in both the orders and cust_calls tables. If customer 106 is
deleted from the customer table, the link between the three tables and
this particular customer is destroyed.
When you delete a row that contains a primary key or update it with a
different primary key, you destroy the meaning of any rows that contain that
value as a foreign key.
Referential integrity is the logical dependency of a foreign key on a primary
key.
The integrity of a row that contains a foreign key depends on the integrity of
the row that it references—the row that contains the matching primary key.
By default, the database server does not allow you to violate referential
integrity and gives you an error message if you attempt to delete rows from
the parent table before you delete rows from the child table.
You can, however, use the ON DELETE CASCADE option to cause deletes
from a parent table to trip deletes on child tables.
Entity Integrity Constraint
• The entity integrity constraint states that primary key value can't be
null.
• This is because the primary key value is used to identify individual
rows in relation and if the primary key has a null value, then we can't
identify those rows.
• A table can contain a null value other than the primary key field.
Example
For example, Let us take example of table Employee having columns:
Emp_Id Name Address Pincode Passport_No Salary
.

Let’s say Emp_Id is primary key in the table. Thus from the definition of
Entity Integrity, the value of Emp_Id cannot be null as it unique
identifies an employee record in the table.

Thus no primary key column of any row in a table can have a null value.
Redundancy and associated problems
Redundancy means having multiple copies of same data in the
database. Problems caused due to redundancy are:
 Insertion anomaly,
 Deletion anomaly, and
 Updation anomaly
Database anomaly is normally the flaw in databases which occurs
because of poor planning and storing everything in a flat database(one
table). Generally this is removed by the process of normalization which
is performed by splitting/joining of tables.
EXAMPLE
Consider a relation emp_dept with attributes:
1. E# {with the primary key as E#.}
2. Ename
3. Address
4. D#
5. Dname
6. Dmgr#
• Insertion anomaly: Let us assume that a new department has been
started by the organization but initially there is no employee
appointed for that department, then the tuple for this department
cannot be inserted into this table as the E# will have NULL, which is
not allowed as E# is primary key.
This kind of a problem in the relation where some tuple cannot be
inserted is known as insertion anomaly.
• Deletion anomaly: Now consider there is only one employee in some
department and that employee leaves the organization, then the
tuple of that employee has to be deleted from the table, but in
addition to that the information about the department also will get
deleted.
This kind of a problem in the relation where deletion of some tuples can
lead to loss of some other data not intended to be removed is known as
deletion anomaly.
• Modification /update anomaly: Suppose the manager of a
department has changed, this requires that the Dmgr# in all the
tuples corresponding to that department must be changed to reflect
the new status. If we fail to update all the tuples of the given
department, then two different records of employee working in the
same department might show different Dmgr# leading to
inconsistency in the database.
This is known as modification/update anomaly. The data redundancy
can not be totally removed from the database, but there should be
controlled redundancy
For example

Consider a relation Student_report(S#, Sname, Course#, SubjectName,


Marks) to store the marks of a student for a course having some
optional subjects, but all the students might not select the same
optional papers. Now the student name appears in every tuple, which is
redundant and we can have two tables as
1. Students(S#, Sname, CourseName)
2. Report(S#, SubjectName, Marks).
However, if we want to print the mark­sheet for every student using
these tables then a join operation, which is a costly operation, in terms
of resources required to carry out, has to be performed in order to get
the name of the student. So to save on the resource utilization, we
might opt to store a single relation, students_report only.
Dependencies
• A database is a collection of related information
• Thus it is mandatory that some items of information in the DB would
depend on some other items of information.
• Information can be:
i. single-valued (Name of the person)or
ii. multi-valued (qualification of a person)
Functional Dependencies
A functional dependency is a constraint that specifies the relationship
between two sets of attributes where one set can accurately determine
the value of other sets. It is denoted as X → Y, where X is a set of
attributes that is capable of determining the value of Y. The attribute set
on the left side of the arrow, X is called Determinant, while on the right
side, Y is called the Dependent.
roll_no name dept_name dept_building
Example:
42 abc CS A4
43 pqr IT A3
44 xyz CS A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the table we can conclude some valid functional dependencies:
 roll_no → { name, dept_name, dept_building },→ Here, roll_no can
determine values of fields name, dept_name and dept_building,
hence a valid Functional dependency
 roll_no → dept_name , Since, roll_no can determine whole set of
{name, dept_name, dept_building}, it can determine its subset
dept_name also.
 dept_name → dept_building , Dept_name can identify the
dept_building accurately, since departments with different
dept_name will also have a different dept_building
 More valid functional dependencies: roll_no → name, {roll_no,
name} ⇢ {dept_name, dept_building}, etc.
Properties of functional dependencies

1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity


rule
For example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is
also valid by the augmentation rule.
For example, If {roll_no, name} → dept_building is valid, hence
{roll_no, name, dept_name} → {dept_building, dept_name} is
also valid.→
3. Transitivity: If X → Y and Y → Z are both valid dependencies,
then X→Z is also valid by the Transitivity rule.
For example, roll_no → dept_name & dept_name →
dept_building, then roll_no → dept_building is also valid.
Types of Functional dependencies

1. Trivial functional dependency


2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the
determinant.
i.e. If X → Y and Y is the subset of X, then it is called trivial functional
dependency
For example,
roll_no name age

42 abc 17
43 pqr 18
44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since


the dependent name is a subset of determinant set {roll_no, name}
Similarly, roll_no → roll_no is also an example of trivial functional
dependency.
Here are some invalid functional dependencies:

 name → dept_name Students with the same name can have different
dept_name, hence this is not a valid functional dependency.
 dept_building → dept_name There can be multiple departments in the
same building, For example, in the table departments ME and EC are in
the same building B2, hence dept_building → dept_name is an invalid
functional dependency.
 More invalid functional dependencies: name → roll_no, {name,
dept_name} → roll_no, dept_building → roll_no, etc.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of
the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional
dependency.
For example,
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the


dependent name is not a subset of determinant roll_no
Similarly, {roll_no, name} → age is also a non-trivial functional dependency,
since age is not a subset of {roll_no, name}
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are
not dependent on each other.
i.e. If a → {b, c} and there exists no functional dependency between b
and c, then it is called a multivalued functional dependency.
For example, roll_no name age

42 abc 17
43 pqr 18
44 xyz 18
45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency,


since the dependents name & age are not dependent on each other(i.e.
name → age or age → name doesn’t exist !)
4. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on
determinant.
i.e. If a → b & b → c, then according to axiom of transitivity, a → c. This is a
transitive functional dependency.
enrol_no name dept building_no
For example,
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2

Here, enrol_no → dept and dept → building_no,


Hence, according to the axiom of transitivity, enrol_no → building_no is a
valid functional dependency. This is an indirect functional dependency, hence
5. Join Dependency
Join dependency is a constraint which is similar to functional
dependency or multivalued dependency. It is satisfied if and only if the
relation concerned is the join of a certain number of projections. Such
type of constraint is called join dependency.
For example, ENROLLMENT
student course lecturer
1001 COMP104 1
1001 COMP171 3
1002 COMP104 2
1002 COMP171 3
1003 ELEC102 4
1003 ELEC151 5
1004 ELEC102 4
1004 ELEC151 6
In the relation enrollment
–JD ((student, course), (course, lecturer), (student, lecturer)) holds,
–but JD ((student,course), (course,lecturer)) does not.
–Decompose the relation enrollment into 3 relations as follows
student course course lecture student lecture
1001 COMP104 COMP104 1 1001 1
1001 COMP171 COMP104 2
1001 3
1002 COMP104 COMP171 3
1002 2
ELEC102 4
1002 COMP171 1002 3
ELEC151 5
1003 ELEC102 1003 4
ELEC151 6
1003 ELEC151 1003 5

1004 ELEC102 1004 4

1004 ELEC151 1004 6


–When the three relations are joined back
together, it will be the same as before
decomposing.
–But joining the first two relations
((student, course), (course, lecturer))
would generate some spurious results.
–The natural join is as shown in the next
slide
student course lecturer
1001 COMP104 1
The tuples with red 1001 COMP104 2
values are spurious.
1001 COMP171 3
They do not exist in
1002 COMP104 1
the original
enrollment relation. 1002 COMP104 2

Therefore the JD of 1002 COMP171 3

joining these two 1003 ELEC102 4

relations does not 1003 ELEC151 5


hold. 1003 ELEC151 6
1004 ELEC102 4
1004 ELEC151 5
1004 ELEC151 6
Normalization
 The inventor of the relational model Edgar Codd
proposed the theory of normalization of data with
the introduction of the First Normal Form, and he
continued to extend theory with Second and Third
Normal Form.
 Later he joined Raymond F. Boyce to develop the
theory of Boyce-Codd Normal Form.
 step by step procedure of removing different kinds
of redundancy and anomaly at each step.
 at each step a specific rule is followed to remove
specific kind of impurity in order to give clean and
slim look to the DB.
 Normalization theory draws heavily on the theory of
functional dependencies which we already
discussed
 Normalization theory defines six normal forms (NF).
 Each normal form involves a set of dependency
properties that a schema must satisfy and each
normal form gives guarantees about the presence
and/or absence of update anomalies.
 This means that higher normal forms have less
redundancy, and as a result, fewer update
problems.
Normal Forms
All the tables in any database can be in one of the normal
forms we will discuss next. Ideally we only want minimal
redundancy for PK to FK. Everything else should be derived
from other tables. There are six normal forms, but we will
only look at the first four, which are:

 First normal form (1NF)


 Second normal form (2NF)
 Third normal form (3NF)
 Boyce-Codd normal form (BCNF)
BCNF is rarely used.
Un-Normalised Form (UNF)
 It is the simplest database model also known as non-first normal form
(NF2).
 A UNF model will suffer problems like data redundancy thus it lacks
the efficiency of database normalization.
 Assume, a video library maintains a database of movies rented out.
Without any normalization in database, all information is stored in
one table as shown below. Let’s understand Normalization database
with normalization example with solution:
Un-Normalised Form (UNF)
It is the simplest database model also known as non-first normal
form (NF2).
A UNF model will suffer problems like data redundancy thus it
lacks the efficiency of database normalization.
Assume, a video library maintains a database of movies rented
out. Without any normalization in database, all information is
stored in one table as shown below. Let’s understand
Normalization database with normalization example with
solution:
Un-Normalised Form (UNF)

 It is the simplest database model also known as non-first


normal form (NF2).
 A UNF model will suffer problems like data redundancy thus
it lacks the efficiency of database normalization.
 Assume, a video library maintains a database of movies
rented out. Without any normalization in database, all
information is stored in one table as shown below. Let’s
understand Normalization database with normalization
example with solution:
Here you see
Movies Rented
column has
multiple values.
Now let’s move
into 1st Normal
Forms:
1NF (First Normal
Form) Rules
• Each table cell
should contain
a single value.
• Each record
needs to be
unique.
• The previous
table in 1NF-

2NF (Second Normal Form) Rules

Rule 1- Be in 1NF
Rule 2- Single Column Primary Key that does not functionally
dependant on any subset of candidate key relation.

It is clear that we can’t move forward to make our simple database


in 2nd Normalization form unless we partition the previous table .
We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1 contains member information. Table 2 contains information on movies rented.

We have introduced a new column called Membership_id which is the primary key for table 1. Records can be uniquely identified in Table 1 using membership id.
Let’s move into 3NF
3NF (Third Normal Form) Rules
Rule 1- Be in 2NF
Rule 2- Has no transitive functional dependencies
To move our 2NF table into 3NF, we again need to again divide our table.
3NF Example
Below is a 3NF example in SQL database:
We have again divided our tables and created a new table which stores Salutations.

There are no transitive functional dependencies, and hence our table is in 3NF

In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to


primary key in Table 3
Now our little example is at a level that
cannot further be decomposed to attain higher
normal form types of normalization in
DBMS. In fact, it is already in higher
normalization forms. Separate efforts for
moving into next levels of normalizing data
are normally needed in complex databases.
BCNF (Boyce-Codd Normal Form)
 Even when a database is in 3rd Normal Form, still there would be
anomalies resulted if it has more than one Candidate Key.
 Sometimes is BCNF is also referred as 3.5 Normal Form.
 BCNF is the advance version of 3NF. It is stricter than 3NF.
 A table is in BCNF if every functional dependency X → Y, X is the
super key of the table.
 For BCNF, the table should be in 3NF, and for every FD, LHS is super
key.
Example: Let us assume there is a company where employees
work in more than one department
Employee Table

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO


264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are as follows:

EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are
keys.
To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table
EMP_ID EMP_COUNTRY

264 India
264 India

EMP_DEPT table
MP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549

Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional
Decomposition
 process of splitting a relation into its projections that will not be
disjoint.
 3 properties
• Attribute preservation
• Lossless-join preservation
• Dependency preservancy
File Organisation
Introduction
• A database consist of a huge amount of data.
• The data is grouped within a table in RDBMS, and each table have
related records.
• A user can see that the data is stored in form of tables, but in actual
this huge amount of data is stored in physical memory in form of
files.
• File – A file is named collection of related information that is
recorded on secondary storage such as magnetic disks, magnetic
tables and optical disks.
Meaning
• File Organization refers to the logical relationships among various
records that constitute the file, particularly with respect to the means
of identification and access to any specific record.
• In simple terms, Storing the files in certain order is called file
Organization.
• File Structure refers to the format of the label and data blocks and of
any logical control record.
• File organization refers to the way data is stored in a file.
• File organization is very important because it determines the
methods of access, efficiency, flexibility and storage devices to use.
Factors to be considered in File Organisation
• Access should be fast
• Storage space has to be efficiently used
• Minimizing the need for reorganisation
• Accomodating growth
Issues in physical databasedesign
• purpose- to translate the logical description of data into technical
specifications for storing and retrieving the data
• goal-to create a DB design to ensure DB integrity, security and
recoverability.
Basic inputs required for physical DB design
• Normalization relations
• Attribute difinitions
• Data usage
• Security, backup, recovery, retention, integrity
• DBMS characteristics
• Performance criteria
Decisions to be taken while designing physical DB
• Optimising attribute data types
• Modifying logical design
• Specifying file organisation
• Choosing indexes
Considerations to be followed while designing the fields in
DB
• choosing data type
• coding, compression, encryption
• controlling data integrity
• dafault value(range control, null value control, referential integrity)
• handling missing data
 substitute an estimate of the missing value
 trigger a report listing missing values
 ignore missing data
Types of File Organisation

1. Heap files(unordered files)


2. Sequential
3. Indexed sequential
4. Hashed/ Direct / Random/ Relative
5. Tree structure(BST, B+)
6. Multi-key
7. Multi-list
8. Inverted file organisation
Multi-key file organisation

• When a file records are made accessed based on more than one key
are called as Multikey file organization.
• Generally these files are index sequential file in which file is stored
sequentially based on primary key and more than one index table are
provided based on different keys.
• This technique is used to sort a file based on multiple key values.
• Multi key file organization allows access to a data file by several
different key fields. Example: Library file that requires access by
author and by subject matter and title.
Heap file organisation
• It is the simplest and most basic type of organization.
• It works with data blocks.
• In heap file organization, the records are inserted at the file's end.
• When the records are inserted, it doesn't require the sorting and
ordering of records.
• When the data block is full, the new record is stored in some other
block.
• This new data block need not to be the very next data block, but it
can select any data block in the memory to store new records.
• The heap file is also known as an unordered file.
• In the file, every record has a unique id, and every page in a file is of
the same size.
• It is the DBMS responsibility to store and manage the new records.
Heap File Organisation
Insertion of a new record

Suppose we have five


records R1, R3, R6, R4
and R5 in a heap and
suppose we want to
insert a new record R2 in
a heap. If the data block 3
is full then it will be
inserted in any of the
database selected by the
DBMS, let's say data block
1.
• If we want to search, update or delete the data in heap file
organization, then we need to traverse the data from staring of the
file till we get the requested record.

• If the database is very large then searching, updating or deleting of


record will be time-consuming because there is no sorting or ordering
of records. In the heap file organization, we need to check all the data
until we get the requested record.
Heap File Organisation

Pros Cons
• It is a very good method of file • This method is inefficient for the
organization for bulk insertion. If large database because it takes
there is a large number of data time to search or modify the
which needs to load into the record.
database at a time, then this • This method is inefficient for
method is best suited. large databases.
• In case of a small database,
fetching and retrieving of
records is faster than the
sequential record.
Binary Structure Tree
• A Binary Search Tree (BST) is a tree in which all the nodes follow the
below-mentioned properties −

• The value of the key of the left sub-tree is less than the value of its
parent (root) node's key.

• The value of the key of the right sub-tree is greater than or equal to the
value of its parent (root) node's key.

• Thus, BST divides all its sub-trees into two segments; the left sub-tree
and the right sub-tree and can be defined as −

left_subtree (keys) < node (key) ≤ right_subtree (keys)


Representation
BST is a collection of nodes arranged in a way where they maintain BST properties. Each node has a key and an associated value. While searching, the
desired key is compared to the keys in BST and if found, the associated value is retrieved. Following is a pictorial representation of BST −
We observe that the root node key (27) has all less-valued keys on the
left sub-tree and the higher valued keys on the right sub-tree.

Basic Operations
Following are the basic operations of a tree −

• Search − Searches an element in a tree.

• Insert − Inserts an element in a tree.

• Pre-order Traversal − Traverses a tree in a pre-order manner.

• In-order Traversal − Traverses a tree in an in-order manner.

• Post-order Traversal − Traverses a tree in a post-order manner .


Sequential File Organisation

• This method is the easiest method for file organization. In this


method, files are stored sequentially. This method can be
implemented in two ways:
1. Pile File Method
2. Sorted File Method
• Input-Output Control System (IOCS-system software)
Pile File Method
• It is a quite simple method. In this method,
we store the record in a sequence, i.e., one Insertion of the new record
after another. Here, the record will be
inserted in the order in which they are Suppose we have four records R1, R3 and
inserted into tables. so on upto R9 and R8 in a sequence.
• In case of updating or deleting of any record, Hence, records are nothing but a row in
the record will be searched in the memory the table. Suppose we want to insert a
blocks. When it is found, then it will be new record R2 in the sequence, then it will
marked for deleting, and the new record is be placed at the end of the file. Here,
inserted. records are nothing but a row in any table.
Sorted File Method
• In this method, the new record is always Insertion of the new record
inserted at the file's end, and then it will Suppose there is a preexisting sorted sequence of four
sort the sequence in ascending or records R1, R3 and so on upto R6 and R7. Suppose a new
record R2 has to be inserted in the sequence, then it will
descending order. Sorting of records is be inserted at the end of the file, and then it will sort
based on any primary key or any other key. the sequence.
• In the case of modification of any record, it
will update the record and then sort the
file, and lastly, the updated record is
placed in the right place.
Sequential file organization

Pros Cons
• It contains a fast and efficient method for the
huge amount of data. • It will waste time as we cannot jump on a
particular record that is required but we have
• In this method, files can be easily stored in to move sequentially which takes our time.
cheaper storage mechanism like magnetic
tapes. • Sorted file method takes more time and space
for sorting the records.
• It is simple in design. It requires no much
effort to store the data.
• This method is used when most of the
records have to be accessed like grade
calculation of a student, generating the salary
slip, etc.
• This method is used for report generation or
statistical calculations.
Indexed Sequential file organisation

• ISAM method is an advanced sequential file organization.


• In this method, records are stored in the file using the primary key.
• An index value is generated for each primary key and mapped with
the record.
• This index contains the address of the record in the file.
If any record has to be retrieved based on its index value, then
the address of the data block is fetched and the record is
retrieved from the memory.
Indexed Sequential file organisation

• In this method, each record has the • This method requires extra
address of its data block, searching space in the disk to store the
a record in a huge
Prosdatabase is quick
and easy.
index value. Cons
• This method supports range • When the new records are
retrieval and partial retrieval of inserted, then these files have to
records. Since the index is based on be reconstructed to maintain
the primary key values, we can the sequence.
retrieve the data for the given • When the record is deleted,
range of value. In the same way, then the space used by it needs
the partial value can also be easily to be released. Otherwise, the
searched, i.e., the student name
starting with 'JA' can be easily
performance of the database
searched. will slow down.
Hashed File organisation
• In this method of file organization, hash function is used to calculate
the address of the block to store the records.
• The hash function can be any simple or complex mathematical
function.
• The hash function is applied on some columns/attributes – either key
or non-key columns to get the block address.
• Hence each record is stored randomly irrespective of the order they
come.
• Hence this method is also known as Direct or Random file
organization.
• If the hash function is generated on key column, then that column is
called hash key, and if hash function is generated on non-key column,
then the column is hash column.
When a record has to be retrieved, based on the hash key column, the address is
generated and directly from that address whole record is retrieved. Here no effort
to traverse through whole file. Similarly when a new record has to be inserted, the
address is generated by hash key and record is directly inserted. Same is the case
with update and delete. There is no effort for searching the entire file nor sorting
the files. Each record will be stored randomly in the memory.
These types of file organizations are useful in online transaction
systems, where retrieval or insertion/updation should be faster.
Advantages of Hash File Organization

 Records need not be sorted after any of the transaction. Hence the
effort of sorting is reduced in this method.
 Since block address is known by hash function, accessing any record
is very faster. Similarly updating or deleting a record is also very
quick.
 This method can handle multiple transactions as each record is
independent of other. i.e.; since there is no dependency on storage
location for each record, multiple records can be accessed at the
same time.
 It is suitable for online transaction systems like online banking, ticket
booking system etc.
Disadvantages of Hash File Organization
 This method may accidentally delete the data. For example, In Student table,
when hash field is on the STD_NAME column and there are two same names
– ‘Antony’, then same address is generated. In such case, older record will be
overwritten by newer. So there will be data loss. Thus hash columns needs to
be selected with utmost care. Also, correct backup and recovery mechanism
has to be established.
 Since all the records are randomly stored, they are scattered in the memory.
Hence memory is not efficiently used.
 If we are searching for range of data, then this method is not suitable.
Because, each record will be stored at random address. Hence range search
will not give the correct address range and searching will be inefficient. For
example, searching the employees with salary from 20K to 30K will be
efficient.
 Searching for records with exact name or value will be efficient. If the
Student name starting with ‘B’ will not be efficient as it does not give the
exact name of the student.
 If there is a search on some columns which is not a hash column,
then the search will not be efficient. This method is efficient only
when the search is done on hash column. Otherwise, it will not be
able find the correct address of the data.
 If there is multiple hash columns – say name and phone number of a
person, to generate the address, and if we are searching any record
using phone or name alone will not give correct results.
 If these hash columns are frequently updated, then the data block
address is also changed accordingly. Each update will generate new
address. This is also not acceptable.
 Hardware and software required for the memory management are
costlier in this case. Complex programs needs to be written to make
this method efficient.
B Tree
+
• A B+ Tree is primarily utilized for implementing dynamic indexing on
multiple levels.
• Compared to B Tree, the B+ Tree stores the data pointers only at the
leaf nodes of the Tree, which makes search more process more accurate
and faster.
• Rules for B+ Tree
Here are essential rules for B+ Tree.
 Leaves are used to store data records.
 It stored in the internal nodes of the Tree.
 If a target key value is less than the internal node, then the point just to
its left side is followed.
 If a target key value is greater than or equal to the internal node, then
the point just to its right side is followed.
 The root has a minimum of two children.
Uses of B+ Tree

 Key are primarily utilized to aid the search by directing to the proper
Leaf.
 B+ Tree uses a “fill factor” to manage the increase and decrease in a
tree.
 In B+ trees, numerous keys can easily be placed on the page of
memory because they do not have the data associated with the
interior nodes. Therefore, it will quickly access tree data that is on the
leaf node.
 A comprehensive full scan of all the elements is a tree that needs just
one linear pass because all the leaf nodes of a B+ tree are linked with
each other.
Multi-list file organisation
• The basic approach to providing the linkage between an index and
the file of data records is called multilist organisation.
• A multilist file maintains an index for each secondary key.
• When a file records are made accessed based on more than one key
are called as Multikey file organization.
• Generally these files are index sequential file in which file is stored
sequentially based on primary key and more than one index table are
provided based on different keys.
Inverted file organisation
• An inverted index is an index data structure storing a mapping from
content, such as words or numbers, to its locations in a document or
a set of documents.
• In simple words, it is a hashmap like data structure that directs you
from a word to a document or a web page.
• There are two types of inverted indexes:
 A record-level inverted index contains a list of references to
documents for each word.
 A word-level inverted index additionally contains the positions of
each word within a document.
• The latter form offers more functionality, but needs more processing
power and space to be created.
Types of Indexes
• Primary index
• Secondary index
• Clustering index
 Dense index
 Sparse index
 Multi-level index

You might also like