Professional Documents
Culture Documents
Let’s say Emp_Id is primary key in the table. Thus from the definition of
Entity Integrity, the value of Emp_Id cannot be null as it unique
identifies an employee record in the table.
Thus no primary key column of any row in a table can have a null value.
Redundancy and associated problems
Redundancy means having multiple copies of same data in the
database. Problems caused due to redundancy are:
Insertion anomaly,
Deletion anomaly, and
Updation anomaly
Database anomaly is normally the flaw in databases which occurs
because of poor planning and storing everything in a flat database(one
table). Generally this is removed by the process of normalization which
is performed by splitting/joining of tables.
EXAMPLE
Consider a relation emp_dept with attributes:
1. E# {with the primary key as E#.}
2. Ename
3. Address
4. D#
5. Dname
6. Dmgr#
• Insertion anomaly: Let us assume that a new department has been
started by the organization but initially there is no employee
appointed for that department, then the tuple for this department
cannot be inserted into this table as the E# will have NULL, which is
not allowed as E# is primary key.
This kind of a problem in the relation where some tuple cannot be
inserted is known as insertion anomaly.
• Deletion anomaly: Now consider there is only one employee in some
department and that employee leaves the organization, then the
tuple of that employee has to be deleted from the table, but in
addition to that the information about the department also will get
deleted.
This kind of a problem in the relation where deletion of some tuples can
lead to loss of some other data not intended to be removed is known as
deletion anomaly.
• Modification /update anomaly: Suppose the manager of a
department has changed, this requires that the Dmgr# in all the
tuples corresponding to that department must be changed to reflect
the new status. If we fail to update all the tuples of the given
department, then two different records of employee working in the
same department might show different Dmgr# leading to
inconsistency in the database.
This is known as modification/update anomaly. The data redundancy
can not be totally removed from the database, but there should be
controlled redundancy
For example
42 abc 17
43 pqr 18
44 xyz 18
name → dept_name Students with the same name can have different
dept_name, hence this is not a valid functional dependency.
dept_building → dept_name There can be multiple departments in the
same building, For example, in the table departments ME and EC are in
the same building B2, hence dept_building → dept_name is an invalid
functional dependency.
More invalid functional dependencies: name → roll_no, {name,
dept_name} → roll_no, dept_building → roll_no, etc.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of
the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional
dependency.
For example,
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
Rule 1- Be in 1NF
Rule 2- Single Column Primary Key that does not functionally
dependant on any subset of candidate key relation.
We have introduced a new column called Membership_id which is the primary key for table 1. Records can be uniquely identified in Table 1 using membership id.
Let’s move into 3NF
3NF (Third Normal Form) Rules
Rule 1- Be in 2NF
Rule 2- Has no transitive functional dependencies
To move our 2NF table into 3NF, we again need to again divide our table.
3NF Example
Below is a 3NF example in SQL database:
We have again divided our tables and created a new table which stores Salutations.
There are no transitive functional dependencies, and hence our table is in 3NF
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are
keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table
MP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional
Decomposition
process of splitting a relation into its projections that will not be
disjoint.
3 properties
• Attribute preservation
• Lossless-join preservation
• Dependency preservancy
File Organisation
Introduction
• A database consist of a huge amount of data.
• The data is grouped within a table in RDBMS, and each table have
related records.
• A user can see that the data is stored in form of tables, but in actual
this huge amount of data is stored in physical memory in form of
files.
• File – A file is named collection of related information that is
recorded on secondary storage such as magnetic disks, magnetic
tables and optical disks.
Meaning
• File Organization refers to the logical relationships among various
records that constitute the file, particularly with respect to the means
of identification and access to any specific record.
• In simple terms, Storing the files in certain order is called file
Organization.
• File Structure refers to the format of the label and data blocks and of
any logical control record.
• File organization refers to the way data is stored in a file.
• File organization is very important because it determines the
methods of access, efficiency, flexibility and storage devices to use.
Factors to be considered in File Organisation
• Access should be fast
• Storage space has to be efficiently used
• Minimizing the need for reorganisation
• Accomodating growth
Issues in physical databasedesign
• purpose- to translate the logical description of data into technical
specifications for storing and retrieving the data
• goal-to create a DB design to ensure DB integrity, security and
recoverability.
Basic inputs required for physical DB design
• Normalization relations
• Attribute difinitions
• Data usage
• Security, backup, recovery, retention, integrity
• DBMS characteristics
• Performance criteria
Decisions to be taken while designing physical DB
• Optimising attribute data types
• Modifying logical design
• Specifying file organisation
• Choosing indexes
Considerations to be followed while designing the fields in
DB
• choosing data type
• coding, compression, encryption
• controlling data integrity
• dafault value(range control, null value control, referential integrity)
• handling missing data
substitute an estimate of the missing value
trigger a report listing missing values
ignore missing data
Types of File Organisation
• When a file records are made accessed based on more than one key
are called as Multikey file organization.
• Generally these files are index sequential file in which file is stored
sequentially based on primary key and more than one index table are
provided based on different keys.
• This technique is used to sort a file based on multiple key values.
• Multi key file organization allows access to a data file by several
different key fields. Example: Library file that requires access by
author and by subject matter and title.
Heap file organisation
• It is the simplest and most basic type of organization.
• It works with data blocks.
• In heap file organization, the records are inserted at the file's end.
• When the records are inserted, it doesn't require the sorting and
ordering of records.
• When the data block is full, the new record is stored in some other
block.
• This new data block need not to be the very next data block, but it
can select any data block in the memory to store new records.
• The heap file is also known as an unordered file.
• In the file, every record has a unique id, and every page in a file is of
the same size.
• It is the DBMS responsibility to store and manage the new records.
Heap File Organisation
Insertion of a new record
Pros Cons
• It is a very good method of file • This method is inefficient for the
organization for bulk insertion. If large database because it takes
there is a large number of data time to search or modify the
which needs to load into the record.
database at a time, then this • This method is inefficient for
method is best suited. large databases.
• In case of a small database,
fetching and retrieving of
records is faster than the
sequential record.
Binary Structure Tree
• A Binary Search Tree (BST) is a tree in which all the nodes follow the
below-mentioned properties −
• The value of the key of the left sub-tree is less than the value of its
parent (root) node's key.
• The value of the key of the right sub-tree is greater than or equal to the
value of its parent (root) node's key.
• Thus, BST divides all its sub-trees into two segments; the left sub-tree
and the right sub-tree and can be defined as −
Basic Operations
Following are the basic operations of a tree −
Pros Cons
• It contains a fast and efficient method for the
huge amount of data. • It will waste time as we cannot jump on a
particular record that is required but we have
• In this method, files can be easily stored in to move sequentially which takes our time.
cheaper storage mechanism like magnetic
tapes. • Sorted file method takes more time and space
for sorting the records.
• It is simple in design. It requires no much
effort to store the data.
• This method is used when most of the
records have to be accessed like grade
calculation of a student, generating the salary
slip, etc.
• This method is used for report generation or
statistical calculations.
Indexed Sequential file organisation
• In this method, each record has the • This method requires extra
address of its data block, searching space in the disk to store the
a record in a huge
Prosdatabase is quick
and easy.
index value. Cons
• This method supports range • When the new records are
retrieval and partial retrieval of inserted, then these files have to
records. Since the index is based on be reconstructed to maintain
the primary key values, we can the sequence.
retrieve the data for the given • When the record is deleted,
range of value. In the same way, then the space used by it needs
the partial value can also be easily to be released. Otherwise, the
searched, i.e., the student name
starting with 'JA' can be easily
performance of the database
searched. will slow down.
Hashed File organisation
• In this method of file organization, hash function is used to calculate
the address of the block to store the records.
• The hash function can be any simple or complex mathematical
function.
• The hash function is applied on some columns/attributes – either key
or non-key columns to get the block address.
• Hence each record is stored randomly irrespective of the order they
come.
• Hence this method is also known as Direct or Random file
organization.
• If the hash function is generated on key column, then that column is
called hash key, and if hash function is generated on non-key column,
then the column is hash column.
When a record has to be retrieved, based on the hash key column, the address is
generated and directly from that address whole record is retrieved. Here no effort
to traverse through whole file. Similarly when a new record has to be inserted, the
address is generated by hash key and record is directly inserted. Same is the case
with update and delete. There is no effort for searching the entire file nor sorting
the files. Each record will be stored randomly in the memory.
These types of file organizations are useful in online transaction
systems, where retrieval or insertion/updation should be faster.
Advantages of Hash File Organization
Records need not be sorted after any of the transaction. Hence the
effort of sorting is reduced in this method.
Since block address is known by hash function, accessing any record
is very faster. Similarly updating or deleting a record is also very
quick.
This method can handle multiple transactions as each record is
independent of other. i.e.; since there is no dependency on storage
location for each record, multiple records can be accessed at the
same time.
It is suitable for online transaction systems like online banking, ticket
booking system etc.
Disadvantages of Hash File Organization
This method may accidentally delete the data. For example, In Student table,
when hash field is on the STD_NAME column and there are two same names
– ‘Antony’, then same address is generated. In such case, older record will be
overwritten by newer. So there will be data loss. Thus hash columns needs to
be selected with utmost care. Also, correct backup and recovery mechanism
has to be established.
Since all the records are randomly stored, they are scattered in the memory.
Hence memory is not efficiently used.
If we are searching for range of data, then this method is not suitable.
Because, each record will be stored at random address. Hence range search
will not give the correct address range and searching will be inefficient. For
example, searching the employees with salary from 20K to 30K will be
efficient.
Searching for records with exact name or value will be efficient. If the
Student name starting with ‘B’ will not be efficient as it does not give the
exact name of the student.
If there is a search on some columns which is not a hash column,
then the search will not be efficient. This method is efficient only
when the search is done on hash column. Otherwise, it will not be
able find the correct address of the data.
If there is multiple hash columns – say name and phone number of a
person, to generate the address, and if we are searching any record
using phone or name alone will not give correct results.
If these hash columns are frequently updated, then the data block
address is also changed accordingly. Each update will generate new
address. This is also not acceptable.
Hardware and software required for the memory management are
costlier in this case. Complex programs needs to be written to make
this method efficient.
B Tree
+
• A B+ Tree is primarily utilized for implementing dynamic indexing on
multiple levels.
• Compared to B Tree, the B+ Tree stores the data pointers only at the
leaf nodes of the Tree, which makes search more process more accurate
and faster.
• Rules for B+ Tree
Here are essential rules for B+ Tree.
Leaves are used to store data records.
It stored in the internal nodes of the Tree.
If a target key value is less than the internal node, then the point just to
its left side is followed.
If a target key value is greater than or equal to the internal node, then
the point just to its right side is followed.
The root has a minimum of two children.
Uses of B+ Tree
Key are primarily utilized to aid the search by directing to the proper
Leaf.
B+ Tree uses a “fill factor” to manage the increase and decrease in a
tree.
In B+ trees, numerous keys can easily be placed on the page of
memory because they do not have the data associated with the
interior nodes. Therefore, it will quickly access tree data that is on the
leaf node.
A comprehensive full scan of all the elements is a tree that needs just
one linear pass because all the leaf nodes of a B+ tree are linked with
each other.
Multi-list file organisation
• The basic approach to providing the linkage between an index and
the file of data records is called multilist organisation.
• A multilist file maintains an index for each secondary key.
• When a file records are made accessed based on more than one key
are called as Multikey file organization.
• Generally these files are index sequential file in which file is stored
sequentially based on primary key and more than one index table are
provided based on different keys.
Inverted file organisation
• An inverted index is an index data structure storing a mapping from
content, such as words or numbers, to its locations in a document or
a set of documents.
• In simple words, it is a hashmap like data structure that directs you
from a word to a document or a web page.
• There are two types of inverted indexes:
A record-level inverted index contains a list of references to
documents for each word.
A word-level inverted index additionally contains the positions of
each word within a document.
• The latter form offers more functionality, but needs more processing
power and space to be created.
Types of Indexes
• Primary index
• Secondary index
• Clustering index
Dense index
Sparse index
Multi-level index