You are on page 1of 9

Assignment No #02

Submitted To:
Engr. M Ubaidullah
Submitted By:
Muhammad Usama Saghar
Roll Number:
2019-CPE-27
Subject:
(CPE-221)
Database Management Systems

Department of Computer Engineering


UCE&T
Bahauddin Zakariya University, Multan.
Qus: Discuss different anomalies in designing a database, normalization, functional
dependency (FD), Armstrong’s Axioms, closures, Equivalence of FDs?

Normalization:
f a database design is not perfect, it may contain anomalies, which are like a bad dream for any database
administrator. Managing a database with anomalies is next to impossible.

Anomalies

1- Update Anomaly: Let say we have 10 columns in a table out of which 2 are called employee Name
and employee address. Now if one employee changes it’s location then we would have to update the table.
But the problem is, if the table is not normalized one employee can have multiple entries and while
updating all of those entries one of them might get missed.

2- Insertion Anomaly: Let’s say we have a table that has 4 columns. Student ID, Student Name, Student
Address and Student Grades. Now when a new student enroll in school, even though first three attributes
can be filled but 4th attribute will have NULL value because he doesn't have any marks yet.

3- Deletion Anomaly: This anomaly indicates unnecessary deletion of important information from the
table. Let’s say we have student’s information and courses they have taken as follows (student ID,Student
Name, Course, address). If any student leaves the school then the entry related to that student will be
deleted. However, that deletion will also delete the course information even though course depends upon
the school and not the student.

Normalization try to bring the tables to granular state where these issues can be avoided. In simple words it
tries to split tables into multiple tables and defines relationships between them using keys.

Important Keys

a) Primary Key: This key uniquely identify each entry in a table. This value cannot be repeated inside a
table and cannot hold null values. Generally first columns is defined as primary key. Example (Student
ID).
b) Foreign Key: This key can have repetitive values, but to uniquely identify each entry the table can
still have primary key column separate of foreign key column. However, the foreign key will create a
relation with another table where those values are defined as primary keys.

c) Compound Key: This is the methods of defining multiple columns as primary key. Situations where
no column have unique values in a table, we can define a combination of two or more than two columns as
unique and set it as primary key. For example: (Student Name, Address, Marks, …, etc.) Here it is likely
that student can have same names, therefore we define combination of student name and address as
primary key. Now it is more unlikely that there can be student with same name and same address.

d) Candidate Key: In simple words a candidate key is a key that can also serve as a primary key. For
example: (Student ID, Student Roll No., Address, Marks) Here student id is primary key because it does
not have repetitive value, does not have null values. However, student roll number also holds all the
properties of primary key and thus considered as candidate key.

f) Surrogate Key: This means an artificially created value that uniquely identify each entry in a table
when no other column was able to hold properties of a primary key. It is an additional column and
generally holds integer values.

Remember, we cannot make any changes to table which have its primary key work as foreign key in
another table. In other words, we cannot make changes to a primary key if it is referred to by foreign key in
from another table. Or to say, we cannot make changes to primary key of a parent table if it has a child
with foreign key referring to parent’s primary key

Normalization

1NF: The first normal form signifies that each cell of the table must only have single value. Therefore,
each intersection of rows and columns must hold atomic values. For example: If we have a column name
phone_number than each row for that column must save only single phone number.

2NF: We saw candidate key above and here is where it plays a role. 2NF rule signifies that no non-prime
attributes in the table are dependent on any of the candidate key. In simple words, If the table is
representative of two different entities then it should be broken down into their own entities. For example:
If we have a table (Student ID, Student Name, Course Number, Course Name, Teacher ID, Teacher Name)
this is representing information about each student enrolled in each course which is taught by each teacher
in school. Since it is a representative of three different entities it must be normalized into 2NF form.

3NF: This rule signifies that tables must be in 2NF form and each table should only contain columns that
are non-transitively depended on primary key of their own table. In simple words, if we have a table
(Transaction ID, price, quantity, total_sales) here the total sales is the product of price and quantity
(price*quantity). Hence sales is transitively depended in Transaction ID which is a primary key here. So
each attribute must directly depend upon the primary key.

Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional
dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples
must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y. The left-hand side attributes determine the values of attributes on the right-hand side.

Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F +, is the set of all functional
dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when applied
repeatedly, generates a closure of functional dependencies.
 Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds
beta.
 Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
 Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a
→ c also holds. a → b is called as a functionally that determines b

Closures:
A Closure is a set of FDs is a set of all possible FDs that can be derived from a given set of FDs. It is also
referred as a Complete set of FDs. If F is used to donate the set of FDs for relation R, then a closure of a
set of FDs implied by F is denoted by F+. Let's consider the set F of  functional dependencies given
below:
F = {A -> B, B -> C, C -> D}
from F, it is possible to derive following dependencies.
A -> A   ...By using Rule-4, Self-Determination.
A -> B   ...Already given in F.
A -> C   ...By using rule-3, Transitivity.
A -> D   ...By using rule-3, Transitivity.
Now, by applyiing  Rule-6 Union, it is possible to derive A+ -> ABCD and it can be denoted using A ->
ABCD. All such type of FDs derived from each FD of F form a closure of F. Steps to determine
F+example:

 Determine each set of attributes X that appears as a left hand side of some FD in F.


 Determine the set X+ of all attributes that are dependent on X, as given in above example.
 In other words, X+ represents a set of attributes that are functionally determined by X based on F.
And, X+ is called the Closure of X under F.
 All such sets of X+, in combine, Form a closure of F.

Discuss Index structures, primary, secondary and clustering indices, Single level and Multi-
level indexing, B-Trees and B+- Trees.
We know that data is stored in the form of records. Every record has a key field, which helps it to be
recognized uniquely.
Indexing is a data structure technique to efficiently retrieve records from the database files based on
some attributes on which the indexing has been done. Indexing in database systems is similar to what we
see in books.
Indexing is defined based on its indexing attributes. Indexing can be of the following types −
 Primary Index − Primary index is defined on an ordered data file. The data file is ordered on
a key field. The key field is generally the primary key of the relation.
 Secondary Index − Secondary index may be generated from a field which is a candidate key and
has a unique value in every record, or a non-key with duplicate values.
 Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered
on a non-key field.
Ordered Indexing is of two types −

 Dense Index
 Sparse Index

Dense Index
In dense index, there is an index record for every search key value in the database. This makes searching
faster but requires more space to store index records itself. Index records contain search key value and a
pointer to the actual record on the disk.
Sparse Index
In sparse index, index records are not created for every search key. An index record here contains a
search key and an actual pointer to the data on the disk. To search a record, we first proceed by index
record and reach at the actual location of the data. If the data we are looking for is not where we directly
reach by following the index, then the system starts sequential search until the desired data is found.

Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the disk along
with the actual database files. As the size of the database grows, so does the size of the indices. There is
an immense need to keep the index records in the main memory so as to speed up the search operations.
If single-level index is used, then a large size index cannot be kept in memory which leads to multiple
disk accesses.
Multi-level Index helps in breaking down the index into several smaller indices in order to make the
outermost level so small that it can be saved in a single disk block, which can easily be accommodated
anywhere in the main memory.
B+ Tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of a
B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same height, thus
balanced. Additionally, the leaf nodes are linked using a link list; therefore, a B + tree can support random
access as well as sequential access.
Structure Index
Every leaf node is at equal distance from the root node. A B + tree is of the order n where n is fixed for
every B+ tree.
Internal nodes −

 Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
 At most, an internal node can contain n pointers.
Leaf nodes −

 Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
 At most, a leaf node can contain n record pointers and n key values.
 Every leaf node contains one block pointer P to point to next leaf node and forms a linked list.

B+ Tree Insertion
 B+ trees are filled from bottom and each entry is done at the leaf node.
 If a leaf node overflows −
o Split node into two parts.
o Partition at i = ⌊(m+1)/2⌋.
o First i entries are stored in one node.
o Rest of the entries (i+1 onwards) are moved to a new node.
o ith key is duplicated at the parent of the leaf.
 If a non-leaf node overflows −
o Split node into two parts.
o Partition the node at i = ⌈(m+1)/2⌉.
o Entries up to i are kept in one node.
o Rest of the entries are moved to a new node.
B+ Tree Deletion
 B+ tree entries are deleted at the leaf nodes.
 The target entry is searched and deleted.
o If it is an internal node, delete and replace with the entry from the left position.
 After deletion, underflow is tested,
o If underflow occurs, distribute the entries from the nodes left to it.
 If distribution is not possible from left, then
o Distribute from the nodes right to it.
 If distribution is not possible from left or from right, then
o Merge the node with left and right to it.

B-Tree Index

B-tree index is the widely used data structures for tree based indexing in DBMS. It is a multilevel format
of tree based indexing in DBMS technique which has balanced binary search trees. All leaf nodes of the B
tree signify actual data pointers.

Moreover, all leaf nodes are interlinked with a link list, which allows a B tree to support both random and
sequential access.

 Lead nodes must have between 2 and 4 values.


 Every path from the root to leaf are mostly on an equal length.
 Non-leaf nodes apart from the root node have between 3 and 5 children nodes.
 Every node which is not a root or a leaf has between n/2] and n children.

You might also like