You are on page 1of 17

CO-3

Database Design
Design is the process of creating and planning the construction of a system, or environment. It
involves identifying the needs and constraints of the project, developing a concept or idea, and
refining it through iteration until a final design is achieved.

In the Context of Database Design, it involves creating a structured and organized approach to storing
and managing data. The goal of database design is to create a database that is efficient, effective, and
easy to use. Database systems are designed to manage large amounts of information that are typically
related to the operations of an organization or enterprise. The information stored in a database is often
used to support the activities of the organization, whether it is for internal operations or to provide
services to customers or clients.

Good Database Design helps organizations avoid the problems and achieve the benefits through
efficient data retrieval and manipulation, accurate and secure data, and easy maintenance and updates.
Overall, Good Database Design is essential for organizations that want to effectively manage their
data and avoid the consequences of a bad design.

The following six steps has to be followed during its design process

1. Requirements Analysis: In requirement analysis for database design, the main goal is to
understand the needs and expectations of the stakeholders for the database system to be developed.
The following are the steps involved in requirement analysis for database design:

• Identify stakeholders: The first step is to identify the stakeholders who will be using the
database system.
• Gather requirements: Once the stakeholders are identified, the next step is to conduct
interviews or surveys to gather information about their requirements and expectations for the
system.
• Define requirements: The information gathered from the stakeholders can then be used to
create a list of requirements for the database system.

After gathering the requirements these requirements are organized and represented using appropriate
tools and are given as input to the conceptual database design phase.

2. Conceptual Database Design: Specifications are converted into ER-Model or Any other similarly
high-level conceptual database design model. The conceptual database design is the first stage of
database design. ER-Model provides a simple description of the data. It is a high-level view of the
entire database that describes what the database should contain and how the data should be related to
each other. This design is usually presented in an Entity-Relationship (ER) diagram. The main focus
is on the overall structure and relationships between entities. Once the requirement specifications are
converted into ER-Model it is given as input to the logical database design phase.

3. Logical Database Design Schema: The logical database design is an important stage of database
design. It focuses on converting the conceptual design into a detailed logical model that can be
implemented in a database management system (DBMS). The main focus is on defining the data
elements, their relationships, and the data constraints. This design is usually presented in the form of
tables, columns, and relationships.

Logical database design means that ER diagrams are now converted into actual relational database
schemas, and these relational database schemas are given as input to the schema refinement phase.

4. Schema Refinement: Database designed based on the E-R model may have some amount of
• Inconsistency
• Uncertainty
• Redundancy

Refinement process is called Normalization. Defined as a step-by-step process of decomposing a


complex relation into a simple relation. The formal process that can be followed to achieve a good
database design. It is also used to check that an existing design is of good quality. The different stages
of normalization are known as “normal forms”. Guidelines that may be used as measures to determine
the quality of relation schema design:

• Guideline:1: Making sure that the semantics of the attributes is clear in the schema
• Guideline:2: Reducing the redundant information in tuples
• Guideline:3: Reducing the NULL values in tuples
• Guideline:4: Disallowing the possibility of generating spurious tuple

Guideline:1 Making sure that the semantics of the attributes is clear in the relations
The semantics of a relation refers to its meaning resulting from the interpretation of attribute values in
a tuple. Design a relation schema so that it is easy to explain its meaning. Do not combine attributes
from multiple entity types and relationship types into a single relation. Attributes of different entities
(EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation. Only
foreign keys should be used to refer to other entities.
Guideline:2 Redundant Information in Tuples and Update Anomalies

• Wastes storage: Grouping attributes into relation schemas has a significant effect on storage
space.
• Problems with update anomalies: Storing natural joins of base relations leads to an additional
problem referred to as update anomalies. These can be classified into insertion anomalies,
deletion anomalies, and modification anomalies.
• Insertion anomalies: To insert a new employee tuple into EMP_DEPT, we must include either
the attribute values for the department that the employee works for, or NULLs (if the
employee does not work for a department as yet). For example, to insert a new tuple for an
employee who works in department number 5, we must enter all the attribute values of
department 5 correctly so that they are consistent with the corresponding values for
department 5 in other tuples in EMP_DEPT. It is difficult to insert a new department that has
no employees as yet in the EMP_DEPT relation.
• Deletion anomalies: If we delete from EMP_DEPT an employee tuple that happens to
represent the last employee working for a particular department, the information concerning
that department is lost inadvertently from the database. This problem does not occur in the
database because DEPARTMENT tuples are stored separately.
• Modification anomalies: In EMP_DEPT, if we change the value of one of the attributes of a
particular department—say, the manager of department 5—we must update the tuples of all
employees who work in that department; otherwise, the database will become inconsistent

Design the base relation schemas so that no insertion, deletion, or modification anomalies are present
in the relations. If any anomalies are present, note them clearly and make sure that the programs that
update the database will operate correctly.

Guideline:3 Reducing the NULL values in tuples:

As far as possible, avoid placing attributes in a base relation whose values may frequently be NULL.
If NULLs are unavoidable, make sure that they apply in exceptional cases only and do not apply to a
majority of tuples in the relation.

Reasons for nulls:

• The attribute does not apply to this tuple. For example, Visa_status may not apply to U.S.
students.
• The attribute value for this tuple is unknown. For example, the Date_of_birth may be
unknown for an employee.

Guideline:4 Disallowing the possibility of generating spurious tuple:

• Bad designs for a relational database may result in erroneous results for certain JOIN
operations.
• The "lossless join" property is used to guarantee meaningful results for join operations.

Design relation schemas so that they can be joined with equality conditions on attributes that are
appropriately related (primary key, foreign key) pairs in a way that guarantees that no spurious tuples
are generated. Avoid relations that contain matching attributes that are not (foreign key, primary key)
combinations because joining on such attributes may produce spurious tuples.

5. Physical Database Design: The physical database design is the third and final stage of database
design. It focuses on implementing the logical design in a specific database management system by
defining the physical database schema. This includes defining the storage structures, access methods,
indexes, and other physical parameters. The main focus is on how the database will be physically
implemented on a specific platform.

6. Application and Security Design: Application and security design for database management
systems (DBMS), there are several key considerations that must be taken into account. Here are some
important points to keep in mind:

• Authentication and authorization: Implement robust authentication and authorization


mechanisms to ensure that only authorized users can access the database. This can include
password policies, two-factor authentication, and access control lists.
• Logging and monitoring: Implement logging and monitoring to detect unusual activity or
unauthorized access attempts.
• Auditing: Perform regular audits of the database to ensure compliance with security policies
and regulations.
• Performance optimization: When designing an application that uses a DBMS, it's important to
optimize performance to ensure that the application runs smoothly and efficiently. This may
involve using indexing and other performance tuning techniques to speed up database queries.
• Backup and recovery: Establish regular backup and recovery procedures to ensure that data is
not lost in case of a system failure or data breach.

Functional Dependencies
A Functional Dependency is a relationship between or among attributes of a relation. For example,
if we know the value of Customer Account no then we can find the value of Customer Balance, if
this is true then we can say that Customer balance is functional dependent on Customer Account
no.

AccountNo → Balance

As another example:

ISBN → Title

Let X and Y are two attributes of a relation and given the value of X, if there is only one value of Y
corresponding to it then Y is said to be functionally dependent on X and this is indicated by the
notation:

X →Y

It means:

➢ Y is functionally dependent on X.
➢ X determines Y
➢ X is called determinant or attributes in the left side of the arrow are called determinants.

A B C D
a1 b1 c1 d1
a1 b2 c1 d2
a2 b2 c2 d2
a2 b3 c2 d3
a3 b4 c2 d4

A → C is satisfied but C→ A is not satisfied.


Consider the relation schema:

Emp_Proj (EmpId,Pnumber,Hours,Ename,Pname,Plocation)

1) EmpId → Ename (since value of an employee Id uniquely determines the employee


name.)

2) Pnumber → { Pname,Location}
3) {EmpId,Pnumber} → Hours

Functional Dependencies may also be based on composite attributes for example:

X, Z→Y

It means that there is only one value of Y corresponding to the given values of X, Z.

Armstrong’s Axioms or Inference Axioms


Or
Inference rules for Functional Dependencies:

Suppose we have F, a set of functional dependencies. To determine whether a FD X→ Y is


logically implied by F, we use a set of rules or axioms. Let R is a relation and W, X, Y , Z are
attributes or subsets of attributes in R:

1) Reflexivity or Reflexive Rule: If Y⊆X, then X →Y. This axiom says indicates that a
given set of attributes the set itself determines any of its subsets.

2) Augmentation Rule: If X →Y then XZ →YZ. We can augment the left side of the FD or
both sides conveniently with one or more attributes but the axiom does not allow
augmenting the right side alone.

3) Transitivity Rule: If X →Y and Y →Z then X →Z.

4) Union Rule: If X →Y and X →Z then X →YZ.

5) Decomposition Rule: X →YZ then X →Y and X →Z.

6) Pseudo Transitivity Rule: If X →Y and YZ →W then XZ→W.

Normalization
The basic objective of Normalization is to reduce redundancy, which means information is to be
stored only once. Storing information several times leads to the insertion, update and deletion
anomalies, wastage of storage space and increase in the total size of the data stored

Normalization of data can be considered a process of analyzing the given relation schemas based on
their FDs and primary keys to achieve the desirable properties of:

(1) Minimizing redundancy and Minimizing the insertion, deletion, and update anomalies

Why Relations are Normalized?

Student_Course Relation:

StudentNo StudentName Address CourseNo CourseName Instructor


85001 Mukul Sec-G CP302 Database Mishra
85001 Mukul Sec-G CP303 Communication Tripathi
85001 Mukul Sec-G CP304 Software Engg Khan
85005 Vipul Sec-A CP302 Database Mishra

Primary Key—(StudentNo,Courseno)

There are following undesirable features or anomalies:

1) Repetition of Information: A lot of information is being repeated. StudentNo, name,


address etc are being repeated often.

2) Insertion Anomalies: It is the inability to represent certain information. Since primary


key is (StudentNo, CourseNo). Any new tuple to be inserted in the relation must have a
value for the primary key since a key may have not null value So we cannot insert the No
and name of a new course in the database until a student enrolls in the course. Similarly
information about a new student cannot be inserted in the database until the student
enrolls in the course.

3) Updation Anomalies: If we want to change the value of one or more attributes of a


particular course in the relation, for example, the Instructor for course no CP302, we
must update all the tuples containing CP302 enrollment. If this modification is not carried
out properly, the database will become inconsistent.

4) Deletion Anomalies: It is a loss of useful information means useful information may be


lost when a tuple is deleted. For example, if we delete the tuple corresponding to student
85001 doing course CP304, we will lose the relevant information about the course
CP304. Similarly, deletion of course CP302 from the database may remove all
information about the student named Vipul.
The above problems arise because the relation StudentCourse has information about students as
well as Course. One solution is to deal with the problems is to decompose the relation into two or
more smaller relations.

Student(StudentNo,studentname,Address)
Course(CourseNo,CourseName,Instructor)

StudentCourse (StudentNo,CourseNo)

Such decomposition is called Normalization and is essential if we wish to overcome undesirable


anomalies.

Normal Forms
A number of Normal forms have been defined for classifying relations. Each Normal form has
associated with it a number of constraints on the kind of FDs that could be associated with the
relation.

The Normal Forms are used to ensure that various types of anomalies and inconsistencies are not
introduced into the database or we can say that a relation is said to be in a normal form if it satisfies
a certain prescribed set of conditions.

There are several stages of Normalization process. These are called the First Normal
Form(1NF),Second Normal Form(2NF),Third Normal Form(3NF),Boyce-Codd Normal
Form(BCNF),Forth Normal form etc.

First Normal Form:

It was defined to disallow multivalued attributes, composite attributes, and their combinations. It
states that the domain of an attribute must include only atomic (simple, indivisible) values and that
the value of any attribute in a tuple must be a single value from the domain of that attribute. Hence,
1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute
value for a single tuple.

A Relation that is not in 1NF


1NF version of the same relation with redundancy.

Student Subject Information


Ashish Code Instructor
CS1 Prof A
CS2 Prof B
CS3 Prof C
Mukesh Code Instructor
CS1 Prof A
CS4 Prof D

A Relation that is not in 1NF

Student Code Lecturer


Ashish CS1 Prof A
Ashish CS2 Prof B
Ashish CS3 Prof C
Mukesh CS1 Prof D
Mukesh CS4 Prof E

1NF version of the same relation with redundancy.

Prime Attributes and Non-Prime Attributes:

An attribute of relation schema R is called a prime attribute of R if it is a member of some


candidate key of R. An attribute is called nonprime if it is not a prime attribute—that is, if it is not
a member of any candidate key.

R(ABCDEFH) AH is only candidate key of R then the attributes A and H are prime attributes and
B,C,D,E,F are non-prime attributes.

Full Functional Dependency and Partial Dependency:


A functional dependency X → Y is a full functional dependency if removal of any attribute A from
X means that the dependency does not hold anymore; that is, for any attribute A ε X, (X −
{A}) does not functionally determine Y.

A functional dependency X → Y is a partial dependency if some attribute A ε X can be removed


from X and the dependency still holds; that is, for some A ε X, (X − {A}) → Y.

EMP_PROJ

EmpId Pnumber Hours Ename Pname Plocation

{EmpId, Pnumber} → Hours is a full dependency (neither EmpId → Hours nor Pnumber → Hours
holds).
However, the dependency {EmpId, Pnumber} → Ename is partial because EmpId→ Ename holds.

Transitive Dependency

A functional dependency X → Y in a relation schema R is a transitive dependency if there exists a


set of attributes Z in R that is neither a candidate key nor a subset of any key of R and both X → Z
and Z → Y hold.

R(A,B,C,D,E) and given set of FDs F={ AB →C,B→D,C→E) and AB is the candidate key.Since

AB → C and C→ E therefore AB→ E

E is transitively dependent on the key

Second Normal Form:


A relation schema R is in 2NF if it is in 1NF and if every nonprime attribute A in R is fully
functionally dependent on the key of Relation R ore we can say that a 2NF does not permit partial
dependency between a non prime attribute and the key of the relation.Consider the relation
EMP_PROJ:

EMP_PROJ

EmpId Pnumber Hours Ename Pname Plocation

FDs are

1) EmpId ,Pnumber → Hours


2) EmpId → Ename
3) Pnumber → Pname,Plocation

The test for 2NF involves testing for functional dependencies whose left-hand side attributes are
part of the primary key. If the primary key contains a single attribute, the test need not be applied at
all.

The EMP_PROJ relation in Figure is in 1NF but is not in 2NF because:

1) Non Prime attribute Ename is Partially functional dependent on the key.


2) Non Prime attributes Pname, Plocation are Partially functional dependent on the key.
3) But the Non Prime attribute hours is fully functional dependent on the key.

If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a number of
2NF relations in which nonprime attributes are associated only with the part of the primary key on
which they are fully functionally dependent. Therefore, we decompose the EMP_PROJ into the
three relation schemas EP1, EP2, and EP3 shown in Figure, each of which is in 2NF.

EP1

EmpId Pnumber Hours

EP2

EmpId EName

EP3

Pnumber Pname PLocation

Third Normal Form:

A Relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute is transitively


dependent on the key.

A Relation schema in Third normal form does not allow partial or transitive dependencies. The
relation schema EMP_DEPT in Figure is in 2NF, since no partial dependencies on a key exist.
Consider the relation EMP_DEPT:

EMP_DEPT

Ename EmpId Bdate Address Dnumber Dname Dmgr_no


FDs are

1) EmpId→ Ename,Bdate,Address,Dnumber
2) Dnumber → Dname,Dmgr_no

However, EMP_DEPT is not in 3NF because of the transitive dependency of Dmgr_no (and also
Dname) on EmpId via Dnumber.

Since EmpId → Dnumber & Dnumber → Dname


Therefore EmpId → dname (Transitive dependency)

Since EmpId → Dnumber & Dnumber → Dmgr_no


Therefore EmpId → Dmgr_no (Transitive dependency)

We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and
ED2 shown in Figure by removing the attributes that violate 3NF and placing them with the
attributes through which they are transitively dependent into another relation.

ED1

Ename EmpId Bdate Address Dnumber

ED2

Dnumber Dname Dmgr_no

Boyce-Codd Normal Form:

A Relation is in BCNF when every determinant is a candidate key or we can say that if an attribute
of a composite key is dependent on an attribute of the other composite key, a Normalization called
BCNF is needed.

When a table contains only one candidate key, the 3NF and the BCNF are equivalent. BCNF can be
violated when the table contains more than one candidate key.

Consider the relation Teach:


FDs are given as:

FD1: {Student, Course} → Instructor


FD2: Instructor → Course

As we see in the relation, no single attribute is a Candidate key.

Candidate Key1: Student, Instructor


Candidate Key2: Student, Course

The relation TEACH is in 3NF since there are no partial dependencies or transitive dependencies.
We see that relation is not in BCNF because although Instructor is a determinant, it is not a
Candidate key.

We can convert the TEACH relation into BCNF by dividing it into two relations. The attribute that
is a determinant but not a Candidate key must also be placed in a separate relation and must be the
key of that relation.

TEACH1

Instructor Course

TEACH2

Instructor Student

Fourth Normal Form:

When a relation is in BCNF, there are no longer any anomalies that result from functional
dependencies. However, there may still be anomalies that result from Multivalued Dependency.
For example:
StudentId Subject Activity
100 Music Swimming
100 Accounting Swimming
100 Music Tennis
100 Accounting Tennis
150 Math Jogging

The candidate key is

(StudentId,Subject,Activity)

Multivalued dependencies are:

StudentId →→ Subject

StudentId →→Activity

In general, Multivalued dependency exists when a relation has at least three attributes, two of
them are multivalued and their values depend on only the third attribute. In other words, in a
Relation R(A,B,C) a multivalued dependency exists if A determines multiple values of B (
A→→B) and A determines multiple values of C (A →→C) and B and C are independent of
each other.

A relation is in 4NF if it is BCNF and has no Multivalued dependencies. So, we can say that
4NF is needed when a relation has undesirable Multivalued dependencies. We have to
eliminate these anomalies by creating two relations, each one storing data for only one of the
two Multivalued attributes.

StudentId Subject
100 Music
100 Accounting
150 Math

StudentId Activity
100 Swimming
100 Tennis
150 Jogging

Now these both relations are in Fourth Normal form as each relation has only one multivalued
attribute.
ANOTHER EXAMPLE OF FORTH NORMAL FORM (4NF)

LOSSLESS AND LOSSY DECOMPOSITION:


Fifth Normal Form:

The Fifth normal form (5NF) is generally not implemented in real life database design. But we
must learn the concept about it. 5NF is also known as Project join normal form (PJ/NF). A
relation will be in 5NF if

• It is in 4NF
• It does not have join dependency.

You might also like