You are on page 1of 104

Introduction of DBMS

A Database Management System (DBMS) is a software system that is


designed to manage and organize data in a structured manner. It allows
users to create, modify, and query a database, as well as manage the
security and access controls for that database.

Some key features of a DBMS include:

1. Data modeling: A DBMS provides tools for creating and


modifying data models, which define the structure and
relationships of the data in a database.
2. Data storage and retrieval: A DBMS is responsible for storing and
retrieving data from the database, and can provide various
methods for searching and querying the data.
3. Concurrency control: A DBMS provides mechanisms for
controlling concurrent access to the database, to ensure that
multiple users can access the data without conflicting with each
other.
4. Data integrity and security: A DBMS provides tools for enforcing
data integrity and security constraints, such as constraints on
the values of data and access controls that restrict who can
access the data.
5. Backup and recovery: A DBMS provides mechanisms for backing
up and recovering the data in the event of a system failure.
6. DBMS can be classified into two types: Relational Database
Management System (RDBMS) and Non-Relational Database
Management System (NoSQL or Non-SQL)
7. RDBMS: Data is organized in the form of tables and each table
has a set of rows and columns. The data is related to each other
through primary and foreign keys.
8. NoSQL: Data is organized in the form of key-value pairs,
document, graph, or column-based. These are designed to
handle large-scale, high-performance scenarios.
Database is a collection of interrelated data which helps in the efficient
retrieval, insertion, and deletion of data from the database and organizes
the data in the form of tables, views, schemas, reports, etc. For Example,
a university database organizes the data about students, faculty, admin
staff, etc. which helps in the efficient retrieval, insertion, and deletion of
data from it.
There are four types of Data Languages
1. Data Definition Language (DDL)
2. Data Manipulation Language(DML)
3. Data Control Language(DCL)
4. Transactional Control Language(TCL)
DDL is the short name for Data Definition Language, which deals with
database schemas and descriptions, of how the data should reside in the
database.
• CREATE: to create a database and its objects like (table, index,
views, store procedure, function, and triggers)
• ALTER: alters the structure of the existing database
• DROP: delete objects from the database

1
• TRUNCATE: remove all records from a table, including all spaces
allocated for the records are removed
• COMMENT: add comments to the data dictionary
• RENAME: rename an object
DML is the short name for Data Manipulation Language which deals with
data manipulation and includes most common SQL statements such
SELECT, INSERT, UPDATE, DELETE, etc., and it is used to store, modify,
retrieve, delete and update data in a database.
• SELECT: retrieve data from a database
• INSERT: insert data into a table
• UPDATE: updates existing data within a table
• DELETE: Delete all records from a database table
• MERGE: UPSERT operation (insert or update)
• CALL: call a PL/SQL or Java subprogram
• EXPLAIN PLAN: interpretation of the data access path
• LOCK TABLE: concurrency Control
DCL is short for Data Control Language which acts as an access specifier
to the database.(basically to grant and revoke permissions to users in the
database
• GRANT: grant permissions to the user for running DML(SELECT,
INSERT, DELETE,…) commands on the table
• REVOKE: revoke permissions to the user for running
DML(SELECT, INSERT, DELETE,…) command on the specified table
TCL is short for Transactional Control Language which acts as an manager
for all types of transactional data and all transactions.Some of the
command of TCL are
• Roll Back: Used to cancel or Undo changes made in the database
• Commit: It is used to apply or save changes in the database
• Save Point: It is used to save the data on the temporary basis in
the database
Database Management System: The software which is used to manage
databases is called Database Management System (DBMS). For Example,
MySQL, Oracle, etc. are popular commercial DBMS used in different
applications. DBMS allows users the following tasks:
• Data Definition: It helps in the creation, modification, and
removal of definitions that define the organization of data in the
database.
• Data Updation: It helps in the insertion, modification, and
deletion of the actual data in the database.
• Data Retrieval: It helps in the retrieval of data from the database
which can be used by applications for various purposes.
• User Administration: It helps in registering and monitoring
users, enforcing data security, monitoring performance,
maintaining data integrity, dealing with concurrency control,
and recovering information corrupted by unexpected failure.
Paradigm Shift from File System to DBMS
File System manages data using files on a hard disk. Users are allowed to
create, delete, and update the files according to their requirements. Let
us consider the example of file-based University Management System.
Data of students is available to their respective Departments, Academics
Section, Result Section, Accounts Section, Hostel Office, etc. Some of the
data is common for all sections like Roll No, Name, Father Name, Address,
and Phone number of students but some data is available to a particular

2
section only like Hostel allotment number which is a part of the hostel
office. Let us discuss the issues with this system:
• Redundancy of data: Data is said to be redundant if the same
data is copied at many places. If a student wants to change their
Phone number, he or she has to get it updated in various
sections. Similarly, old records must be deleted from all sections
representing that student.
• Inconsistency of Data: Data is said to be inconsistent if multiple
copies of the same data do not match each other. If the Phone
number is different in Accounts Section and Academics Section,
it will be inconsistent. Inconsistency may be because of typing
errors or not updating all copies of the same data.
• Difficult Data Access: A user should know the exact location of
the file to access data, so the process is very cumbersome and
tedious. If the user wants to search the student hostel allotment
number of a student from 10000 unsorted students’ records,
how difficult it can be.
• Unauthorized Access: File Systems may lead to unauthorized
access to data. If a student gets access to a file having his marks,
he can change it in an unauthorized way.
• No Concurrent Access: The access of the same data by multiple
users at the same time is known as concurrency. The file system
does not allow concurrency as data can be accessed by only one
user at a time.
• No Backup and Recovery: The file system does not incorporate
any backup and recovery of data if a file is lost or corrupted.

ADVANTAGES OR DISADVANTAGES:

Advantages of using a DBMS:

1. Data organization: A DBMS allows for the organization and


storage of data in a structured manner, making it easy to retrieve
and query the data as needed.
2. Data integrity: A DBMS provides mechanisms for enforcing data
integrity constraints, such as constraints on the values of data
and access controls that restrict who can access the data.
3. Concurrent access: A DBMS provides mechanisms for controlling
concurrent access to the database, to ensure that multiple users
can access the data without conflicting with each other.
4. Data security: A DBMS provides tools for managing the security
of the data, such as controlling access to the data and encrypting
sensitive data.
5. Backup and recovery: A DBMS provides mechanisms for backing
up and recovering the data in the event of a system failure.
6. Data sharing: A DBMS allows multiple users to access and share
the same data, which can be useful in a collaborative work
environment.

3
Disadvantages of using a DBMS:

1. Complexity: DBMS can be complex to set up and maintain,


requiring specialized knowledge and skills.
2. Performance overhead: The use of a DBMS can add overhead to
the performance of an application, especially in cases where
high levels of concurrency are required.
3. Scalability: The use of a DBMS can limit the scalability of an
application, since it requires the use of locking and other
synchronization mechanisms to ensure data consistency.
4. Cost: The cost of purchasing, maintaining and upgrading a DBMS
can be high, especially for large or complex systems.
5. Limited use cases: Not all use cases are suitable for a DBMS, some
solutions don’t need high reliability, consistency or security and
may be better served by other types of data storage.
Data Models

Data Model is the modeling of the data description, data semantics,


and consistency constraints of the data. It provides the conceptual
tools for describing the design of a database at each level of data
abstraction. Therefore, there are following four data models used for
understanding the structure of the database:

1) Relational Data Model: This type of model designs the data in the
form of rows and columns within a table. Thus, a relational model
uses tables for representing data and in-between relationships.
Tables are also called relations. This model was initially described
by Edgar F. Codd, in 1969. The relational data model is the widely
used model which is primarily used by commercial data processing
applications.
2) Entity-Relationship Data Model: An ER model is the logical
representation of data as objects and relationships among them.
These objects are known as entities, and relationship is an
association among these entities. This model was designed by Peter
Chen and published in 1976 papers. It was widely used in database
designing. A set of attributes describe the entities. For example,
student_name, student_id describes the 'student' entity. A set of the
same type of entities is known as an 'Entity set', and the set of the
same type of relationships is known as 'relationship set'.
3) Object-based Data Model: An extension of the ER model with
notions of functions, encapsulation, and object identity, as well. This
model supports a rich type system that includes structured and
collection types. Thus, in 1980s, various database systems following
the object-oriented approach were developed. Here, the objects are
nothing but the data carrying its properties.
4) Semistructured Data Model: This type of data model is different
from the other three data models (explained above). The
semistructured data model allows the data specifications at places
where the individual data items of the same type may have different
attributes sets. The Extensible Markup Language, also known as XML,
is widely used for representing the semistructured data. Although
XML was initially designed for including the markup information to

4
the text document, it gains importance because of its application in
the exchange of data.

Database Languages in DBMS

o A DBMS has appropriate languages and interfaces to express


database queries and updates.
o Database languages can be used to read, store and update the data
in the database.

Types of Database Languages


1. Data Definition Language (DDL)
o DDL stands for Data Definition Language. It is used to define
database structure or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the
database.
o Using the DDL statements, you can create the skeleton of the
database.
o Data definition language is used to store the information of metadata
like the number of tables and schemas, their names, indexes,
columns in each table, constraints, etc.

Here are some tasks that come under DDL:

o Create: It is used to create objects in the database.


o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.

These commands are used to update the database schema that's why they
come under Data definition language.

2. Data Manipulation Language (DML)

DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.

Here are some tasks that come under DML:

o Select: It is used to retrieve data from a database.


o Insert: It is used to insert data into a table.

5
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update
operations.
o Call: It is used to call a structured query language or a Java
subprogram.
o Explain Plan: It has the parameter of explaining data.
o Lock Table: It controls concurrency.

3. Data Control Language (DCL)


o DCL stands for Data Control Language. It is used to retrieve the
stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.

(But in Oracle database, the execution of data control language does


not have the feature of rolling back.)

Here are some tasks that come under DCL:

o Grant: It is used to give user access privileges to a database.


o Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language (TCL)

TCL is used to run the changes made by the DML statement. TCL can be
grouped into a logical transaction.

Here are some tasks that come under TCL:

o Commit: It is used to save the transaction on the database.


o Rollback: It is used to restore the database to original since the last
Commit.

Database Users and Administrators:-


Database Users
Database users are the ones who really use and take the benefits of the
database. There will be different types of users depending on their needs
and way of accessing the database.

1. Application Programmers – They are the developers who


interact with the database by means of DML queries. These

6
DML queries are written in the application programs like C,
C++, JAVA, Pascal, etc. These queries are converted into object
code to communicate with the database. For example, writing
a C program to generate the report of employees who are
working in a particular department will involve a query to fetch
the data from the database. It will include an embedded
SQL query in the C Program.
2. Sophisticated Users – They are database developers, who
write SQL queries to select/insert/delete/update data. They do
not use any application or programs to request the database.
They directly interact with the database by means of a query
language like SQL. These users will be scientists, engineers,
analysts who thoroughly study SQL and DBMS to apply the
concepts in their requirements. In short, we can say this
category includes designers and developers of DBMS and SQL.
3. Specialized Users – These are also sophisticated users, but
they write special database application programs. They are the
developers who develop the complex programs to the
requirement.
4. Stand-alone Users – These users will have a stand-alone
database for their personal use. These kinds of the database
will have readymade database packages which will have menus
and graphical interfaces.
5. Native Users – these are the users who use the existing
application to interact with the database. For example, online
library system, ticket booking systems, ATMs etc which has
existing application and users use them to interact with the
database to fulfill their requests.

Database Administrators
The life cycle of a database starts from designing, implementing to the
administration of it. A database for any kind of requirement needs to be
designed perfectly so that it should work without any issues. Once all the
design is complete, it needs to be installed. Once this step is complete,
users start using the database. The database grows as the data grows in
the database. When the database becomes huge, its performance comes
down. Also accessing the data from the database becomes a challenge.
There will be unused memory in the database, making the memory
inevitably huge. This administration and maintenance of the database are
taken care of by the database Administrator – DBA.
A DBA has many responsibilities. A good-performing database is in the
hands of DBA.
• Installing and upgrading the DBMS Servers: – DBA is
responsible for installing a new DBMS server for the new
projects. He is also responsible for upgrading these servers as
there are new versions that come into the market or
requirement. If there is any failure in the up-gradation of the
existing servers, he should be able to revert the new changes
back to the older version, thus maintaining the DBMS working.
He is also responsible for updating the service packs/
hotfixes/ patches to the DBMS servers.
• Design and implementation: – Designing the database and
implementing is also DBA’s responsibility. He should be able

7
to decide on proper memory management, file organizations,
error handling, log maintenance, etc for the database.
• Performance tuning: – Since the database is huge and it will
have lots of tables, data, constraints, and indices, there will be
variations in the performance from time to time. Also, because
of some designing issues or data growth, the database will not
work as expected. It is the responsibility of the DBA to tune the
database performance. He is responsible to make sure all the
queries and programs work in a fraction of seconds.
• Migrate database servers: – Sometimes, users using oracle
would like to shift to SQL server or Netezza. It is the
responsibility of DBA to make sure that migration happens
without any failure, and there is no data loss.
• Backup and Recovery: – Proper backup and recovery programs
needs to be developed by DBA and has to be maintained him.
This is one of the main responsibilities of DBA. Data/objects
should be backed up regularly so that if there is any crash, it
should be recovered without much effort and data loss.
• Security: – DBA is responsible for creating various database
users and roles, and giving them different levels of access
rights.
• Documentation: – DBA should be properly documenting all his
activities so that if he quits or any new DBA comes in, he
should be able to understand the database without any effort.
He should basically maintain all his installation, backup,
recovery, security methods. He should keep various reports
about database performance.
In order to perform his entire task, he should have very good command
over DBMS.
Types of DBA
There are different kinds of DBA depending on the responsibility that he
owns.

• Administrative DBA – This DBA is mainly concerned with


installing, and maintaining DBMS servers. His prime tasks are
installing, backups, recovery, security, replications, memory
management, configurations, and tuning. He is mainly
responsible for all administrative tasks of a database.
• Development DBA – He is responsible for creating queries and
procedures for the requirement. Basically, his task is similar to
any database developer.
• Database Architect – Database architect is responsible for
creating and maintaining the users, roles, access rights, tables,
views, constraints, and indexes. He is mainly responsible for
designing the structure of the database depending on the
requirement. These structures will be used by developers and
development DBA to code.
• Data Warehouse DBA –DBA should be able to maintain the data
and procedures from various sources in the data warehouse.
These sources can be files, COBOL, or any other programs.
Here data and programs will be from different sources. A good
DBA should be able to keep the performance and function

8
levels from these sources at the same pace to make the data
warehouse work.
• Application DBA –He acts like a bridge between the application
program and the database. He makes sure all the application
program is optimized to interact with the database. He ensures
all the activities from installing, upgrading, and patching,
maintaining, backup, recovery to executing the records work
without any issues.
• OLAP DBA – He is responsible for installing and maintaining
the database in OLAP systems. He maintains only OLAP
databases.

Three schema Architecture:-

o The three schema architecture is also called ANSI/SPARC architecture


or three-level architecture.
o This framework is used to describe the structure of a specific
database system.
o The three schema architecture is also used to separate the user
applications and physical database.
o The three schema architecture contains three-levels. It breaks the
database down into three different categories.

The three-schema architecture is as follows:

9
In the above diagram:

o It shows the DBMS architecture.


o Mapping is used to transform the request and response between
various database levels of architecture.
o Mapping is not good for small DBMS because it takes more time.
o In External / Conceptual mapping, it is necessary to transform the
request from external level to conceptual schema.
o In Conceptual / Internal mapping, DBMS transform the request from
the conceptual to internal level.

Objectives of Three schema Architecture

The main objective of three level architecture is to enable multiple users


to access the same data with a personalized view while storing the
underlying data only once. Thus it separates the user's view from the
physical structure of the database. This separation is desirable for the
following reasons:

o Different users need different views of the same data.


o The approach in which a particular user needs to see the data may
change over time.
o The users of the database should not worry about the physical
implementation and internal workings of the database such as data
compression and encryption techniques, hashing, optimization of
the internal structures etc.
o All users should be able to access the same data according to their
requirements.
o DBA should be able to change the conceptual structure of the
database without affecting the user's
o Internal structure of the database should be unaffected by changes
to physical aspects of the storage.

1. Internal Level

10
o The internal level has an internal schema which describes the
physical storage structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data
will be stored in a block.
o The physical level is used to describe complex low-level data
structures in detail.

The internal level is generally is concerned with the following activities:

55.1M
995
Prime Ministers of India | List of Prime Minister of India (1947-2020)

o Storage space allocations.


For Example: B-Trees, Hashing etc.
o Access paths.
For Example: Specification of primary and secondary keys, indexes,
pointers and sequencing.
o Data compression and encryption techniques.
o Optimization of internal structures.
o Representation of stored fields.

2. Conceptual Level

o The conceptual schema describes the design of a database at the


conceptual level. Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole
database.
o The conceptual level describes what data are to be stored in the
database and also describes what relationship exists among those
data.

11
o In the conceptual level, internal details such as an implementation
of the data structure are hidden.
o Programmers and database administrators work at this level.

3. External Level

o At the external level, a database contains several schemas that


sometimes called as subschema. The subschema is used to describe
the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user
group is interested and hides the remaining database from that user
group.
o The view schema describes the end user interaction with database
systems.

What is Data Independence in DBMS?


Data independence is the ability to modify the scheme without affecting
the programs and the application to be rewritten. Data is separated from
the programs, so that the changes made to the data will not affect the
program execution and the application.
We know the main purpose of the three levels of data abstraction is to
achieve data independence. If the database changes and expands over
time, it is very important that the changes in one level should not affect
the data at other levels of the database. This would save time and cost
required when changing the database.
There are two levels of data independence based on three levels of
abstraction. These are as follows −
• Physical Data Independence
• Logical Data Independence
Physical Data Independence
Physical Data Independence means changing the physical level without
affecting the logical level or conceptual level. Using this property, we can
change the storage device of the database without affecting the logical
schema.
The changes in the physical level may include changes using the following

• A new storage device like magnetic tape, hard disk, etc.
• A new data structure for storage.
• A different data access method or using an alternative files
organization technique.
• Changing the location of the database.

12
Logical Data Independence
Logical view of data is the user view of the data. It presents data in the
form that can be accessed by the end users.
Codd’s Rule of Logical Data Independence says that users should be able
to manipulate the Logical View of data without any information of its
physical storage. Software or the computer program is used to manipulate
the logical view of the data.
Database administrator is the one who decides what information is to be
kept in the database and how to use the logical level of abstraction. It
provides the global view of Data. It also describes what data is to be stored
in the database along with the relationship.
The data independence provides the database in simple structure. It is
based on application domain entities to provide the functional
requirement. It provides abstraction of system functional requirements.
Static structure for the logical view is defined in the class object diagrams.
Users cannot manipulate the logical structure of the database.
The changes in the logical level may include −
• Change the data definition.
• Adding, deleting, or updating any new attribute, entity or
relationship in the database

E.F. Codd’s 12 Rules for RDBMS


RDBMSBig Data AnalyticsDBMS

Database Management System or DBMS essentially consists of a


comprehensive set of application programs that can be leveraged to
access, manage and update the data, provided the data is interrelated and
profoundly persistent. Just like any management system, the goal of a
DBMS is to provide an efficient and convenient environment in which it
becomes easy to retrieve and store the information into the database. It
goes without mentioning that databases are used to store and manage
large amounts of information.
To achieve this, the following are the absolute must-haves:

• Data Modeling − It is all about defining the structures for


information storage.
• Provision of Mechanisms − To manipulate processed data and
modify file and system structures, it is important to provide
query processing mechanisms.
• Crash Recovery and Security − To avoid any discrepancies
and ensure that the data is secure, crash recovery and security
mechanisms are must.
• Concurrency Control − If the system is shared by multiple
users, concurrency control is the need of the hour.

13
Dr Edgar F Codd
Dr E.F.Codd, also known to the world as the ‘Father of Database
Management Systems’ had propounded 12 rules which are in-fact 13 in
number. The rules are numbered from zero to twelve. According to him, a
DBMS is fully relational if it abides by all his twelve rules. Till now, only
few databases abide by all the eleven rules. His twelve rules are fondly
called ‘E.F.Codd’s Twelve Commandments’. His brilliant and seminal
research paper ‘A Relational Model of Data for Large Shared Data Banks’ in
its entirety is a visual treat to eyes.
Relational Database Management System
There is an unspoken rule in the jargon of Database Management Systems.
As the databases that implement all the E.F.Codd’s rules are scare, the
unspoken rule has been gaining traction.

• If a management system or software follows any of 5-6 rules


proposed by E.F.Codd, it qualifies to be a Database
Management System (DBMS).
• If a management system or software follows any of 7-9 rules
proposed by E.F.Codd, it qualifies to be a semi-Relational
Database Management System (semi- RDBMS).
• If a management system or software follows 9-12 rules
proposed by E.F. Codd, it qualifies to be a complete Relational
Database Management System (RDBMS).
Dr Edgar F Codd’s Twelve Commandments
Here is brief note on E.F Codd’s Twelve rules:
Rule 0 − Foundation rule
Any relational database management system that is propounded to be
RDBMS or advocated to be a RDBMS should be able to manage the stored
data in its entirety through its relational capabilities.
Rule 1 − Rule of Information
Relational Databases should store the data in the form of relations. Tables
are relations in Relational Database Management Systems. Be it any user
defined data or meta-data, it is important to store the value as an entity
in the table cells.
Rule 2 − Rule of Guaranteed Access
The use of pointers to access data logically is strictly forbidden. Every
data entity which is atomic in nature should be accessed logically by using
a right combination of the name of table, primary key represented by a
specific row value and column name represented by attribute value.
Rule 3 − Rule of Systematic Null Value Support
Null values are completely supported in relational databases. They should
be uniformly considered as ‘missing information’. Null values are
independent of any data type. They should not be mistaken for blanks or
zeroes or empty strings. Null values can also be interpreted as
‘inapplicable data’ or ‘unknown information.’
Rule 4 − Rule of Active and online relational Catalog

14
In the Database Management Systems lexicon, ‘metadata’ is the data about
the database or the data about the data. The active online catalog that
stores the metadata is called ‘Data dictionary’. The so called data
dictionary is accessible only by authored users who have the required
privileges and the query languages used for accessing the database should
be used for accessing the data of data dictionary.
Rule 5 − Rule of Comprehensive Data Sub-language
A single robust language should be able to define integrity constraints,
views, data manipulations, transactions and authorizations. If the
database allows access to the aforementioned ones, it is violating this rule.
Rule 6 − Rule of Updating Views
Views should reflect the updates of their respective base tables and vice
versa. A view is a logical table which shows restricted data. Views
generally make the data readable but not modifiable. Views help in data
abstraction.
Rule 7 − Rule of Set level insertion, update and deletion
A single operation should be sufficient to retrieve, insert, update and
delete the data.
Rule 8 − Rule of Physical Data Independence
Batch and end user operations are logically separated from physical
storage and respective access methods.
Rule 9 − Rule of Logical Data Independence
Batch and end users can change the database schema without having to
recreate it or recreate the applications built upon it.
Rule 10 − Rule of Integrity Independence
Integrity constraints should be available and stored as metadata in data
dictionary and not in the application programs.
Rule 11 − Rule of Distribution Independence
The Data Manipulation Language of the relational system should not be
concerned about the physical data storage and no alterations should be
required if the physical data is centralized or distributed.
Rule 12 − Rule of Non Subversion
Any row should obey the security and integrity constraints imposed. No
special privileges are applicable.
Almost all full scale DBMSs are RDMSs. Oracle implements 11+ rules and
so does Sybase. SQL Server also implements 11+ rules while FoxPro
implements 7+ rules
ER Model:-
ER Diagram:
• Entity Relational model is a model for identifying entities to be
represented in the database and representation of how those
entities are related.
• The ER data model specifies enterprise schema that represents
the overall logical structure of a database graphically.

15
• E-R diagrams are used to model real-world objects like a person,
a car, a company and the relation between these real-world
objects.
Features of ER model
i) E-R diagrams are used to represent E-R model in a database, which
makes them easy to be converted into relations (tables).
ii) E-R diagrams provide the purpose of real-world modeling of objects
which makes them intently useful.
iii) E-R diagrams require no technical knowledge and no hardware
support.
iv) These diagrams are very easy to understand and easy to create even
by a naive user.
v) It gives a standard solution of visualizing the data logically.
ER Model is used to model the logical view of the system from a data
perspective which consists of these components:

Components of the E-R Model


Entity, Entity Type, Entity Set –
An Entity may be an object with a physical existence – a particular person,
car, house, or employee – or it may be an object with a conceptual
existence – a company, a job, or a university course.
An Entity is an object of Entity Type and a set of all entities is called as
an entity set. e.g.; E1 is an entity having Entity Type Student and set of

16
all students is called Entity Set. In ER diagram, Entity Type is represented
as:

Attribute(s):
Attributes are the properties that define the entity type. For example,
Roll_No, Name, DOB, Age, Address, Mobile_No are the attributes that
define entity type Student. In ER diagram, the attribute is represented by
an oval.

1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is
called key attribute.For example, Roll_No will be unique for each student.
In ER diagram, key attribute is represented by an oval with underlying
lines.

2. Composite Attribute –
An attribute composed of many other attribute is called as composite
attribute. For example, Address attribute of student Entity type consists
of Street, City, State, and Country. In ER diagram, composite attribute is
represented by an oval comprising of ovals.

3. Multivalued Attribute –
An attribute consisting more than one value for a given entity. For
example, Phone_No (can be more than one for a given student). In ER
diagram, a multivalued attribute is represented by a double oval.

17
4. Derived Attribute –
An attribute that can be derived from other attributes of the entity type
is known as a derived attribute. e.g.; Age (can be derived from DOB). In
ER diagram, the derived attribute is represented by a dashed oval.

The complete entity type Student with its attributes can be represented
as:

Relationship Type and Relationship Set:


A relationship type represents the association between entity types. For
example,‘Enrolled in’ is a relationship type that exists between entity type
Student and Course. In ER diagram, the relationship type is represented
by a diamond and connecting the entities with lines.

18
A set of relationships of the same type is known as a relationship set. The
following relationship set depicts S1 as enrolled in C2, S2 is enrolled in
C1, and S3 is enrolled in C3.

Degree of a relationship set:


The number of different entity sets participating in a relationship set is
called as the degree of a relationship set.
1. Unary Relationship –
When there is only ONE entity set participating in a relation, the
relationship is called a unary relationship. For example, one person is
married to only one person.

2. Binary Relationship –
When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is
enrolled in a Course.

3. n-ary Relationship –
When there are n entities set participating in a relation, the relationship
is called an an n-ary relationship.
Cardinality:

19
The number of times an entity of an entity set participates in a
relationship set is known as cardinality. Cardinality can be of different
types:
1. One-to-one – When each entity in each entity set can take part only
once in the relationship, the cardinality is one-to-one. Let us assume that
a male can marry one female and a female can marry one male. So the
relationship will be one-to-one.
the total number of tables that can be used in this is 2.

Using Sets, it can be represented as:

2. Many to one – When entities in one entity set can take part only once
in the relationship set and entities in other entity sets can take part
more than once in the relationship set, cardinality is many to one. Let
us assume that a student can take only one course but one course can be
taken by many students. So the cardinality will be n to 1. It means that
for one course there can be n students but for one student, there will be
only one course.
The total number of tables that can be used in this is 3.

Using Sets, it can be represented as:

20
In this case, each student is taking only 1 course but 1 course has been
taken by many students.
3. Many to many – When entities in all entity sets can take part more
than once in the relationship cardinality is many to many. Let us assume
that a student can take more than one course and one course can be taken
by many students. So the relationship will be many to many.
the total number of tables that can be used in this is 3.

Using sets, it can be represented as:

In this example, student S1 is enrolled in C1 and C3 and Course C3 is


enrolled by S1, S3, and S4. So it is many-to-many relationships.
In this, there is one-to-many mapping as well where each entity can be
related to more than one relationship and the total number of tables that
can be used in this is 2.
Participation Constraint:
Participation Constraint is applied to the entity participating in the
relationship set.
1. Total Participation – Each entity in the entity set must participate in
the relationship. If each student must enroll in a course, the participation

21
of students will be total. Total participation is shown by a double line in
the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT
participate in the relationship. If some courses are not enrolled by any
of the students, the participation of the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity
set having total participation and Course Entity set having partial
participation.

Using set, it can be represented as,

Every student in the Student Entity set is participating in a relationship


but there exists a course C4 that is not taking part in the relationship.
Weak Entity Type and Identifying Relationship:
As discussed before, an entity type has a key attribute that uniquely
identifies each entity in the entity set. But there exists some entity type
for which key attributes can’t be defined. These are called Weak Entity
types.
For example, A company may store the information of dependents
(Parents, Children, Spouse) of an Employee. But the dependents don’t
have existed without the employee. So Dependent will be a weak entity
type and Employee will be Identifying Entity type for Dependent.
A weak entity type is represented by a double rectangle. The participation
of weak entity types is always total. The relationship between the weak
entity type and its identifying strong entity type is called identifying
relationship and it is represented by a double diamond.

22
ER Design Issues:-

In the previous sections of the data modeling, we learned to design an ER


diagram. We also discussed different ways of defining entity sets and
relationships among them. We also understood the various designing
shapes that represent a relationship, an entity, and its attributes. However,
users often mislead the concept of the elements and the design process of
the ER diagram. Thus, it leads to a complex structure of the ER diagram
and certain issues that does not meet the characteristics of the real-world
enterprise model.

Here, we will discuss the basic design issues of an ER database schema in


the following points:

1) Use of Entity Set vs Attributes

The use of an entity set or attribute depends on the structure of the real-
world enterprise that is being modelled and the semantics associated with
its attributes. It leads to a mistake when the user use the primary key of
an entity set as an attribute of another entity set. Instead, he should use
the relationship to do so. Also, the primary key attributes are implicit in
the relationship set, but we designate it in the relationship sets.

2) Use of Entity Set vs. Relationship Sets


It is difficult to examine if an object can be best expressed by an entity set
or relationship set. To understand and determine the right use, the user
need to designate a relationship set for describing an action that occurs
in-between the entities. If there is a requirement of representing the object
as a relationship set, then its better not to mix it with the entity set.

3) Use of Binary vs n-ary Relationship Sets


Generally, the relationships described in the databases are binary
relationships. However, non-binary relationships can be represented by
several binary relationships. For example, we can create and represent a
ternary relationship 'parent' that may relate to a child, his father, as well
as his mother. Such relationship can also be represented by two binary
relationships i.e, mother and father, that may relate to their child. Thus, it
is possible to represent a non-binary relationship by a set of distinct binary
relationships.

4) Placing Relationship Attributes

The cardinality ratios can become an affective measure in the placement


of the relationship attributes. So, it is better to associate the attributes of
one-to-one or one-to-many relationship sets with any participating entity
sets, instead of any relationship set. The decision of placing the specified
attribute as a relationship or entity attribute should possess the
charactestics of the real world enterprise that is being modelled.

For example, if there is an entity which can be determined by the


combination of participating entity sets, instead of determing it as a
separate entity. Such type of attribute must be associated with the many-
to-many relationship sets.

23
Thus, it requires the overall knowledge of each part that is involved inb
desgining and modelling an ER diagram. The basic requirement is to
analyse the real-world enterprise and the connectivity of one entity or
attribute with other.

Keys:-

o Keys play an important role in the relational database.


o It is used to uniquely identify any record or row of data from the
table. It is also used to establish and identify relationships between
tables.

For example, ID is used as a key in the Student table because it is unique


for each student. In the PERSON table, passport_number, license_number,
SSN are keys since they are unique for each person.

Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an
entity uniquely. An entity can contain multiple keys, as we saw in
the PERSON table. The key which is most suitable from those lists
becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique
for each employee. In the EMPLOYEE table, we can even select
License_Number and Passport_Number as primary keys since they
are also unique.
o For each entity, the primary key selection is based on requirements
and developers.

24
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely
identify a tuple.
o Except for the primary key, the remaining attributes are considered
a candidate key. The candidate keys are as strong as the primary key.

For example: In the EMPLOYEE table, id is best suited for the primary key.
The rest of the attributes, like SSN, Passport_Number, License_Number,
etc., are considered a candidate key.

3. Super Key

Super key is an attribute set that can uniquely identify a tuple. A super key
is a superset of a candidate key.

25
For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME), the name of two employees can be the same, but their
EMPLYEE_ID can't be the same. Hence, this combination can also be a key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME),
etc.

4. Foreign key
o Foreign keys are the column of the table used to point to the primary
key of another table.
o Every employee works in a specific department in a company, and
employee and department are two different entities. So we can't
store the department's information in the employee table. That's why
we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id,
as a new attribute in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both
the tables are related.

26
5. Alternate key

There may be one or more attributes or a combination of attributes that


uniquely identify each tuple in a relation. These attributes or combinations
of the attributes are called the candidate keys. One key is chosen as the
primary key from these candidate keys, and the remaining candidate key,
if it exists, is termed the alternate key. In other words, the total number
of the alternate keys is the total number of candidate keys minus the
primary key. The alternate key may or may not exist. If there is only one
candidate key in a relation, it does not have an alternate key.

For example, employee relation has two attributes, Employee_Id and


PAN_No, that act as candidate keys. In this relation, Employee_Id is chosen
as the primary key, so the other candidate key, PAN_No, acts as the
Alternate key.

6. Composite key

Whenever a primary key consists of more than one attribute, it is known


as a composite key. This key is also known as Concatenated Key.

For example, in employee relations, we assume that an employee may be


assigned multiple roles, and an employee may work on multiple projects
simultaneously. So the primary key will be composed of all three
attributes, namely Emp_ID, Emp_role, and Proj_ID in combination. So these
attributes act as a composite key since the primary key comprises more
than one attribute.

27
7. Artificial key

The key created using arbitrarily assigned data are known as artificial keys.
These keys are created when a primary key is large and complex and has
no relationship with many other relations. The data values of the artificial
keys are usually numbered in a serial order.

For example, the primary key, which is composed of Emp_ID, Emp_role,


and Proj_ID, is large in employee relations. So it would be better to add a
new virtual attribute to identify each tuple in the relation uniquely.

ER Diagram:-

An Entity Relationship Diagram (ER Diagram) pictorially explains the


relationship between entities to be stored in a database. Fundamentally,
the ER Diagram is a structural design of the database. It acts as a framework
created with specialized symbols for the purpose of defining the
relationship between the database entities. ER diagram is created based on
three principal components: entities, attributes, and relationships.

The following diagram showcases two entities - Student and Course, and
their relationship. The relationship described between student and course
is many-to-many, as a course can be opted by several students, and a
student can opt for more than one course. Student entity possesses
attributes - Stu_Id, Stu_Name & Stu_Age. The course entity has attributes
such as Cou_ID & Cou_Name.

28
Weak Entity Set in ER diagrams:-
An entity type should have a key attribute which uniquely identifies each
entity in the entity set, but there exists some entity type for which key
attribute can’t be defined. These are called Weak Entity type.
The entity sets which do not have sufficient attributes to form a primary
key are known as weak entity sets and the entity sets which have a
primary key are known as strong entity sets.
As the weak entities do not have any primary key, they cannot be identified
on their own, so they depend on some other entity (known as owner entity).
The weak entities have total participation constraint (existence
dependency) in its identifying relationship with owner identity. Weak
entity types have partial keys. Partial Keys are set of attributes with the
help of which the tuples of the weak entities can be distinguished and
identified.
Note – Weak entity always has total participation but Strong entity may not
have total participation.
Weak entity is depend on strong entity to ensure the existence of weak
entity. Like strong entity, weak entity does not have any primary key, It
has partial discriminator key. Weak entity is represented by double
rectangle. The relation between one strong and one weak entity is
represented by double diamond.

29
Weak entities are represented with double rectangular box in the ER
Diagram and the identifying relationships are represented with double
diamond. Partial Key attributes are represented with dotted lines.

Example-1:
In the below ER Diagram, ‘Payment’ is the weak entity. ‘Loan Payment’ is
the identifying relationship and ‘Payment Number’ is the partial key.
Primary Key of the Loan along with the partial key would be used to
identify the records.

Example-2:
The existence of rooms is entirely dependent on the existence of a hotel.
So room can be seen as the weak entity of the hotel.
Example-3:
The bank account of a particular bank has no existence if the bank doesn’t
exist anymore.
Example-4:
A company may store the information of dependents (Parents, Children,
Spouse) of an Employee. But the dependents don’t have existence without
the employee. So Dependent will be weak entity type and Employee will be
Identifying Entity type for Dependent.
Other examples:

30
Strong entity | Weak entity
Order | Order Item
Employee | Dependent
Class | Section
Host | Logins

Extended or Enhanced ER model in DBMS?


Extended ER is a high-level data model that incorporates the extensions to
the original ER model. Enhanced ER models are high level models that
represent the requirements and complexities of complex databases.
The extended Entity Relationship (ER) models are three types as given
below −
• Aggregation
• Specialization
• Generalization
Specialization
The process of designing sub groupings within an entity set is called
specialization. It is a top-down process. If an entity set is given with all
the attributes in which the instances of the entity set are differentiated
according to the given attribute value, then that sub-classes or the sub-
entity sets can be formed from the given attribute.
Example
Specialization of a person allows us to distinguish a person according to
whether they are employees or customers. Specialization of account
creates two entity sets: savings account and current account.
In the E-R diagram specialization is represented by triangle components
labeled ISA. The ISA relationship is referred as superclass- subclass
relationship as shown below −

31
Generalization
It is the reverse process of specialization. It is a bottom-up approach.
It converts subclasses to superclasses. This process combines a number
of entity sets that share the same features into higher-level entity sets.
If the sub-class information is given for the given entity set then, ISA
relationship type will be used to represent the connectivity between the
subclass and superclass as shown below −
Example

Aggregation
It is an abstraction in which relationship sets are treated as higher level
entity sets and can participate in relationships. Aggregation allows us to
indicate that a relationship set participates in another relationship set.
Aggregation is used to simplify the details of a given database where
ternary relationships will be changed into binary relationships. Ternary
relation is only one type of relationship which is working between three
entities.
Aggregation is shown in the image below −

32
Relational Model in DBMS:-
the relational Model was proposed by E.F. Codd to model data in the
form of relations or tables. After designing the conceptual model of
the Database using ER diagram, we need to convert the conceptual
model into a relational model which can be implemented using any
RDBMS language like Oracle SQL, MySQL, etc. So we will see what the
Relational Model is.
The relational model represents how data is stored in Relational
Databases. A relational database stores data in the form of relations
(tables). Consider a relation STUDENT with attributes ROLL_NO, NAME,
ADDRESS, PHONE, and AGE shown in Table 1.
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE
1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 18
IMPORTANT TERMINOLOGIES
• Attribute: Attributes are the properties that define a relation.
e.g.; ROLL_NO, NAME
• Relation Schema: A relation schema represents the name of the
relation with its attributes. e.g.; STUDENT (ROLL_NO, NAME,
ADDRESS, PHONE, and AGE) is the relation schema for STUDENT.
If a schema has more than 1 relation, it is called Relational
Schema.
• Tuple: Each row in the relation is known as a tuple. The above
relation contains 4 tuples, one of which is shown as:
1 RAM DELHI 9455123451 18
• Relation Instance: The set of tuples of a relation at a particular
instance of time is called a relation instance. Table 1 shows the
relation instance of STUDENT at a particular time. It can change
whenever there is an insertion, deletion, or update in the
database.

33
• Degree: The number of attributes in the relation is known as the
degree of the relation. The STUDENT relation defined above has
degree 5.
• Cardinality: The number of tuples in a relation is known as
cardinality. The STUDENT relation defined above has cardinality
4.
• Column: The column represents the set of values for a particular
attribute. The column ROLL_NO is extracted from the relation
STUDENT.
ROLL_NO
1
2
3
4
• NULL Values: The value which is not known or unavailable is
called a NULL value. It is represented by blank space. e.g.; PHONE
of STUDENT having ROLL_NO 4 is NULL.
Constraints in Relational Model
While designing the Relational Model, we define some conditions which
must hold for data present in the database are called Constraints. These
constraints are checked before performing any operation (insertion,
deletion, and updation ) in the database. If there is a violation of any of
the constraints, the operation will fail.
Domain Constraints: These are attribute-level constraints. An attribute
can only take values that lie inside the domain range. e.g; If a constraint
AGE>0 is applied to STUDENT relation, inserting a negative value of AGE
will result in failure.
Key Integrity: Every relation in the database should have at least one set
of attributes that defines a tuple uniquely. Those set of attributes is
called keys. e.g.; ROLL_NO in STUDENT is a key. No two students can have
the same roll number. So a key has two properties:
• It should be unique for all tuples.
• It can’t have NULL values.
Referential Integrity: When one attribute of a relation can only take
values from another attribute of the same relation or any other relation,
it is called referential integrity. Let us suppose we have 2 relations
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE
1 RAM DELHI 9455123451 18 CS
2 RAMESH GURGAON 9652431543 18 CS
3 SUJIT ROHTAK 9156253131 20 ECE
4 SURESH DELHI 18 IT
BRANCH
BRANCH_CODE BRANCH_NAME
CS COMPUTER SCIENCE
IT INFORMATION TECHNOLOGY
ECE ELECTRONICS AND COMMUNICATION ENGINEERING
CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint.
The relation which is referencing another relation is called REFERENCING
RELATION (STUDENT in this case) and the relation to which other relations

34
refer is called REFERENCED RELATION (BRANCH in this case).

ANOMALIES
An anomaly is an irregularity or something which deviates from the
expected or normal state. When designing databases, we identify three
types of anomalies: Insert, Update and Delete.
Insertion Anomaly in Referencing Relation:
We can’t insert a row in REFERENCING RELATION if referencing attribute’s
value is not present in the referenced attribute value. e.g.; Insertion of a
student with BRANCH_CODE ‘ME’ in STUDENT relation will result in an
error because ‘ME’ is not present in BRANCH_CODE of BRANCH.
Deletion/ Updation Anomaly in Referenced Relation:
We can’t delete or update a row from REFERENCED RELATION if the value
of REFERENCED ATTRIBUTE is used in the value of REFERENCING
ATTRIBUTE. e.g; if we try to delete a tuple from BRANCH having
BRANCH_CODE ‘CS’, it will result in an error because ‘CS’ is referenced by
BRANCH_CODE of STUDENT, but if we try to delete the row from BRANCH
with BRANCH_CODE CV, it will be deleted as the value is not been used
by referencing relation. It can be handled by the following method:
ON DELETE CASCADE: It will delete the tuples from REFERENCING
RELATION if the value used by REFERENCING ATTRIBUTE is deleted from
REFERENCED RELATION. e.g; For, if we delete a row from BRANCH with
BRANCH_CODE ‘CS’, the rows in STUDENT relation with BRANCH_CODE CS
(ROLL_NO 1 and 2 in this case) will be deleted.
ON UPDATE CASCADE: It will update the REFERENCING ATTRIBUTE in
REFERENCING RELATION if the attribute value used by REFERENCING
ATTRIBUTE is updated in REFERENCED RELATION. e.g;, if we update a row
from BRANCH with BRANCH_CODE ‘CS’ to ‘CSE’, the rows in STUDENT
relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be
updated with BRANCH_CODE ‘CSE’.
SUPER KEYS:
Any set of attributes that allows us to identify unique rows (tuples) in a
given relationship is known as super keys. Out of these super keys, we
can always choose a proper subset among these which can be used as a
primary key. Such keys are known as Candidate keys. If there is a
combination of two or more attributes that are being used as the primary
key then we call it a Composite key.

Advantages:
• Simple model
• It is Flexible
• It is Secure
• Data accuracy
• Data integrity
• Operations can be applied easily
Disadvantage:
• Not good for large database
• Relation between tables become difficult some time
Characteristics of Relational Model:
• Data is represented into rows and columns called as relation.

35
• Data is stored in tables having relationship between them called
Relational model.
• Relational model supports the operations like Data definition,
Data manipulation, Transaction management.
• Each column have distinct name and they are representing
attribute.
• Each row represents the single entity.

Relational Model in DBMS:-

Relational model can represent as a table with columns and rows. Each row
is known as a tuple. Each table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.

Attribute: It contains the name of a column in a particular table. Each


attribute Ai must have a domain, dom(Ai)

Relational instance: In the relational database system, the relational


instance is represented by a finite set of tuples. Relation instances do not
have duplicate tuples.

Relational schema: A relational schema contains the name of the relation


and name of all columns or attributes.

Relational key: In the relational key, each row has one or more attributes.
It can identify the row in the relation uniquely.

Example: STUDENT Relation

NAME ROLL_NO PHONE_NO ADDRESS AGE


Ram 14795 7305758992 Noida 24
Shyam 12839 9026288936 Delhi 35
Laxman 33289 8583287182 Gurugram 20
Mahesh 27857 7086819134 Ghaziabad 27
Ganesh 17282 9028 9i3988 Delhi 40
o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE
are the attributes.
o The instance of schema STUDENT has 5 tuples.
o t3 = <Laxman, 33289, 8583287182, Gurugram, 20>

Relational Algebra in DBMS:-


Relational Algebra is a procedural query language. Relational algebra
mainly provides a theoretical foundation for relational databases
and SQL. The main purpose of using Relational Algebra is to define
operators that transform one or more input relations into an output
relation. Given that these operators accept relations as input and
produce relations as output, they can be combined and used to
express potentially complex queries that transform potentially many
input relations (whose data are stored in the database) into a single
output relation (the query results). As it is pure mathematics, there is

36
no use of English Keywords in Relational Algebra and operators are
represented using symbols.

Relational algebra is a procedural query language. It gives a step by step


process to obtain the result of the query. It uses operators to perform
queries.

Types of Relational operation


1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).

1. Notation: σ p(r)

Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors like:
AND OR and NOT. These relational can use as relational operators like =,
≠, ≥, <, >, ≤.

For example: LOAN Relation

PlayNext
Unmute

Current Time 0:00

Duration 18:10
Loaded: 0.37%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
BRANCH_NAME LOAN_NO AMOUNT
Downtown L-17 1000
Redwood L-23 2000
Perryride L-15 1500
Downtown L-14 1500
Mianus L-13 500
Roundhill L-11 900
Perryride L-16 1300

37
Input:

1. σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT


Perryride L-15 1500
Perryride L-16 1300
2. Project Operation:
o This operation shows the list of those attributes that we wish to
appear in the result. Rest of the attributes are eliminated from the
table.
o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)

Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION

NAME STREET CITY


Jones Main Harrison
Smith North Rye
Hays Main Harrison
Curry North Rye
Johnson Alma Brooklyn
Brooks Senator Brooklyn

Input:

1. ∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains
all the tuples that are either in R or S or both in R & S.

38
o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S

A union operation must hold the following condition:

o R and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.

Example:

DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17

Input:
1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes

39
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in both R & S.
o It is denoted by intersection ∩.

1. Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME
Smith
Jones
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in R but not in S.
o It is denoted by intersection minus (-).

1. Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with
each row in the other table. It is also known as a cross product.
o It is denoted by X.

1. Notation: E X D

40
Example:

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT


1 Smith A
2 Harry C
3 John B

DEPARTMENT

DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal

Input:

1. EMPLOYEE X DEPARTMENT

Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME


1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:

The rename operation is used to rename the output relation. It is denoted


by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to


STUDENT1.

ρ(STUDENT1, STUDENT)

Extended Operators in Relational Algebra:-


Basic idea about relational model and basic operators in Relational
Algebra: Relational Model Basic Operators in Relational Algebra Extended
operators are those operators which can be derived from basic operators.
There are mainly three types of extended operators in Relational Algebra:
• Join
• Intersection
• Divide

41
The relations used to understand extended operators are STUDENT,
STUDENT_SPORTS, ALL_SPORTS and EMPLOYEE which are shown in Table
1, Table 2, Table 3 and Table 4 respectively. STUDENT
ROLL_NO NAME ADDRESS PHONE AGE
1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 9156768971 18
Table 1
STUDENT_SPORTS
ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton
Table 2
ALL_SPORTS
SPORTS
Badminton
Cricket
Table 3
EMPLOYEE
EMP_NO NAME ADDRESS PHONE AGE
1 RAM DELHI 9455123451 18
5 NARESH HISAR 9782918192 22
6 SWETA RANCHI 9852617621 21
4 SURESH DELHI 9156768971 18
Table 4
Intersection (∩): Intersection on two relations R1 and R2 can only be
computed if R1 and R2 are union compatible (These two relation should
have same number of attributes and corresponding attributes in two
relations have same domain). Intersection operator when applied on two
relations as R1∩R2 will give a relation with tuples which are in R1 as well
as R2. Syntax:
Relation1 ∩ Relation2
Example: Find a person who is student as well as employee- STUDENT ∩
EMPLOYEE
In terms of basic operators (union and minus) :
STUDENT ∩ EMPLOYEE = STUDENT + EMPLOYEE - (STUDENT U
EMPLOYEE)
RESULT:
ROLL_NO NAME ADDRESS PHONE AGE
1 RAM DELHI 9455123451 18
4 SURESH DELHI 9156768971 18
Conditional Join(⋈c): Conditional Join is used when you want to join two
or more relation based on some conditions. Example: Select students
whose ROLL_NO is greater than EMP_NO of employees
STUDENT⋈c STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
In terms of basic operators (cross product and selection) :
σ (STUDENT.ROLL_NO>EMPLOYEE.EMP_NO)(STUDENT×EMPLOYEE)
RESULT:

42
ROLL NAM ADDR PHONE A EMP_
NA ADDR PHONE A
_NO E ESS GE NO
ME ESS GE
2 RAM GURG
965243 RA 18 1
DELHI 945512 18
ESH AON
1543 M 3451
3 SUJIT ROHT
915625 20 1 RA DELHI 945512 18
AK3131 M 3451
4 SURE DELHI
915676 18 1 RA DELHI 945512 18
SH 8971 M 3451
Equijoin(⋈): Equijoin is a special case of conditional join where only
equality condition holds between a pair of attributes. As values of two
attributes will be equal in result of equijoin, only one attribute will be
appeared in result. Example: Select students whose ROLL_NO is equal to
EMP_NO of employees.
STUDENT⋈STUDENT.ROLL_NO=EMPLOYEE.EMP_NOEMPLOYEE
In terms∏ of basic operators (cross product, selection and projection) :
∏(STUDENT.ROLL_NO, STUDENT.NAME, STUDENT.ADDRESS, STUDENT.PHONE, STUDENT.AGE EMPLOYEE.NAME, EMPLOYEE.ADDRESS,

EMPLOYEE.PHONE, EMPLOYEE>AGE)(σ (STUDENT.ROLL_NO=EMPLOYEE.EMP_NO) (STUDENT×EMPLOYEE))

RESULT:
ROLL_ NAM ADDR PHONE AG NAM ADDR PHONE AG
NO E ESS E E ESS E
1 RAM DELHI
9455123 18 RAM
9455123 DELHI 18
451 451
4 SURE DELHI 9156768 18 SURE DELHI 9156768 18
SH 971 SH 971
Natural Join(⋈): It is a special case of equijoin in which equality
condition hold on all attributes which have same name in relations R and
S (relations on which join operation is applied). While applying natural
join on two relations, there is no need to write equality condition
explicitly. Natural Join will also return the similar attributes only once as
their value will be same in resulting relation. Example: Select students
whose ROLL_NO is equal to ROLL_NO of STUDENT_SPORTS as:
STUDENT⋈STUDENT_SPORTS
In terms of basic operators (cross product, selection and projection) :
∏(STUDENT.ROLL_NO, STUDENT.NAME, STUDENT.ADDRESS, STUDENT.PHONE, STUDENT.AGE STUDENT_SPORTS.SPORTS)(σ

(STUDENT.ROLL_NO=STUDENT_SPORTS.ROLL_NO) (STUDENT×STUDENT_SPORTS))

RESULT:
ROLL_NO NAME ADDRESS PHONE AGE SPORTS
1 RAM DELHI 9455123451 18 Badminton
2 RAMESH GURGAON 9652431543 18 Cricket
2 RAMESH GURGAON 9652431543 18 Badminton
4 SURESH DELHI 9156768971 18 Badminton
Natural Join is by default inner join because the tuples which does not
satisfy the conditions of join does not appear in result set. e.g.; The tuple
having ROLL_NO 3 in STUDENT does not match with any tuple in
STUDENT_SPORTS, so it has not been a part of result set.
Left Outer Join(⟕): When applying join on two relations R and S, some
tuples of R or S does not appear in result set which does not satisfy the
join conditions. But Left Outer Joins gives all tuples of R in the result set.
The tuples of R which do not satisfy join condition will have values as
NULL for attributes of S. Example:Select students whose ROLL_NO is
greater than EMP_NO of employees and details of other students as well

43
STUDENT⟕STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT
ROLL NAM ADDR PHONE A EMP_ NA ADDR PHONE AG
_NO E ESS GE NO ME ESS E
2 RAM GURG 965243 18 1 RA DELHI 945512 18
ESH AON 1543 M 3451
3 SUJI ROHT 915625 20 1 RA DELHI 945512 18
T AK 3131 M 3451
4 SURE DELHI 915676 18 1 RA DELHI 945512 18
SH 8971 M 3451
1 RAM DELHI 945512 18 NULL NU NULL NULL NU
3451 LL LL
Right Outer Join(⟖): When applying join on two relations R and S, some
tuples of R or S does not appear in result set which does not satisfy the
join conditions. But Right Outer Joins gives all tuples of S in the result
set. The tuples of S which do not satisfy join condition will have values
as NULL for attributes of R. Example: Select students whose ROLL_NO is
greater than EMP_NO of employees and details of other Employees as well
STUDENT⟖STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
ROLL NAM ADDR PHONE AG EMP_ NAM ADD PHONE A
_NO E ESS E NO E RESS G
E
2 RAM GURG 965243 18 1 RAM DELHI 945512 18
ESH AON 1543 3451
3 SUJI ROHT 915625 20 1 RAM DELHI 945512 18
T AK 3131 3451
4 SURE DELHI 915676 18 1 RAM DELHI 945512 18
SH 8971 3451
NULL NULL NULL NULL NU 5 NAR HISAR 978291 22
LL ESH 8192
NULL NULL NULL NULL NU 6 SWE RANC 985261 21
LL TA HI 7621
NULL NULL NULL NULL NU 4 SURE DELHI 915676 18
LL SH 8971
Full Outer Join(⟗): When applying join on two relations R and S, some
tuples of R or S does not appear in result set which does not satisfy the
join conditions. But Full Outer Joins gives all tuples of S and all tuples of
R in the result set. The tuples of S which do not satisfy join condition will
have values as NULL for attributes of R and vice versa. Example:Select
students whose ROLL_NO is greater than EMP_NO of employees and
details of other Employees as well and other Students as well
STUDENT⟗STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
ROLL NAM ADDR PHONE AG EMP_ NAM ADD PHONE AG
_NO E ESS E NO E RESS E
2 RAM GURG 965243 18 1 RAM DELHI 945512 18
ESH AON 1543 3451
3 SUJI ROHT 915625 20 1 RAM DELHI 945512 18
T AK 3131 3451
4 SURE DELHI 915676 18 1 RAM DELHI 945512 18
SH 8971 3451

44
NULL NUL NULL NULL NU 5 NAR HISA 978291 22
L LL ESH R 8192
NULL NUL NULL NULL NU 6 SWE RANC 985261 21
L LL TA HI 7621
NULL NUL NULL NULL NU 4 SURE DELHI 915676 18
L LL SH 8971
1 RAM DELHI 945512 18 NULL NUL NULL NULL NU
3451 L LL
Division Operator (÷): Division operator A÷B or A/B can be applied if and
only if:
• Attributes of B is proper subset of Attributes of A.
• The relation returned by division operator will have attributes =
(All attributes of A – All Attributes of B)
• The relation returned by division operator will return those
tuples from relation A which are associated to every B’s tuple.
A
x y
a 1
b 2
a 2
d 4
÷∏
B
y
1
2
The resultant of A/B is
A÷B
x
a
Division can be expressed in terms of Cross Product , Set
Difference and Projection.
In the above example , for A/B , compute all x values that are not
disqualified by some y in B.
x value is disqualified if attaching y value from B, we obtain xy tuple that
is not in A.
Disqualified x values: ∏x(( ∏x(A) × B ) – A)
So A/B = ∏x( A ) − all disqualified tuples
A/B = ∏x( A ) − ∏x(( ∏x(A) × B ) – A)
In the above example , disqualified tuples are
b 2
d 4
So, the resultant is
x
a

Relational Database Model:-


The relational data model was introduced by C. F. Codd in 1970. Currently,
it is the most widely used data model. The relational data model describes

45
the world as “a collection of inter-related relations (or tables).” A relational
data model involves the use of data tables that collect groups of elements
into relations. These models work based on the idea that each table setup
will include a primary key or identifier. Other tables use that identifier to
provide "relational" data links and results.
Today, there are many commercial Relational Database Management
System (RDBMS), such as Oracle, IBM DB2, and Microsoft SQL Server. There
are also many free and open-source RDBMS, such as MySQL, mSQL (mini-
SQL) and the embedded Java DB (Apache Derby). Database administrators
use Structured Query Language (SQL) to retrieve data elements from a
relational database.
As mentioned, the primary key is a fundamental tool in creating and using
relational data models. It must be unique for each member of a data set.
It must be populated for all members. Inconsistencies can cause problems
in how developers retrieve data. Other issues with relational database
designs include excessive duplication of data, faulty or partial data, or
improper links or associations between tables. A large part of routine
database administration involves evaluating all the data sets in a database
to make sure that they are consistently populated and will respond well
to SQL or any other data retrieval method.
For example, a conventional database row would represent a tuple, which
is a set of data that revolves around an instance or virtual object so that
the primary key is its unique identifier. A column name in a data table is
associated with an attribute, an identifier or feature that all parts of a data
set have. These and other strict conventions help to provide database
administrators and designers with standards for crafting relational
database setups.
Database Design Objective
• Eliminate Data Redundancy: the same piece of data shall not
be stored in more than one place. This is because duplicate
data not only waste storage spaces but also easily lead to
inconsistencies.
• Ensure Data Integrity and Accuracy: is the maintenance of,
and the assurance of the accuracy and consistency of, data over
its entire life-cycle, and is a critical aspect to the design,
implementation, and usage of any system which stores,
processes, or retrieves data.
The relational model has provided the basis for:

• Research on the theory of data/relationship/constraint


• Numerous database design methodologies
• The standard database access language called structured query
language (SQL)
• Almost all modern commercial database management systems
Relational databases go together with the development of SQL. The
simplicity of SQL - where even a novice can learn to perform basic queries
in a short period of time - is a large part of the reason for the popularity
of the relational model.

46
The two tables below relate to each other through the product code field.
Any two tables can relate to each other simply by creating a field they have
in common.
Table 1
Product_code Description Price
A416 Colour Pen ₹ 25.00
C923 Pencil box ₹ 45.00
Table 2
Invoice_code Invoice_line Product_code Quantity
3804 1 A416 15
3804 2 C923 24
There are four stages of an RDM which are as follows −

• Relations and attributes − The various tables and attributes


related to each table are identified. The tables represent
entities, and the attributes represent the properties of the
respective entities.
• Primary keys − The attribute or set of attributes that help in
uniquely identifying a record is identified and assigned as the
primary key.
• Relationships −The relationships between the various tables
are established with the help of foreign keys. Foreign keys are
attributes occurring in a table that are primary keys of another
table. The types of relationships that can exist between the
relations (tables) are One to one, One to many, and Many to
many
• Normalization − This is the process of optimizing the database
structure. Normalization simplifies the database design to
avoid redundancy and confusion. The different normal forms
are as follows:
1. First normal form 2. Second normal form 3.
Third normal form 4. Boyce-Codd normal form 5. Fifth
normal form
By applying a set of rules, a table is normalized into the above normal
forms in a linearly progressive fashion. The efficiency of the design gets
better with each higher degree of normalization.
Advantages of Relational Databases
The main advantages of relational databases are that they enable users to
easily categorize and store data that can later be queried and filtered to
extract specific information for reports. Relational databases are also easy
to extend and aren't reliant on the physical organization. After the original
database creation, a new data category can be added without all existing
applications being modified.

47
Other Advantages
• Accurate − Data is stored just once, which eliminates data
deduplication.
• Flexible − Complex queries are easy for users to carry out.
• Collaborative −Multiple users can access the same database.
• Trusted −Relational database models are mature and well-
understood.
• Secure − Data in tables within relational database management
systems (RDBMS) can be limited to allow access by only
particular user

Functional Dependency:-

The functional dependency is a relationship that exists between two


attributes. It typically exists between the primary key and non-key
attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the


production is known as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name,


Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of


employee table because if we know the Emp_Id, we can tell that employee
name associated with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency

1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Nam


e.

48
2. {Employee_id, Employee_Name} → Employee_Id is a trivial functi
onal dependency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Emplo
yee_Name are trivial dependencies too.
2. Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not a subset of
A.
o When A intersection B is NULL, then A → B is called as complete non-
trivial.

Example:

1. ID → Name,
2. Name → DOB

Anomalies:-
Anomalies in the relational model refer to inconsistencies or errors that
can arise when working with relational databases, specifically in the
context of data insertion, deletion, and modification. There are different
types of anomalies that can occur in referencing and referenced relations
which can be discussed as:
These anomalies can be categorized into three types:
1. Insertion Anomalies
2. Deletion Anomalies
3. Update Anomalies.
How Are Anomalies Caused in DBMS?
Database anomalies are the faults in the database caused due to poor
management of storing everything in the flat database. It can be removed
with the process of Normalization, which generally splits the database
which results in reducing the anomalies in the database.
STUDENT Table
STUD_N STUD_NA STUD_PHO STUD_STA STUD- STUD_AG
O ME NE TE COUNTR E
Y
1 RAM 971627172 Haryana India 20
1
2 RAM 989829128 Punjab India 19
1
3 SUJIT 789829198 Rajasthan India 18
1
4 SURESH Punjab India 21
Table
1
STUDENT_COURSE
STUD_NO COURSE_NO COURSE_NAME
1 C1 DBMS
2 C2 Computer Networks

49
1 C2 Computer Networks
Table
2
Insertion anomaly: If a tuple is inserted in referencing relation and
referencing attribute value is not present in referenced attribute, it will
not allow insertion in referencing relation.
Example: If we try to insert a record in STUDENT_COURSE with STUD_NO
=7, it will not allow it.
Deletion and Updation anomaly: If a tuple is deleted or updated from
referenced relation and the referenced attribute value is used by
referencing attribute in referencing relation, it will not allow deleting the
tuple from referenced relation.
Example: If we want to update a record from STUDENT_COURSE with
STUD_NO =1, We have to update it in both rows of the table. If we try to
delete a record from STUDENT with STUD_NO =1, it will not allow it.
To avoid this, the following can be used in query:
• ON DELETE/UPDATE SET NULL: If a tuple is deleted or updated
from referenced relation and the referenced attribute value is
used by referencing attribute in referencing relation, it will
delete/update the tuple from referenced relation and set the
value of referencing attribute to NULL.
• ON DELETE/UPDATE CASCADE: If a tuple is deleted or updated
from referenced relation and the referenced attribute value is
used by referencing attribute in referencing relation, it will
delete/update the tuple from referenced relation and
referencing relation as well.

How these Anomalies Occur

• Insertion Anomalies: These anomalies occur when it is not


possible to insert data into a database because the required
fields are missing or because the data is incomplete. For
example, if a database requires that every record has a primary
key, but no value is provided for a particular record, it cannot be
inserted into the database.
• Deletion anomalies: These anomalies occur when deleting a
record from a database and can result in the unintentional loss
of data. For example, if a database contains information about
customers and orders, deleting a customer record may also
delete all the orders associated with that customer.
• Update anomalies: These anomalies occur when modifying data
in a database and can result in inconsistencies or errors. For
example, if a database contains information about employees
and their salaries, updating an employee’s salary in one record
but not in all related records could lead to incorrect calculations
and reporting.
Removal of Anomalies
These anomalies can be avoided or minimized by designing databases
that adhere to the principles of normalization. Normalization involves
organizing data into tables and applying rules to ensure data is stored in
a consistent and efficient manner. By reducing data redundancy and

50
ensuring data integrity, normalization helps to eliminate anomalies and
improve the overall quality of the database
According to E.F.Codd, who is the inventor of the Relational Database, the
goals of Normalization include:
• it helps in removing all the repeated data from the database.
• it helps in removing undesirable deletion, insertion, and update
anomalies.
• it helps in making a proper and useful relationship between
tables.

Normalization:-
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes
in a relation. Functional dependency says that if two tuples have same
values for attributes A1, A2,..., An, then those two tuples must have to have
same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y,
where X functionally determines Y. The left-hand side attributes determine
the values of attributes on the right-hand side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as
F+, is the set of all functional dependencies logically implied by F.
Armstrong's Axioms are a set of rules, that when applied repeatedly,
generates a closure of functional dependencies.
• Reflexive rule − If alpha is a set of attributes and beta
is_subset_of alpha, then alpha holds beta.
• Augmentation rule − If a → b holds and y is attribute set, then
ay → by also holds. That is adding attributes in dependencies,
does not change the basic dependencies.
• Transitivity rule − Same as transitive rule in algebra, if a → b
holds and b → c holds, then a → c also holds. a → b is called as
a functionally that determines b.
Trivial Functional Dependency
• Trivial − If a functional dependency (FD) X → Y holds, where Y
is a subset of X, then it is called a trivial FD. Trivial FDs always
hold.
• Non-trivial − If an FD X → Y holds, where Y is not a subset of X,
then it is called a non-trivial FD.
• Completely non-trivial − If an FD X → Y holds, where x
intersect Y = Φ, it is said to be a completely non-trivial FD.
Normalization
If a database design is not perfect, it may contain anomalies, which are like
a bad dream for any database administrator. Managing a database with
anomalies is next to impossible.
• Update anomalies − If data items are scattered and are not
linked to each other properly, then it could lead to strange
situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances

51
get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.
• Deletion anomalies − We tried to delete a record, but parts of
it was left undeleted because of unawareness, the data is also
saved somewhere else.
• Insert anomalies − We tried to insert data in a record that does
not exist at all.
Normalization is a method to remove all these anomalies and bring the
database to a consistent state.
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself.
This rule defines that all the attributes in a relation must have atomic
domains. The values in an atomic domain are indivisible units.

We re-arrange the relation (table) as below, to convert it to First Normal


Form.

Each attribute must contain only a single value from its pre-defined
domain.
Second Normal Form
Before we learn about the second normal form, we need to understand the
following −
• Prime attribute − An attribute, which is a part of the candidate-
key, is known as a prime attribute.
• Non-prime attribute − An attribute, which is not a part of the
prime-key, is said to be a non-prime attribute.
If we follow second normal form, then every non-prime attribute should
be fully functionally dependent on prime key attribute. That is, if X → A
holds, then there should not be any proper subset Y of X, for which Y → A
also holds true.

52
We see here in Student_Project relation that the prime key attributes are
Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name
and Proj_Name must be dependent upon both and not on any of the prime
key attribute individually. But we find that Stu_Name can be identified by
Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form.

We broke the relation in two as depicted in the above picture. So there


exists no partial dependency.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal
form and the following must satisfy −

• No non-prime attribute is transitively dependent on prime key


attribute.
• For any non-trivial functional dependency, X → A, then either −
o X is a superkey or,

o A is prime attribute.

We find that in the above Student_detail relation, Stu_ID is the key and only
prime key attribute. We find that City can be identified by Stu_ID as well
as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

53
To bring this relation into third normal form, we break the relation into
two relations as follows −

Boyce-Codd Normal Form


Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on
strict terms. BCNF states that −

• For any non-trivial functional dependency, X → A, X must be a


super-key.
In the above image, Stu_ID is the super-key in the relation Student_Detail
and Zip is the super-key in the relation ZipCodes. So,
Stu_ID → Stu_Name, Zip
and
Zip → City
Which confirms that both the relations are in BCNF.

Normal Forms in DBMS:-


Normalization is the process of minimizing redundancy from a relation
or set of relations. Redundancy in relation may cause insertion, deletion,
and update anomalies. So, it helps to minimize the redundancy in
relations. Normal forms are used to eliminate or reduce redundancy in
database tables.

1. First Normal Form –

If a relation contain composite or multi-valued attribute, it violates first


normal form or a relation is in first normal form if it does not contain any
composite or multi-valued attribute. A relation is in first normal form if
every attribute in that relation is singled valued attribute.
• Example 1 – Relation STUDENT in table 1 is not in 1NF because
of multi-valued attribute STUD_PHONE. Its decomposition into
1NF has been shown in table 2.

54
• Example 2 –

• ID Name Courses
• ------------------
• 1 A c1, c2
• 2 E c3
• 3 M C2, c3
In the above table Course is a multi-valued attribute so it is not
in 1NF.
Below Table is in 1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3

2. Second Normal Form –

To be in second normal form, a relation must be in first normal form and


relation must not contain any partial dependency. A relation is in 2NF if
it has No Partial Dependency, i.e., no non-prime attribute (attributes
which are not part of any candidate key) is dependent on any proper
subset of any candidate key of the table.
Partial Dependency – If the proper subset of candidate key determines
non-prime attribute, it is called partial dependency.
• Example 1 – Consider table-3 as following below.
• STUD_NO COURSE_NO COURSE_FEE

55
• 1 C1 1000
• 2 C2 1500
• 1 C4 2000
• 4 C3 1000
• 4 C1 1000
• 2 C5 2000
{Note that, there are many courses having the same course fee. }
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or
STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value
of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not
belong to the one only candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent
on COURSE_NO, which is a proper subset of the candidate key.
Non-prime attribute COURSE_FEE is dependent on a proper
subset of the candidate key, which is a partial dependency and
so this relation is not in 2NF.
To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
NOTE: 2NF tries to reduce the redundant data getting stored in
memory. For instance, if there are 100 students taking C1 course,
we don’t need to store its Fee as 1000 for all the 100 records,
instead, once we can store it in the second table as the course
fee for C1 is 1000.
• Example 2 – Consider following functional dependencies in
relation R (A, B , C, D )
• AB -> C [A and B together determine C]

BC -> D [B and C together determine D]


In the above relation, AB is the only candidate key and there is
no partial dependency, i.e., any proper subset of AB doesn’t
determine any non-prime attribute.

56
3. Third Normal Form –

A relation is in third normal form, if there is no transitive


dependency for non-prime attributes as well as it is in second
normal form.
A relation is in 3NF if at least one of the following condition
holds in every non-trivial function dependency X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some
candidate key).

Transitive dependency – If A->B and B->C are two FDs then A->C is
called transitive dependency.
3. Example 1 – In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO ->
STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO -
> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true. So
STUD_COUNTRY is transitively dependent on STUD_NO. It
violates the third normal form. To convert it in third
normal form, we will decompose the relation STUDENT
(STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
4. Example 2 – Consider relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD,
BC} All attributes are on right sides of all functional
dependencies are prime.

4. Boyce-Codd Normal Form (BCNF) –

A relation R is in BCNF if R is in Third Normal Form and for


every FD, LHS is super key. A relation is in BCNF iff in every
non-trivial functional dependency X –> Y, X is a super key.
• Example 1 – Find the highest normal form of a
relation R(A,B,C,D,E) with FD set as {BC->D, AC->BE,
B->E}

57
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of
its subset can determine all attribute of relation, So
AC will be candidate key. A or C can’t be derived from
any other attribute of the relation, so there will be
only 1 candidate key {AC}.
Step 2. Prime attributes are those attributes that are
part of candidate key {A, C} in this example and
others will be non-prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a
relational DBMS does not allow multi-valued or
composite attribute.
The relation is in 2nd normal form because BC->D is
in 2nd normal form (BC is not a proper subset of
candidate key AC) and AC->BE is in 2nd normal form
(AC is candidate key) and B->E is in 2nd normal form
(B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC-
>D (neither BC is a super key nor D is a prime
attribute) and in B->E (neither B is a super key nor E
is a prime attribute) but to satisfy 3rd normal for,
either LHS of an FD should be super key or RHS
should be prime attribute.
So the highest normal form of relation will be 2nd
Normal form.
• Example 2 –For example consider relation R(A, B, C)
A -> BC,
B ->
A and B both are super keys so above relation is in
BCNF.
Key Points –
• BCNF is free from redundancy.
• If a relation is in BCNF, then 3NF is also satisfied.
• If all attributes of relation are prime attribute, then
the relation is always in 3NF.
• A relation in a Relational Database is always and at
least in 1NF form.
• Every Binary Relation ( a Relation with only 2
attributes ) is always in BCNF.
• If a Relation has only singleton candidate keys( i.e.
every candidate key consists of only 1 attribute),
then the Relation is always in 2NF( because no Partial
functional dependency possible).
• Sometimes going for BCNF form may not preserve
functional dependency. In that case go for BCNF only
if the lost FD(s) is not required, else normalize till
3NF only.
• There are many more Normal forms that exist after
BCNF, like 4NF and more. But in real world database
systems it’s generally not required to go beyond
BCNF.

Exercise 1: Find the highest normal form in R (A, B, C, D, E)


under following functional dependencies.

58
ABC --> D
CD --> AE
Important Points for solving above type of question.
1) It is always a good idea to start checking from BCNF, then
3 NF, and so on.
2) If any functional dependency satisfied a normal form then
there is no need to check for lower normal form. For example,
ABC –> D is in BCNF (Note that ABC is a superkey), so no need
to check this dependency for lower normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a
super key so this dependency is not in BCNF. So, R is not in
BCNF.
3NF: ABC -> D we don’t need to check for this dependency as
it already satisfied BCNF. Let us consider CD -> AE. Since E is
not a prime attribute, so the relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD is
a proper subset of a candidate key and it determines E, which
is non-prime attribute. So, given relation is also not in 2 NF.
So, the highest normal form is 1 NF

SQL Constraints:-

SQL constraints are used to specify rules for the data in a table.

Constraints are used to limit the type of data that can go into a table. This
ensures the accuracy and reliability of the data in the table. If there is any
violation between the constraint and the data action, the action is aborted.

Constraints can be column level or table level. Column level constraints


apply to a column, and table level constraints apply to the whole table.

The following constraints are commonly used in SQL:

• NOT NULL - Ensures that a column cannot have a NULL value


• UNIQUE - Ensures that all values in a column are different
• PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely
identifies each row in a table
• FOREIGN KEY - Prevents actions that would destroy links between
tables
• CHECK - Ensures that the values in a column satisfies a specific
condition
• DEFAULT - Sets a default value for a column if no value is specified
• CREATE INDEX - Used to create and retrieve data from the database
very quickly

Integrity Constraints :-

o Integrity constraints are a set of rules. It is used to maintain the


quality of information.

59
o Integrity constraints ensure that the data insertion, updating, and
other processes have to be performed in such a way that data
integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage
to the database.

Types of Integrity Constraint


1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of
values for an attribute.
o The data type of domain includes string, character, integer, time,
date, currency, etc. The value of the attribute must be available in
the corresponding domain.

Example:

2. Entity integrity constraints


o The entity integrity constraint states that primary key value can't be
null.
o This is because the primary key value is used to identify individual
rows in relation and if the primary key has a null value, then we can't
identify those rows.
o A table can contain a null value other than the primary key field.

Example:

60
3. Referential Integrity Constraints
o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1
refers to the Primary Key of Table 2, then every value of the Foreign
Key in Table 1 must be null or be available in Table 2.

Example:

4. Key constraints
o Keys are the entity set that is used to identify an entity within its
entity set uniquely.
o An entity set can have multiple keys, but out of which one key will
be the primary key. A primary key can contain a unique and null
value in the relational table.

DDL Commands & Syntax:-


In this article, we will discuss the overview of DDL commands and will
understand DDL commands like create, alter, truncate, drop. We will
cover each command syntax with the help of an example for better
understanding. Let’s discuss it one by one.
Overview :
Data Definition Language(DDL) is a subset of SQL and a part
of DBMS(Database Management System). DDL consist of Commands to
commands like CREATE, ALTER, TRUNCATE and DROP. These commands
are used to create or modify the tables in SQL.

61
DDL Commands : In this section, We will cover the following DDL
commands as follows.
1. Create
2. Alter
3. truncate
4. drop
5. Rename
Let’s discuss it one by one.
Command-1 :
CREATE
This command is used to create a new table in SQL. The user has to give
information like table name, column names, and their datatypes.
Syntax –
CREATE TABLE table_name
(
column_1 datatype,
column_2 datatype,
column_3 datatype,
....
);
Example
We need to create a table for storing Student information of a particular
College. Create syntax would be as below.
CREATE TABLE Student_info
(
College_Id number(2),
College_name varchar(30),
Branch varchar(10)
);
Command-2
ALTER
This command is used to add, delete or change columns in the existing
table. The user needs to know the existing table name and can do add,
delete or modify tasks easily.
Syntax
Syntax to add a column to an existing table.
ALTER TABLE table_name
ADD column_name datatype;
Example
In our Student_info table, we want to add a new column for CGPA. The
syntax would be as below as follows.
ALTER TABLE Student_info
ADD CGPA number;

62
Command-3
TRUNCATE
This command is used to remove all rows from the table, but the structure
of the table still exists.
Syntax
Syntax to remove an existing table.
TRUNCATE TABLE table_name;
Example
The College Authority wants to remove the details of all students for new
batches but wants to keep the table structure. The command they can use
is as follows.
TRUNCATE TABLE Student_info;
Command-4
DROP
This command is used to remove an existing table along with its structure
from the Database.
Syntax
Syntax to drop an existing table.
DROP TABLE table_name;
Example
If the College Authority wants to change their Database by deleting the
Student_info Table.
DROP TABLE Student_info;
Command -5
RENAME:
It is possible to change name of table with or without data in it using
simple RENAME command.
We can rename any table object at any point of time.
Syntax –
RENAME TABLE <Table Name> To <New_Table_Name>;
Example:
If you want to change the name of the table from Employee to Emp we
can use rename command as
RENAME TABLE Employee To EMP;

Data Manipulation Language (DML):-


A DML (data manipulation language) refers to a computer programming
language that allows you to add (insert), delete (delete), and alter (update)
data in a database. A DML is typically a sublanguage of a larger database
language like SQL, with the DML containing some of the language’s
operators. A DML (data manipulation language) is a group of computer
languages that provide commands for manipulating data in databases.
The majority of SQL statements are categorised as DML (Data Manipulation
Language), which includes SQL commands that deal with modifying data
in a database. It’s the section of the SQL statement that controls who has
access to the database and data. DML statements and DCL statements are
grouped together. Because the DML command isn’t auto-committed, it

63
won’t be able to save all database changes permanently. There’s a chance
they’ll be rolled back.
Here are some different DML commands:

INSERT INTO Command


This command can be used to insert data into a row of a table. INSERT
INTO would insert the values that are mentioned in the ‘Student’ table
below.

Syntax:
INSERT INTO NAME_OF_TABLE (1_column, 2_column, 3_column, ….
N_column)
VALUES (1_value, 2_value, 3_value, …. N_value);
Or
INSERT INTO NAME_OF_TABLE
VALUES (1_value, 2_value, 3_value, …. N_value);

Example:
INSERT INTO Student(Stu_Name, DOB, Phone, Mail)
VALUES(‘Phoebe’, ‘1998-05-26’, 7812865845, ‘user@xyz.com’);

UPDATE Command
This statement in SQL is used to update the data that is present in an
existing table of a database. The UPDATE statement can be used to update
single or multiple columns on the basis of our specific needs.

Syntax:
UPDATE name_of_table SET 1_coumn = 1_value, 2_coumn = 2_value,
3_coumn = 3_value, … , N_coumn = N_value
WHERE condition;
And here,
name_of_table: name of the table
1_column, 2_column, 3_column, …. N_column: name of the first, second,
third, …. nth column.
1_value, 2_value, 3_value, …. N_value: the new value for the first, second,
third, …. nth column.
condition: the condition used to select those rows for which the column
values need to be updated.

Example:
UPDATE Student SET Phone = 9039462901 WHERE Stu_Name = ’Phoebe’;
The WHERE clause in the preceding query is used to select the rows for
which the columns need to be adjusted, and the SET statement has been

64
used to assign new values to a particular column. If the WHERE clause is
not used at all, then all of the rows’ columns will be modified. As a result,
the WHERE clause is used to pick specific rows from the table.
Thus, the example query would update the phone number of the student
with the name ‘Phoebe’.

DELETE Command
The DELETE statement can be used in SQL to delete various records from a
given table. On the basis of the condition that has been set in the WHERE
clause, one can delete single or multiple records.

Syntax:
DELETE FROM name_of_table [WHERE condition];

Example:
DELETE FROM Student WHERE Stu_Name = ’Phoebe’;
The command given above would delete the record for the student with
the name ‘Phoebe’ from the ‘Student’ table. Apart from this, one can also
use the LOCK Table statement to explicitly acquire the shared or exclusive
table lock on a specified table

DCL commands in DBMS:-


Data control language (DCL) is used to access the stored data. It is mainly
used for revoke and to grant the user the required access to a database. In
the database, this language does not have the feature of rollback.
It is a part of the structured query language (SQL).

It helps in controlling access to information stored in a

database. It complements the data manipulation language
(DML) and the data definition language (DDL).
• It is the simplest among three commands.
• It provides the administrators, to remove and set database
permissions to desired users as needed.
• These commands are employed to grant, remove and deny
permissions to users for retrieving and manipulating a
database.
DDL Commands
The Data Definition Language (DDL) commands are as follows −
GRANT Command
It is employed to grant a privilege to a user. GRANT command allows
specified users to perform specified tasks
Syntax
GRANT privilege_name on objectname to user;
Here,
• privilege names are SELECT,UPDATE,DELETE,INSERT,ALTER,ALL
• objectname is table name
• user is the name of the user to whom we grant privileges

65
REVOKE Command
It is employed to remove a privilege from a user. REVOKE helps the owner
to cancel previously granted permissions.
Syntax
REVOKE privilege_name on objectname from user;
SQL Operators:-
An operator is a reserved word or a character that is used to query our
database in a SQL expression. To query a database using operators, we use
a WHERE clause. Operators are necessary to define a condition in SQL, as
they act as a connector between two or more conditions. The operator
manipulates the data and gives the result based on the operator’s
functionality.

An operator is a reserved word or a character that is used to query our


database in a SQL expression. To query a database using operators, we use
a WHERE clause. Operators are necessary to define a condition in SQL, as
they act as a connector between two or more conditions. The operator
manipulates the data and gives the result based on the operator’s
functionality.

What Are the Types of SQL Operators?

Generally, there are three types of operators that are used in SQL.

• Arithmetic Operators
• Comparison Operators
• Logical Operators

Now, let’s look at each one of them in detail.

1. Arithmetic SQL Operators

Arithmetic operators are used to perform arithmetic operations such as


addition, subtraction, division, and multiplication. These operators
usually accept numeric operands. Different operators that come under this
category are given below-

66
Operator Operation Description

+ Addition Adds operands on either side of the


operator

- Subtraction Subtracts the right-hand operand from


the left-hand operand

* Multiplication Multiplies the values on each side

/ Division Divides left-hand operand by right-hand


operand

% Modulus Divides left-hand operand by right-hand


operand and returns the remainder

2. Comparison SQL Operators

Comparison operators in SQL are used to check the equality of two


expressions. It checks whether one expression is identical to another.
Comparison operators are generally used in the WHERE clause of a SQL
query. The result of a comparison operation may be TRUE, FALSE or
UNKNOWN. When one or both the expression is NULL, then the operator
returns UNKNOWN. These operators could be used on all types of
expressions except expressions that contain a text, ntext or an image. The
table below shows different types of comparison operators in SQL:

3. Logical SQL Operators

Logical operators are those operators that take two expressions as


operands and return TRUE or False as output. While working with complex
SQL statements and queries, comparison operators come in handy and
these operators work in the same way as logic gates do.

67
Operator Description

ALL Compares a value to all other values in a set

AND Returns the records if all the conditions separated by


AND are TRUE

ANY Compares a specific value to any other values in a set

SOME Compares a value to each value in a set. It is similar


to ANY operator

SQL
LIKE It returns the rows for which the operand matches a Set
specific pattern

IN Used to compare a value to a specified value in a list

BETWEEN Returns the rows for which the value lies between the
mentioned range

NOT Used to reverse the output of any logical operator

EXISTS Used to search a row in a specified table in the


database

OR Returns the records for which any of the conditions


separated by OR is true

NULL Returns the rows where the operand is NULL

Operation:-

Operators like union, intersect, minus and exist operate on relations.


Corresponding to relational algebra U, ∩ and -. Relations participating in
the operations must have the same set of attributes.
The syntax for the set operators is as follows −
<query1><set operator><query2>
Now, let us understand the set operators in the database management
system (DBMS).
UNION − It returns a table which consists of all rows either appearing in
the result of <query1> or in the result of <query2>
For example,

68
select ename from emp where job=’mamager’ UNION select ename from
emp where job=’analyst’;
UNION ALL − It returns all rows selected by either query, including all
duplicates.
For example,
select salary from emp where job=’manager’ UNION ALL select salary fro,
emp where job=’analyst’);
INTERSECT − It returns all rows that appear in both results <query1> and
<query2>
For example,
select * from orderList1 INTERSECT select * from orderList2;
INTERSECT ALL − It is same as INTERSECT, returns all distinct rows
selected by both queries.
For example,
select * from orderList1 INTERSECT ALL select * from orderList2;
MINUS − It returns those rows which appear in result of <query1> but not
in the result of <query2>
For example,
select * from(select salary from emp where job=’manager’ MINUS select
salary from emp where job=’CEO’);
Example
Consider the step by step query given below −
Step 1
Create table T1(regno number(10), branch varchar2(10));
The output is given herewith: Table created.
Step 2
insert into T1 values(100,'CSE');
insert into T1 values(101,'CSE');
insert into T1 values(102,'CSE');
insert into T1 values(103,'CSE');
insert into T1 values(104,'CSE');
The output will be as follows: 5 rows inserted.
Step 3
create table T2 (regno number(10), branch varchar2(10));
The output is as follows: Table created.
Step 4
insert into T2 values(101,'CSE');
insert into T2 values(102,'CSE');
insert into T2 values(103,'CSE');

69
The output is given herewith: 3 rows inserted.
Step 5
select * from T1;
Output
You will get the following output −
100|CSE
101|CSE
102|CSE
103|CSE
104|CSE
Step 6
select * from T2;
Output
You will get the following output −
101|CSE
102|CSE
103|CSE
Application of set operators
Now apply the set operators on the two tables which are created above.
The syntax for use of set operators is as follows −
select coulmnname(s) from tablename1 operatorname select
columnname(s) from table2;
Union
Given below is the command for usage of Union set operator −
select regno from T1 UNION select regno from T2;
Output
You will get the following output −
100
101
102
103
104
Intersect
Given below is the command for usage of Intersect set operator −
select regno from T1 INTERSECT select regno from T2;
Output
You will get the following output −

70
101
102
103
Minus
Given below is the command for usage of Minus set operator −
select regno from T1 MINUS select regno from T2;
Output
You will get the following output −
100
104

Aggregate Functions in DBMS;-


Aggregate functions in DBMS take multiple rows from the table and return
a value according to the query.
All the aggregate functions are used in Select statement.
Syntax −
SELECT <FUNCTION NAME> (<PARAMETER>) FROM <TABLE NAME>
AVG Function
This function returns the average value of the numeric column that is
supplied as a parameter.
Example: Write a query to select average salary from employee table.
Select AVG(salary) from Employee
COUNT Function
The count function returns the number of rows in the result. It does not
count the null values.
Example: Write a query to return number of rows where salary > 20000.
Select COUNT(*) from Employee where Salary > 20000;
Types −
•COUNT(*): Counts all the number of rows of the table including
null.
• COUNT( COLUMN_NAME): count number of non-null values in
column.
• COUNT( DISTINCT COLUMN_NAME): count number of distinct
values in a column.
MAX Function
The MAX function is used to find maximum value in the column that is
supplied as a parameter. It can be used on any type of data.
Example − Write a query to find the maximum salary in employee table.
Select MAX(salary) from Employee

71
SUM Function
This function sums up the values in the column supplied as a parameter.
Example: Write a query to get the total salary of employees.
Select SUM(salary) from Employee
STDDEV Function
The STDDEV function is used to find standard deviation of the column
specified as argument.
Example − Write a query to find standard deviation of salary in Employee
table.
Select STDDEV(salary) from Employee
VARIANCE Function
The VARIANCE Function is used to find variance of the column specified
as argument.
Example −
Select VARIANCE(salary) from Employee

Constraints in DBMS:-

Constraints in DBMS are the restrictions that are applied to data or


operations on the data. This means that constraints allow only a particular
kind of data to be inserted in the database or only some particular kind of
operations to be performed on the data in the database.

Thus, constraints ensure the correctness of data in a Database


Management System (DBMS).

Now, let us discuss the different types of constraints in DBMS.

Types of Constraints in DBMS

In relational databases, there are mainly 5 types of constraints in DBMS


called relational constraints. They are as follows:

1. Domain Constraints

2. Key Constraints

3. Entity Integrity Constraints

4. Referential Integrity Constraints

5. Tuple Uniqueness Constraints

We will discuss all the constraints in DBMS one by one.

72
Domain Constraints in DBMS

1. The domain means a range of values. In mathematics, the concept of


Domain means the allowed values for a function.

2. Similarly, in DBMS, the Domain Constraint specifies the domain or


set of values.

3. This is a constraint applied to attributes, not tuples. This means that


it defines what values are allowed to be kept inside a particular
column (attribute) for a table.

4. The domain constraint specifies that the value of an attribute must


be an atomic value in its own domain.

Consider the following example table.

Student ID Student Name Marks (in %)

1 Guneet 90

2 Ahan 92

3 Yash 87

4 Lavish 90

5 Ashish 79

So, we can say that this is a valid table. This is because the student ID
attribute can have only integers as ID and it does have only integers as ID.
Also, the names can be strings only and the marks can be integers or
floating values only. So, every attribute for every tuple in this table has its
values within its domain.

Now, consider the table shown below.

Student ID Student Name Marks (in %)

1 Guneet 90

2 Ahan 92

3 Yash 87

73
4 Lavish A

5 Ashish 79

Now, in the table above, the tuple with Student ID = 4 and name = “Lavish”
has marks = A. This is not an integer or float value. So, the domain
constraint is violated here.

Types of Domain constraints in DBMS

1. Not Null: The values that are not assigned or are unknown can be kept
null. These can be the default values for an attribute if the answer is not
known. For instance, in the above table, if we dont know the marks of the
student with student id = 1, we can keep null in the marks attribute.
However, certain attributes cannot be null. For instance, the Student ID is
a must-known value and it can never be null as we can identify each
student uniquely only with the help of Student ID. So, we can apply a “not
null” constraint on the Student ID attribute in the above table.

Create table Students

(Student_id NUMBER not null,

Student_name varchar(30),

marks NUMBER);

2. Check: Let us say we have a class of students. Now, the school decides
that only the students with marks greater than 35% will be declared
qualified in the current class. Also, they decide that only the results of
those students will be entered in the table who have passed the class. So,
they want to check whether the marks of a student are greater than 35% or
not. If not, that student’s data will not be entered in the table. So, for this
kind of constraint application, the Domain Constraint – Check is used.

Create table Students

(Student_id NUMBER not null,

Student_name varchar(30),

marks NUMBER check(marks > 35))

So, this is the Domain Constraint in DBMS. Now, let us move ahead and
study the next constraint i.e. the Tuple Uniqueness Constraint.

74
Tuple Uniqueness Constraint in DBMS

1. This is a very simple constraint. Tuple in DBMS means row or record.

2. As the name suggests, the tuple uniqueness constraint in DBMS


specifies that each tuple in the table must be unique.

3. A tuple is said to be duplicate if all the corresponding attribute


values of that tuple are present in some other tuple simultaneously
in the table.

For instance, let us consider the table shown below.

Student ID Student Name Marks (in %)

1 Guneet 90

2 Ahan 92

3 Yash 87

4 Lavish 90

5 Ashish 79

In this table, all the tuples are unique. So, we can say that the tuple
uniqueness constraint is not violated here.

Consider the table shown below.

Student ID Student Name Marks (in %)

1 Guneet 90

2 Ahan 92

1 Guneet 90

4 Lavish 90

5 Ashish 79

In the above table, tuple 1 and 3 have all the attribute values same. Thus
these 2 tuples are not unique or we can simply say that they are duplicate
tuples. So, the tuple uniqueness constraint is violated here.

75
We can use the UNIQUE keyword for applying the tuple uniqueness
constraint.

Key Constraint in DBMS

1. As the name suggests, this is a constraint applied on an attribute that


we consider to be a primary key. So, the conditions for a primary key
in a table is in fact this constraint.

2. So, we know that a primary key cannot be null.

3. Also, a primary key must be unique.

Consider the table shown below

Student ID Student Name Marks (in %)

1 Guneet 90

2 Ahan 92

3 Yash 87

4 Lavish 90

5 Ashish 79

Here, the Student_ID is a primary key. So, we can see that all the tuples
have a unique value for Student_ID and there is no tuple in which the
Student_ID attribute is null. So, in the above table, the attribute Student_ID
satisfies the Key constraint.

Create table Students

(Student_ID NUMBER not null,

Student_name varchar(30),

marks NUMBER,

PRIMARY KEY (Student_ID))

Consider the table shown below.

Student ID Student Name Marks (in %)

76
1 Guneet 90

2 Ahan 92

1 Yash 87

4 Lavish 90

null Ashish 79

In the table above, the tuples 1 and 3 have value of Student_ID = 1. So, the
primary key constraint is violated here as the values are not unique.

Also, in the 5th tuple, we can see that the value of Student_ID is null. This
can’t be allowed as the Student_ID is the primary key. Hence, we can say
that the Key constraint is violated there as well.

Views:- A database view is a subset of a database and is based on a query


that runs on one or more database tables. Database views are saved in the
database as named queries and can be used to save frequently used,
complex queries.

There are two types of database views: dynamic views and static views.
Dynamic views can contain data from one or two tables and automatically
include all of the columns from the specified table or tables. Dynamic
views are automatically updated when related objects or extended objects
are created or changed. Static views can contain data from multiple tables
and the required columns from these tables must be specified in the
SELECT and WHERE clauses of the static view. Static views must be
manually updated when related objects or extended objects are created or
changed.

When you create a dynamic view with data from two tables, you must
ensure that both tables have the same PRIMARYKEYCOLSEQ columns or
contain unique indexes with the same column name in the same order.

Database views are populated depending on the object on which they are
based. For example, if you add or remove an attribute from the
WORKORDER object, the attribute is either added or removed from the
dynamic view that is based on the object. When you change an attribute,
not all changes are applied to the associated database view. For example,
if you change the data type of an attribute, the change is applied to the
database view. However, if you change or add a domain to the default value
of the WORKORDER object, the change is not automatically applied to the
database view. Instead, you must apply this change to the database view.

77
A view is nothing more than a SQL statement that is stored in the database
with an associated name. A view is actually a composition of a table in the
form of a predefined SQL query.
A view can contain all rows of a table or select rows from a table. A view
can be created from one or many tables which depends on the written SQL
query to create a view.
Views, which are a type of virtual tables allow users to do the following −
• Structure data in a way that users or classes of users find
natural or intuitive.
• Restrict access to the data in such a way that a user can see and
(sometimes) modify exactly what they need and no more.
• Summarize data from various tables which can be used to
generate reports.
Creating Views
Database views are created using the CREATE VIEW statement. Views can
be created from a single table, multiple tables or another view.
To create a view, a user must have the appropriate system privilege
according to the specific implementation.
The basic CREATE VIEW syntax is as follows −
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
You can include multiple tables in your SELECT statement in a similar way
as you use them in a normal SQL SELECT query.
Example
Consider the CUSTOMERS table having the following records −

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------

nested queries in DBMS:-


A nested query is a query that has another query embedded within it. The
embedded query is called a subquery.
A subquery typically appears within the WHERE clause of a query. It can
sometimes appear in the FROM clause or HAVING clause.

78
Example
Let’s learn about nested queries with the help of an example.
Find the names of employee who have regno=103
The query is as follows −
select E.ename from employee E where E.eid IN (select S.eid from salary S
where S.regno=103);
Student table
The student table is created as follows −

create table student(id number(10), name varchar2(20),classID


number(10), marks varchar2(20));
Insert into student values(1,'pinky',3,2.4);
Insert into student values(2,'bob',3,1.44);
Insert into student values(3,'Jam',1,3.24);
Insert into student values(4,'lucky',2,2.67);
Insert into student values(5,'ram',2,4.56);
select * from student;
Output
You will get the following output −
Id Name classID Marks
1 Pinky 3 2.4
2 Bob 3 1.44
3 Jam 1 3.24
4 Lucky 2 2.67
5 Ram 2 4.56
Teacher table
The teacher table is created as follows −
Example

Create table teacher(id number(10), name varchar(20), subject


varchar2(10), classID number(10), salary number(30));
Insert into teacher values(1,’bhanu’,’computer’,3,5000);
Insert into teacher values(2,'rekha','science',1,5000);
Insert into teacher values(3,'siri','social',NULL,4500);
Insert into teacher values(4,'kittu','mathsr',2,5500);
select * from teacher;
Output
You will get the following output −
Id Name Subject classID Salary
1 Bhanu Computer 3 5000
2 Rekha Science 1 5000
3 Siri Social NULL 4500
4 Kittu Maths 2 5500

79
Class table
The class table is created as follows −
Example

Create table class(id number(10), grade number(10), teacherID number(10),


noofstudents number(10));
insert into class values(1,8,2,20);
insert into class values(2,9,3,40);
insert into class values(3,10,1,38);
select * from class;
Output
You will get the following output −
Id Grade teacherID No.ofstudents
1 8 2 20
2 9 3 40
3 10 1 38
Now let’s work on nested queries
Example 1

Select AVG(noofstudents) from class where teacherID IN(


Select id from teacher
Where subject=’science’ OR subject=’maths’);
Output
You will get the following output −
20.0
Example 2

SELECT * FROM student


WHERE classID = (
SELECT id
FROM class
WHERE noofstudents = (
SELECT MAX(noofstudents)
FROM class));
Output
You will get the following output −
4|lucky |2|2.67
5|ram |2|4.56

PL/SQL – Cursors:-
In this chapter, we will discuss the cursors in PL/SQL. Oracle creates a
memory area, known as the context area, for processing an SQL statement,

80
which contains all the information needed for processing the statement;
for example, the number of rows processed, etc.
A cursor is a pointer to this context area. PL/SQL controls the context area
through a cursor. A cursor holds the rows (one or more) returned by a SQL
statement. The set of rows the cursor holds is referred to as the active set.
You can name a cursor so that it could be referred to in a program to fetch
and process the rows returned by the SQL statement, one at a time. There
are two types of cursors −

• Implicit cursors
• Explicit cursors
Implicit Cursors
Implicit cursors are automatically created by Oracle whenever an SQL
statement is executed, when there is no explicit cursor for the statement.
Programmers cannot control the implicit cursors and the information in it.
Whenever a DML statement (INSERT, UPDATE and DELETE) is issued, an
implicit cursor is associated with this statement. For INSERT operations,
the cursor holds the data that needs to be inserted. For UPDATE and
DELETE operations, the cursor identifies the rows that would be affected.
In PL/SQL, you can refer to the most recent implicit cursor as the SQL
cursor, which always has attributes such as %FOUND, %ISOPEN,
%NOTFOUND, and %ROWCOUNT. The SQL cursor has additional
attributes, %BULK_ROWCOUNT and %BULK_EXCEPTIONS, designed for
use with the FORALL statement. The following table provides the
description of the most used attributes −
S.No Attribute & Description
1 %FOUND
Returns TRUE if an INSERT, UPDATE, or DELETE statement affected
one or more rows or a SELECT INTO statement returned one or
more rows. Otherwise, it returns FALSE.
2 %NOTFOUND
The logical opposite of %FOUND. It returns TRUE if an INSERT,
UPDATE, or DELETE statement affected no rows, or a SELECT INTO
statement returned no rows. Otherwise, it returns FALSE.
3 %ISOPEN
Always returns FALSE for implicit cursors, because Oracle closes
the SQL cursor automatically after executing its associated SQL
statement.
4
%ROWCOUNT
Returns the number of rows affected by an INSERT, UPDATE, or
DELETE statement, or returned by a SELECT INTO statement.

Any SQL cursor attribute will be accessed as sql%attribute_name as shown


below in the example.

81
Example
We will be using the CUSTOMERS table we had created and used in the
previous chapters.
Select * from customers;

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
+----+----------+-----+-----------+----------+
The following program will update the table and increase the salary of each
customer by 500 and use the SQL%ROWCOUNT attribute to determine the
number of rows affected −

DECLARE
total_rows number(2);
BEGIN
UPDATE customers
SET salary = salary + 500;
IF sql%notfound THEN
dbms_output.put_line('no customers selected');
ELSIF sql%found THEN
total_rows := sql%rowcount;
dbms_output.put_line( total_rows || ' customers selected ');
END IF;
END;
/
When the above code is executed at the SQL prompt, it produces the
following result −
6 customers selected

PL/SQL procedure successfully completed.


If you check the records in customers table, you will find that the rows
have been updated −
Select * from customers;

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2500.00 |
| 2 | Khilan | 25 | Delhi | 2000.00 |
| 3 | kaushik | 23 | Kota | 2500.00 |
| 4 | Chaitali | 25 | Mumbai | 7000.00 |
| 5 | Hardik | 27 | Bhopal | 9000.00 |
| 6 | Komal | 22 | MP | 5000.00 |

82
+----+----------+-----+-----------+----------+
Explicit Cursors
Explicit cursors are programmer-defined cursors for gaining more control
over the context area. An explicit cursor should be defined in the
declaration section of the PL/SQL Block. It is created on a SELECT Statement
which returns more than one row.
The syntax for creating an explicit cursor is −
CURSOR cursor_name IS select_statement;
Working with an explicit cursor includes the following steps −

• Declaring the cursor for initializing the memory


• Opening the cursor for allocating the memory
Fetching the cursor for retrieving the data

• Closing the cursor to release the allocated memory
Declaring the Cursor
Declaring the cursor defines the cursor with a name and the associated
SELECT statement. For example −

CURSOR c_customers IS
SELECT id, name, address FROM customers;
Opening the Cursor
Opening the cursor allocates the memory for the cursor and makes it ready
for fetching the rows returned by the SQL statement into it. For example,
we will open the above defined cursor as follows −

OPEN c_customers;
Fetching the Cursor
Fetching the cursor involves accessing one row at a time. For example, we
will fetch rows from the above-opened cursor as follows −

FETCH c_customers INTO c_id, c_name, c_addr;


Closing the Cursor
Closing the cursor means releasing the allocated memory. For example, we
will close the above-opened cursor as follows −

CLOSE c_customers;

Trigger:-
Trigger: A trigger is a stored procedure in database which automatically
invokes whenever a special event in the database occurs. For example, a
trigger can be invoked when a row is inserted into a specified table or
when certain table columns are being updated.
Syntax:
create trigger [trigger_name]

83
[before | after]
{insert | update | delete}
on [table_name]
[for each row]
[trigger_body]
Explanation of syntax:
1. create trigger [trigger_name]: Creates or replaces an existing
trigger with the trigger_name.
2. [before | after]: This specifies when the trigger will be executed.
3. {insert | update | delete}: This specifies the DML operation.
4. on [table_name]: This specifies the name of the table associated
with the trigger.
5. [for each row]: This specifies a row-level trigger, i.e., the trigger
will be executed for each row being affected.
6. [trigger_body]: This provides the operation to be performed as
trigger is fired
BEFORE and AFTER of Trigger:
BEFORE triggers run the trigger action before the triggering statement is
run. AFTER triggers run the trigger action after the triggering statement
is run.
Example:
Given Student Report Database, in which student marks assessment is
recorded. In such schema, create a trigger so that the total and percentage
of specified marks is automatically inserted whenever a record is insert.
Here, as trigger will invoke before record is inserted so, BEFORE Tag can
be used.
Suppose the database Schema –
mysql> desc Student;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| tid | int(4) | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| subj1 | int(2) | YES | | NULL | |
| subj2 | int(2) | YES | | NULL | |
| subj3 | int(2) | YES | | NULL | |
| total | int(3) | YES | | NULL | |
| per | int(3) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
SQL Trigger to problem statement.
create trigger stud_marks

84
before INSERT
on
Student
for each row
set Student.total = Student.subj1 + Student.subj2 + Student.subj3,
Student.per = Student.total * 60 / 100;
Above SQL statement will create a trigger in the student database in which
whenever subjects marks are entered, before inserting this data into the
database, trigger will compute those two values and insert with the
entered values. i.e.,
mysql> insert into Student values(0, "ABCDE", 20, 20, 20, 0, 0);
Query OK, 1 row affected (0.09 sec)

mysql> select * from Student;


+-----+-------+-------+-------+-------+-------+------+
| tid | name | subj1 | subj2 | subj3 | total | per |
+-----+-------+-------+-------+-------+-------+------+
| 100 | ABCDE | 20 | 20 | 20 | 60 | 36 |
+-----+-------+-------+-------+-------+-------+------+
1 row in set (0.00 sec)

What is RDBMS?
RDBMS stands for Relational Database Management System. RDBMS is the
basis for SQL, and for all modern database systems like MS SQL Server, IBM
DB2, Oracle, MySQL, and Microsoft Access.
A Relational database management system (RDBMS) is a database
management system (DBMS) that is based on the relational model as
introduced by E. F. Codd in 1970.
What is a table?
The data in an RDBMS is stored in database objects which are called
as tables. This table is basically a collection of related data entries and it
consists of numerous columns and rows.
Remember, a table is the most common and simplest form of data storage
in a relational database. Following is an example of a CUSTOMERS table
which stores customer's ID, Name, Age, Salary, City and Country.−
ID Name Age Salary City Country
1 Ramesh 32 2000.00 Hyderabad India
2 Mukesh 40 5000.00 New York USA
3 Sumit 45 4500.00 Muscat Oman
4 Kaushik 25 2500.00 Kolkata India
5 Hardik 29 3500.00 Bhopal India
6 Komal 38 3500.00 Saharanpur India

85
7 Ayush 25 3500.00 Delhi India
8 Javed 29 3700.00 Delhi India
What is a field?
Every table is broken up into smaller entities called fields. For example our
CUSTOMERS table consist of different fields which are ID, Name, Age,
Salary, City and Country.
A field is a column in a table that is designed to maintain specific
information about every record in the table.
What is a Record or a Row?
A record is also called as a row of data is each individual entry that exists
in a table. For example, there are 7 records in the above CUSTOMERS table.
Following is a single row of data or record in the CUSTOMERS table −
ID Name Age Salary City Country
1 Ramesh 32 2000.00 Hyderabad India
A record is a horizontal entity in a table.
What is a column?
A column is a vertical entity in a table that contains all information
associated with a specific field in a table.
For example, our CUSTOMERS table have different columns to represent
ID, Name, Age, Salary, City and Country.
What is a NULL value?
A NULL value in a table is a value in a field that appears to be blank, which
means a field with a NULL value is a field with no value.
It is very important to understand that a NULL value is different than a
zero value or a field that contains spaces. A field with a NULL value is the
one that has been left blank during a record creation. Following table has
three records where first record has NULL value for the salary and second
record has a zero value for the salary.
ID Name Age Salary City Country
1 Ramesh 32 Hyderabad India
2 Mukesh 40 00.00 New York USA
3 Sumit 45 4500.00 Muscat Oman
SQL Constraints
Constraints are the rules enforced on data columns on a table. These are
used to limit the type of data that can go into a table. This ensures the
accuracy and reliability of the data in the database.
Constraints can either be column level or table level. Column level
constraints are applied only to one column whereas, table level constraints
are applied to the entire table.
Following are some of the most commonly used constraints available in
SQL −
S.N. Constraints
1 NOT NULL Constraint
Ensures that a column cannot have a NULL value.

86
2 DEFAULT Constraint
Provides a default value for a column when none is specified.
3 UNIQUE Constraint
Ensures that all the values in a column are different.
4 PRIMARY Key
Uniquely identifies each row/record in a database table.
5 FOREIGN Key
Uniquely identifies a row/record in any another database table.
6 CHECK Constraint
Ensures that all values in a column satisfy certain conditions.
7 INDEX
Used to create and retrieve data from the database very quickly.
Data Integrity
The following categories of data integrity exist with each RDBMS −
• Entity Integrity − This ensures that there are no duplicate rows
in a table.
• Domain Integrity − Enforces valid entries for a given column
by restricting the type, the format, or the range of values.
• Referential integrity − Rows cannot be deleted, which are used
by other records.
• User-Defined Integrity − Enforces some specific business
rules that do not fall into entity, domain or referential integrity.
Database Normalization
Database normalization is the process of efficiently organizing data in a
database. There are two reasons of this normalization process −
• Eliminating redundant data, for example, storing the same data
in more than one table.
• Ensuring data dependencies make sense.
Both these reasons are worthy goals as they reduce the amount of space a
database consumes and ensures that data is logically stored.
Normalization consists of a series of guidelines that help guide you in
creating a good database structure.
Normalization guidelines are divided into normal forms; think of a form
as the format or the way a database structure is laid out. The aim of normal
forms is to organize the database structure, so that it complies with the
rules of first normal form, then second normal form and finally the third
normal form.
It is your choice to take it further and go to the Fourth Normal Form, Fifth
Normal Form and so on, but in general, the Third Normal Form is more
than enough for a normal Database Application.
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
Query Optimization:-
Query optimization is of great importance for the performance of a
relational database, especially for the execution of complex SQL
statements. A query optimizer decides the best methods for implementing
each query.

87
The query optimizer selects, for instance, whether or not to use indexes
for a given query, and which join methods to use when joining multiple
tables. These decisions have a tremendous effect on SQL performance, and
query optimization is a key technology for every application, from
operational Systems to data warehouse and analytical systems to content
management systems.
There is the various principle of Query Optimization are as follows −
• Understand how your database is executing your query −
The first phase of query optimization is understanding what
the database is performing. Different databases have different
commands for this. For example, in MySQL, one can use the
“EXPLAIN [SQL Query]” keyword to see the query plan. In
Oracle, one can use the “EXPLAIN PLAN FOR [SQL Query]” to see
the query plan.
• Retrieve as little data as possible − The more information
restored from the query, the more resources the database is
required to expand to process and save these records. For
example, if it can only require to fetch one column from a table,
do not use ‘SELECT *’.
• Store intermediate results − Sometimes logic for a query can
be quite complex. It is possible to produce the desired
outcomes through the use of subqueries, inline views, and
UNION-type statements. For those methods, the transitional
results are not saved in the database but are directly used
within the query. This can lead to achievement issues,
particularly when the transitional results have a huge number
of rows.
There are various query optimization strategies are as follows −
• Use Index − It can be using an index is the first strategy one
should use to speed up a query.
• Aggregate Table − It can be used to pre-populating tables at
higher levels so less amount of information is required to be
parsed.
• Vertical Partitioning − It can be used to partition the table by
columns. This method reduces the amount of information a
SQL query required to process.
• Horizontal Partitioning − It can be used to partition the table
by data value, most often time. This method reduces the
amount of information a SQL query required to process.
• De-normalization − The process of de-normalization combines
multiple tables into a single table. This speeds up query
implementation because fewer table joins are required.
• Server Tuning − Each server has its parameters and provides
tuning server parameters so that it can completely take benefit
of the hardware resources that can significantly speed up
query implementation.
Optimization algorithms:-

The process of selecting an efficient execution plan for processing a query


is known as query optimization.

88
Following query parsing which is a process by which this decision making
is done that for a given query, calculating how many different ways there
are in which the query can run, then the parsed query is delivered to the
query optimizer, which generates various execution plans to analyze the
parsed query and select the plan with the lowest estimated cost. The
catalog manager assists the optimizer in selecting the optimum plan to
perform the query by generating the cost of each plan.

Query optimization is used to access and modify the database in the most
efficient way possible. It is the art of obtaining necessary information in a
predictable, reliable, and timely manner. Query optimization is formally
described as the process of transforming a query into an equivalent form
that may be evaluated more efficiently. The goal of query optimization is
to find an execution plan that reduces the time required to process a query.
We must complete two major tasks to attain this optimization target.

The first is to determine the optimal plan to access the database, and the
second is to reduce the time required to execute the query plan.

Purpose of the Query Optimizer in DBMS

The optimizer tries to come up with the best execution plan possible for a
SQL statement.

Among all the candidate plans reviewed, the optimizer chooses the plan
with the lowest cost. The optimizer computes costs based on available
facts. The cost computation takes into account query execution factors
such as I/O, CPU, and communication for a certain query in a given context.

Sr. No Class Name Role


01 10 Shreya CR
02 10 Ritik

For example, there is a query that requests information about students who
are in leadership roles, such as being a class representative. If the
optimizer statistics show that 50% of students are in positions of
leadership, the optimizer may decide that a full table search is the most
efficient. However, if data show that just a small number of students are
in positions of leadership, reading an index followed by table access by
row id may be more efficient than a full table scan.

Because the database has so many internal statistics and tools at its
disposal, the optimizer is frequently in a better position than the user to
decide the best way to execute a statement. As a result, the optimizer is
used by all SQL statements.

Optimizer Components

The optimizer is made up of three parts: the transformer, the estimator,


and the plan generator. The figure below depicts those components.

89
Query Transformer The query transformer determines whether it is
advantageous to rewrite the original SQL statement into a semantically
equivalent SQL statement at a lower cost for some statements.

When a plausible alternative exists, the database compares the costs of


each alternative and chooses the one with the lowest cost. The query
transformer shown in the query below can be taken as an example of how
query optimization is done by transforming an OR-based input query into
a UNION ALL-based output query.

SELECT *
FROM sales
WHERE promo_id=12
OR prod_id=125;
The given query is transformed using query transformer
SELECT *
FROM sales
WHERE promo_id=124
UNION ALL
SELECT *
FROM sales
WHERE promo_id=12
AND LNNVL(prod_id=125);/*LNNVL provides a concise way to evaluate a
condition when one or both operands of the condition may be null. */

• Estimator

The estimator is the optimizer component that calculates the total cost of
a given execution plan.

To determine the cost, the estimator employs three different methods:

• Selectivity: The query picks a percentage of the rows in the row set,
with 0 indicating no rows and 1 indicating all rows. Selectivity is
determined by a query predicate, such as WHERE the last
name LIKE X%, or by a mix of predicates. As the selectivity value
approaches zero, a predicate gets more selective, and as the value
nears one, it becomes less selective (or more unselective).

For example, The row set can be a base table, a view, or the result of a join.
The selectivity is tied to a query predicate, such as last_name = 'Prakash',
or a combination of predicates, such as last_name = 'Prakash' AND job_id
= 'SDE'.

Transaction processing:-

90
o The transaction is a set of logically related operation. It contains a
group of tasks.
o A transaction is an action or series of actions. It is performed by a
single user to perform operations for accessing the contents of the
database.

Example: Suppose an employee of bank transfers Rs 800 from X's account


to Y's account. This small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:

Following are the main operations of transaction:

Read(X): Read operation is used to read the value of X from the database
and stores it in a buffer in main memory.

Write(X): Write operation is used to write the value back to the database
from the buffer.

Let's take an example to debit transaction from an account which consists


of following operations:

1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a
buffer.

91
o The second operation will decrease the value of X by 500. So buffer
will contain 3500.
o The third operation will write the buffer's value to the database. So
X's final value will be 3500.

But it may be possible that because of the failure of hardware, software or


power, etc. that transaction may fail before finished all the operations in
the set.

For example: If in the above transaction, the debit transaction fails after
executing operation 2 then X's value will remain 4000 in the database
which is not acceptable by the bank.

To solve this problem, we have two important operations:

Concurrency Control:-

Concurrency control concept comes under the Transaction in database


management system (DBMS). It is a procedure in DBMS which helps us for
the management of two simultaneous processes to execute without
conflicts between each other, these conflicts occur in multi user systems.
Concurrency can simply be said to be executing multiple transactions at a
time. It is required to increase time efficiency. If many transactions try to
access the same data, then inconsistency arises. Concurrency control
required to maintain consistency data.
For example, if we take ATM machines and do not use concurrency,
multiple persons cannot draw money at a time in different places. This is
where we need concurrency.
Advantages
The advantages of concurrency control are as follows −
• Waiting time will be decreased.
• Response time will decrease.
• Resource utilization will increase.
• System performance & Efficiency is increased.
Control concurrency
The simultaneous execution of transactions over shared databases can
create several data integrity and consistency problems.
For example, if too many people are logging in the ATM machines, serial
updates and synchronization in the bank servers should happen whenever
the transaction is done, if not it gives wrong information and wrong data
in the database.
Main problems in using Concurrency
The problems which arise while using concurrency are as follows −
• Updates will be lost − One transaction does some changes and
another transaction deletes that change. One transaction
nullifies the updates of another transaction.
• Uncommitted Dependency or dirty read problem − On
variable has updated in one transaction, at the same time

92
another transaction has started and deleted the value of the
variable there the variable is not getting updated or committed
that has been done on the first transaction this gives us false
values or the previous values of the variables this is a major
problem.
• Inconsistent retrievals − One transaction is updating multiple
different variables, another transaction is in a process to
update those variables, and the problem occurs is
inconsistency of the same variable in different instances.
Concurrency control techniques
The concurrency control techniques are as follows −
Locking
Lock guaranties exclusive use of data items to a current transaction. It first
accesses the data items by acquiring a lock, after completion of the
transaction it releases the lock.
Types of Locks
The types of locks are as follows −
• Shared Lock [Transaction can read only the data item values]
• Exclusive Lock [Used for both read and write data item values]
Time Stamping
Time stamp is a unique identifier created by DBMS that indicates relative
starting time of a transaction. Whatever transaction we are doing it stores
the starting time of the transaction and denotes a specific time.
This can be generated using a system clock or logical counter. This can be
started whenever a transaction is started. Here, the logical counter is
incremented after a new timestamp has been assigned.
Database Recovery Techniques in DBMS:-
When a system crashes, it may have several transactions being executed
and various files opened for them to modify the data items. Transactions
are made of various operations, which are atomic in nature. But according
to ACID properties of DBMS, atomicity of transactions as a whole must be
maintained, that is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
• It should check the states of all the transactions, which were
being executed.
• A transaction may be in the middle of some operation; the
DBMS must ensure the atomicity of the transaction in this case.
• It should check whether the transaction can be completed now
or it needs to be rolled back.
• No transactions would be allowed to leave the DBMS in an
inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as
well as maintaining the atomicity of a transaction −
• Maintaining the logs of each transaction, and writing them onto
some stable storage before actually modifying the database.
• Maintaining shadow paging, where the changes are done on a
volatile memory, and later, the actual database is updated.

93
Database recovery techniques are used in database management systems
(DBMS) to restore a database to a consistent state after a failure or error
has occurred. The main goal of recovery techniques is to ensure data
integrity and consistency and prevent data loss. There are mainly two
types of recovery techniques used in DBMS:
Rollback/Undo Recovery Technique: The rollback/undo recovery
technique is based on the principle of backing out or undoing the effects
of a transaction that has not completed successfully due to a system
failure or error. This technique is accomplished by undoing the changes
made by the transaction using the log records stored in the transaction
log. The transaction log contains a record of all the transactions that have
been performed on the database. The system uses the log records to undo
the changes made by the failed transaction and restore the database to
its previous state.
Commit/Redo Recovery Technique: The commit/redo recovery
technique is based on the principle of reapplying the changes made by a
transaction that has been completed successfully to the database. This
technique is accomplished by using the log records stored in the
transaction log to redo the changes made by the transaction that was in
progress at the time of the failure or error. The system uses the log
records to reapply the changes made by the transaction and restore the
database to its most recent consistent state.
In addition to these two techniques, there is also a third technique called
checkpoint recovery. Checkpoint recovery is a technique used to reduce
the recovery time by periodically saving the state of the database in a
checkpoint file. In the event of a failure, the system can use the
checkpoint file to restore the database to the most recent consistent state
before the failure occurred, rather than going through the entire log to
recover the database.
Overall, recovery techniques are essential to ensure data consistency and
availability in DBMS, and each technique has its own advantages and
limitations that must be considered in the design of a recovery system
Database systems, like any other computer system, are subject to
failures but the data stored in them must be available as and when
required. When a database fails it must possess the facilities for fast
recovery. It must also have atomicity i.e. either transaction are completed
successfully and committed (the effect is recorded permanently in the
database) or the transaction should have no effect on the database. There
are both automatic and non-automatic ways for both, backing up of data
and recovery from any failure situations. The techniques used to recover
the lost data due to system crashes, transaction errors, viruses,
catastrophic failure, incorrect commands execution, etc. are database
recovery techniques. So to prevent data loss recovery techniques based
on deferred update and immediate update or backing up data can be used.
Recovery techniques are heavily dependent upon the existence of a
special file known as a system log. It contains information about the start
and end of each transaction and any updates which occur during
the transaction. The log keeps track of all transaction operations that
affect the values of database items. This information is needed to recover
from transaction failure.

94
• The log is kept on disk start_transaction(T): This log entry
records that transaction T starts the execution.
• read_item(T, X): This log entry records that transaction T reads
the value of database item X.
• write_item(T, X, old_value, new_value): This log entry records
that transaction T changes the value of the database item X from
old_value to new_value. The old value is sometimes known as a
before an image of X, and the new value is known as an
afterimage of X.
• commit(T): This log entry records that transaction T has
completed all accesses to the database successfully and its effect
can be committed (recorded permanently) to the database.
• abort(T): This records that transaction T has been aborted.
• checkpoint: Checkpoint is a mechanism where all the previous
logs are removed from the system and stored permanently in a
storage disk. Checkpoint declares a point before which the DBMS
was in a consistent state, and all the transactions were
committed.
A transaction T reaches its commit point when all its operations that
access the database have been executed successfully i.e. the transaction
has reached the point at which it will not abort (terminate without
completing). Once committed, the transaction is permanently recorded in
the database. Commitment always involves writing a commit entry to the
log and writing the log to disk. At the time of a system crash, item is
searched back in the log for all transactions T that have written a
start_transaction(T) entry into the log but have not written a commit(T)
entry yet; these transactions may have to be rolled back to undo their
effect on the database during the recovery process.
• Undoing – If a transaction crashes, then the recovery manager
may undo transactions i.e. reverse the operations of a
transaction. This involves examining a transaction for the log
entry write_item(T, x, old_value, new_value) and set the value of
item x in the database to old-value. There are two major
techniques for recovery from non-catastrophic transaction
failures: deferred updates and immediate updates.
• Deferred update – This technique does not physically update
the database on disk until a transaction has reached its commit
point. Before reaching commit, all transaction updates are
recorded in the local transaction workspace. If a transaction fails
before reaching its commit point, it will not have changed the
database in any way so UNDO is not needed. It may be necessary
to REDO the effect of the operations that are recorded in the local
transaction workspace, because their effect may not yet have
been written in the database. Hence, a deferred update is also
known as the No-undo/redo algorithm
• Immediate update – In the immediate update, the database may
be updated by some operations of a transaction before the
transaction reaches its commit point. However, these operations
are recorded in a log on disk before they are applied to the
database, making recovery still possible. If a transaction fails to
reach its commit point, the effect of its operation must be
undone i.e. the transaction must be rolled back hence we require
both undo and redo. This technique is known as undo/redo
algorithm.

95
• Caching/Buffering – In this one or more disk pages that include
data items to be updated are cached into main memory buffers
and then updated in memory before being written back to disk.
A collection of in-memory buffers called the DBMS cache is kept
under the control of DBMS for holding these buffers. A directory
is used to keep track of which database items are in the buffer.
A dirty bit is associated with each buffer, which is 0 if the buffer
is not modified else 1 if modified.
• Shadow paging – It provides atomicity and durability. A
directory with n entries is constructed, where the ith entry
points to the ith database page on the link. When a transaction
began executing the current directory is copied into a shadow
directory. When a page is to be modified, a shadow page is
allocated in which changes are made and when it is ready to
become durable, all pages that refer to the original are updated
to refer new replacement page.
• Backward Recovery – The term “Rollback ” and “UNDO” can also
refer to backward recovery. When a backup of the data is not
available and previous modifications need to be undone, this
technique can be helpful. With the backward recovery method,
unused modifications are removed and the database is returned
to its prior condition. All adjustments made during the previous
traction are reversed during the backward recovery. In another
word, it reprocesses valid transactions and undoes the
erroneous database updates.
• Forward Recovery – “Roll forward “and “REDO” refers to
forwarding recovery. When a database needs to be updated with
all changes verified, this forward recovery technique is helpful.
Some failed transactions in this database are applied to the
database to roll those modifications forward. In another word,
the database is restored using preserved data and valid
transactions counted by their past saves.
Some of the backup techniques are as follows :
• Full database backup – In this full database including data and
database, Meta information needed to restore the whole
database, including full-text catalogs are backed up in a
predefined time series.
• Differential backup – It stores only the data changes that have
occurred since the last full database backup. When some data
has changed many times since last full database backup, a
differential backup stores the most recent version of the
changed data. For this first, we need to restore a full database
backup.
• Transaction log backup – In this, all events that have occurred
in the database, like a record of every single statement executed
is backed up. It is the backup of transaction log entries and
contains all transactions that had happened to the database.
Through this, the database can be recovered to a specific point
in time. It is even possible to perform a backup from a
transaction log if the data files are destroyed and not even a
single committed transaction is lost.

96
Advanced Database Management System:-
Database Management System or DBMS in short refers to the technology
of storing and retrieving usersí data with utmost efficiency along with
appropriate security measures. This tutorial explains the basics of DBMS
such as its architecture, data models, data schemas, data independence, E-
R model, relation model, relational database design, and storage and file
structure and much more.
Why to Learn DBMS?
Traditionally, data was organized in file formats. DBMS was a new concept
then, and all the research was done to make it overcome the deficiencies
in traditional style of data management. A modern DBMS has the following
characteristics −
• Real-world entity − A modern DBMS is more realistic and uses
real-world entities to design its architecture. It uses the
behavior and attributes too. For example, a school database
may use students as an entity and their age as an attribute.
• Relation-based tables − DBMS allows entities and relations
among them to form tables. A user can understand the
architecture of a database just by looking at the table names.
• Isolation of data and application − A database system is
entirely different than its data. A database is an active entity,
whereas data is said to be passive, on which the database works
and organizes. DBMS also stores metadata, which is data about
data, to ease its own process.
• Less redundancy − DBMS follows the rules of normalization,
which splits a relation when any of its attributes is having
redundancy in values. Normalization is a mathematically rich
and scientific process that reduces data redundancy.
• Consistency − Consistency is a state where every relation in a
database remains consistent. There exist methods and
techniques, which can detect attempt of leaving database in
inconsistent state. A DBMS can provide greater consistency as
compared to earlier forms of data storing applications like file-
processing systems.
• Query Language − DBMS is equipped with query language,
which makes it more efficient to retrieve and manipulate data.
A user can apply as many and as different filtering options as
required to retrieve a set of data. Traditionally it was not
possible where file-processing system was used.
Applications of DBMS
Database is a collection of related data and data is a collection of facts and
figures that can be processed to produce information.
Mostly data represents recordable facts. Data aids in producing
information, which is based on facts. For example, if we have data about
marks obtained by all students, we can then conclude about toppers and
average marks.
A database management system stores data in such a way that it becomes
easier to retrieve, manipulate, and produce information. Following are the
important characteristics and applications of DBMS.

97
• ACID Properties − DBMS follows the concepts
of Atomicity, Consistency, Isolation, and Durability (normally
shortened as ACID). These concepts are applied on
transactions, which manipulate data in a database. ACID
properties help the database stay healthy in multi-
transactional environments and in case of failure.
• Multiuser and Concurrent Access − DBMS supports multi-user
environment and allows them to access and manipulate data in
parallel. Though there are restrictions on transactions when
users attempt to handle the same data item, but users are
always unaware of them.
• Multiple views − DBMS offers multiple views for different
users. A user who is in the Sales department will have a
different view of database than a person working in the
Production department. This feature enables the users to have
a concentrate view of the database according to their
requirements.
• Security − Features like multiple views offer security to some
extent where users are unable to access data of other users and
departments. DBMS offers methods to impose constraints while
entering data into the database and retrieving the same at a
later stage. DBMS offers many different levels of security
features, which enables multiple users to have different views
with different features. For example, a user in the Sales
department cannot see the data that belongs to the Purchase
department. Additionally, it can also be managed how much
data of the Sales department should be displayed to the user.
Since a DBMS is not saved on the disk as traditional file
systems, it is very hard for miscreants to break the code.

object-oriented database management system (OODBMS):-

An object-oriented database management system (OODBMS), sometimes


shortened to ODBMS for object database management system, is a
database management system (DBMS) that supports the modelling and
creation of data as objects. This includes some kind of support
for classes of objects and the inheritance of class properties
and methods by subclasses and their objects.

There is no widely agreed-upon standard for what constitutes an OODBMS,


although the Object Data Management Group (ODMG) published The Object
Data Standard ODMG 3.0 in 2001 which describes an object model as well
as standards for defining and querying objects. The group has since
disbanded.

Currently, Not Only SQL (NoSQL) document database systems are a popular
alternative to the object database. Although they lack all the capabilities

98
of a true ODBMS, NoSQL document databases provide key-based access to
semi-structured data as documents, typically using JavaScript Object
Notation (JSON).

Features of an ODBMS:-

In their influential paper, The Object-Oriented Database Manifesto,


Malcolm Atkinson and others define an OODBMS as follows:

An object-oriented database system must satisfy two criteria: it should be


a DBMS, and it should be an object-oriented system, i.e., to the extent
possible, it should be consistent with the current crop of object-oriented
programming languages.

The first criterion translates into five features: persistence, secondary


storage management, concurrency, recovery and an ad hoc query facility.

The second one translates into eight features: complex objects, object
identity, encapsulation, types or classes, inheritance, overriding combined
with late binding, extensibility and computational completeness.

Web based DB:-

The Web-based database management system is one of the essential parts


of DBMS and is used to store web application data. A web-based Database
management system is used to handle those databases that are having data
regarding E-commerce, E-business, blogs, e-mail, and other online
applications.

Requirement :-While many DBMS sellers are working for providing a


proprietary database for connectivity solutions with the Web, the majority
of the organizations necessitate a more general way out to prevent them
from being tied into a single technology. Here are the lists of some of the
most significant necessities for the database integration applications
within the Web. These requirements are standards and not fully attainable
at present. There is no ranking of orders, and so the requirements are as
follows:

• The ability and right to use valuable corporate data in a fully secured
manner.

99
• Provides data and vendor's autonomous connectivity that allows
freedom of choice in selecting the DBMS for present and future use.
• The capability to interface to the database, independent of any
proprietary Web browser and/or Web server.
• A connectivity solution that takes benefit of all the features of an
organization's DBMS.
• An open-architectural structure that allows interoperability with a
variety of systems and technologies; such as:
o Different types of Web servers
o Microsoft's Distributed Common Object Model (DCOM) /
Common Object Model (COM)
o CORBA / IIOP
o Java / RMI which is Remote Method Invocation
o XML (Extensible Markup Language)
o Various Web services (SOAP, UDDI, etc.)
• A cost-reducing way which allows for scalability, development, and
changes in strategic directions and helps lessen the costs of
developing and maintaining those applications
• Provides support for transactions that span multiple HTTP requests.
• Gives minimal administration overhead.

Benefits of the Web-DBMS Approach


Here are various benefits that come through the use of web-based DBMS
are:

• Provides simplicity
• Web-DBMS is Platform independence
• Provides Graphical User Interface (GUI)
• Standardization
• Provides Cross-platform support
• Facilitates transparent network access
• Scalability
• Innovation

Data Warehousing:-
Background
A Database Management System (DBMS) stores data in the form of tables,
uses ER model and the goal is ACID properties. For example, a DBMS of
college has tables for students, faculty, etc.
A Data Warehouse is separate from DBMS, it stores a huge amount of
data, which is typically collected from multiple heterogeneous sources
like files, DBMS, etc. The goal is to produce statistical results that may
help in decision makings. For example, a college might want to see quick
different results, like how the placement of CS students has improved
over the last 10 years, in terms of salaries, counts, etc.
Need for Data Warehouse
An ordinary Database can store MBs to GBs of data and that too for a
specific purpose. For storing data of TB size, the storage shifted to Data
Warehouse. Besides this, a transactional database doesn’t offer itself to
analytics. To effectively perform analytics, an organization keeps a
central Data Warehouse to closely study its business by organizing,
understanding, and using its historic data for taking strategic decisions
and analyzing trends.
Benefits of Data Warehouse:

100
1. Better business analytics: Data warehouse plays an important
role in every business to store and analysis of all the past data
and records of the company. which can further increase the
understanding or analysis of data to the company.
2. Faster Queries: Data warehouse is designed to handle large
queries that’s why it runs queries faster than the database.
3. Improved data Quality: In the data warehouse the data you
gathered from different sources is being stored and analyzed it
does not interfere with or add data by itself so your quality of
data is maintained and if you get any issue regarding data quality
then the data warehouse team will solve this.
4. Historical Insight: The warehouse stores all your historical data
which contains details about the business so that one can
analyze it at any time and extract insights from it
Data Warehouse vs DBMS

Example Applications of Data Warehousing


Data Warehousing can be applied anywhere where we have a huge amount
of data and we want to see statistical results that help in decision
making.

• Social Media Websites: The social networking websites like


Facebook, Twitter, Linkedin, etc. are based on analyzing large
data sets. These sites gather data related to members, groups,
locations, etc., and store it in a single central repository. Being a
large amount of data, Data Warehouse is needed for
implementing the same.
• Banking: Most of the banks these days use warehouses to see the
spending patterns of account/cardholders. They use this to
provide them with special offers, deals, etc.
• Government: Government uses a data warehouse to store and
analyze tax payments which are used to detect tax thefts.
There can be many more applications in different sectors like E-
Commerce, telecommunications, Transportation Services, Marketing and
Distribution, Healthcare, and Retail

Data Mining:-

101
Data mining is the process of extracting useful information from large
sets of data. It involves using various techniques from statistics, machine
learning, and database systems to identify patterns, relationships, and
trends in the data. This information can then be used to make data-driven
decisions, solve business problems, and uncover hidden insights.
Applications of data mining include customer profiling and
segmentation, market basket analysis, anomaly detection, and predictive
modeling. Data mining tools and technologies are widely used in various
industries, including finance, healthcare, retail, and telecommunications.
In general terms, “Mining” is the process of extraction of some valuable
material from the earth e.g. coal mining, diamond mining, etc. In the
context of computer science, “Data Mining” can be referred to
as knowledge mining from data, knowledge extraction, data/pattern
analysis, data archaeology, and data dredging. It is basically the
process carried out for the extraction of useful information from a bulk
of data or data warehouses. One can see that the term itself is a little
confusing. In the case of coal or diamond mining, the result of the
extraction process is coal or diamond. But in the case of Data Mining, the
result of the extraction process is not data!! Instead, data mining results
are the patterns and knowledge that we gain at the end of the extraction
process. In that sense, we can think of Data Mining as a step in the process
of Knowledge Discovery or Knowledge Extraction.
Gregory Piatetsky-Shapiro coined the term “Knowledge Discovery in
Databases” in 1989. However, the term ‘data mining’ became more
popular in the business and press communities. Currently, Data Mining
and Knowledge Discovery are used interchangeably.
Nowadays, data mining is used in almost all places where a large amount
of data is stored and processed. For example, banks typically use ‘data
mining’ to find out their prospective customers who could be interested
in credit cards, personal loans, or insurance as well. Since banks have the
transaction details and detailed profiles of their customers, they analyze
all this data and try to find out patterns that help them predict that
certain customers could be interested in personal loans, etc.
Main Purpose of Data Mining

102
Data Mining

Basically, Data mining has been integrated with many other techniques
from other domains such as statistics, machine learning, pattern
recognition, database and data warehouse systems, information
retrieval, visualization, etc. to gather more information about the data
and to helps predict hidden patterns, future trends, and behaviors and
allows businesses to make decisions.
Technically, data mining is the computational process of analyzing data
from different perspectives, dimensions, angles and
categorizing/summarizing it into meaningful information.
Data Mining can be applied to any type of data e.g. Data Warehouses,
Transactional Databases, Relational Databases, Multimedia Databases,
Spatial Databases, Time-series Databases, World Wide Web.
Data Mining as a whole process
The whole process of Data Mining consists of three main phases:
1. Data Pre-processing – Data cleaning, integration, selection, and
transformation takes place
2. Data Extraction – Occurrence of exact data mining
3. Data Evaluation and Presentation – Analyzing and presenting
results

In future articles, we will cover the details of each of these phases.


Applications of Data Mining
1. Financial Analysis
2. Biological Analysis
3. Scientific Analysis
4. Intrusion Detection
5. Fraud Detection
6. Research Analysis

103
There are several benefits of data mining, including:
1. Improved decision making: Data mining can provide valuable
insights that can help organizations make better decisions by
identifying patterns and trends in large data sets.
2. Increased efficiency: Data mining can automate repetitive and
time-consuming tasks, such as data cleaning and preparation,
which can help organizations save time and resources.
3. Enhanced competitiveness: Data mining can help organizations
gain a competitive edge by uncovering new business
opportunities and identifying areas for improvement.
4. Improved customer service: Data mining can help
organizations better understand their customers and tailor their
products and services to meet their needs.
5. Fraud detection: Data mining can be used to identify fraudulent
activities by detecting unusual patterns and anomalies in data.
6. Predictive modeling: Data mining can be used to build models
that can predict future events and trends, which can be used to
make proactive decisions.
7. New product development: Data mining can be used to identify
new product opportunities by analyzing customer purchase
patterns and preferences.
8. Risk management: Data mining can be used to identify potential
risks by analyzing data on customer behavior, market
conditions, and other factors.
Real-life examples of Data Mining
Market Basket Analysis: It is a technique that gives the careful study of
purchases done by a customer in a supermarket. The concept is basically
applied to identify the items that are bought together by a customer. Say,
if a person buys bread, what are the chances that he/she will also
purchase butter. This analysis helps in promoting offers and deals by the
companies. The same is done with the help of data mining.
Protein Folding: It is a technique that carefully studies the biological
cells and predicts the protein interactions and functionality within
biological cells. Applications of this research include determining causes
and possible cures for Alzheimer’s, Parkinson’s, and cancer caused by
Protein misfolding.
Fraud Detection: Nowadays, in this land of cell phones, we can use data
mining to analyze cell phone activities for comparing suspicious phone
activity. This can help us to detects calls made on cloned phones.
Similarly, with credit cards, comparing purchases with historical
purchases can detect activity with stolen cards.
Data mining also has many successful applications, such as business
intelligence, Web search, bioinformatics, health informatics, finance,
digital libraries, and digital governments

104

You might also like