You are on page 1of 166

BACHELORS OF COMPUTER APPLICATION

BCA-PC(L)-242
RELATIONAL DATABASE MANAGEMENT
SYSTEM

Directorate of Distance Education


Guru Jambheshwar University of
Science & Technology
Hisar - 125001
CONTENTS

1. Relational Model Concepts 1-14

2. Codd’s Rules for Relational Model 15-29

3. Relational Algebra 30-46

4. Relational Calculus 47-61

5. Functional Dependencies and Normalization 62-75

6. Types of Functional Dependencies 76-86

7. Decomposition and Normal forms 87-105

8. SQL 106-116

9. Basic Queries in SQL 117-133

10. Procedural Language for SQL 134-147

11. PL/SQL Character set and Data Types 148-164

ii
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 1 VETTER:

RELATIONAL MODEL CONCEPTS

STRUCTURE

1.0 Learning Objective

1.1 Introduction

1.2 Definition Relational Models

1.2.1 What is RDBMS?

1.3 Difference between DBMS and RDBMS

1.3.1 Using Parameters

1.4 Domains, Attributes, Tuples and Relations

1.5 Check Your Progress

1.6 Summary

1.7 Keywords

1.8 Self-Assessment Test

1.9 Answers to check your progress

1.10 References / Suggested Readings

1
1.0 LEARNING OBJECTIVE
 To understand the concepts of relational database.
 To know the difference between DBMS and RDBMS.
 To understand the parameters of RDBMS and components of RDBMS in detail
 To know the concepts and notations of the relational model.

1.1 INTRODUCTION
The relational model was first introduced by Ted Codd of IBM Research in 1970 in a classic
paper (Codd 1970), and attracted immediate attention due to its simplicity and
mathematical foundation. The model uses the concept of a mathematical relation-which
looks somewhat like a table of values-as its basic building block, and has its theoretical
basis in set theory and first-order predicate logic. In this chapter we discuss the basic
characteristics of the model and its constraints. The first commercial implementations of
the relational model became available in the early 1980s, such as the Oracle DBMS and the
SQL/DS system on the MVS operating system by IBM. Since then, the model has been
implemented in a large number of commercial systems. Current popular relational DBMSs
(RDBMSs) include DB2 and lnformix Dynamic Server (from IBM), Oracle and Rdb (from
Oracle), and SQL Server and Access (from Microsoft). Most of the problems faced at the
time of implementation of any system are outcome of a poor database design. In many cases
it happens that system has to be continuously modified in multiple respects due to changing
requirements of users. It is very important that a proper planning has to be done. A relation
in a relational database is based on a relational schema, which consists of number of
attributes. A relational database is made up of a number of relations and corresponding
relational database schema. The goal of a relational database design is to generate a set of
relation schema that allows us to store information without unnecessary redundancy and
also to retrieve information easily. One approach to design schemas that are in an
appropriate normal form. The normal forms are used to ensure that various types of
anomalies and inconsistencies are not introduced into the database.

2
1.2 DEFINITION RELATIONAL MODELS

The relational model represents the database as a collection of relations. Informally, each
relation resembles a table of values or, to some extent, a "flat" file of records. When a
relation is thought of as a table of values, each row in the table represents a collection of
related data values. In the relational model, each row in the table represents a fact that
typically corresponds to a real-world entity or relationship. The table name and column
names are used to help in interpreting the meaning of the values in each row. For example,
the first table of Figure 1.1 is called STUDENT because each row represents facts about a
particular student entity. The column names-Name, Student-Number, Class, and Major-
specify how to interpret the data values in each row, based on the column each value is in.
All values in a column are of the same data type.
In the formal relational model terminology, a row is called a tuple, a column header
is called an attribute, and the table is called a relation. The data type describing the types
of values that can appear in each column is represented by a domain of possible values. We
now define these terms--domain, tuple, attribute, and relation-more precisely.

1.2.1 What is RDBMS?

RDBMS stands for Relational Database Management System. RDBMS data is structured
in database tables, fields and records. Each RDBMS table consists of database table rows.
Each database table row consists of one or more database table fields. RDBMS store the
data into collection of tables, which might be related by common fields (database table
columns). RDBMS also provide relational operators to manipulate the data stored into the
database tables. Most RDBMS use SQL as database query language. The most popular
RDBMS are MS SQL Server, DB2, Oracle and MySQL. The relational model is an
example of record-based model. Record based models are so named because the database
is structured in fixed format records of several types. Each table contains records of a
particular type. Each record type defines a fixed number of fields, or attributes.

3
Figure 1.1 Database stores information of students and course
The columns of the table correspond to the attributes of the record types. The relational
data model is the most widely used data model, and a vast majority of current database
systems are based on the relational model. The relational model was designed by the IBM
research scientist and mathematician, Dr. E.F.Codd. Many modern DBMS do not conform
to the Codd’s definition of a RDBMS, but nonetheless they are still considered to be
RDBMS.
Two of Dr.Codd’s main focal points when designing the relational model were to further
reduce data redundancy and to improve data integrity within database systems.
The relational model originated from a paper authored by Dr.codd entitled “A

4
Relational Model of Data for Large Shared Data Banks”, written in 1970. This paper
included the following concepts that apply to database management systems for relational
databases. The relation is the only data structure used in the relational data model to
represent both entities and relationships between them. Rows of the relation are referred to
as tuples of the relation and columns are its attributes. Each attribute of the column are
drawn from the set of values known as domain. The domain of an attribute contains the set
of values that the attribute may assume. From the historical perspective, the relational data
model is relatively new .The first database systems were based on either network or
hierarchical models .The relational data model has established itself as the primary data
model for commercial data processing applications. Its success in this domain has led to its
applications outside data processing in systems for computer aided design and other
environments. A relational database management system (RDBMS) is a collection of
programs and capabilities that enable IT teams and others to create, update, administer and
otherwise interact with a relational database. RDBMS store data in the form of tables, with
most commercial relational database management systems using Structured Query
Language (SQL) to access the database. However, since SQL was invented after the initial
development of the relational model, it is not necessary for RDBMS use.
The RDBMS is the most popular database system among organizations across the
world. It provides a dependable method of storing and retrieving large amounts of data
while offering a combination of system performance and ease of implementation.

1.3 DIFFERENCE BETWEEN DBMS AND RDBMS


A DBMS has to be persistent, that is it should be accessible when the program created the
data ceases to exist or even the application that created the data restarted. A DBMS also
has to provide some uniform methods independent of a specific application for accessing
the information that is stored. RDBMS is a Relational Data Base Management System
Relational DBMS. This adds the additional condition that the system supports a tabular
structure for the data, with enforced relationships between the tables. This excludes the
databases that don't support a tabular structure or don't enforce relationships between tables.
You can say DBMS does not impose any constraints or security with regard to data
manipulation it is user or the programmer responsibility to ensure the ACID PROPERTY

5
of the database whereas the RDBMS is more with this regard because RDBMS define the
integrity constraint for the purpose of holding ACID PROPERTY.

In general, databases store sets of data that can be queried for use in other
applications. A database management system supports the development, administration and
use of database platforms. An RDBMS is a type of database management system (DBMS)
that stores data in a row-based table structure which connects related data elements. An
RDBMS includes functions that maintain the security, accuracy, integrity and consistency
of the data. This is different than the file storage used in a DBMS. Other differences
between database management systems and relational database management systems
include:

 Number of allowed users- While a DBMS can only accept one user at a time, an
RDBMS can operate with multiple users.
 Hardware and software requirements- A DBMS needs less software and
hardware than an RDBMS.
 Amount of data- RDBMS can handle any amount of data, from small to large,
while a DBMS can only manage small amounts.
 Database structure- In a DBMS, data is kept in a hierarchical form, whereas an
RDBMS utilizes a table where the headers are used as column names and the rows
contain the corresponding values.
 ACID implementation- DBMS do not use the atomicity, consistency, isolation and
durability (ACID) model for storing data. On the other hand, RDBMS base the
structure of their data on the ACID model to ensure consistency.
 Distributed databases- While an RDBMS offers complete support for distributed
databases, a DBMS will not provide support.
 Types of programs managed- While an RDBMS helps manage the relationships
between its incorporated tables of data, a DBMS focuses on maintaining databases
that are present within the computer network and system hard disks.
 Support of database normalization- An RDBMS can be normalized, but a DBMS
cannot.

6
DBMS vs RDBMS using different parameters

Parameter DBMS RDBMS

Storage DBMS stores data as a file. Data is stored in the form of tables.

RDBMS uses a tabular structure

DBMS system, stores data in either a Where the headers are the column
Database structure
navigational or hierarchical form. names, and the rows contain

corresponding values

Number of Users DBMS supports single user only. It supports multiple users.

In a regular database, Relational databases are harder to

the data may not be construct, but they are consistent and

ACID stored following the ACID model. well structured. They obey ACID

This can develop inconsistencies (Atomicity, Consistency,

in the database. Isolation, Durability).

It is the database systems which are


It is the program for managing the
Type of program databases on the computer networks used for maintaining the relationships
and the system hard disks.
among the tables.

Hardware and software Higher hardware and software


Low software and hardware needs.
needs. need.

RDBMS supports the integrity

DBMS does not support the integrity constraints at the schema level.

Integrity constraints constants. The integrity constants are Values beyond a defined range
not imposed at the file level. cannot be stored into the particular

RDMS column.

Normalization DBMS does not support Normalization RDBMS can be Normalized.

7
Parameter DBMS RDBMS

DBMS does not support distributed RBMS offers support for distributed
Distributed Databases
database. databases.

DBMS system mainly deals with small RDMS is designed to handle a large
Ideally suited for
quantity of data. amount of data.

Dbms satisfy less than seven of Dr. E.F. Dbms satisfy 8 to 10 Dr. E.F. Codd
Dr. E.F. Codd Rules
Codd Rules Rules

DBMS does not support client-server RDBMS supports client-server


Client Server
architecture architecture.

Data fetching is slower for the complex Data fetching is rapid because of its
Data Fetching
and large amount of data. relational approach.

Data redundancy is common in this Keys and indexes do not allow Data
Data Redundancy
model. redundancy.

Data is stored in the form of tables

Data Relationship No relationship between data which are related to each other with

the help of foreign keys.

Multiple levels of security. Log files

Security There is no security. are created at OS, Command, and

object level.

Data can be easily accessed using


Data elements need to access
Data Access SQL query. Multiple data elements
individually.
can be accessed at the same time.

Examples of DBMS are a file system, Example of RDBMS is MySQL,


Examples
XML, Windows Registry, etc. Oracle, SQL Server, etc.

8
1.4 DOMAINS, ATTRIBUTES, TUPLES AND RELATIONS
A domain D is a set of atomic values. By atomic we mean that each value in the domain is
indivisible as far as the relational model is concerned. A common method of specifying a
domain is to specify a data type from which the data values forming the domain are drawn.
It is also useful to specify a name for the domain, to help in interpreting its values. Some
examples of domains follow:

• USA_phone_numbers: The set of ten-digit phone numbers valid in the United States.

• Local_phone_numbers: The set of seven-digit phone numbers valid within a particular


area code in the United States.

• Social_securiry_numbers: The set of valid nine-digit social security numbers.

• Names: The set of character strings that represent names of persons.

• Grade_paint_averages: Possible values of computed grade point averages; each must be


a real (floating-point) number between 0 and 4.

• Employee_ages: Possible ages of employees of a company; each must be a value between


15 and 80 years old.

• Academic_department_names: The set of academic department names in a university,


such as Computer Science, Economics, and Physics.

• Academic_department_codes: The set of academic department codes, such as CS, ECON,


and PHYS.

The preceding are called logical definitions of domains. A data type or format is also
specified for each domain. For example, the data type for the domain USA_phone_
numbers can be declared as a character string of the form (ddd) ddd-dddd, where each d is
a numeric (decimal) digit and the first three digits form a valid telephone area code. The
data type for Employee_ages is an integer number between 15 and 80. For Academic_
department_names, the data type is the set of all character strings that represent valid
department names. A domain is thus given a name, data type, and format. Additional
information for interpreting the values of a domain can also be given; for example, a
numeric domain such as Person_weights should have the units of measurement, such as
pounds or kilograms.

9
A relation schema R, denoted by R(A1, A2, ... , An), is made up of a relation name
R and a list of attributes A1, A2, ..., An, Each attribute Ai is the name of a role played by
some domain D in the relation schema R. D is called the domain of Ai and is denoted by
dom(Ai). A relation schema is used to describe a relation; R is called the name of this
relation. The degree (or arity) of a relation is the number of attributes n of its relation
schema. An example of a relation schema for a relation of degree seven, which describes

university students, is the following:

STUDENT(Name, SSN, HomePhone, Address, OfficePhone, Age, GPA)

Using the data type of each attribute, the definition is sometimes written as:

STUDENT(Name: string, SSN: string, HomePhone: string, Address: string, OfficePhone:

string, Age: integer, GPA: real)

For this relation schema, STUDENT is the name of the relation, which has seven attributes.
In the above definition, we showed assignment of generic types such as string or integer to
the attributes. More precisely, we can specify the following previously defined domains for
some of the attributes of the STUDENT relation: dom(Name) = Names; dom(SSN) =
Social_security_numbers;dom(HomePhone)= LocaLphone_numbers,3 dom(OfficePhone)
= Localjphonejiumbers, and dom(GPA) = Gradepoint averages. It is also possible to refer
to attributes of a relation schema by their position within the relation; thus, the second
attribute of the STUDENT relation is SSN, whereas the fourth attribute is Address.

Figure 1.2: The attributes and Tuples of a relation STUDENT

10
Figure 1.2 shows an example of a STUDENT relation, which corresponds to the STUDENT
schema just specified. Each tuple in the relation represents a particular student entity. We
display the relation as a table, where each tuple is shown as a row and each attribute
corresponds to a column header indicating a role or interpretation of the values in that
column. Null values represent attributes whose values are unknown or do not exist for some
individual STUDENT tuple. The earlier definition of a relation can be restated more
formally as follows. A relation (or relation state) r(R) is a mathematical relation of degree
n on the domains dom(A1) , dom(Az), ... , domi.A}, which is a subset of the Cartesian
product of the domains that define R:

r(R) ⊆ (dom(A1) X dom(A2) X ... X dom(An))

The Cartesian product specifies all possible combinations of values from the underlying
domains. Hence, if we denote the total number of values, or cardinality, in a domain D by
ID I (assuming that all domains are finite), the total number of tuples in the Cartesian
product is
Idom(A1) I X Idom(A2) I X ... X Idom(An ) I
Of all these possible combinations, a relation state at a given time-the current relation state-
reflects only the valid tuples that represent a particular state of the real world. In general,
as the state of the real world changes, so does the relation, by being transformed into another
relation state. However, the schema R is relatively static and does not change except very
infrequently-for example, as a result of adding an attribute to represent new information
that was not originally stored in the relation. It is possible for several attributes to have the
same domain. The attributes indicate different roles, or interpretations, for the domain. For
example, in the STUDENT relation, the same domain Local_phone_numbers plays the role
of HomePhone, referring to the "home phone of a student," and the role of OfficePhone,
referring to the "office phone of the student."

1.5 CHECK YOUR PROGRESS


1. A relation in a relational database is based on a relational schema, which consists
of number of ………………… .
2. …………………is a Relational Data Base Management System.
3. Rows of the relation are referred to as ………………… of the relation.
4. The relational model was designed by the IBM research scientist and

11
mathematician, Dr. …………………..
5. The ………………… is the only data structure used in the relational data model to
represent both entities and relationships between them.
6. Does the normal forms never removes anomalies?
7. Is each attribute of the column are drawn from the set of values known as domain?

1.6 SUMMARY
A DBMS is a software used to store and manage data. The DBMS was introduced during
1960's to store any data. It also offers manipulation of the data like insertion, deletion, and
updating of the data. DBMS system also performs the functions like defining, creating,
revising and controlling the database. It is specially designed to create and maintain data
and enable the individual business application to extract the desired data.

Relational Database Management System (RDBMS) is an advanced version of a


DBMS system. It came into existence during 1970's. RDBMS system also allows the
organization to access data more efficiently then DBMS. RDBMS is a software system
which is used to store only data which need to be stored in the form of tables. In this kind
of system, data is managed and stored in rows and columns which is known as tuples and
attributes. RDBMS is a powerful data management system and is widely used across the
world. The goal of a relational database design is to generate a set of relation schema that
allows us to store information without unnecessary redundancy and also to retrieve
information easily. A database system is an integrated collection of related files, along with
details of interpretation of the data contained therein. DBMS is a software that allows
access to data contained in a database. The objective of the DBMS is to provide a
convenient and effective method of defining, storing and retrieving the information
contained in the database. The DBMS interfaces with application programs so that the data
contained in the database can be used by multiple applications and users. The DBMS allows
these users to access and manipulate the data contained in the database in a convenient and
effective manner. In addition the DBMS exerts centralized control of the database, prevents
unauthorized users from accessing the data and ensures privacy of data.

In Relational database model, a table is a collection of data elements organised in


terms of rows and columns. A table is also considered as a convenient representation of
relations. But a table can have duplicate row of data while a true relation cannot have

12
duplicate data. Table is the simplest form of data storage. All data stored in the tables are
provided by an RDBMS. Ensures that all data stored are in the form of rows and columns.
Facilitates primary key, which helps in unique identification of the rows. Index creation for
retrieving data at a higher speed. Facilitates a common column to be shared amid two or
more tables. Major components of RDBMS are Table, Record or Tuple, Field, Domain,
Instance, Schema, Keys. Relational database stores data in tables. Tables are organized into
columns, and each column stores one type of data (integer, real number, character strings,
date). The data for a single “instance” of a table is stored as a row. Many relational database
systems have an option of using the SQL (Structured Query Language) for querying and
maintaining the database.

1.7 KEYWORDS
 Domain- A domain describes the set of possible values for a given attribute, and
can be considered a constraint on the value of the attribute. Mathematically,
attaching a domain to an attribute means that any value for the attribute must be an
element of the specified set. The character string "ABC", for instance, is not in the
integer domain, but the integer value 123 is.
 Constraints- Constraints make it possible to further restrict the domain of an
attribute. For instance, a constraint can restrict a given integer attribute to values
between 1 and 10.
 Tuple- A data set representing a single item.
 Column- A labeled element of a tuple, e.g. "Address" or "Date of birth"
 Table- A set of tuples sharing the same attributes; a set of columns and rows
 View- Any set of tuples; a data report from the RDBMS in response to a query

1.8 SELF-ASSESSMENT TEST


1. Explain the following terms
i) Domain
ii) Tuple
iii) Relation
iv) Attribute
2. Explain difference between DBMS and RDBMS.
3. Why relational data model is so popular?

13
4. What are record based models?
5. How RDBMS stores its data?

1.9 ANSWERS TO CHECK YOUR PROGRESS


1. Attributes
2. RDBMS
3. Tuples
4. E.F Codd
5. Relation
6. False
7. True

1.10 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://www.geeksforgeeks.org/difference-between-rdbms-and-dbms/
 https://en.wikipedia.org/wiki/Relational_database
 https://www.javatpoint.com/what-is-rdbms
 https://searchdatamanagement.techtarget.com/definition/RDBMS-relational-
database-management-system

14
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 2 VETTER:

CODD’S RULES FOR RELATIONAL MODEL

STRUCTURE

2.0 Learning Objective

2.1 Introduction

2.2 Definition

2.3 Codd’s Rules for relational model

2.3.1 Foundation Rule

2.3.2 Information Rule

2.3.3 Guaranteed Access Rule

2.3.4 Systematic Treatment of NULL Values

2.3.5 Active Online Catalog

2.3.6 Comprehensive Data Sub-Language Rule

2.3.7 View Updating Rule

2.3.8 High-Level Insert, Update and Delete Rule

2.3.9 Physical Data Independence

2.3.10 Logical Data Independence

2.3.11 Integrity Independence

2.3.12 Distribution Independence

2.3.13 Non-Subversion Rule

2.4 Check Your Progress

15
2.5 Summary

2.6 Keywords

2.7 Self-Assessment Test

2.8 Answers to check your progress

2.9 References / Suggested Readings

2.0 LEARNING OBJECTIVE


To understand the concepts of codd’s relational database. To understand the rules defined
to represent a database to be a RDBMS. For a perfect RDBMS these rules needed to be
understood well.

2.1 INTRODUCTION

The relational model was first introduced by Ted Codd of IBM Research in 1970 in a classic
paper (Codd 1970), and attracted immediate attention due to its simplicity and
mathematical foundation. The model uses the concept of a mathematical relation-which
looks somewhat like a table of values-as its basic building block, and has its theoretical
basis in set theory and first-order predicate logic. In this chapter we discuss in detail about
the brief history of Dr. codd and his research, and what are the rules stated by him to define
a database as relational database. Dr Edgar F. Codd, after his extensive research on the
Relational Model of database systems, came up with twelve rules of his own, which
according to him, a database must obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using only its
relational capabilities. This is a foundation rule, which acts as a base for all the other rules.
Database Management System or DBMS essentially consists of a comprehensive set of
application programs that can be leveraged to access, manage and update the data, provided
the data is interrelated and profoundly persistent. Just like any management system, the
goal of a DBMS is to provide an efficient and convenient environment in which it becomes
easy to retrieve and store the information into the database. It goes without mentioning that
databases are used to store and manage large amounts of information.

To achieve this, the following are the absolute must-haves:

16
 Data Modeling − It is all about defining the structures for information storage.
 Provision of Mechanisms − To manipulate processed data and modify file and
system structures, it is important to provide query processing mechanisms.
 Crash Recovery and Security − To avoid any discrepancies and ensure that the data
is secure, crash recovery and security mechanisms are must.
 Concurrency Control − If the system is shared by multiple users, concurrency
control is the need of the hour.
Based on relational model, the Relational database was created. Codd proposed 13 rules
popularly known as Codd's 12 rules to test DBMS's concept against his relational model.
Codd's rule actualy define what quality a DBMS requires in order to become a Relational
Database Management System (RDBMS). Till now, there is hardly any commercial
product that follows all the 13 Codd's rules.

Terminology used:

 Relational Model: Relational model represents data in the form of relations or


tables.
 Relational Instance: The set of values present in a relation at a particular instance
of time is known as relational instance as shown in Table 1 and Table 2.
 Relational Schema: Schema represents structure of a relation. e.g.; Relational
Schema of STUDENT relation can be represented as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE)
 Attribute: Each relation is defined in terms of some properties, each of which is
known as attribute. For Example, STUD_NO, STUD_NAME etc. are attributes of
relation STUDENT.
 Domain of an attribute: The possible values an attribute can take in a relation is
called its domain. For Example, domain of STUD_AGE can be from 18 to 40.
Tuple: Each row of a relation is known as tuple. e.g.; STUDENT relation given
below has 4 tuples.
 NULL values: Values of some attribute for some tuples may be unknown,
missing or undefined which are represented by NULL. Two NULL values in a relation

17
are considered different from each other.
Table 1 and Table 2 represent relational model having two relations STUDENT and
STUDENT_COURSE.

2.2 DEFINATION
Dr E.F.Codd, also known to the world as the ‘Father of Database Management Systems’
had propounded 12 rules which are in-fact 13 in number. The rules are numbered from zero
to twelve. According to him, a DBMS is fully relational if it abides by all his twelve rules.
Till now, only few databases abide by all the eleven rules. His twelve rules are fondly called
‘E.F.Codd’s Twelve Commandments’. His brilliant and seminal research paper ‘A
Relational Model of Data for Large Shared Data Banks’ in its entirety is a visual treat to
eyes.

2.3 CODD’S RULES FOR RELATIONAL MODEL

The relational model was first introduced by Ted Codd of IBM Research in 1970 in a classic
paper (Codd 1970), and attracted immediate attention due to its simplicity and
mathematical foundation. The relational model was introduced by Codd (1970) in a classic
paper. Codd also introduced relational algebra and laid the theoretical foundations for the
relational model in a series of papers (Codd 1971, 1972, 1972a, 1974); he was later given
the Turing award, the highest honor of the ACM, for his work on the relational model. In a
later paper, Codd (1979) discussed extending the relational model to incorporate more

18
meta-data and semantics about the relations; he also proposed a three-valued logic to deal
with uncertainty in relations and incorporating NULLs in the relational algebra. The
resulting model is known as RM/T. Childs (1968) had earlier used set theory to model
databases. Later, Codd (1990) published a book examining over 300 features of the
relational data model and database systems. E.F Codd was a Computer Scientist who
invented the Relational model for Database management. Based on relational model,
the Relational database was created. Codd proposed 13 rules popularly known as Codd's
12 rules to test DBMS's concept against his relational model. Codd's rule actualy define
what quality a DBMS requires in order to become a Relational Database Management
System(RDBMS). Till now, there is hardly any commercial product that follows all the 13
Codd's rules. Even Oracle follows only eight and half(8.5) out of 13.

The following are Codd's original 13 rules:

2.3.1 Codd’s Rule 0- Foundation Rule

This is the foundational Rule. This rule states that any database system should have
characteristics as relational, as a database and as a management system to be RDBMS. That
means a database should be a relational by having the relation / mapping among the tables
in the database. They have to be related to one another by means of constraints/ relation.
There should not be any independent tables hanging in the database. RDBMS is a database
i.e.; it stores the data in a well-organized form called tables. It should be able to handle
large amount of information too. In short, it should meet the objectives of a database.

RDBMS is management system – that means it should be able to manage the data,
relation, retrieval, update, delete, permission on the objects. It should be able handle all
these administrative tasks without affecting the objectives of database. It should be
performing all these tasks by using query languages.

 The system must qualify as relational, as a database, and as a management


system. For a system to qualify as a relational database management system
(RDBMS), that system must use its relational facilities (exclusively) to manage
the database.
 The other 12 rules derive from this rule. The rules are as follows :

19
2.3.2 Codd’s Rule 1- Rule of Information

Relational Databases should store the data in the form of relations. Tables are relations in
Relational Database Management Systems. Be it any user defined data or meta-data, it is
important to store the value as an entity in the table cells. A database consists of lot of data
– may be user data and the data about these data or metadata. Each group of these data must
be stored in a table in the form of rows and columns. Each cell in the table should have
these datas. The order of rows and columns in the table should not affect the meaning of
the table. Each cell should have single data. There should not be any group/range of values
separated by comma, space or hyphen (Normalized data). This should be the only way to
store the data in a database. This rule is satisfied by all the databases.

For Example: Order of storing personal details about ‘James’ and ‘Antony’ in PERSON
table should not have any difference. There should be flexibility of storing them in any
order in a row. Similarly, storing Person name first and then his address should be same as
storing address and then his name. It does not make any difference on the meaning of table.

2.3.3 Codd’s Rule 2- Rule of Guaranteed Access

The use of pointers to access data logically is strictly forbidden. Every data entity which is
atomic in nature should be accessed logically by using a right combination of the name of
table, primary key represented by a specific row value and column name represented by
attribute value. Each unique piece of data(atomic value) should be accesible by : Table
Name + Primary Key(Row) + Attribute(column).

NOTE: Ability to directly access via POINTER is a violation of this rule.

This rule refers to the primary key. It states that any data/column/attribute in the table
should be able logically accessed by using the table in which it is stored, the primary key
column of the table and the column which we want to access. When combination of these
3 is used, it should give the correct result. Any column/ cell value should not be directly
accessed without specifying the table and primary key. From figure 2.1:

Address of Kathy STUDENT + STUDENT_ID (Kathy) + ADDRESS is the right way


of getting any cell value.

Address of Kathy Troy should be able to access like this.

20
Figure 2.1: Database of STUDENT

Each item of data in an RDBMS is guaranteed to be logically accessible by resorting to a


combination of table name, primary key value, and column name.

2.3.4 Codd’s Rule 3- Rule of Systematic Null Value Support

Null values are completely supported in relational databases. They should be uniformly
considered as ‘missing information’. Null values are independent of any data type. They
should not be mistaken for blanks or zeroes or empty strings. Null values can also be
interpreted as ‘inapplicable data’ or ‘unknown information.’ This rule states about handling
the NULLs in the database. As database consists of various types of data, each cell will
have different datatypes. If any of the cell value is unknown, or not applicable or missing,
it cannot be represent as zero or empty. It will be always represented as NULL. This NULL
should be acting irrespective of the datatype used for the cell. When used in logical or
arithmetical operation, it should result the value correctly.

For example:
Adding NULL to numeric 5 should result NULL –

5+ unknown = unknown 5+ NULL = NULL


5+ NULL! = 5 or 0

It should not result in any zero or numeric value. DBMS should be strong enough to
handle these NULLs according to the situation and the datatypes. Null values (distinct
from an empty character string or a string of blank characters and distinct from zero or
any other number) are supported in a fully relational DBMS for representing missing
information and inapplicable information in a systematic way, independent of the data
type.

21
2.3.5 Codd’s Rule 4- Rule of Active and online relational Catalog

In the Database Management Systems lexicon, ‘metadata’ is the data about the database or
the data about the data. The active online catalog that stores the metadata is called ‘Data
dictionary’. The so called data dictionary is accessible only by authored users who have the
required privileges and the query languages used for accessing the database should be used
for accessing the data of data dictionary. The database description is represented at the
logical level in the same way as ordinary data, so that authorized users can apply the same
relational language to its interrogation as they apply to the regular data. This rule illustrates
data dictionary. Metadata should be maintained for all the data in the database. These
metadata should also be stored as tables, rows and columns. It should also have access
privileges. In short, these metadata stored in the data dictionary should also obey all the
characteristics of a database. Also, it should have correct up to date data. We should be able
to access these metadata by using same query language that we use to access the database.

Active online catalog based on the relational model: The system must support an
online, inline, relational catalog that is accessible to authorized users by means of their
regular query language. That is, users must be able to access the database's structure
(catalog) using the same query language that they use to access the database's data. The
structure description of the entire database must be stored in an online catalog, known as
data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.

2.3.6 Codd’s Rule 5- Rule of Comprehensive Data Sub-language

A single robust language should be able to define integrity constraints, views, data
manipulations, transactions and authorizations. If the database allows access to the
aforementioned ones, it is violating this rule. A relational system may support several
languages and various modes of terminal use (for example, the fill-in-blanks mode).
However, there must be at least one language whose statements are expressible, per some
well-defined syntax, as character strings and whose ability to support all of the following
is comprehensible: data definition, view definition, data manipulation (interactive and by
program), integrity constraints, and transaction boundaries (begin, commit, and rollback).
Any RDBMS database should not be directly accessed. It should always be accessed by
using some strong query language. This query language should be able to access the data,

22
manipulate the data and maintain the consistency and integrity of the database. They query
should make sure that the transaction is fully complete or not done at all.

For Example:

SQL is a structured query language which support creating tables / views/


constraints/indexes, accessing the records of tables/views (SELECT), manipulating the
records by insert/delete/update, provides security by giving different level of access rights
(GRANT and REVOKE) and integrity and consistency by using constraints. Any database
without any query language is not a RDBMS. Database can be accessed by using query
language directly or using them in the application.

2.3.7 Codd’s Rule 6- Rule of Updating Views

Views should reflect the updates of their respective base tables and vice versa. A view is a
logical table which shows restricted data. Views generally make the data readable but not
modifiable. Views help in data abstraction. Views are the virtual tables created by using
queries to show the partial view of the table. That is views are subset of table, it is only
partial table with few rows and columns. This rule states that views are also be able to get
updated as we do with its table.

For Example:

Suppose we have created a view on Employee table, in which we have details of the
employees who work for particular department, say ‘Testing’. Here STUDENT is the
whole table and STUDENT_TEST is the view with Testing Employees. According to this
rule, we should be able to update the records in STUDENT_VIEW.

But in real database systems, we cannot give this privilege on views. Basic intension of
creating the view is to give the group of data to the user in the form of table. When lengthy
queries have to be written to get some details from the database, view shortens the length
of the query and gives more meaningful and shorter query. In such case, updating the view
is not feasible. Although updating the view will update the table used for creating it, it is
not recommended by most of the database. Hence this rule is not used in most of the
database. All views of the data which are theoretically updatable must be updatable in
practice by the DBMS.

23
2.3.8 Codd’s Rule 7- Rule of Set level insertion, update and deletion

A single operation should be sufficient to retrieve, insert, update and delete the data. The
capability of handling a base relation or a derived relation as a single operand applies not
only to the retrieval of data but also to the insertion, update, and deletion of data. This rule
states that every query language used by the database should support INSERT, DELETE
and UPDATE on the records. It should also support set operations like UNION, UNION
ALL, MINUS, INTERSECT and INTERSECT ALL. All these operation should not be
restricted to single table or row at a time. It should be able to handle multiple tables and
rows in its operation.

For Example:

Suppose employees got 5% hike in a year. Then their salary has to be updated to reflect the
new salary. Since this is the annual hike given to the employees, this increment is applicable
for all the employees. Hence, the query should not be written for updating the salary one
by one for thousands of employee. A single query should be strong enough to update the
entire employee’s salary at a time. A database must support high-level insertion, updation,
and deletion. This must not be limited to a single row, that is, it must also support union,
intersection and minus operations to yield sets of data records.

2.3.9 Codd’s Rule 8- Rule of Physical Data Independence

Batch and end user operations are logically separated from physical storage and respective
access methods. Application programs and terminal activities remain logically unimpaired
whenever any changes are made in either storage representations or access methods. If there
is any change in the physical storage of the data, it should not affect the data at the logical
or external view. The physical storage of data should not matter to the system. If say, some
file supporting table is renamed or moved from one disk to another, it should not affect the
application.

For Example:

If the data stored in one disk is transferred to another disk, then the user viewing the data
should not feel the difference or delay in access time. The user should be able to access the
data as he was accessing before. Similarly, if the file name for the table is changed in the

24
memory, it should not affect the table or the user viewing the table. This is known as
physical independence and database should support this feature.

2.3.10 Codd’s Rule 9- Rule of Logical Data Independence

Batch and end users can change the database schema without having to recreate it or
recreate the applications built upon it. Application programs and terminal activities remain
logically unimpaired when information preserving changes of any kind that theoretically
permit unimpairment are made to the base tables. This is similar to physical data
independence. Here if there are any changes to the logical view, then it should not be
reflected in the user view.

For Example:

If we split the EMPLOYEE table according to his department into multiple employee
tables, the user viewing the employee table should not feel that these records are coming
from different tables. These split tables should be able to get joined and show the result. In
our example we can use UNION and display the results to the user.

But in ideal scenario, this is difficult to achieve since all the logical and user view will be
tied so st rongly that they will be almost same.

2.3.11 Codd’s Rule 10- Rule of Integrity Independence

Integrity constraints should be available and stored as metadata in data dictionary and not
in the application programs. Database should be able apply integrity rules by using its query
languages. It should not be dependent on any external factor or application to maintain the
integrity. The keys and constraints in the database should be strong enough to handle the
integrity. A good RDBMS should be independent of the frontend application. It should at
least support primary key and foreign key integrity constraints. Integrity constraints must
be definable in the RDBMS sub-language and stored in the system catalogue and not within
individual application programs.

For Example:

Suppose we want to insert an employee for department 50 using an application. But


department 50 does not exists in the system. In such case, the application should not

25
perform the task of fetching if department 50 exists, if not insert the department and then
inserting the employee. It should all handled by the database.

2.3.12 Codd’s Rule 11- Rule of Distribution Independence

The Data Manipulation Language of the relational system should not be concerned about
the physical data storage and no alterations should be required if the physical data is
centralized or distributed. The end-user must not be able to see that the data is distributed
over various locations. Users should always get the impression that the data is located at
one site only. This rule has been regarded as the foundation of distributed database systems.
The database can be located at the user server or at any other network. The end user should
not be able to know about the database servers. He should be able to get the records as if
he is pulling the records locally. Even if the database is located in different servers, the
accessibility time should be comparatively less. An RDBMS has distribution independence.
Distribution independence implies that users should not have to be aware of whether a
database is distributed.

2.3.13 Codd’s Rule 12- Rule of Non Subversion

Any row should obey the security and integrity constraints imposed. No special privileges
are applicable.

Almost all full scale DBMSs are RDMSs. Oracle implements 11+ rules and so does Sybase.
SQL Server also implements 11+ rules while FoxPro implements 7+ rules.

If a system has an interface that provides access to low-level records, then the interface
must not be able to subvert the system and bypass security and integrity constraints. When
a query is fired in the database, it will be converted into low level language so that it can
be understood by the underlying systems to retrieve the data. In such case, when accessing
or manipulating the records at low level language, there should not be any loopholes that
alter the integrity of the database. In other words, even though the query written does not
change the integrity of the tables, the converted low level language should be same as the
query written. It should not be converted into some other low level language which changes
the data integrity in the database or performs some unwanted actions in the database.

For Example:

26
Update Student’s address query should always be converted into low level language which
updates the address record in the student file in the memory. It should not be updating any
other record in the file nor inserting some malicious record into the file/memory.

2.4 CHECK YOUR PROGRESS


1. Which rule states that for a system to qualify as RDBMS?
2. _________ is the rule for the system provides a low-level (record-at-a-time)
interface, then that interface cannot be used to subvert the system, for example,
bypassing a relational security or integrity constraint.
3. An entity can be ____________, either animate or inanimate, that can be easily
identifiable. For example, in a school database, students, teachers, classes.
4. There are _______rules of Codd’s law.
5. Dr. E.F Codd firsly published the rules in ______.

2.5 SUMMARY
Every database which has tables and constraints need not be a relational database system.
Any database which simply has relational data model is not a relational database system
(RDBMS). There are certain rules for a database to be perfect RDBMS. These rules are
developed by Dr Edgar F Codd (EF Codd) in 1985 to define a perfect RDBMS. For a
RDBMS to be a perfect RDBMS, it has to follow his rules. But no RDBMS can obey all
his rules. EF Codd has developed 13 rules for a database to be a RDBMS. According to
him, all these rule help to have perfect RDBMS and hence correct data and relation among
the objects in database. But none of the database follows all these rules; but obeys to some
extent. For example, oracle follows only 8.5 Codd’s rules.

Since Codd's pioneering work, much research has been conducted on various
aspects of the relational model. Todd (1976) describes an experimental DBMS called
PRTV that directly implements the relational algebra operations. Schmidt and Swenson
(1975) introduces additional semantics into the relational model by classifying different
types of relations. Chen's (1976) entity-relationship model means to communicate the real-
world semantics of a relational database at the conceptual level. Wiederhold and Elmasri
(1979) introduces various types of connections.

27
Several characteristics differentiate relations from ordinary tables or files. The first
is that tuples in a relation are not ordered. The second involves the ordering of attributes in
a relation schema and the corresponding ordering of values within a tuple. We gave an
alternative definition of relation that does not require these two orderings, but we continued
to use the first definition, which requires attributes and tuple values to be ordered, for
convenience. We then discussed values in tuples and introduced null values to represent
missing or unknown information. We then classified database constraints into inherent
model-based constraints, schema-based constraints and application-based constraints. We
then discussed the schema constraints pertaining to the relational model, starting with
domain constraints, then key constraints, including the concepts of super key, candidate
key, and primary key, and the NOT NULL constraint on attributes. We then defined
relational databases and relational database schemas. Additional relational constraints
include the entity integrity constraint, which prohibits primary key attributes from being
null. The interrelation referential integrity constraint was then described, which is used to
maintain consistency of references among tuples from different relations.

2.6 KEYWORDS
 DML- A data manipulation language (DML) is a computer programming language
used for adding (inserting), deleting, and modifying (updating) data in a database.
A DML is often a sublanguage of a broader database language such as SQL, with
the DML comprising some of the operators in the language.
 Super Key- A superkey is a set of attributes within a table whose values can be
used to uniquely identify a tuple. A candidate key is a minimal set of attributes
necessary to identify a tuple; this is also called a minimal superkey.
 Primary Key- A primary key, also called a primary keyword, is a key in a relational
database that is unique for each record. It is a unique identifier, such as a driver
license number, telephone number (including area code), or vehicle identification
number (VIN). A relational database must always have one and only one primary
key.
 SQL- SQL is Structured Query Language, which is a computer language for
storing, manipulating and retrieving data stored in a relational database. SQL is the
standard language for Relational Database System.

28
 Schema- The database schema of a database is its structure described in a formal
language supported by the database management system (DBMS). The term
"schema" refers to the organization of data as a blueprint of how the database is
constructed (divided into database tables in the case of relational databases).

2.7 SELF-ASSESSMENT TEST


1. What is database schema?
2. Explain the foundation rule of RDBMS
3. Discuss in detail the 13 Codd’s rule for defining DBMS as RDBMS.
4. What is the FUNCTION operation? What is it used for?
5. Define foreign key? What is this concept used for?
6. What is the difference between a key and a super key?
7. Why are tuples in a relation not ordered?

2.8 ANSWERS TO CHECK YOUR PROGRESS


1. Foundation Rule
2. Non subversion rule
3. Real world Object
4. 13
5. 1985

2.9 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://www.tutorialcup.com/dbms/codds-rule.htm
 https://www.webopedia.com/TERM/C/Codds_Rules.html
 https://www.tutorialspoint.com/e-f-codd-s-12-rules-for-rdbms

29
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 3 VETTER:

RELATIONAL ALGEBRA

STRUCTURE

3.0 Learning Objective

3.1 Introduction

3.2 Definition

3.3 Unary Relational Operations

3.3.1 Select

3.3.2 Project

3.3.3 Rename

3.4 Relational Algebra Operations from Set Theory

3.4.1 Union Operator

3.4.2 Intersection Operator

3.4.3 Minus or Set Operator

3.5.4 Cartesian Product

3.5 Check Your Progress

3.6 Summary

3.7 Keywords

3.8 Self-Assessment Test

3.9 Answers to check your progress

3.10 References / Suggested Readings

30
3.0 LEARNING OBJECTIVE
 To understand the concepts of relational algebra, which is the integral part of the
relational data model. To learn the different notations such as unary as well as
binary with examples in detail.

3.1 INTRODUCTION
Database management systems (DBMS) must have a query language so that the users can
access the data stored in the database. Relational algebra (RA) is considered as
a procedural query language where the user tells the system to carry out a set of operations
to obtain the desired results. i.e. The user tells what data should be retrieved from the
database and how to retrieve it. In this article, I will give a brief introduction to relational
algebra and go through a few operations with examples and PostgreSQL commands.

The relational algebra is often considered to be an integral part of the relational data
model, and its operations can be divided into two groups. One group includes set operations
from mathematical set theory; these are applicable because each relation is defined to be a
set of tuples in the formal relational model. Set operations include UNION,
INTERSECTION, SET DIFFERENCE, and CARTESIAN PRODUCT. The other group
consists of operations developed specifically for relational databases-these include SELECT
PROJECT, and JOIN, among others. This chapter firstly discuss the SELECT and POJECT
operations because they are unary operations that operate on single relations. Then the
chapter discusses the JOIN and other complex binary operations, which operate on two
tables. Some common database requests cannot be performed with the original relational
algebra operations, so additional operations were created to express these requests. These
include aggregate functions, which are operations that can summarize data from the tables,
as well as additional types of JOIN and UNION operations. These operations were added to
the original relational algebra because of their importance to many database applications.
As, the chapter ends with the discussion of relational algebra, the subsequent chapter will
focus on describing the other main formal language for relational databases and relational
calculus.

31
3.2 DEFINATION
Relational algebra is a procedural query language that works on relational model. The
purpose of a query language is to retrieve data from database or perform various operations
such as insert, update, and delete on the data. When I say that relational algebra is a
procedural query language, it means that it tells what data to be retrieved and how to be
retrieved. On the other hand relational calculus is a non-procedural query language, which
means it tells what data to be retrieved but doesn’t tell how to retrieve it. We will discuss
relational calculus in a separate tutorial. Relational algebra is a procedural query language.
It gives a step by step process to obtain the result of the query. It uses operators to perform
queries.

The relational algebra is a theoretical procedural query language which takes an


instance of relations and does operations that work on one or more relations to describe
another relation without altering the original relation(s). Thus, both the operands and the
outputs are relations. So the output from one operation can turn into the input to another
operation, which allows expressions to be nested in the relational algebra, just as you nest
arithmetic operations. This property is called closure: relations are closed under the algebra,
just as numbers are closed under arithmetic operations.
The relational algebra is a relation-at-a-time (or set) language where all tuples are
controlled in one statement without the use of a loop. There are several variations of syntax
for relational algebra commands, and you use a common symbolic notation for the
commands and present it informally.

The primary operations of relational algebra are as follows:

 Select

 Project

 Union

 Set different

 Cartesian product

 Rename

32
Relational algebra is a family of algebras with a well-founded semantics used for modelling
the data stored in relational databases, and defining. It takes instances of relations as input
and yields instances of relations as output. It uses operators to perform queries. An operator
can be either unary or binary. They accept relations as their input and yield relations as
their output. Relational algebra is performed recursively on a relation and intermediate
results are also considered relations. Relational algebra collects instances of relations as
input and gives occurrences of relations as output. It uses various operations to perform this
action. SQL Relational algebra query operations are performed recursively on a relation.
The output of these operations is a new relation, which might be formed from one or more
input relations.

The figure 3.1 shows how we use relational algebra to fetch information or data from a
bigger dataset or table. In relational algebra, input is a relation (table from which data has
to be accessed) and output is also a relation (a temporary table holding the data asked for
by the user).

Figure 3.1: Use of Relational Algebra

Relational Algebra works on the whole table at once, so we do not have to use loops etc.
to iterate over all the rows (tuples) of data one by one. All we have to do is specify the
table name from which we need the data, and in a single line of command, relational
algebra will traverse the entire given table to fetch data for you.

33
3.3 UNARY RELATIONAL OPERATIONS
In mathematics, a unary operation is an operation with only one operand, i.e. a single
input. This is in contrast to binary operations, which use two operands. An example is the
function f : A → A, where A is a set. The function f is a unary operation on A. An operator
can be either unary or binary. They accept relations as their input and yield relations as
their output. Relational algebra is performed recursively on a relation and intermediate
results are also considered relations. Operators act on what's known as operands.
An operator can act on one operand, and then it is called a unary operator, or, it can act
on two operands and then it is called a binary operator. It can act on more than two
operands but we won't go into this now. Figure 3.2 shows different relational operations.

Figure 3.2: Types of Relational Operation

3.3.1 SELECT (σ)


It selects tuples that satisfy the given predicate from a relation. The SELECT operation is
used for selecting a subset of the tuples according to a given selection condition. Sigma (σ)
Symbol denotes it. It is used as an expression to choose tuples which meet the selection
condition. Select operator selects tuples that satisfy a given predicate. Select Operator is
denoted by sigma (σ) and it is used to find the tuples (or rows) in a relation (or table) which
satisfy the given condition. Selection is used to select the required tuples of data from a
relation. During selection, we can specify certain conditions that the data must satisfy.

Notation − σp(r)

34
Where σ stands for selection predicate and r stands for relation. p is prepositional logic
formula which may use connectors like and, or, and not. These terms may use relational
operators like − =, ≠, ≥, < , >, ≤.

σ is the predicate

r stands for relation which is the name of the table

p is prepositional logic

For example –

σsubject = "database (Books)


"

Output − Selects tuples from books where subject is 'database'.

σsubject = "database" and price = "450" (Books)

Output − Selects tuples from books where subject is 'database' and 'price' is 450.

σsubject = "database" and price = "450" or year > "2010"(Books)


Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those
books published after 2010.
Lets do one more example:
Table: CUSTOMER
---------------

Customer_Id Customer_Name Customer_City


----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi

Syntax of SELECT is: σ Condition/Predicate(Relation/Table name)

35
Query:
σ Customer_City="Agra" (CUSTOMER)

Output:
Customer_Id Customer_Name Customer_City
----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra

One more example:

BRANCH_NAME LOAN_NO AMOUNT


Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Query:

σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

36
3.3.2 PROJECT (∏)
The projection eliminates all attributes of the input relation but those mentioned in the
projection list. The projection method defines a relation that contains a vertical subset of
Relation. This helps to extract the values of specified attributes to eliminate duplicate
values. (pi) symbol is used to choose attributes from a relation. This operator helps you to
keep specific columns from a relation and discards the other columns. Project operation is
used to project only a certain set of attributes of a relation. In simple words, If you want to
see only the names all of the students in the Student table, then you can use Project
Operation.

It will only project or show the columns or attributes asked for, and will also remove
duplicate data from the columns. Projection is used to project required column data from a
relation. Project operator is denoted by ∏ symbol and it is used to select desired columns
(or attributes) from a table (or relation).

o Project operator in relational algebra is similar to the Select statement in SQL. This
operation shows the list of those attributes that we wish to appear in the result. Rest
of the attributes are eliminated from the table.

The Projection operation works on a single relation R and defines a relation that contains a
vertical subset of R, extracting the values of specified attributes and eliminating duplicates.

Notation − ∏A , A , A (r)
1 2 n

Where A1, A2 , An are attribute names of relation r.

Duplicate rows are automatically eliminated, as relation is a set.

Produce a list of salaries for all staff, showing only the staffNo, fName, lName, and

salary details

ΠstaffNo, fName, lName, salary(Staff)

For example − ∏subject, author (Books)


Selects and projects columns named as subject and author from the relation Books.

37
Lets takes some more example for better understanding the Project notation

CustomerID CustomerName Status

1 Google Active

2 Amazon Active

3 Apple Inactive

4 Alibaba Active

Query: Π CustomerName, Status (Customers)

CustomerName Status

Google Active

Amazon Active

Apple Inactive

Alibaba Active

In another example let’s take CUSTOMER with three columns, we want to fetch only
two columns of the table, which we can do with the help of Project Operator ∏.

Table: CUSTOMER

Customer_Id Customer_Name Customer_City


----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi

Query: ∏ Customer_Name, Customer_City (CUSTOMER)

38
Output:

Customer_Name Customer_City
------------- -------------
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi

3.3.3 RENAME (ρ)


Rename is a unary operation used for renaming attributes of a relation. This operation is
used to rename the output relation for any query operation which returns result like Select,
Project etc. Or to simply rename a relation (table)

ρ (a/b)R will rename the attribute 'b' of relation by 'a'.

Syntax: ρ(RelationNew, RelationOld)

3.4 RELATIONAL ALGEBRA OPERATIONS FROM SET


THEORY

3.4.1 Union Operator (∪)


This operation is used to fetch data from two relations (tables) or temporary relation (result
of another operation). For this operation to work, the relations (tables) specified should
have same number of attributes (columns) and same attribute domain. Also the duplicate
tuples are automatically eliminated from the result. Let’s discuss union operator a bit more.
Let’s say we have two relations R1 and R2 both have same columns and we want to select
all the tuples(rows) from these relations then we can apply the union operator on these
relations.

UNION is symbolized by ∪ symbol. It includes all tuples that are in tables A or in B. It


also eliminates duplicate tuples. So, set A UNION set B would be expressed as:

Syntax: A ∪ B

39
For a union operation to be valid, the following conditions must hold -

 R and S must be the same number of attributes.


 Attribute domains need to be compatible.
 Duplicate tuples should be automatically removed.

Where A and B are relations.

For example, if we have two tables RegularClass and ExtraClass, both have a
column student to save name of student, then,

∏Student(RegularClass) ∪ ∏Student(ExtraClass)

Above operation will give us name of Students who are attending both regular classes and
extra classes, eliminating repetition.

Example:

table_name1 ∪ table_name2

Table 1: COURSE

Course_Id Student_Name Student_Id


--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931

Table 2: STUDENT

Student_Id Student_Name Student_Age


------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18

40
Query:

∏ Student_Name (COURSE) ∪ ∏ Student_Name (STUDENT)

Output:

Student_Name
------------
Aditya
Carl
Paul
Lucy
Rick
Steve

Note: As you can see there are no duplicate names present in the output even though we
had few common names in both the tables, also in the COURSE table we had the duplicate
name itself.

3.4.2 INTERSECTION OPERATOR (∩)

Intersection operator is denoted by ∩ symbol and it is used to select common rows (tuples)
from two tables (relations). Lets say we have two relations R1 and R2 both have same
columns and we want to select all those tuples(rows) that are present in both the relations,
then in that case we can apply intersection operation on these two relations R1 ∩ R2.

Note: Only those rows that are present in both the tables will appear in the result set.

Syntax:

table_name1 ∩ table_name2

Table 1: COURSE

Course_Id Student_Name Student_Id


--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911

41
C109 Paul S921
C115 Lucy S931

Table 2: STUDENT

Student_Id Student_Name Student_Age


------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18

Query:

∏ Student_Name (COURSE) ∩ ∏ Student_Name (STUDENT)

Output:

Student_Name
------------
Aditya
Steve
Paul
Lucy

3.4.3 MINUS or SET DIFFERENCE (-)

Set Difference in relational algebra is same set difference operation as in set theory with
the constraint that both relation should have same set of attributes. The result of set
difference operation is tuples, which are present in one relation but are not in the second
relation. Lets take the same tables COURSE and STUDENT that we have seen above.

Notation − r – s or A − B

Query:

∏ Student_Name (STUDENT) - ∏ Student_Name (COURSE)

42
Output:

Student_Name
------------
Carl
Rick

3.4.4 Cartesian product (X)

Cartesian Product is denoted by X symbol. Lets say we have two relations R1 and R2 then
the cartesian product of these two relations (R1 X R2) would combine each tuple of first
relation R1 with the each tuple of second relation R2. I know it sounds confusing but once
we take an example of this, you will be able to understand this.

Syntax:

R1 X R2

Table 1: R

Col_A Col_B
----- ------
AA 100
BB 200
CC 300
Table 2: S

Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101

Query:

R X S

43
Output:

Col_A Col_B Col_X Col_Y


----- ------ ------ ------
AA 100 XX 99
AA 100 YY 11
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101

3.5 CHECK YOUR PROGRESS


1. Rename operator is represented by________.
2. The union operator comes under which type? Unary or binary.
3. Rename operator comes under which category? Unary or binary.
4. SELECT is the part of ________.
5. CARTESIAN product is denoted by ________.

3.6 SUMMARY
In this chapter we presented two formal languages for the relational model of data.
They are used to manipulate relations and produce new relations as answers to queries. We
discussed the relational algebra and its operations, which are used to specify a sequence of
operations to specify a query. Then we introduced two types of relational calculi called
tuple calculus and domain calculus; they are declarative in that they specify the result of a
query without specifying how to produce the query result. The data for a single “instance”
of a table is stored as a row. Many relational database systems have an option of using the
SQL (Structured Query Language) for querying and maintaining the database.

We introduced the basic relational algebra operations and illustrated the types of
queries for which each is used. The unary relational operator SELECT and PROJECT, as
well as the RENAME operation, were discussed first. Then we discussed binary set

44
theoretic operations requiring that relations on which they are applied be union compatible;
these include UNION, INTERSECTION, and SET DIFFERENCE. The CARTESIAN
PRODUCT operation is a set operation that can be used to combine tuples from two
relations, producing all possible combinations. It is rarely used in practice; however, we
showed how CARTESIAN PRODUCT followed by SELECT can be used to define
matching tuples from two relations and leads to the JOIN operation. Different JOIN
operations called THETA JOIN, EQUIJOIN, and NATURAL JOIN were introduced. Some
important types of queries that cannot be stated with the basic relational algebra operations
but are important for practical situations. We introduced the AGGREGATE FUNCTION
operation to deal with aggregate types of requests. We discussed recursive queries, for
which there is no direct support in the algebra but which can be approached in a step-by-
step approach, as we demonstrated. We then presented the OUTER JOIN and OUTER
UNION operations, which extend JOIN and UNION and allow all information in source
relations to be preserved in the result.

3.7 KEYWORDS
 DATASET- A data set (or dataset) is a collection of data. In the case of tabular
data, a data set corresponds to one or more database tables, where every column of
a table represents a particular variable, and each row corresponds to a given record
of the data set in question
 TUPLE- A tuple is a collection of objects which ordered and immutable. Tuples
are sequences, just like lists. The differences between tuples and lists are, the tuples
cannot be changed unlike lists and tuples use parentheses, whereas lists use square
brackets.
 RDBMS- Stands for "Relational Database Management System." An RDBMS is a
DBMS designed specifically for relational databases. Therefore, RDBMSes are a
subset of DBMSes. A relational database refers to a database that stores data in a
structured format, using rows and columns.
 QUERY- A database query is a request for data from a database. Usually the
request is to retrieve data; however, data can also be manipulated using queries.
 SQL- Structured Query Language, which is a computer language for storing,
manipulating and retrieving data stored in a relational database. SQL is the standard
language for Relational Database System.

45
3.8 SELF-ASSESSMENT TEST
1. Explain the role of relational algebra in relational database.
2. What are the different type of relational algebra? Discuss in detail
3. How different notations are used in relational algebra, discuss with examples.
4. Rename operator comes under which category, when it comes to relational algebra?
5. What are major differences between and unary and binary notations in relational
algebra?

3.9 ANSWERS TO CHECK YOUR PROGRESS


1. ρ
2. Binary
3. Unary
4. Unary
5. X

3.10 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://beginnersbook.com/2019/02/dbms-relational-algebra/
 https://www.studytonight.com/dbms/relational-algebra.php
 https://www.guru99.com/relational-algebra-dbms.html
 https://www.tutorialspoint.com/dbms/relational_algebra.htm

46
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 4 VETTER:

RELATIONAL CALCULUS

STRUCTURE

4.0 Learning Objective

4.1 Introduction

4.2 Definition of Relational Calculus

4.3 Tuple Relational Calculus

4.4 Domain Relational Calculus

4.5 Check Your Progress

4.6 Summary

4.7 Keywords

4.8 Self-Assessment Test

4.9 Answers to check your progress

4.10 References / Suggested Readings

4.0 LEARNING OBJECTIVE


 To understand the term relational calculus, how it is different from relational
algebra, what is the need of relational calculus. To know the tuple relational calculus
and domain relational calculus in depth.

4.1 INTRODUCTION
In the previous chapter we have discussed relational algebra, which is a procedural query
language. In this tutorial, we will discuss Relational Calculus, which is a non-procedural
query language. In this chapter, you will learn about the relational calculus and its concept

47
about the database management system. A certain arrangement is explicitly stated in
relational algebra expression, and a plan for assessing the query is implied. In the relational
calculus, there is no description and depiction of how to assess a query; instead, a relational
calculus query focuses on what is to retrieve rather than how to retrieve it. It uses
mathematical predicate calculus. The relational calculus is not the same as that of
differential and integral calculus in mathematics but takes its name from a branch of
symbolic logic termed as predicate calculus. When applied to databases, it is found in two
forms. These are

 Tuple relational calculus which was originally proposed by Codd in the year 1972
and
 Domain relational calculus which was proposed by Lacroix and Pirotte in the year
1977

A calculus expression specifies what is to be retrieved rather than how to retrieve it.
Therefore, the relational calculus is considered to be a nonprocedural language. This differs
from relational algebra, where we must write a sequence of operations to specify a retrieval
request; hence, it can be considered as a procedural way of stating a query. It is possible to
nest algebra operations to form a single expression; however, a certain order among the
operations is always explicitly specified in a relational algebra expression. This order also
influences the strategy for evaluating the query. A calculus expression may be written in
different ways, but the way it is written has no bearing on how a query should be evaluated.

It has been shown that any retrieval that can be specified in the basic relational
algebra can also be specified in relational calculus, and vice versa; in other words, the
expressive power of the two languages is identical. This led to the definition of the concept
of a relationally complete language. A relational query language L is considered
relationally complete if we can express in L any query that can be expressed in relational
calculus. Relational completeness has become an important basis for comparing the
expressive power of high-level query languages. However, as certain frequently required
queries in database applications cannot be expressed in basic relational algebra or calculus.
Most relational query languages are relationally complete but have more expressive power
than relational algebra or relational calculus because of additional operations such as
aggregate functions, grouping, and ordering.

48
4.2 DEFINATION OF RELATIONAL CALCULUS
What is Relational Calculus?

Relational calculus is a non-procedural query language that tells the system what data to be
retrieved but doesn’t tell how to retrieve it. Relational calculus is a non-procedural query
language. In the non-procedural query language, the user is concerned with the details of
how to obtain the end results. The relational calculus tells what to do but never explains
how to do. Contrary to Relational Algebra which is a procedural query language to fetch
data and which also explains how it is done, Relational Calculus in non-procedural query
language and has no description about how the query will work or the data will be fetched.
It only focusses on what to do, and not on how to do it.

Relational Calculus exists in two forms as shown in figure 4.1:

1. Tuple Relational Calculus (TRC)


2. Domain Relational Calculus (DRC)

Figure 4.1: Types of relational calculus

In first-order logic or predicate calculus, a predicate is a truth-valued function with


arguments. When we replace with values for the arguments, the function yields an
expression, called a proposition, which will be either true or false.

For example, steps involved in listing all the employees who attend the 'Networking' Course
would be:

49
SELECT the tuples from COURSE relation with COURSENAME =
'NETWORKING'

PROJECT the COURSE_ID from above result


SELECT the tuples from EMP relation with COURSE_ID resulted above.

4.3 TUPLE RELATIONAL CALCULUS

In the tuple relational calculus, you will have to find tuples for which a predicate is true.
The calculus is dependent on the use of tuple variables. A tuple variable is a variable that
'ranges over' a named relation: i.e., a variable who’s only permitted values are tuples of the
relation. The tuple relational calculus is specified to select the tuples in a relation. In TRC,
filtering variable uses the tuples of a relation. The result of the relation can have one or
more tuples. Tuple Relational Calculus is a non-procedural query language unlike relational
algebra. Tuple Calculus provides only the description of the query but it does not provide
the methods to solve it. Thus, it explains what to do but not how to do.

Syntax:

{ T | Condition }

For example, to specify the range of a tuple variable S as the Staff relation, we write:

Staff(S)

To express the query 'Find the set of all tuples S such that F(S) is true,' we can write:

{S | F(S)}

Here, F is called a formula (well-formed formula, or wff in mathematical logic). For


example, to express the query 'Find the staffNo, fName, lName, position, sex, DOB, salary,
and branchNo of all staff earning more than £10,000', we can write:

{S | Staff(S) ∧ S.salary > 10000}

In this form of relational calculus, we define a tuple variable, specify the table (relation)
name in which the tuple is to be searched for, along with a condition.

We can also specify column name using a . dot operator, with the tuple variable to only get
a certain attribute(column) in result. A lot of informtion, right! Give it some time to sink

50
in. A tuple variable is nothing but a name, can be anything, generally we use a single
alphabet for this, so let's say T is a tuple variable. To specify the name of the relation (table)
in which we want to look for data, we do the following:

Relation(T), where T is our tuple variable.

For example if our table is Student, we would put it as Student(T)

Then comes the condition part, to specify a condition applicable for a particular attribute
(column), we can use the. Dot variable with the tuple variable to specify it, like in table
Student, if we want to get data for students with age greater than 17, then, we can write it
as,

T.age > 17, where T is our tuple.

Putting it all together, if we want to use Tuple Relational Calculus to fetch names of
students, from table Student, with age greater than 17, then, for T being our tuple variable,

T.name | Student(T) AND T.age > 17

Let us take one more detailed example

Table: STUDENT

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28

Let’s write relational calculus queries.

Query to display the last name of those students where age is greater than 30.

{ t.Last_Name | Student(t) AND t.age > 30 }

51
In the above query you can see two parts separated by | symbol. The second part is where
we define the condition and in the first part we specify the fields which we want to display
for the selected tuples.

The result of the above query would be:

Last_Name
---------
Singh

Query to display all the details of students where Last name is ‘Singh’.

{ t | Student(t) AND t.Last_Name = 'Singh' }

Output:

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31

Let’s take one more example for better understanding of Tuple relational calculus.

Table-1: Customer

CUSTOMER NAME STREET CITY

Saurabh A7 Patiala

Mehak B6 Jalandhar

Sumiti D9 Ludhiana

Ria A5 Patiala

Table-2: Branch

BRANCH NAME BRANCH CITY

52
ABC Patiala

DEF Ludhiana

GHI Jalandhar

Table-3: Account

ACCOUNT NUMBER BRANCH NAME BALANCE

1111 ABC 50000

1112 DEF 10000

1113 GHI 9000

1114 ABC 7000

Table-4: Loan

LOAN NUMBER BRANCH NAME AMOUNT

L33 ABC 10000

L35 DEF 15000

L49 GHI 9000

L98 DEF 65000

Table-5: Borrower

CUSTOMER NAME LOAN NUMBER

Saurabh L33

Mehak L49

Ria L98

Table-6: Depositor

CUSTOMER NAME ACCOUNT NUMBER

Saurabh 1111

53
Mehak 1113

Sumiti 1114

Queries-1: Find the loan number, branch, amount of loans of greater than or equal to 10000
amount.

{t| t ∈ loan ∧ t[amount]>=10000}


Resulting relation:

LOAN NUMBER BRANCH NAME AMOUNT

L33 ABC 10000

L35 DEF 15000

L98 DEF 65000

In the above query, t[amount] is known as tuple variable.

Queries-2: Find the loan number for each loan of an amount greater or equal to 10000.

{t| ∃ s ∈ loan(t[loan number] = s[loan number]


∧ s[amount]>=10000)}
Resulting relation:

LOAN NUMBER

L33

L35

L98

Queries-3: Find the names of all customers who have a loan and an account at the bank.

{t | ∃ s ∈ borrower( t[customer-name] = s[customer-name])


∧ ∃ u ∈ depositor( t[customer-name] = u[customer-name])}
Resulting relation:

54
CUSTOMER NAME

Saurabh

Mehak

Queries-4: Find the names of all customers having a loan at the “ABC” branch.

{t | ∃ s ∈ borrower(t[customer-name] = s[customer-name]
∧ ∃ u ∈ loan(u[branch-name] = “ABC” ∧ u[loan-number] = s[loan-
number]))}
Resulting relation:

CUSTOMER NAME

Saurabh

4.4 DOMAIN RELATIONAL CALCULUS

In contrast to tuple relational calculus, domain relational calculus uses list of attribute to be
selected from the relation based on the condition. It is same as TRC, but differs by selecting
the attributes rather than selecting whole tuples. In the tuple relational calculus, you have
use variables that have a series of tuples in a relation. In the domain relational calculus, you
will also use variables, but in this case, the variables take their values from domains of
attributes rather than tuples of relations. In domain relational calculus, filtering is done
based on the domain of the attributes and not based on the tuple values. The second form
of relation is known as Domain relational calculus.

 In domain relational calculus, filtering variable uses the domain of attributes.


Domain relational calculus uses the same operators as tuple calculus.
 It uses logical connectives ∧ (and), ∨ (or) and ┓ (not).
 It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.

Domain Relational Calculus is a non-procedural query language equivalent in power to


Tuple Relational Calculus. Domain Relational Calculus provides only the description of
the query but it does not provide the methods to solve it. In Domain Relational Calculus, a
query is expressed as,

Notation:

55
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}

Or

{ < x1, x2, x3, ..., xn > | P (x1, x2, x3, ..., xn ) }

where, < x1, x2, x3, …, xn > represents resulting domains variables and P (x1, x2, x3, …,
xn ) represents the condition or formula equivalent to the Predicate calculus.

Example 1:

Table: STUDENT

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Query to find the first name and age of students where student age is greater than 27.

Query:

{< First_Name, Age > | ∈ Student ∧ Age > 27}

Note: The symbols used for logical operators are: ∧ for AND, ∨ for OR and ┓ for NOT.

Output:

First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28

Example 2:

Table-1: Customer

CUSTOMER NAME STREET CITY

56
Debomit Kadamtala Alipurduar

Sayantan Udaypur Balurghat

Soumya Nutanchati Bankura

Ritu Juhu Mumbai

Table-2: Loan

LOAN NUMBER BRANCH NAME AMOUNT

L01 Main 200

L03 Main 150

L10 Sub 90

L08 Main 60

Table-3: Borrower

CUSTOMER NAME LOAN NUMBER

Ritu L01

Debomit L08

Soumya L03

Query-1: Find the loan number, branch, amount of loans of greater than or equal to 100 amount.

{≺l, b, a≻ | ≺l, b, a≻ ∈ loan ∧ (a ≥ 100)}


Resulting relation:

LOAN NUMBER BRANCH NAME AMOUNT

L01 Main 200

L03 Main 150

Query-2: Find the loan number for each loan of an amount greater or equal to 150.

57
{≺l≻ | ∃ b, a (≺l, b, a≻ ∈ loan ∧ (a ≥ 150)}
Resulting relation:

LOAN NUMBER

L01

L03

Query-3: Find the names of all customers having a loan at the “Main” branch and find the loan
amount.

{≺c, a≻ | ∃ l (≺c, l≻ ∈ borrower ∧ ∃ b (≺l, b, a≻ ∈ loan ∧ (b =


“Main”)))}
Resulting relation:

CUSTOMER NAME AMOUNT

Ritu 200

Debomit 60

Soumya 150

4.5 CHECK YOUR PROGRESS


1. TRC stands for________?
2. DRC stands for________?
3. Relational Calculus is
a. A non-procedural query language
b. A procedural query language
c. Programming language
d. None of these
4. Find the ID, name, dept name, salary for instructors whose salary is greater than
$80,000 .
a) {t | t ε instructor ∧ t[salary] > 80000}
b) Э t ∈ r (Q(t))
c) {t | Э s ε instructor (t[ID] = s[ID]∧ s[salary] > 80000)}

58
d) None of the mentioned
5. A query in the tuple relational calculus is expressed as:
a) {t | P() | t}
b) {P(t) | t }
c) {t | P(t)}
d) All of the mentioned
6. Which of the following symbol is used in the place of except?
a) ^
b) V
c) ¬
d) ~
7. An expression in the domain relational calculus is of the form
a) {P(x1, x2, . . . , xn) | < x1, x2, . . . , xn > }
b) {x1, x2, . . . , xn | < x1, x2, . . . , xn > }
c) { x1, x2, . . . , xn | x1, x2, . . . , xn}
d) {< x1, x2, . . . , xn > | P(x1, x2, . . . , xn)}

4.6 SUMMARY
Relational calculus is a non-procedural query language. It uses mathematical predicate
calculus instead of algebra. It provides the description about the query to get the result
whereas relational algebra gives the method to get the result. It informs the system what to
do with the relation, but does not inform how to perform it. For example, steps involved in
listing all the students who attend ‘Database’ Course in relational algebra would be

 SELECT the tuples from COURSE relation with COURSE_NAME =


‘DATABASE’
 PROJECT the COURSE_ID from above result
 SELECT the tuples from STUDENT relation with COUSE_ID resulted above.

There are two types of relational calculus – Tuple Relational Calculus (TRC) and Domain
Relational Calculus (DRC).

TRC - A tuple relational calculus is a non-procedural query language which specifies to


select the tuples in a relation. It can select the tuples with range of values or tuples for

59
certain attribute values etc. The resulting relation can have one or more tuples. It can be
denoted as:- {t | P (t)} or {t | condition (t)}

DRC- In contrast to tuple relational calculus, domain relational calculus uses list of attribute
to be selected from the relation based on the condition. It is same as TRC, but differs by
selecting the attributes rather than selecting whole tuples. It is denoted as below:

{< a1, a2, a3, … an > | P(a1, a2, a3, … an)}

Where a1, a2, a3, … an are attributes of the relation and P is the condition.

4.7 KEYWORDS
 DATABASE- A database is a collection of information that is organized so that it
can be easily accessed, managed and updated. Computer databases typically contain
aggregations of data records or files, containing information about sales
transactions or interactions with specific customers.
 NON PROCEDURAL LANGUAGE- A computer language that does not require
writing traditional programming logic. Also known as a "declarative language,"
users concentrate on defining the input and output rather than the program steps
required in a procedural programming language such as C++ or Java.
 RELATIONAL QUERY LANGUAGE- Relational query languages use
relational algebra to break the user requests and instruct the DBMS to execute the
requests. It is the language by which user communicates with the database. These
relational query languages can be procedural or non-procedural.
 PROCEDURAL QUERY LANGUAGE: In procedural query language, user
instructs the system to perform a series of operations to produce the desired results.
Here users tells what data to be retrieved from database and how to retrieve it.

4.8 SELF-ASSESSMENT TEST

1. Given the following relational schemas

Student (studId, name, age, sex, deptNo, advisor)


Department (deptId, DName, hod, phoneNo)

60
Which of the following will be the TRC query to obtain the department names that
do not have any girl students?
Qption 1.
{d.Dname | department (d) ∧ ~ ((∃(s)) student(s) ∧ s.sex ≠ ‘F’ ∧ s.deptNo = d.deptId)}

Qption 2.
{d.Dname | department (d) ∧ ((∀ (s)) student(s) ∧ s.sex ≠ ‘F’ ∧ s.deptNo = d.deptId)}

Qption 3.
{d.Dname | department (d) ∧ ~ ((∃(s)) student(s) ∧ s.sex = ‘F’ ∧ s.deptNo = d.deptId)}

2. Explain tuple relational calculus in detail.

3. What do you mean by Relational calculus, how it is different from relational algebra.

4. Discuss domain relational calculus in detail.

4.9 ANSWERS TO CHECK YOUR PROGRESS


1. Tuple Relational Calculus
2. Domain Relational Calculus
3. A non-procedural query language
4. A
5. C
6. C
7. D

4.10 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://www.javatpoint.com/dbms-relational-calculus
 https://www.studytonight.com/dbms/relational-calculus.php
 https://www.w3schools.in/dbms/relational-calculus/
 https://www.tutorialcup.com/dbms/relational-calculus.htm

61
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 5 VETTER:

FUNCTIONAL DEPENDENCY AND NORMALIZATION-


BASICS

STRUCTURE

5.0 Learning Objective

5.1 Introduction

5.2 Definition

5.3 Purpose of Functional Dependency

5.4 Data Redundancy in Functional Dependency

5.5 Update Anomalies

5.6 Check Your Progress

5.7 Summary

5.8 Keywords

5.9 Self-Assessment Test

5.10 Answers to check your progress

5.11 References / Suggested Readings

5.0 LEARNING OBJECTIVE


 To understand the concepts of Functional Dependency, Normalization.
 To learn the purpose of functional dependency.
 To discuss and Update the anomalies in functional dependency.
 To understand the data redundancy in Functional Dependency.

62
5.1 INTRODUCTION
Each relation schema consists of a number of attributes, and the relational database schema
consists of a number of relation schemas. So far, we have assumed that attributes are
grouped to form a relation schema by using the common sense of the database designer or
by mapping a database schema design from a conceptual data model such as the ER or
enhanced ER (EER) or some other conceptual data model. These models make the designer
identify entity types and relationship types and their respective attributes, which leads to a
natural and logical grouping of the attributes into relations when the mapping procedures.
We have not developed any measure of appropriateness or "goodness" to measure the
quality of the design, other than the intuition of the designer. In this chapter we discuss
some of the theory that has been developed with the goal of evaluating relational schemas
for design quality-that is, to measure formally why one set of groupings of attributes into
relation schemas is better than another.

There are two levels at which we can discuss the "goodness" of relation schemas.
The first is the logical (or conceptual) level-how users interpret the relation schemas and
the meaning of their attributes. Having good relation schemas at this level enables users to
understand clearly the meaning of the data in the relations, and hence to formulate their
queries correctly. The second is the implementation (or storage) level-how the tuples in a
base relation are stored and updated. This level applies only to schemas of base relations-
which will be physically stored as files-whereas at the logical level we are interested in
schemas of both base relations and views (virtual relations). The relational database design
theory developed in this chapter applies mainly to base relations, although some criteria of
appropriateness also apply to views. As with many design problems, database design may
be performed using two approaches: bottom-up or top-down. A bottom-up design
methodology (also called design by synthesis) considers the basic relationships among
individual attributes as the starting point and uses those to construct relation schemas. This
approach is not very popular in practice. Because it suffers from the problem of having to
collect a large number of binary relationships among attributes as the starting point. In
contrast, a top-down design methodology (also called design by analysis) starts with a
number of groupings of attributes into relations that exist together naturally, for example,
on an invoice, a form, or a report. The relations are then analysed individually and
collectively, leading to further decomposition until all desirable properties are met. The
theory described in this chapter is applicable to both the top-down and bottom-up design
63
approaches, but is more practical when used with the top-down approach. We define the
concept of functional dependency, a formal constraint among attributes that is the main tool
for formally measuring the appropriateness of attribute groupings into relation schemas.
Properties of functional dependencies are also studied and analysed. Then properties of
functional dependencies are also studied and analysed. Then we will discuss the how
functional dependencies can be used to group attributes into relation schemas that are in a
normal form. A relation schema is in a normal form when it satisfies certain desirable
properties. The process of normalization consists of analysing relations to meet
increasingly more stringent normal forms leading to progressively better groupings of
attributes. Normal forms are specified in terms of functional dependencies-which are
identified by the database designer-and key attributes of relation schemas.
When developing the schema of a relational database, one of the most important
aspects to be taken into account is to ensure that the duplication is minimized. This is done
for 2 purposes:
 Reducing the amount of storage needed to store the data.
 Avoiding unnecessary data conflicts that may creep in because of multiple copies
of the same data getting stored.

5.2 DEFINITION
Functional Dependency: A functional dependency is a constraint between two sets of
attributes from the database. Suppose that our relational database schema has n attributes
AI, A2, ……, An; let us think of the whole database as being described by a single universal
relation schema R = {A1, A2, A3…….., An). 6We do not imply that we will actually store
the database as a single universal table; we use this concept only in developing the formal
theory of data dependencies

Definition: A functional dependency is a constraint between two sets of attributes from the
database. Suppose that our relational database schema has n attributes A1, A2, ..., An. If
we think of the whole database as being described by a single universal relation schema R
= {A1, A2, ... , An}. A functional dependency (FD) is a relationship between two attributes,
typically between the PK and other non-key attributes within a table. For any relation R,
attribute Y is functionally dependent on attribute X (usually the PK), if for every valid
instance of X, that value of X uniquely determines the value of Y.

64
It determines the relation of one attribute to another attribute in a database management
system (DBMS) system. Functional dependency helps you to maintain the quality of data
in the database. A functional dependency is denoted by an arrow →. The functional
dependency of X on Y is represented by X → Y. Functional Dependency plays a vital role
to find the difference between good and bad database design.

Example:

Employee number Employee Name Salary City

1 Dana 50000 San Francisco

2 Francis 38000 London

3 Andrew 25000 Tokyo

In this example, if we know the value of Employee number, we can obtain Employee Name,
city, salary, etc. By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.

Definition of Normalization:

Database Normalization is a technique of organizing the data in the database.


Normalization is a systematic approach of decomposing tables to eliminate data
redundancy (repetition) and undesirable characteristics like Insertion, Update and Deletion
Anomalies. It is a multi-step process that puts data into tabular form, removing duplicated
data from the relation tables.

Normalization is used for mainly two purposes,

 Eliminating redundant (useless) data.


 Ensuring data dependencies make sense i.e. data is logically stored.

5.3 PURPOSE OF FUNCTIONAL DEPENDANCY


A functional dependency, denoted by X Y, between two sets of attributes X and Y that
are subsets of R, such that any two tuples t1 and t2 in r that have t1[X] = t2[X], they must
also have t1[Y] = t2[Y].

65
This means that the values of the Y component of a tuple in r depend on, or are determined
by, the values of the X component; we say that the values of the X component of a tuple
uniquely (or functionally) determine the values of the Y component. We say that there is a
functional dependency from X to Y, or that Y is functionally dependent on X.

Functional dependency is represented as FD or f.d. The set of attributes X is called the left-
hand side of the FD, and Y is called the right-hand side.

X functionally determines Y in a relation schema R if, and only if, whenever two tuples of
r(R) agree on their X-value, they must necessarily agree on their Y-value. If a constraint on
R states that there cannot be more than one tuple with a given X-value in any relation
instance r(R)—that is, X is a candidate key of R— this implies that X Y for any subset
of attributes Y of R.

If X is a candidate key of R, then XR.



If XY in R, this does not imply that YX in R.

A functional dependency is a property of the semantics or meaning of the attributes.


Whenever the semantics of two sets of attributes in R indicate that a functional dependency
should hold, we specify the dependency as a constraint.

Legal Relation States:

Relation extensions r(R) that satisfy the functional dependency constraints are called legal
relation states (or legal extensions) of R. Functional dependencies are used to describe
further a relation schema R by specifying constraints on its attributes that must hold at all
times. Certain FDs can be specified without referring to a specific relation, but as a property
of those attributes given their commonly understood meaning.

For example, {State, Driver_license_number} Ssn should hold for any adult in the

United States and hence should hold whenever these attributes appear in a relation.
Consider the relation schema EMP_PROJ from the semantics of the attributes and the
relation, we know that the following functional dependencies should hold:
a. SsnEname
b. Pnumber {Pname, Plocation}

66
c. {Ssn, Pnumber}Hours

A functional dependency is a property of the relation schema R, not of a particular legal


relation state r of R. Therefore, an FD cannot be inferred automatically from a given relation
extension r but must be defined explicitly by someone who knows the semantics of the
attributes of R.

Example 1: For the relation Student(studentID, name, DateOfBirth, phoneNumber),


assuming
each student has only one name, then the following functional dependency holds
{studentID} {name, DateOfBirth}
However, assuming a student may have multiple phone numbers, then the FD
{studentID} {phoneNumber}
does not hold for the table.
By convention, we often omit the curly braces { } for the set, and write the first functional
dependency in Example 1 as
studentIDname, DateOfBirth.
Note that the above FD can also be written equivalently into the two FDs below:
studentID name
studentIDDateOfBirth

5.4 DATA REDUNDANCY IN FUNCTIONAL DEPENDENCY

Data redundancy is a condition created within a database or data storage technology in


which the same piece of data is held in two separate places. This can mean two different
fields within a single database, or two different spots in multiple software environments or
platforms. Data redundancy occurs when the same piece of data is stored in two or more
separate places and is a common occurrence in many businesses. As more companies are
moving away from siloed data to using a central repository to store information, they are
finding that their database is filled with inconsistent duplicates of the same entry. Although
it can be challenging to reconcile — or even benefit from — duplicate data entries,
understanding how to reduce and track data redundancy efficiently can help mitigate long-
term inconsistency issues for your business.

67
Sometimes data redundancy happens by accident while other times it is intentional.
Accidental data redundancy can be the result of a complex process or inefficient coding
while intentional data redundancy can be used to protect data and ensure consistency —
simply by leveraging the multiple occurrences of data for disaster recovery and quality
checks. If data redundancy is intentional, it’s important to have a central field or space for
the data. This allows you to easily update all records of redundant data when necessary.
Four major advantages of Data Redundancy:
Although data redundancy sounds like a negative event, there are many organizations that
can benefit from this process when it’s intentionally built into daily operations.
1. Alternative data backup method
Backing up data involves creating compressed and encrypted versions of data and storing
it in a computer system or the cloud. Data redundancy offers an extra layer of protection
and reinforces the backup by replicating data to an additional system. It’s often an
advantage when companies incorporate data redundancy into their disaster recovery plans.
2. Better data security
Data security relates to protecting data, in a database or a file storage system, from
unwanted activities such as cyberattacks or data breaches. Having the same data stored in
two or more separate places can protect an organization in the event of a cyberattack or
breach — an event which can result in lost time and money, as well as a damaged
reputation.
3. Faster data access and updates
When data is redundant, employees enjoy fast access and quick updates because the
necessary information is available on multiple systems. This is particularly important for
customer service-based organizations whose customers expect promptness and efficiency.
4. Improved data reliability
Data that is reliable is complete and accurate. Organizations can use data redundancy to
double check data and confirm it’s correct and completed in full — a necessity when
interacting with customers, vendors, internal staff, and others.

Although there are noteworthy advantages of intentional data redundancy, there are also
several significant drawbacks when organizations are unaware of its presence.
Possible data inconsistency

68
Data redundancy occurs when the same piece of data exists in multiple places, whereas
data inconsistency is when the same data exists in different formats in multiple tables.
Unfortunately, data redundancy can cause data inconsistency, which can provide a
company with unreliable and/or meaningless information.
Increase in data corruption
Data corruption is when data becomes damaged as a result of errors in writing, reading,
storage, or processing. When the same data fields are repeated in a database or file storage
system, data corruption arises. If a file gets corrupted, for example, and an employee tries
to open it, they may get an error message and not be able to complete their task.
Increase in database size
Data redundancy may increase the size and complexity of a database — making it more of
a challenge to maintain. A larger database can also lead to longer load times and a great
deal of headaches and frustrations for employees as they’ll need to spend more time
completing daily tasks.
Increase in cost
When more data is created due to data redundancy, storage costs suddenly increase. This
can be a serious issue for organizations who are trying to keep costs low in order to increase
profits and meet their goals. In addition, implementing a database system can become more
expensive.
There are four informal measures of quality for relation schema design.

 Semantics of the attributes.

 Reducing the redundant values in tuples.

 Reducing the null values in tuples.

 Disallowing the possibility of generating spurious tuples.

Semantics of the Relation Attributes- The easier it is to explain the semantics of the

relation, the better the relation schema design will be.

GUIDELINE 1: Design a relation schema so that it is easy to explain its meaning. Do not
combine attributes from multiple entity types and relationship types into a single relation.
Intuitively, if a relation schema corresponds to one entity type or one relationship type, the
meaning tends to be clear. Otherwise, the relation corresponds to a mixture of multiple
entities and relationships and hence becomes semantically unclear.

69
Example: A relation involves two entities- poor design.
EMP DEPT
ENAME SSN BDATE ADDREESS DNUMBER DNAME DMGRSSN

5.5 UPDATE ANOMALIES

Consider the two relation schemas EMP_LOCS and EMP_PROJl in Figure 5.1 a, A tuple

in EMP_LOCS means that the employee whose name is ENAME works on some project

whose location is PLOCATION.

Figure 5.1 (a): The two relation schemas EMP_LOCS and EMP_PROJ1

70
Figure 5.1 (b) The result of projecting the extension of EMP_PROJ form Figure 5.1(a) on
the relations EMP_LOCS and EMP_PROJ1
Update anomalies for base relations EMP DEPT and EMP PROJ in Figure 5.1
 Insertion anomalies: For EMP DEPT relation in Figure 5.1
 To insert a new employee tuple, we need to make sure that the values of
attributes DNUMBER, DNAME, and DMGRSSN are consistent to other
employees (tuples) in EMP DEPT.
 It is difficult to insert a new department that has no employees as yet in the EMP
DEPT relation.
 Deletion anomalies: If we delete from EMP DEPT an employee tuple that happens
to represent the last employee working for a particular department, the information
concerning that department is lost from the database.

71
 Modification anomalies: If we update the value of MGRSSN in a particular
department, we must to update the tuples of all employees who work in that
department; otherwise, the database will become inconsistent.

GUIDELINE 2: Design the base relation schemas so that no insertion, deletion, or


Modification anomalies are present in the relations. If any anomalies are present, note them
clearly and make sure the programs that update the database will operate correctly. It is
advisable to use anomaly-free base relations and to specify views that include the JOINs
for placing together the attributes frequently referenced to improve the performance.

5.6 CHECK YOUR PROGRESS


1. We can use the following three rules to find logically implied functional
dependencies. This collection of rules is called
a) Axioms
b) Armstrong’s axioms
c) Armstrong
d) Closure
2. Which of the following is not Armstrong’s Axiom?
a) Reflexivity rule
b) Transitivity rule
c) Pseudo transitivity rule
d) Augmentation rule
3. The relation employee(ID,name,street,Credit,street,city,salary) is decomposed into
employee1 (ID, name)
employee2 (name, street, city, salary)
This type of decomposition is called
a) Lossless decomposition
b) Lossless-join decomposition
c) All of the mentioned
d) None of the mentioned
4. Inst_dept (ID, name, salary, dept name, building, budget) is decomposed into
instructor (ID, name, dept name, salary)
department (dept name, building, budget)
This comes under
a) Lossy-join decomposition

72
b) Lossy decomposition
c) Lossless-join decomposition
d) Both Lossy and Lossy-join decomposition
5. Suppose relation R(A,B,C,D,E) has the following functional dependencies:
A -> B
B -> C
BC -> A
A -> D
E -> A
D -> E
Which of the following is not a key?
a) A
b) E
c) B, C
d) D

5.7 SUMMARY
Functional dependency (FD) is a set of constraints between two attributes in a relation.
Functional dependency says that if two tuples have same values for attributes A1, A2,...,
An, then those two tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X
functionally determines Y. The left-hand side attributes determine the values of attributes
on the right-hand side. Database normalization is the process of efficiently organizing data
in a database so that redundant data is eliminated. This process can ensure that all of a
company’s data looks and reads similarly across all records. By implementing data
normalization, an organization standardizes data fields such as customer names, addresses,
and phone numbers. Normalizing data involves organizing the columns and tables of a
database to make sure their dependencies are enforced correctly. The “normal form” refers
to the set of rules or normalizing data, and a database is known as “normalized” if it’s free
of delete, update, and insert anomalies. When it comes to normalizing data, each company
has their own unique set of criteria. Therefore, what one organization believes to be
“normal,” may not be “normal” for another organization. For instance, one company may
want to normalize the state or province field with two digits, while another may prefer the
full name. Regardless, database normalization can be the key to reducing data redundancy
across any company.

73
Efficient data redundancy is possible. Many organizations like home improvement
companies, real estate agencies, and companies focused on customer interactions have
customer relationship management (CRM) systems. When a CRM system is integrated
with another business software like an accounting software that combines customer and
financial data, redundant manual data is eliminated, leading to more insightful reports and
improved customer service. Database management systems are also used in a variety of
organizations. They receive direction from a database administrator (DBA) and allow the
system to load, retrieve, or change existing data from the systems. Database management
systems adhere to the rules of normalization, which reduces data redundancy. Hospitals,
nursing homes, and other healthcare entities use database management systems to generate
reports that provide useful information for physicians and other employees. When data
redundancy is efficient and does not lead to data inconsistency, these systems can alert
healthcare providers of rises in denial claim rates, how successful a certain medication is,
and other important pieces of information.

5.8 KEYWORDS
 AXIOM - Axioms is a set of inference rules used to infer all the functional
dependencies on a relational database.
 DECOMPOSITION- It is a rule that suggests if you have a table that appears to
contain two entities which are determined by the same primary key then you should
consider breaking them up into two different tables.
 DEPENDENT - It is displayed on the right side of the functional dependency
diagram.
 UNION - It suggests that if two tables are separate, and the PK is the same, you
should consider putting them. Together.
 DETERMINANT - It is displayed on the left side of the functional dependency
Diagram.

5.9 SELF-ASSESSMENT TEST


1. Explain the Functional Dependency in detail.
2. Discuss how to Insert and Update anomaly in functional dependency.
3. What is the key role of Normalization?
4. How normalization and functional dependency are related to each other?

74
5. Discuss with example the redundancy in functional dependency.

5.10 ANSWERS TO CHECK YOUR PROGRESS


1. B
2. C
3. D
4. D
5. C

5.11 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://opentextbc.ca/dbdesign01/chapter/chapter-11-functional-dependencies/
 https://hackr.io/blog/dbms-normalization
 https://www.guru99.com/database-normalization.html
 https://www.javatpoint.com/dbms-normalization

75
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 6 VETTER:

TYPES OF FUNCTIONAL DEPENDENCIES

STRUCTURE

6.0 Learning Objective

6.1 Introduction

6.2 Definition

6.3 Types of Functional Dependency

6.3.1 Full Functional Dependency

6.3.2 Transitive Functional Dependency

6.3.3 Multivalued Functional Dependency

6.3.4 Partial Functional Dependency

6.4 Characteristics of Functional Dependency

6.5 Check Your Progress

6.6 Summary

6.7 Keywords

6.8 Self-Assessment Test

6.9 Answers to check your progress

6.10 References / Suggested Readings

6.0 LEARNING OBJECTIVE


 To understand the concepts Functional Dependency.
 To study and understand the different types of dependencies

76
 To know the characteristics of functional dependencies

6.1 INTRODUCTION
Relational database design ultimately produces a set of relations. The implicit goals of the
design activity are: information preservation and minimum redundancy. So we need to
firstly focus on the Informal Design Guidelines for Relation Schemas.

Four informal guidelines that may be used as measures to determine the quality of relation
schema design:
 Making sure that the semantics of the attributes is clear in the schema
 Reducing the redundant information in tuples
 Reducing the NULL values in tuples
 Disallowing the possibility of generating spurious tuples

The semantics of a relation refers to its meaning resulting from the interpretation of
attribute values in a tuple. The relational schema design should have a clear meaning.

Guideline 1:

1. Design a relation schema so that it is easy to explain.

2. Do not combine attributes from multiple entity types and relationship types into a single
relation.

Redundant Information in Tuples and Update Anomalies

One goal of schema design is to minimize the storage space used by the base relations (and
hence the corresponding files). Grouping attributes into relation schemas has a significant
effect on storage space storing natural joins of base relations leads to an additional problem
referred to as update anomalies. These are: insertion anomalies, deletion anomalies, and
modification anomalies.

Insertion Anomalies happen:

 When insertion of a new tuple is not done properly and will therefore can make
the database become inconsistent.

77
 When the insertion of a new tuple introduces a NULL value (for example a
department in which no employee works as of yet). This will violate the
integrity constraint of the table since ESsn is a primary key for the table.

Deletion Anomalies:

The problem of deletion anomalies is related to the second insertion anomaly situation just
discussed. Example: If we delete from EMP_DEPT an employee tuple that happens to
represent the last employee working for a particular department, the information
concerning that department is lost from the database.

Modification Anomalies happen if we fail to update all tuples as a result in the change in
a single one. Example: if the manager changes for a department, all employees who work
for that department must be updated in all the tables. It is easy to see that these three
anomalies are undesirable and cause difficulties to maintain consistency of data as well as
require unnecessary updates that can be avoided; hence

Guideline 2

Design the base relation schemas so that no insertion, deletion, or modification anomalies
are present in the relations. If any anomalies are present, note them clearly and make sure
that the programs that update the database will operate correctly. The second guideline is
consistent with and, in a way, a restatement of the first guideline.

NULL Values in Tuples

Fat Relations: A relation in which too many attributes are grouped. If many of the attributes
do not apply to all tuples in the relation, we end up with many NULLs in those tuples. This
can waste space at the storage level and may also lead to problems with understanding the
meaning of the attributes and with specifying JOIN operations at the logical level. Another
problem with NULLs is how to account for them when aggregate operations such as
COUNT or SUM are applied. SELECT and JOIN operations involve comparisons; if
NULL values are present, the results may become unpredictable. Moreover, NULLs can
have multiple interpretations, such as the following:

 The attribute does not apply to this tuple. For example, Visa_status may not apply
to U.S. students.

78
 The attribute value for this tuple is unknown. For example, the Date_of_birth may
be unknown for an employee.
 The value is known but absent; that is, it has not been recorded yet. For example,
the Home_Phone_Number for an employee may exist, but may not be available and
recorded yet. Having the same representation for all NULLs compromises the
different meanings they may have. Therefore, we may state another guideline.

Guideline 3

As much as possible, avoid placing attributes in a base relation whose values may
frequently be NULL. If NULLs are unavoidable, make sure that they apply in exceptional
cases only.

For example, if only 15 percent of employees have individual offices, there is little
justification for including an attribute Office_number in the EMPLOYEE relation; rather,
a relation EMP_OFFICES(Essn, Office_number) can be created.

Guideline 4

Design relation schemas so that they can be joined with equality conditions on attributes
that are appropriately related (primary key, foreign key) pairs in a way that guarantees that
no spurious tuples are generated. Avoid relations that contain matching attributes that are
not (foreign key, primary key) combinations because joining on such attributes may
produce spurious tuples.

6.2 DEFINITION
Functional Dependency (FD) determines the relation of one attribute to another attribute in
a database management system (DBMS) system. Functional dependency helps you to
maintain the quality of data in the database. A functional dependency is denoted by an
arrow →. The functional dependency of X on Y is represented by X → Y. Functional
Dependency plays a vital role to find the difference between good and bad database design.

Example:

Employee number Employee Name Salary City

79
1 Dana 50000 San Francisco

2 Francis 38000 London

3 Andrew 25000 Tokyo

In this example, if we know the value of Employee number, we can obtain Employee Name,
city, salary, etc. By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.

A functional dependency A->B in a relation holds if two tuples having same value of
attribute A also have same value for attribute B. For Example, in relation STUDENT shown
in table 1, Functional Dependencies.

STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE hold

But

STUD_NAME->STUD_ADDR do not hold

Functional Dependencies in a relation are dependent on the domain of the relation.


Consider the STUDENT relation given in Table 1.

 We know that STUD_NO is unique for each student. So STUD_NO-


>STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE,
STUD_NO->STUD_COUNTRY and STUD_NO -> STUD_AGE all will be true.
 Similarly, STUD_STATE->STUD_COUNTRY will be true as if two records have
same STUD_STATE, they will have same STUD_COUNTRY as well.
80
 For relation STUDENT_COURSE, COURSE_NO->COURSE_NAME will be true
as two records with same COURSE_NO will have same COURSE_NAME.

Functional Dependency Set: Functional Dependency set or FD set of a relation is the set
of all FDs present in the relation. For Example, FD set for relation STUDENT shown in
table 1 is:

{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE,


STUD_NO->STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY }

6.3 TYPES OF FUNCTIONAL DEPENDENCY

Dependencies in DBMS is a relation between two or more attributes. It has the following
types in DBMS –

 Fully-Functional Dependency
 Transitive Dependency
 Multivalued Dependency
 Partial Dependency

6.3.1 FULLY FUNCTIONAL DEPENDENCY

An attribute is fully functional dependent on another attribute, if it is Functionally


Dependent on that attribute and not on any of its proper subset.

For example, an attribute Q is fully functional dependent on another attribute P, if it is


Functionally Dependent on P and not on any of the proper subset of P.

Example:

<ProjectCost>
ProjectID ProjectCost
001 1000
002 5000

<EmployeeProject>

81
EmpID ProjectID Days (spent on the project)
E099 001 320
E056 002 190

The above relations states:


EmpID, ProjectID, ProjectCost -> Days
However, it is not fully functional dependent. Whereas the subset {EmpID, ProjectID} can
easily determine the {Days} spent on the project by the employee.
This summarizes and gives our fully functional dependency −
{EmpID, ProjectID} -> (Days)

6.3.2 TRANSITIVE FUNCTIONAL DEPENDENCY


When an indirect relationship causes functional dependency it is called Transitive
Dependency.
If P ---> Q and Q ---> R is true, then P---> R is a transitive dependency.
A transitive dependency in a database is an indirect relationship between values in the same
table that causes a functional dependency. To achieve the normalization standard of Third
Normal Form (3NF), you must eliminate any transitive dependency. By its nature, a
transitive dependency requires three or more attributes (or database columns) that have a
functional dependency between them, meaning that Column A in a table relies on Column
B through an intermediate Column C.

Example:

Author_ID Author Book Author_Nationality


Auth_001 Orson Scott Card Ender's Game United States
Auth_001 Orson Scott Card Children of the Mind United States
Auth_002 Margaret Atwood The Handmaid's Tale Canada

In the AUTHORS example above:

 Book → Author: Here, the Book attribute determines the Author attribute. If you
know the book name, you can learn the author's name. However, Author does not
determine Book, because an author can write multiple books. For example, just

82
because we know the author's name Orson Scott Card, we still don't know the book
name.
 Author → Author_Nationality: Likewise, the Author attribute determines
the Author_Nationality, but not the other way around; just because we know the
nationality does not mean we can determine the author.

But this table introduces a transitive dependency:

 Book →Author_Nationality: If we know the book name, we can determine the


nationality via the Author column.

6.3.3 MULTIVALUED FUNCTIONAL DEPENDENCY


When existence of one or more rows in a table implies one or more other rows in the same
table, then the Multi-valued dependencies occur. If a table has attributes P, Q and R, then
Q and R, are multi-valued facts of P.

It is represented by double arrow –

->->

Example:

P->->Q
Q->->R

6.3.4 PARTIAL FUNCTIONAL DEPENDENCY


Partial Dependency occurs when a nonprime attribute is functionally dependent on part of
a candidate key.

The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example −

<StudentProject>

StudentID ProjectNo StudentName ProjectName


S01 199 Katie Geo Location
S02 120 Ollie Cluster Exploration

The prime key attributes are StudentID and ProjectNo. As stated, the non-prime attributes
i.e. StudentName and ProjectName should be functionally dependent on part of a candidate
key, to be Partial Dependent. The StudentName can be determined by StudentID that makes

83
the relation Partial Dependent. The ProjectName can be determined by ProjectID, which
that the relation Partial Dependent.

6.4 CHARACTERISTICS OF FUNCTIONAL DEPENDENCY


Properties and axiomatization of functional dependencies among the most important are
the following, usually called Armstrong's axioms: Reflexivity: If Y is a subset of X, then
X → Y. Augmentation: If X → Y, then XZ → YZ. Transitivity: If X → Y and Y → Z, then
X → Z.

Main characteristics of functional dependencies used in normalization:

 There is a one-to-one relationship between the left-hand side and right-hand side
attributes
 Holds for all time
 The determinant has the minimal number of necessary attributes

6.5 CHECK YOUR PROGRESS


1. What is functional dependency that is satisfied by all relational called_______?
2. _________ refers to a attribute or group of attributes mentioned in the left hand side of the
arrow in a FD.
3. In a functional dependency X --> Y, if Y is functionally dependent on X, but not on
X's proper subsets, then we would call the functional dependency as__________.
4. In RDBMS, FD stands for ___________.
5. _________allow us to identify uniquely a tuple in the relation.

6.6 SUMMARY
A functional dependency is a constraint between two sets of attributes from the database.
Suppose that our relational database schema has n attributes A1, A2, ..., An. If we think of
the whole database as being described by a single universal relation schema R = {A1, A2,

... , An}. A functional dependency, denoted by X Y, between two sets of attributes X
and Y that are subsets of R, such that any two

tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
This means that the values of the Y component of a tuple in r depend on, or are determined

by, the values of the X component; we say that the values of the X component of a tuple

84
uniquely (or functionally) determine the values of the Y component. We say that there is a

functional dependency from X to Y, or that Y is functionally dependent on X.

Functional dependency is represented as FD or f.d. The set of attributes X is called the left-
hand side of the FD, and Y is called the right-hand side. X functionally determines Y in a
relation schema R if, and only if, whenever two tuples of r(R) agree on their X-value, they
must necessarily agree on their Y-value. If a constraint on R states that there cannot be
more than one tuple with a given X-value in any relation instance r(R)—that is, X is a
candidate key of R— this implies that X Y for any subset of attributes Y of R. If X is a
candidate key of R, then XR.

If XY in R, this does not imply that YX in R.

A functional dependency is a property of the semantics or meaning of the attributes.


Whenever the semantics of two sets of attributes in R indicate that a functional dependency
should hold, we specify the dependency as a constraint. A functional dependency is a
property of the relation schema R, not of a particular legal relation state r of R. Therefore,
an FD cannot be inferred automatically from a given relation extension r but must be
defined explicitly by someone who knows the semantics of the attributes of R.

6.7 KEYWORDS
 AXIOM- An axiom or postulate is a statement that is taken to be true, to serve as a
premise or starting point for further reasoning and arguments. The word comes from
the Greek axíōma (ἀξίωμα) 'that which is thought worthy or fit' or 'that which
commends itself as evident.
 TRIVIAL − If a functional dependency (FD) X → Y holds, where Y is a subset of
X, then it is called a trivial FD. ... Completely non-trivial − If an FD X → Y holds,
where x intersect Y = Φ, it is said to be a completely non-trivial FD.
 FOREIGN KEY- Foreign keys are the columns of a table that points to the primary
key of another table. They act as a cross-reference between tables.
 JOIN DEPENDENCY- A join dependency is a constraint on the set of legal
relations over a database scheme. A table is subject to a join dependency if can
always be recreated by joining multiple tables each having a subset of the attributes
of.

85
6.8 SELF-ASSESSMENT TEST
1. What do you mean by fully functional dependency?
2. Write a short note on transitive dependency, give an example of transitive
dependency.
3. What is functional dependency and its types?
4. Discuss the characteristics of FD?
5. What is the use of functional dependency in RDBMS?

6.9 ANSWERS TO CHECK YOUR PROGRESS


1. Trivial
2. Multivalued attribute
3. Partial functional dependency
4. Functional Dependency
5. Super key

6.10 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://www.guru99.com/dbms-functional-dependency.html
 https://www.tutorialspoint.com/Types-of-dependencies-in-DBMS
 https://beginnersbook.com/2015/04/functional-dependency-in-dbms/
 https://www.javatpoint.com/dbms-functional-dependency

86
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 7 VETTER:

DECOMPOSITION AND NORMAL FORMS

STRUCTURE

7.0 Learning Objective

7.1 Introduction

7.2 Definition of Normalization

7.3 Decomposition

7.4 First Normal Form (1NF)

7.5 Second Normal Form (2NF)

7.6 Third Normal Form (3NF)

7.7 Boyce-Codd normal form (BCNF)

7.8 Check Your Progress

7.9 Summary

7.10 Keywords

7.11 Self-Assessment Test

7.12 Answers to check your progress

7.13 References / Suggested Readings

7.0 LEARNING OBJECTIVE


 To understand the concepts of Anomalies in Database
 Learn how to update, insert and Delete anomalies

87
 To understand the concept of Normalization in removing anomalies in
database
 Study and learn decomposition methods and different forms of Normalization

7.1 INTRODUCTION
NORMALIZATION is a database design technique that reduces data redundancy and
eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization rules divides larger tables into smaller tables and links them using
relationships. The purpose of Normalization in SQL is to eliminate redundant (repetitive)
data and ensure data is stored logically. The inventor of the relational model Edgar Codd
proposed the theory of normalization with the introduction of the First Normal Form, and
he continued to extend theory with Second and Third Normal Form. Later he joined
Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.

There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly. Let’s take an example to understand this.

Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing
the department details in which the employee works. At some point of time the table looks
like this in table 7.1:

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

88
166 Glenn Chennai D900

166 Glenn Chennai D004

Table 7.1: Un-Normalized Data in a table

Update anomaly: In the above table we have two rows for employee Rick as he belongs
to two departments of the company. If we want to update the address of Rick then we have
to update the same in two rows or the data will become inconsistent. If somehow, the correct
address gets updated in one department but not in other then as per the database, Rick would
be having two different addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the department D890
then deleting the rows that are having emp_dept as D890 would also delete the information
of employee Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data.

7.2 DEFINTION OF NORMALIZATION


Database Normalization is a technique that helps in designing the schema of the database
in an optimal manner so as to ensure the above points. The core idea of database
normalization is to divide the tables into smaller sub tables and store pointers to data rather
than replicating it. For a better understanding of what we just said, here is a simple DBMS
Normalization example:

To understand (RDBMS) normalization in the database with example tables, let's assume
that we are supposed to store the details of courses and instructors in a university. Here is
what a sample database could look like:

Course code Course venue Instructor Name Instructor’s phone number

89
CS101 Lecture Hall 20 Prof. George +91 6514821924

CS152 Lecture Hall 21 Prof. Atkins +91 6519272918

CS154 CS Auditorium Prof. George +91 6514821924

Here, the data basically stores the course code, course venue, instructor name, and
instructor’s phone number. At first, this design seems to be good. However, issues start to
develop once we need to modify information. For instance, suppose, if Prof. George
changed his mobile number. In such a situation, we will have to make edits in 2 places.
What if someone just edited the mobile number against CS101, but forgot to edit it for
CS154? This will lead to stale/wrong information in the database.

This problem, however, can be easily tackled by dividing our table into 2 simpler tables:

Table 1 (Instructor):

1. Instructor ID
2. Instructor Name
3. Instructor mobile number

Table 2 (Course):

 Course code
 Course venue
 Instructor ID

Now, our data will look like the following:

Table 1 (Instructor):
Insturctor's ID Instructor's name Instructor's number

1 Prof. George +1 6514821924

2 Prof. Atkins +1 6519272918

Table 2 (Course):

90
Course code Course venue Instructor ID

CS101 Lecture Hall 20 1

CS152 Lecture Hall 21 2

CS154 CS Auditorium 1

Basically, we store the instructors separately and in the course table, we do not store the
entire data of the instructor. We rather store the ID of the instructor. Now, if someone wants
to know the mobile number of the instructor, he/she can simply look up the instructor table.
Also, if we were to change the mobile number of Prof. George, it can be done in exactly
one place. This avoids the stale/wrong data problem.

Further, if you observe, the mobile number now need not be stored 2 times. We
have stored it at just 1 place. This also saves storage. This may not be obvious in the above
simple example. However, think about the case when there are hundreds of courses and
instructors and for each instructor, we have to store not just the mobile number, but also
other details like office address, email address, specialization, availability, etc. In such a
situation, replicating so much data will increase the storage requirement unnecessarily. The
above is a simplified example of how database normalization works. We will now more
formally study it.

 Normalization is the process of organizing the data in the database.


 Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
 Normalization divides the larger table into the smaller table and links them
using relationship.
 The normal form is used to reduce redundancy from the database table.

Normalization rules are divided into the following normal forms:

1. First Normal Form


2. Second Normal Form

91
3. Third Normal Form
4. BCNF
5. Fourth Normal Form

7.3 DECOMPOSITION

Definition. The normal form of a relation refers to the highest normal form condition that

it meets, and hence indicates the degree to which it has been normalized. Normal forms,

when considered in isolation from other factors, do not guarantee a good database design.

It is generally not sufficient to check separately that each relation schema in the database

is, say, in BCNF or 3NF. Rather, the process of normalization through decomposition must

also confirm the existence of additional properties that the relational schemas, taken

together, should possess. These would include two properties:

 The nonadditive join or lossless join property, which guarantees that the spurious

tuple generation problem does not occur with respect to the relation schemas

created after decomposition.

 The dependency preservation property, which ensures that each functional


dependency is represented in some individual relation resulting after
decomposition.

In fact Normalization is carried out in practice so that the resulting designs are of high

quality and meet the desirable properties stated previously. The practical utility of these

normal forms becomes questionable when the constraints on which they are based are rare,

and hard to understand or to detect by the database designers and users who must discover

these constraints. Thus, database design as practiced in industry today pays particular

attention to normalization only up to 3NF, BCNF, or at most 4NF. Another point worth

noting is that the database designers need not normalize to the highest possible normal

form. Relations may be left in a lower normalization status, such as 2NF.

92
7.4 FIRST NORMAL FORM (1NF)

First normal form (1NF) is now considered to be part of the formal definition of a relation

in the basic (flat) relational model; historically, it was defined to disallow multivalued

attributes, composite attributes, and their combinations. It states that the domain of an

attribute must include only atomic (simple, indivisible) values and that the value of any

attribute in a tuple must be a single value from the domain of that attribute. Hence, 1NF

disallows having a set of values, a tuple of values, or a combination of both as an attribute

value for a single tuple. In other words, 1NF disallows relations within relations or

relations as attribute values within tuples. The only attribute values permitted by 1NF are

single atomic (or indivisible) values.

Consider the DEPARTMENT relation schema shown in Figure below, whose primary key

is Dnumber, and suppose that we extend it by including the Dlocations attribute as shown

in Figure. We assume that each department can have a number of locations.. As we can see,

this is not in 1NF because Dlocations is not an atomic attribute.

DEPARTMENT
Dnumber Dname Dmgr_SSN DLocation

There are three main techniques to achieve first normal form for such a relation:

 Remove the attribute Dlocations that violates 1NF and place it in a separate relation

DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT.

The primary key of this relation is the combination {Dnumber, Dlocation},

 Expand the key so that there will be a separate tuple in the original DEPARTMENT

relation for each location of a DEPARTMENT,

93
 If a maximum number of values is known for the attribute—for example, if it is

known that at most three locations can exist for a department—replace the

Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and

Dlocation3. This solution has the disadvantage of introducing NULL values if most

departments have fewer than three locations.

Of the three solutions above, the first is generally considered best because it does not suffer

from redundancy and it is completely general, having no limit placed on a maximum

number of values.

Example:

The First normal form simply says that each cell of a table should contain exactly one value.
Let us take an example. Suppose we are storing the courses that a particular instructor takes,
we can store it like this:

Instructor's name Course code


Prof. George (CS101, CS154)
Prof. Atkins (CS152)

Here, the issue is that in the first row, we are storing 2 courses against Prof. George. This
isn’t the optimal way since that’s now how SQL databases are designed to be used. A better
method would be to store the courses separately. For instance:

Instructor's name Course code


Prof. George CS101
Prof. George CS154
Prof. Atkins CS152

This way, if we want to edit some information related to CS101, we do not have to touch

the data corresponding to CS154. Also, observe that each row stores unique information.

There is no repetition. This is the First Normal Form.

94
7.5 Second Normal Form (2NF)
Second normal form (2NF) is based on the concept of full functional dependency. A

functional dependency X → Y is a full functional dependency if removal of any attribute

A from X means that the dependency does not hold any more; that is, for any attribute A ε

X, (X – {A}) does not functionally determine Y. A functional dependency X→Y is a partial

dependency if some attribute A ε X can be removed from X and the dependency still holds;

that is, for some A ε X, (X – {A}) → Y.

Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully

functionally dependent on the primary key of R. The test for 2NF involves testing for

functional dependencies whose left-hand side attributes are part of the primary key. If the

primary key contains a single attribute, the test need not be applied at all.

General Definition of 2NF

A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is

not partially dependent on any key of R.

If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a

number of 2NF relations in which nonprime attributes are associated only with the part of

the primary key on which they are fully functionally dependent. The following example

shows how we can decompose a relation not in 2NF into three relations which are now in

2NF. For a table to be in second normal form, the following 2 conditions are to be met:

1. The table should be in the first normal form.

2. The primary key of the table should compose of exactly 1 column.

The first point is obviously straightforward since we just studied 1NF. Let us understand

the first point - 1 column primary key. Well, a primary key is a set of columns that uniquely

identifies a row. Basically, no 2 rows have the same primary keys.

95
(a) Relation not in 2NF

SSN Pnumber Hours Ename Pname PLocation

FD1

FD2

FD3

(b) Relation decomposed in 2NF

SSN Pnumber Hours SSN Ename Pnumber Pname PLocation

FD1 FD2 FD3

Example:

Course Course venue Instructor Instructor’s phone


code Name number
CS101 Lecture Hall Prof. George +91 6514821924
20
CS152 Lecture Hall Prof. Atkins +91 6519272918
21
CS154 CS Auditorium Prof. George +91 6514821924

Here, in this table, the course code is unique. So, that becomes our primary key. Let us take
another example of storing student enrollment in various courses. Each student may enroll
in multiple courses. Similarly, each course may have multiple enrollments. A sample table
may look like this (student name and course code):

Student name Course code


Rahul CS152

96
Rajat CS101
Rahul CS154
Raman CS101

Here, the first column is the student name and the second column is the course taken by the
student. Clearly, the student name column isn’t unique as we can see that there are 2 entries
corresponding to the name ‘Rahul’ in row 1 and row 3. Similarly, the course code column
is not unique as we can see that there are 2 entries corresponding to course code CS101 in
row 2 and row 4. However, the tuple (student name, course code) is unique since a student
cannot enroll in the same course more than once. So, these 2 columns when combined form
the primary key for the database.

As per the second normal form definition, our enrollment table above isn’t in the second
normal form. To achieve the same (1NF to 2NF), we can rather break it into 2 tables:

Students:

Student name Enrolment number


Rahul 1
Rajat 2
Raman 3

Here the second column is unique and it indicates the enrollment number for the student.
Clearly, the enrollment number is unique. Now, we can attach each of these enrollment
numbers with course codes.

Courses:

Course code Enrolment number


CS101 2
CS101 3
CS152 1
CS154 1

97
These 2 tables together provide us with the exact same information as our original table.

7.6 Third Normal Form (3NF)

Third normal form (3NF) is based on the concept of transitive dependency. A functional

dependency X→Y in a relation schema R is a transitive dependency if there exists a set of

attributes Z in R that is neither a candidate key nor a subset of any key of R, and both X→Z

and Z→Y hold.

Definition. According to Codd’s original definition, a relation schema R is in 3NF if it

satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key.

The relation schema EMP_DEPT in Figure (a) below is in 2NF, since no partial

dependencies

on a key exist. However, EMP_DEPT is not in 3NF because of the transitive dependency

of Dmgr_Adhar_No. (and also Dname) on Adhar_No. via Dnumber. We can normalize

EMP_DEPT by decomposing it into the two 3NF relation schemas shown in Figure (b).

Intuitively, we see that the two relations represent independent entity facts about employees

and departments :

(a)

Ename Adhar_no BDate Address Dnumber Dname Dmgr_Adhar_no

FD1

FD2

(b)

Ename Adhar_no Bdate Address Dnumber Dnumber Dname Dmgr_Adhar_no

FD1 FD2

98
General Definition

Definition. A relation schema R is in third normal form (3NF) if, whenever a nontrivial

functional dependency X→A holds in R, either (a) X is a superkey of R, or (b) A is a prime

attribute of R. A relation schema R violates the general definition of 3NF if a functional

dependency X → A holds in R that does not meet either condition—meaning that it violates

both conditions (a) and (b) of 3NF. This can occur due to two types of problematic

functional dependencies:

■ A nonprime attribute determines another nonprime attribute. Here we typically have a

transitive dependency that violates 3NF.

■ A proper subset of a key of R functionally determines a nonprime attribute.

Here we have a partial dependency that violates 3NF (and also 2NF). Therefore, we can

state a general alternative definition of 3NF as follows:

Alternative Definition. A relation schema R is in 3NF if every nonprime attribute of R

meets both of the following conditions:

■ It is fully functionally dependent on every key of R.

■ It is nontransitively dependent on every key of R.

Example:

 Before we delve into details of third normal form, let us again understand the
concept of a functional dependency on a table. Column A is said to be functionally
dependent on column B if changing the value of A may require a change in the
value of B. As an example, consider the following table:

Course Course venue Instructor's Department


code name
MA214 Lecture Hall 18 Prof. George CS Department

ME112 Auditorium Prof. John Electronics


building Department

99
Here, the department column is dependent on the professor name column. This is because
if in a particular row, we change the name of the professor, we will also have to change the
department value. As an example, suppose MA214 is now taken by Prof. Ronald who
happens to be from the Mathematics department, the table will look like this:

Course Course venue Instructor's Department


code name
MA214 Lecture Hall 18 Prof. Ronald Mathematics
Department
ME112 Auditorium Prof. John Electronics Department
building

Here, when we changed the name of the professor, we also had to change the department
column. This is not desirable since someone who is updating the database may remember
to change the name of the professor, but may forget updating the department value. This
can cause inconsistency in the database.

Third normal form avoids this by breaking this into separate tables:

Course code Course venue Instructor's ID

MA214 Lecture Hall 18 1

ME112 Auditorium building, 2

Here, the third column is the ID of the professor who’s taking the course.

Instructor's ID Instructor's Name Department

1 Prof. Ronald Mathematics Department

2 Prof. John Electronics Department

Here, in the above table, we store the details of the professor against his/her ID. This way,
whenever we want to reference the professor somewhere, we don’t have to put the other
details of the professor in that table again. We can simply use the ID.

Therefore, in the third normal form, the following conditions are required:

100
 The table should be in the second normal form.
 There should not be any functional dependency.

7.7 Boyce-Codd Normal Form (BCNF)

Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was

found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF; however, a

relation in 3NF is not necessarily in BCNF.

Definition. A relation schema R is in BCNF if whenever a nontrivial functional

dependency X→A holds in R, then X is a superkey of R. The formal definition of BCNF

differs from the definition of 3NF in that condition (b) of 3NF, which allows A to be prime,

is absent from BCNF. That makes BCNF a stronger normal form compared to 3NF. In

practice, most relation schemas that are in 3NF are also in BCNF. Only if X→A holds in a

relation schema R with X not being a superkey and A being a prime attribute will R be in

3NF but not in BCNF. Consider an example which shows a relation TEACH with the

following dependencies:

FD1: {Student, Course} → Teacher

FD2: Teacher→ Course

Student Course Teacher


A B C
Neeraj DBMS H K Lal
FD1
Saroj Operating System P K Sharma
FD2
Saroj DBMS Radhe Shyam

Saroj Autometa M K Gupta

Shikha DBMS H K Lal

Shikha Operating System RajNath

The relation is in 3NF but not in BCNF

101
Boyce-Codd Normal form is a stronger generalization of third normal form. A table is in
Boyce-Codd Normal form if and only if at least one of the following conditions are met for
each functional dependency A → B:

 A is a superkey
 It is a trivial functional dependency.

Let us first understand what a superkey means. To understand BCNF in DBMS, consider
the following BCNF example table:

Course Course venue Instructor Name Instructor’s phone


code number
CS101 Lecture Hall 20 Prof. George +91 6514821924

CS152 Lecture Hall 21 Prof. Atkins +91 6519272918

CS154 CS Auditorium Prof. George +91 6514821924

Here, the first column (course code) is unique across various rows. So, it is a superkey.
Consider the combination of columns (course code, professor name). It is also unique
across various rows. So, it is also a superkey. A superkey is basically a set of columns such
that the value of that set of columns is unique across various rows. That is, no 2 rows have
the same set of values for those columns. Some of the superkeys for the table above are:

 Course code
 Course code, professor name
 Course code, professor mobile number

A superkey whose size (number of columns) is the smallest is called as a candidate key.
For instance, the first superkey above has just 1 column. The second one and the last one
have 2 columns. So, the first superkey (Course code) is a candidate key.

Boyce-Codd Normal Form says that if there is a functional dependency A → B, then either
A is a superkey or it is a trivial functional dependency. A trivial functional dependency
means that all columns of B are contained in the columns of A. For instance, (course code,
professor name) → (course code) is a trivial functional dependency because when we know

102
the value of course code and professor name, we do know the value of course code and so,
the dependency becomes trivial.

Let us understand what’s going on:

A is a superkey: this means that only and only on a superkey column should it be the case
that there is a dependency of other columns. Basically, if a set of columns (B) can be
determined knowing some other set of columns (A), then A should be a superkey. Superkey
basically determines each row uniquely.

It is a trivial functional dependency: this means that there should be no non-trivial


dependency. For instance, we saw how the professor’s department was dependent on the
professor’s name. This may create integrity issues since someone may edit the professor’s
name without changing the department. This may lead to an inconsistent database. There
are also 2 other normal forms:

7.8 CHECK YOUR PROGRESS


1. If F is a set of functional dependencies, then the closure of F is denoted by?
a) F*
b) Fo
c) F+
d) F
2. In the_________ normal form, a composite attribute is converted to individual
attributes.
A. First
B. Second
C. Third
D. Fourth
3. Table in 2NF eliminated _______________.
4. Functional dependencies are the types of constraints that are based on________.
5. ____________ is the bottom up approach to database design that design by
examining the relationship between attributes.

103
7.9 SUMMARY
Normalization of data can be considered a process of analyzing the given relation schemas
based on their FDs and primary keys to achieve the desirable properties of (1) minimizing
redundancy and (2) minimizing the insertion, deletion, and update anomalies. It can be
considered as a “filtering” or “purification” process to make the design have successively
better quality. Unsatisfactory relation schemas that do not meet certain conditions—the
normal form tests—are decomposed into smaller relation schemas that meet the tests and
hence possess the
desirable properties. Thus, the normalization procedure provides database designers with
the following:
■ A formal framework for analyzing relation schemas based on their keys and on the
functional dependencies among their attributes.
■ A series of normal form tests that can be carried out on individual relation schemas so
that the relational database can be normalized to any desired degree.
Database Normalization is a technique of organizing the data in the database.
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion
Anomalies. It is a multi-step process that puts data into tabular form, removing duplicated
data from the relation tables. Normalization is used for mainly two purposes,
 Eliminating redundant(useless) data.
 Ensuring data dependencies make sense i.e data is logically stored.

7.10 KEYWORDS
 SUPERKEY: A superkey is a set of attributes within a table whose values can be
used to uniquely identify a tuple. A candidate key is a minimal set of attributes
necessary to identify a tuple; this is also called a minimal superkey.
 ANOMALY: Anomalies are problems that can occur in poorly planned, un-
normalised databases where all the data is stored in one table (a flat-file database).
 CANDIDATE KEY: Primary Key is a unique and non-null key which identify a
record uniquely in table. A table can have only one primary key. Candidate key is
also a unique key to identify a record uniquely in a table but a table can have
multiple candidate keys

104
 4NF: Fourth normal form (4NF): Fourth normal form (4NF) is a level of database
normalization where there are no non-trivial multivalued dependencies other than a
candidate key. It builds on the first three normal forms (1NF, 2NF and 3NF) and
the Boyce-Codd Normal Form (BCNF).
 5NF: Fifth normal form (5NF), also known as project-join normal form (PJ/NF), is
a level of database normalization designed to reduce redundancy in relational
databases recording multi-valued facts by isolating semantically related multiple
relationships.

7.11 SELF-ASSESSMENT TEST


1. Explain why normalization is needed?
2. What are anomalies in a database? How we handle them?
3. Discuss 3NF in detail.
4. Which forms has a relation that possesses data about an individual entity? Explain
5. Which forms are based on the concept of functional dependency?

7.12 ANSWERS TO CHECK YOUR PROGRESS


1. C
2. A
3. All hidden dependencies
4. Key
5. Normalization

7.13 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://www.tutorialspoint.com/dbms/database_normalization.htm
 https://www.guru99.com/database-normalization.html
 https://www.studytonight.com/dbms/database-normalization.php
 https://www.javatpoint.com/dbms-normalization

105
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 8 VETTER:

SQL

STRUCTURE

8.0 Learning Objective

8.1 Introduction

8.2 Definition of SQL

8.3 Characteristics of SQL

8.4 SQL Data Types

8.5 SQL Literals

8.6 SQL Constraints

8.7 Check Your Progress

8.8 Summary

8.9 Keywords

8.10 Self-Assessment Test

8.11 Answers to check your progress

8.12 References / Suggested Readings

8.0 LEARNING OBJECTIVE

 The objective of this chapter is to make the reader understand the most popular and
widely used query language SQL. This chapter presents the main features of the

106
SQL standard for commercial relational DBMSs. The main characteristics of SQL,
SQL data types and SQL literals.

8.1 INTRODUCTION
The SQL language may be considered one of the major reasons for the commercial success
of relational databases. Because it became a standard for relational databases, users were
less concerned about migrating their database applications from other types of database
systems—for example, network or hierarchical systems—to relational systems. This is
because even if the users became dissatisfied with the particular relational DBMS product
they were using, converting to another relational DBMS product was not expected to be
too expensive and time-consuming because both systems followed the same language
standards. However, the relational algebra operations are considered to be too technical for
most commercial DBMS users because a query in relational algebra is written as a sequence
of operations that, when executed, produces the required result. Hence, the user must
specify how—that is, in what order—to execute the query operations. On the other hand,
the SQL language provides a higher-level declarative language interface, so the user only
specifies what the result is to be, leaving the actual optimization and decisions on how to
execute the query to the DBMS. Although SQL includes some features from relational
algebra, it is based to a greater extent on the tuple relational calculus.
The name SQL is presently expanded as Structured Query Language. Originally,
SQL was called SEQUEL (Structured English QUEry Language) and was designed and
implemented at IBM Research as the interface for an experimental relational database
system called SYSTEM R. SQL is now the standard language for commercial relational
DBMSs. A joint effort by the American National Standards Institute (ANSI) and the
International Standards Organization (ISO) has led to a standard version of SQL (ANSI
1986), called SQL-86 or SQL1. A revised and much expanded standard called SQL-92
(also referred to as SQL2) was subsequently developed. The next standard that is well-
recognized is SQL:1999, which started out as SQL3. Two later updates to the standard are
SQL:2003 and SQL:2006, which added XML features among other updates to the
language. Another update in 2008 incorporated more object database features in SQL. .SQL
is a comprehensive database language: It has statements for data definitions, queries, and
updates. Hence, it is both a DDL and a DML. In addition, it has facilities for defining views
on the database, for specifying security and authorization, for defining integrity constraints,

107
and for specifying transaction controls. It also has rules for embedding SQL statements into
a general-purpose programming language such as Java, COBOL, or C/C++.

8.2 DEFINITION
SQL uses the terms table, row, and column for the formal relational model terms relation,
tuple, and attribute, respectively. Unlike most programming languages, SQL is unique in
that it is not procedural but declarative in nature. This means that when using this language
one states what data is desired and not how to get that data. A component within the
database server known as the optimizer will automatically determine how to get the data
most efficiently. Therefore the user may concentrate solely on what data is desired and then
allow the database to automatically select the optimum method by which to retrieve that
data. The SQL language has several aspects to it:
 The Data Definition Language (DDL): This subset of SQL supports the creation,
deletion, and modifcation of definitions for tables and views. Integrity constraints
can be defined on tables, either when the table is created or later. The DDL also
provides commands for specifying access rights or privileges to tables and views.
Although the standard does not discuss indexes, commercial implementations also
provide commands for creating and deleting indexes.
 The Data Manipulation Language (DML): This subset of SQL allows users to
pose queries and to insert, delete, and modify rows.
 Embedded and dynamic SQL: Embedded SQL features allow SQL code to be
called from a host language such as C or COBOL. Dynamic SQL features allow a
query to be constructed (and executed) at run-time.
 Triggers: The new SQL:1999 standard includes support for triggers, which are
actions executed by the DBMS whenever changes to the database meet conditions
specified in the trigger.
 Security: SQL provides mechanisms to control users' access to data objects such
as tables and views.
 Transaction management: Various commands allow a user to explicitly control
aspects of how a transaction is to be executed.

108
 Client-server execution and remote database access: These commands control
how a client application program can connect to an SQL database server, or access
data from a database over a network.

8.3 CHARACTERISTICS OF SQL


SQL is both an easy-to-understand language and a comprehensive tool for managing data.
Here are some of the major features of SQL and the market forces that have made it
successful:
a) Vendor Independence
A SQL-based database and the programs that use it can be moved from one DBMS
to another vendor's DBMS with minimal conversion effort and little retraining of
personnel.
b) SQL Standards
In 1986, the American National Standards Institute (ANSI) and the International
Standards Organization (ISO) published the first official standard for SQL which
was expanded in 1989, 1992 and 1999. The evolving standards serve as an official
stamp of approval for SQL and have speeded its market acceptance.
c) Portability across Computer Systems
SQL databases run on various computer systems, ranging from mainframes to
stand-alone computers. SQL-based applications that begin on single-user or
departmental server systems can be moved to larger server systems as they grow.
d) Relational Foundation
We already know that SQL is a language for relational databases. The relational
database model and row/column structure make SQL simple and easy to
understand. he relational model also has a strong theoretical foundation that has
guided the evolution and implementation of relational databases.
e) Programmatic Database Access
SQL is also a database language used by programmers to write applications that
access a database. The same SQL statements are used for both interactive and
programmatic access, so the database access parts of a program can be tested first
with interactive SQL and then embedded into the program.

109
8.4 SQL DATA TYPES

The basic data types available for attributes include numeric, character string, bit
string, Boolean, date, and time.
■ Numeric data types include integer numbers of various sizes (INTEGER or INT, and
SMALLINT) and floating-point (real) numbers of various precision (FLOAT or REAL,
and DOUBLE PRECISION). Formatted numbers can be declared by using
DECIMAL(i,j)—or DEC(i,j) or NUMERIC(i,j)—where i, the precision, is the total number
of decimal digits and j, the scale, is the number of digits after the decimal point. The default
for scale is zero, and the default for precision is implementation-defined.
■ Character-string data types are either fixed length—CHAR(n) or CHARACTER(n),
where n is the number of characters—or varying length— VARCHAR(n) or CHAR
VARYING(n) or CHARACTER VARYING(n), where n is the maximum number of
characters. When specifying a literal string value, it is placed between single quotation
marks (apostrophes), and it is case sensitive (a distinction is made between uppercase and
lowercase). For fixedlength strings, a shorter string is padded with blank characters to the
right. For example, if the value ‘Sudha’ is for an attribute of type CHAR(10), it is padded
with five blank characters to become ‘Sudha ’ if needed. Padded blanks are generally
ignored when strings are compared
■ Bit-string data types are either of fixed length n—BIT(n)—or varying length—BIT
VARYING(n), where n is the maximum number of bits. The default for n, the length of a
character string or bit string, is 1. Literal bit strings are placed between single quotes but
preceded by a B to distinguish them from character strings; for example, B‘10101’.5
Another variable-length bitstring data type called BINARY LARGE OBJECT or BLOB is
also available
to specify columns that have large binary values, such as images. As for CLOB, the
maximum length of a BLOB can be specified in kilobits (K), megabits (M), or gigabits (G).
For example, BLOB(30G) specifies a maximum length of 30 gigabits.
■ A Boolean data type has the traditional values of TRUE or FALSE. In SQL, because of
the presence of NULL values, a three-valued logic is used, so a third possible value for a
Boolean data type is UNKNOWN.

110
■ The DATE data type has ten positions, and its components are YEAR, MONTH, and
DAY in the form YYYY-MM-DD. The TIME data type has at least eight positions, with
the components HOUR, MINUTE, and SECOND in the form HH:MM:SS. Only valid dates
and times should be allowed by the SQL implementation. This implies that months should
be between 1 and 12 and dates must be between 1 and 31; furthermore, a date should be a
valid date for the corresponding month. The < (less than) comparison can be used with
dates or times—an earlier date is considered to be smaller than a later date, and similarly
with time.
Some additional data types are discussed below. The list of types discussed here is not
exhaustive; different implementations have added more data types to SQL.
■ A timestamp data type (TIMESTAMP) includes the DATE and TIME fields, plus a
minimum of six positions for decimal fractions of seconds and an optional WITH TIME
ZONE qualifier. Literal values are represented by single quoted strings preceded by the
keyword TIMESTAMP, with a blank space between data and time; for example,
TIMESTAMP ‘2008-09-27 09:12:47.648302’.
■ Another data type related to DATE, TIME, and TIMESTAMP is the INTERVAL data
type. This specifies an interval—a relative value that can be used to increment or
decrement an absolute value of a date, time, or timestamp. Intervals are qualified to be
either YEAR/MONTH intervals or DAY/TIME intervals. The format of DATE, TIME, and
TIMESTAMP can be considered as a special type of string. Hence, they can generally be
used in string comparisons by being cast (or coerced or converted) into the equivalent
strings.

8.5 SQL LITERAL

Data Literal: A program source element that represents a data value. Data literals can be
divided into multiple groups depending upon the type of the data it is representing and how
it is representing.
1. Character String Literals are used to construct character strings, exact numbers,
approximate numbers and data and time values. The syntax rules of character string literals
are pretty simple:
 A character string literal is a sequence of characters enclosed by quote characters.
 The quote character is the single quote character "'".
111
 If "'" is part of the sequence, it needs to be doubled it as "''".
Examples of character string literals:
'Hello’
‘world!'
'Loews
'123'
2. Hex String Literals are used to construct character strings and exact numbers.
Hexadecimal literals consist of 0 to 62000 hexadecimal digits delimited by a matching pair
of single quotes, where a hexadecimal digit is a character from 0 to 9, a to f, or A to F. The
syntax rules for hex string literals are also very simple:
 A hex string literal is a sequence of hex digits enclosed by quote characters and
prefixed with "x".
 The quote character is the single quote character "'".
Examples of hex string literals:
x ‘41423534’
x ‘ 57664873’
3. Numeric Literals are used to construct exact numbers and approximate numbers. A
numeric literal is a string of 1 to 40 characters selected from the following:
• plus sign
• minus sign
• digits 0 through 9
• decimal point
Numeric literals are also referred to as numeric constants. Syntax rules of numeric literals
are:
 A numeric literal can be written in signed integer form, signed real numbers without
exponents, or real numbers with exponents.
Examples of numeric literals:
1
22.33
-345
4. Date and Time Literals are used to construct date and time values. The syntax of date
and time literals are:
 A date literal is written in the form of "DATE 'yyyy-mm-dd'".
 A time literal is written in the form of "TIMESTAMP 'yyyy-mm-dd hh:mm:ss'".
112
Examples of data and time literals:
DATE ‘2013-07-15’
TIMESTAMP ’2013-07-15 01:02:03’

8.6 SQL CONSTRAINTS


Constraints are the rules enforced on data columns on a table. These are used to limit the
type of data that can go into a table. This ensures the accuracy and reliability of the data in
the database. Constraints can either be column level or table level. Column level constraints
are applied only to one column whereas, table level constraints are applied to the entire
table.

Following are some of the most commonly used constraints available in SQL:

 NOT NULL Constraint: Ensures that a column cannot have a NULL value.

 DEFAULT Constraint: Provides a default value for a column when none is specified.

 UNIQUE Constraint: Ensures that all the values in a column are different.

 PRIMARY Key: Uniquely identifies each row/record in a database table.

 FOREIGN Key: Uniquely identifies a row/record in any another database table.

 CHECK Constraint: The CHECK constraint ensures that all values in a column satisfy
certain conditions.

 INDEX: Used to create and retrieve data from the database very quickly.

8.7 CHECK YOUR PROGRESS


1. SQL is a combination of a _________ language and a _________ language.
2. SQL stands for_________.
3. SQL was developed by ______ in the late 1970’s.
4. The ______________ maintains the standards for SQL.
5. SQL is not a complete programming language. Rather it is a _____________.

113
8.8 SUMMARY
SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to
communicate with a database. According to ANSI (American National Standards Institute),
it is the standard language for relational database management systems. SQL statements
are used to perform tasks such as update data on a database, or retrieve data from a database.
Some common relational database management systems that use SQL are: Oracle, Sybase,
Microsoft SQL Server, Access, Ingres, etc. Although most database systems use SQL, most
of them also have their own additional proprietary extensions that are usually only used on
their system. However, the standard SQL commands such as "Select", "Insert", "Update",
"Delete", "Create", and "Drop" can be used to accomplish almost everything that one needs
to do with a database. This tutorial will provide you with the instruction on the basics of
each of these commands as well as allow you to put them to practice using the SQL
Interpreter.

SQL is Structured Query Language, which is a computer language for storing,


manipulating and retrieving data stored in a relational database. SQL is the standard
language for Relational Database System. All the Relational Database Management
Systems (RDMS) like MySQL, MS Access, Oracle, Sybase, Informix, Postgres and SQL
Server use SQL as their standard database language. Also, they are using different dialects,
such as:

 MS SQL Server using T-SQL,

 Oracle using PL/SQL,

 MS Access version of SQL is called JET SQL (native format) etc.

Now it must be stated that why SQL?

SQL is widely popular because it offers the following advantages:

 Allows users to access data in the relational database management systems.

 Allows users to describe the data.

 Allows users to define the data in a database and manipulate that data.

 Allows to embed within other languages using SQL modules, libraries & pre-compilers

114
 Allows users to create and drop databases and tables.

 Allows users to create view, stored procedure, functions in a database.

 Allows users to set permissions on tables, procedures and views.

8.9 KEYWORDS
 PL (PROGRAMMING LANUAGE)- A programming language is a vocabulary
and set of grammatical rules for instructing a computer or computing device to
perform specific tasks. The term programming language usually refers to high-
level languages, such as BASIC, C, C++, COBOL, Java, FORTRAN, Ada, and
Pascal.
 CONSTRAINTS- Constraints make it possible to further restrict the domain of an
attribute. For instance, a constraint can restrict a given integer attribute to values
between 1 and 10.
 TUPLE- A data set representing a single item.
 COLUMN- A labeled element of a tuple, e.g. "Address" or "Date of birth"
 TABLE- A set of tuples sharing the same attributes; a set of columns and rows
 VIEW- Any set of tuples; a data report from the RDBMS in response to a query
 OPEN-SOURCE- MYSQL is an open-source relational database management
system (RDBMS). ... SQL is a language programmers use to create, modify and
extract data from the relational database, as well as control user access to the
database.

8.10 SELF-ASSESSMENT TEST


1. List the data types that are allowed for SQL attributes.

2. How does SQL allow implementation of the entity integrity and referential integrity

constraints described in Chapter 3? What about referential triggered actions?

3. How do the relations (tables) in SQL differ from the relations defined formally in

relation algebra? Discuss the other differences in terminology. Why does SQL

allow duplicate tuples in a table or in a query result?

4. What are Literals in SQL? Write short note.


115
5. Discuss in detail the constraints in SQL.

6. What is My- SQL?

8.11 ANSWERS TO CHECK YOUR PROGRESS


1. Data manipulation
2. Structured Query language
3. IBM
4. ANSI (American National Standards Institute)
5. Data sublanguage

8.12 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://www.tutorialspoint.com/sql/sql_tutorial.pdf
 https://www.hcoe.edu.np/uploads/attachments/r96oytechsacgzi4.pdf
 http://www.sqlcourse.com/intro.html

116
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 9 VETTER:

BASIC QUERIES IN SQL

STRUCTURE

9.0 Learning Objective

9.1 Introduction

9.2 Definition

9.3 Basic Queries in SQL

9.4 Data Definition is SQL

9.4.1 Create Table

9.4.2 Alter Table

9.4.3 Drop Table

9.4.4 Renaming a Table

9.4.5 Truncate a Table

9.5 Data Manipulation Commands

9.5.1 Select

9.5.2 Insert

9.5.3 Delete

9.5.4 Update

9.6 Data Control Commands and Views

9.7 Check Your Progress

9.8 Summary

117
9.9 Keywords

9.10 Self-Assessment Test

9.11 Answers to check your progress

9.12 References / Suggested Readings

9.0 LEARNING OBJECTIVE


 The objective of this chapter is to make the reader understand the queries of
SQL and use of some important and widely used commands in SQL, so the main
commands for creating, manipulating and updating the databases are presented
here with simple examples.

9.1 INTRODUCTION

When we are executing an SQL command for any RDBMS, the system determines the best
way to carry out your request and SQL engine figures out how to interpret the task. There
are various components included in the process. These components are Query Dispatcher,
Optimization Engines, Classic Query Engine and SQL Query Engine, etc. Classic query
engine handles all non-SQL queries, but SQL query engine won't handle logical files.
Following is a simple diagram in figure 9.1 showing SQL Architecture:

When you are executing an SQL command for any RDBMS, the system determines the
best way to carry out your request and SQL engine figures out how to interpret the task.

There are various components included in this process.

These components are −

 Query Dispatcher

 Optimization Engines

 Classic Query Engine

 SQL Query Engine, etc.

A classic query engine handles all the non-SQL queries, but a SQL query engine won't
handle logical files.

118
Figure 9.1: SQL Architecture
9.2 DEFINITION
Query Dispatcher- The function of the dispatcher is to route the query request to either
CQE or SQE, depending on the attributes of the query. All queries are processed by the
dispatcher. It cannot be bypassed.

Optimization Engines- The query optimizer determines the most efficient way to execute
a SQL statement after considering many factors related to the objects referenced and the
conditions specified in the query.

Classic Query Engine- Classic query engine handles all non-SQL queries but SQL query
engine won't handle logical files. CREATE Creates a new table, a view of a table, or other
object in database ALTER Modifies an existing database object, such as a table.

119
SQL Query Engine- SQL engine is defined as software that recognizes and interprets SQL
commands to access a relational database and interrogate data. SQL engine is also
commonly referred to as a SQL database engine or a SQL query engine.

9.3 BASIC QUERIES IN SQL

The SQL language is very rich in functionality and it is constantly expanding. It is


comprised of a series of declarative statements. One may categorize the most important
statements within the language as shown here in figure 9.2:
Data definition language (DDL) – These are statements which create database objects, such
as tables, within the database. DDL statements can also alter the definition of objects
already stored within the database or drop the objects altogether.
Data manipulation language (DML) – These are statements which query or modify data
within the database.
Transaction Control Language – These are related to DML action. They define the context,
otherwise known as the transaction, in which DML statements execute. They thus control
the manner in which DML statements modify or update information stored within the
database.

120
Figure 9.2: SQL Commands types

9.4 DATA DEFINITION IN SQL

DDL (Data Definition Language): DDL or Data Definition Language actually consists of
the SQL commands that can be used to define the database schema. It simply deals with
descriptions of the database schema and is used to create and modify the structure of
database objects in the database.

9.4.1 CREATE TABLE

The CREATE TABLE command is used to specify a new relation by giving it a name
And specifying its attributes and initial constraints. The attributes are specified first,
And each attribute is given a name, a data type to specify its domain of values, and
Any attribute constraints, such as NOT NULL. The key, entity integrity, and referential
Integrity constraints can be specified within the CREATE TABLE statement after
The attributes are declared. The syntax of the statement is as follows:
CREATE TABLE tablename ( attribute_name1 datatype, attribute_name2
datatype)

Example
CREATE TABLE EMPLOYEE
( Fname VARCHAR(15) NOT NULL, Minit CHAR, Lname VARCHAR(15) NOT NULL,
Adhar_No. CHAR(9) NOT NULL, Bdate DATE, Address VARCHAR(30), Sex CHAR,
Salary DECIMAL(10,2), Super_Adhar_No. CHAR(9), Dno INT NOT NULL, PRIMARY
KEY (Adhar_No.),
FOREIGN KEY (Super_Adhar_No.) REFERENCES EMPLOYEE(Adhar_No.),
FOREIGN KEY (Dno) REFERENCES DEPARTMENT(Dnumber) );

9.4.2 ALTER TABLE

The definition of a base table or of other named schema elements can be changed by using
the ALTER command. For base tables, the possible alter table actions include adding or
dropping a column (attribute), changing a column definition, and adding or dropping table

121
constraints. For example, to add an attribute for keeping track of jobs of employees to the
EMPLOYEE base relation in the COMPANY schema , we can use the command
ALTER TABLE COMPANY.EMPLOYEE ADD COLUMN Job VARCHAR(12);
We must still enter a value for the new attribute Job for each individual EMPLOYEE tuple.
This can be done either by specifying a default clause or by using the UPDATE command
individually on each tuple. If no default clause is specified, the new attribute will have
NULLs in all the tuples of the relation immediately after the command is executed; hence,
the NOT NULL constraint is not allowed in this case. To drop a column, we must choose
either CASCADE or RESTRICT for drop behavior. If CASCADE is chosen, all constraints
and views that reference the column are dropped automatically from the schema, along with
the column. If RESTRICT is chosen, the command is successful only if no views or
constraints (or other schema elements) reference the column.
ALTER TABLE COMPANY.EMPLOYEE DROP COLUMN Address
CASCADE;

9.4.3 DROP TABLE


The DROP command can be used to drop named schema elements, such as tables, domains,
or constraints. One can also drop a schema. For example, if a whole schema is no longer
needed, the DROP SCHEMA command can be used. There are two drop behavior options:
CASCADE and RESTRICT. For example, to remove the COMPANY database schema
and all its tables, domains, and other elements, the CASCADE option is used as follows:
DROP SCHEMA COMPANY CASCADE;
If the RESTRICT option is chosen in place of CASCADE, the schema is dropped only if it
has no elements in it; otherwise, the DROP command will not be executed. To use the
RESTRICT option, the user must first individually drop each element in the schema, then
drop the schema itself. If a base relation within a schema is no longer needed, the relation
and its definition can be deleted by using the DROP TABLE command. If the RESTRICT
option is chosen instead of CASCADE, a table is dropped only if it is not referenced in any
constraints (for example, by foreign key definitions in another relation) or views or by any
other elements. With the CASCADE option, all such constraints, views, and other elements
that reference the table being dropped are also dropped automatically from the schema,
along with the table itself.

122
Notice that the DROP TABLE command not only deletes all the records in the table if
successful, but also removes the table definition from the catalog. If it is desired to delete
only the records but to leave the table definition for future use, then the DELETE command
should be used instead of DROP TABLE. The DROP command can also be used to drop
other types of named schema elements, such as constraints or domains.

9.4.4 RENAMING A TABLE


With RENAME statement you can rename a table. Some of the relational database
management system (RDBMS) does not support this command, because this is not
standardizing statement. The syntax for this command is as follows:
RENAME TABLE {OLDTABLE} TO {NEWTABLE}

9.4.5 TRUNCATE A TABLE


The SQL TRUNCATE TABLE command is used to delete complete data from an existing
table. You can also use DROP TABLE command to delete complete table but it would
remove complete table structure form the database and you would need to re-create this
table once again if you wish you store some data.
The basic syntax of TRUNCATE TABLE is as follows:
TRUNCATE TABLE {tablename};
Difference between DELETE and TRUNCATE Statements:
DELETE Statement: This command deletes only the rows from the table based on the
condition given in the where clause or deletes all the rows from the table if no condition
is specified. But it does not free the space containing the table.
TRUNCATE statement: This command is used to delete all the rows from the table
and free the space containing the table.

9.6 DATA MANIPULATION COMMANDS


9.6.1 SELECT Command
SQL has one basic statement for retrieving information from a database: the SELECT
statement. The SELECT statement is not the same as the SELECT operation of relational
algebra.
 SELECT-FROM-WHERE

123
The basic form of the SELECT statement, sometimes called a mapping or a select-
from- where block, is formed of the three clauses SELECT, FROM, and WHERE and
has the following form:
SELECT <attribute list>
FROM <table list>
WHERE <condition>;
where
■ <attribute list> is a list of attribute names whose values are to be retrieved
by the query.
■ <table list> is a list of the relation names required to process the query.
■ <condition> is a conditional (Boolean) expression that identifies the tuples
to be retrieved by the query.
The SELECT clause of SQL specifies the attributes whose values are to be
retrieved, which are called the projection attributes, and the WHERE clause
specifies the Boolean condition that must be true for any retrieved tuple, which is known
as the selection condition.
 Unspecified WHERE Clause and Use of the Asterisk
A missing WHERE clause indicates no condition on tuple selection; hence, all
tuples of the relation specified in the FROM clause qualify and are selected for the
query result. If more than one relation is specified in the FROM clause and there is
no WHERE clause, then the CROSS PRODUCT—all possible tuple combinations—of
these relations is selected. For example:
SELECT *
FROM EMPLOYEE
WHERE Dno=5;
 Substring Pattern Matching and Arithmetic Operators
In this section we discuss several more features of SQL. The first feature allows
comparison conditions on only parts of a character string, using the LIKE
comparison
operator. This can be used for string pattern matching. Partial strings are specified
using two reserved characters: % replaces an arbitrary number of zero or more characters,
and the underscore (_) replaces a single character. For example, consider the following
query.

124
Query . Retrieve all employees whose address is in Sirsa, Haryana.
SELECT Fname, Lname
FROM EMPLOYEE
WHERE Address LIKE ‘%Sirsa, Haryana%’;
Another example:
Find all employees who were born during the 1950s.
SELECT Fname, Lname
FROM EMPLOYEE
WHERE Bdate LIKE ‘_ _ 5 _ _ _ _ _ _ _’;
Another feature allows the use of arithmetic in queries. The standard arithmetic
operators for addition (+), subtraction (–), multiplication (*), and division (/) can
be applied to numeric values or attributes with numeric domains. For example,
suppose that we want to see the effect of giving all employees who work on the
‘ProductX’ project a 10 percent raise; we can use the following query:
SELECT E.Fname, E.Lname, 1.1 * E.Salary AS Increased_sal
FROM EMPLOYEE AS E, WORKS_ON AS W, PROJECT AS P
WHERE E.Adhar_No.=W.Adhar_No. AND W.Pno=P.Pnumber AND
P.Pname=‘ProductX’;
 GROUP BY & HAVING CLAUSE
In many cases we want to apply the aggregate functions to subgroups of tuples in a relation,
where the subgroups are based on some attribute values. For example, we may want to find
the average salary of employees in each department or the number of employees who work
on each project. In these cases we need to partition the relation into nonoverlapping
subsets (or groups) of tuples. Each group (partition) will consist of the tuples that have the
same value of some attribute(s), called the grouping attribute(s). We can then apply the
function to each such group independently to produce summary information about each
group. SQL has a GROUP BY clause for this purpose. The GROUP BY clause specifies
the grouping attributes, which should also appear in the SELECT clause, so that the value
resulting from applying each aggregate function to a group of tuples appears along with the
value of the grouping attribute(s).
For each department, retrieve the department number, the number of employees in the
department, and their average salary.
SELECT Dno, COUNT (*), AVG (Salary)
FROM EMPLOYEE
125
GROUP BY Dno;
Sometimes we want to retrieve the values of these functions only for groups that satisfy
certain conditions. For example, suppose that we want to modify above Query so that only
projects with more than two employees appear in the result. SQL provides a HAVING
clause, which can appear in conjunction with a GROUP BY clause, for this purpose.
HAVING provides a condition on the summary information regarding the group of tuples
associated with each value of the grouping attributes. Only the groups that satisfy the
condition are retrieved in the result of the query. This is illustrated by Query below:
For each project on which more than two employees work, retrieve the project
number, the project name, and the number of employees who work on the project.
SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname
HAVING COUNT (*) > 2;
A retrieval query in SQL can consist of up to six clauses, but only the first two— SELECT
and FROM—are mandatory. The query can span several lines, and is ended by a semicolon.
Query terms are separated by spaces, and parentheses can be used to group relevant parts
of a query in the standard way. The clauses are specified in the following order, with the
clauses between square brackets [ ... ] being optional:
SELECT <attribute and function list>
FROM <table list>
[ WHERE <condition> ]
[ GROUP BY <grouping attribute(s)> ]
[ HAVING <group condition> ]
[ ORDER BY <attribute list> ];
The SELECT clause lists the attributes or functions to be retrieved. The FROM clause
specifies all relations (tables) needed in the query, including joined relations, but not those
in nested queries. The WHERE clause specifies the conditions for selecting the tuples from
these relations, including join conditions if needed. GROUP BY specifies grouping
attributes, whereas HAVING specifies a condition on the groups being selected rather than
on the individual tuples. The built-in aggregate functions COUNT, SUM, MIN, MAX, and
AVG are used in conjunction with grouping, but they can also be applied to all the selected

126
tuples in a query without a GROUP BY clause. Finally, ORDER BY specifies an order for
displaying the result of a query.

9.5.2 INSERT Command


In its simplest form, INSERT is used to add a single tuple to a relation. We must specify
the relation name and a list of values for the tuple. The values should be listed in the same
order in which the corresponding attributes were specified in the CREATE TABLE
command. For example, to add a new tuple to the EMPLOYEE relation, we can use :
INSERT INTO EMPLOYEE
VALUES ( ‘Radha’, ‘Rani’, ‘Bansal’, ‘653298653’, ‘1962-12-30’, ’98Kirti Nagar’,
‘Sirsa’,’Haryana’, ‘M’, 37000, ‘653298653’, 4 );
A second form of the INSERT statement allows the user to specify explicit attribute names
that correspond to the values provided in the INSERT command. This is useful if a relation
has many attributes but only a few of those attributes are assigned values in the new tuple.
However, the values must include all attributes with NOT NULL specification and no
default value. Attributes with NULL allowed or DEFAULT values are the ones that can be
left out. For example, to enter a tuple for a new EMPLOYEE for whom we know only the
Fname, Lname, Dno, and Adhar_No. attributes, we can use:
INSERT INTO EMPLOYEE (Fname, Lname, Dno, Adhar_No.)
VALUES (‘Radha’, ‘Bansal’, 4, ‘653298653’);

9.5.3 DELETE Command


The DELETE command removes tuples from a relation. It includes a WHERE clause,
similar to that used in an SQL query, to select the tuples to be deleted. Tuples are explicitly
deleted from only one table at a time. However, the deletion may propagate to tuples in
other relations if referential triggered actions are specified in the referential integrity
constraints of the DDL Depending on the number of tuples selected by the condition in the
WHERE clause, zero, one, or several tuples can be deleted by a single DELETE command.
A missing WHERE clause specifies that all tuples in the relation are to be deleted; however,
the table remains in the database as an empty table. We must use the DROP TABLE
command to remove the table definition. The DELETE commands below, if applied
independently to the database , will delete zero, one, four, and all tuples, respectively, from
the EMPLOYEE relation:

127
DELETE FROM EMPLOYEE
WHERE Lname=‘Bansal’;
DELETE FROM EMPLOYEE
WHERE Adhar_No.=‘123456789’;
DELETE FROM EMPLOYEE
WHERE Dno=5;
DELETE FROM EMPLOYEE;

9.5.4 UPDATE Command


The UPDATE command is used to modify attribute values of one or more selected tuples.
As in the DELETE command, a WHERE clause in the UPDATE command selects the
tuples to be modified from a single relation. However, updating a primary key value may
propagate to the foreign key values of tuples in other relations if such a referential triggered
action is specified in the referential integrity constraints of the DDL .An additional SET
clause in the UPDATE command specifies the attributes to be modified and their new
values. For example, to change the location and controlling department number of project
number 10 to ‘Bellaire’ and 5, respectively, we use:
UPDATE PROJECT
SET Plocation = ‘Bombay’, Dnum = 5
WHERE Pnumber=10;

9.6 DATA CONTROL COMMANDS AND VIEWS


Transaction control statements manage changes made by DML statements. The transaction
control statements are:
COMMIT
ROLLBACK
SAVEPOINT
SET TRANSACTION
All transaction control statements, except certain forms of the COMMIT and ROLLBACK
commands, are supported in PL/SQL.
COMMIT
Use the COMMIT statement to end your current transaction and make permanent all
changes performed in the transaction. A transaction is a sequence of SQL statements

128
that Oracle Database treats as a single unit. This statement also erases all savepoints
in the transaction and releases transaction locks. Oracle Database issues an
implicit COMMIT before and after any data definition language (DDL) statement.
You can also use this statement to
 Commit an in-doubt distributed transaction manually
 Terminate a read-only transaction begun by a SET TRANSACTION statement
Committing an Insert: Example
This statement inserts a row into the hr.regions table and commits this
change:
INSERT INTO regions VALUES (5, 'Antarctica');
COMMIT WORK;
ROLLBACK
Use the ROLLBACK statement to undo work done in the current transaction or to manually
undo the work done by an in-doubt distributed transaction. To roll back your current
transaction, no privileges are necessary. To manually roll back an in-doubt distributed
transaction that you originally committed, you must have the FORCE TRANSACTION
system privilege. To manually roll back an in-doubt distributed transaction originally
committed by another user, you must have the FORCE ANY TRANSACTION system
privilege. The following statement rolls back your entire current transaction:
ROLLBACK;
SET TRANSACTION SAVEPOINT
Specify the savepoint to which you want to roll back the current transaction. If you omit
this clause, then the ROLLBACK statement rolls back the entire transaction. Using
ROLLBACK without the TO SAVEPOINT clause performs the following operations: Ends
the transaction
 Undoes all changes in the current transaction
 Erases all savepoints in the transaction
 Releases any transaction locks
Using ROLLBACK with the TO SAVEPOINT clause performs the following operations:
 Rolls back just the portion of the transaction after the savepoint
 Erases all savepoints created after that savepoint. The named savepoint is retained,
so you can roll back to the same savepoint multiple times. Prior savepoints are also
retained.

129
 Releases all table and row locks acquired since the savepoint. Other transactions
that have requested access to rows locked after the savepoint must continue to wait
until the transaction is committed or rolled back. Other transactions that have not
already requested the rows can request and access the rows immediately. The
following statement rolls back your current transaction to savepoint banda_sal:
ROLLBACK TO SAVEPOINT banda_sal;

VIEW-
A view in SQL terminology is a single table that is derived from other tables. These other
tables can be base tables or previously defined views. A view does not necessarily exist in
physical form; it is considered to be a virtual table, in contrast to base tables, whose tuples
are always physically stored in the database. This limits the possible update operations that
can be applied to views, but it does not provide any limitations on querying a view. We can
think of a view as a way of specifying a table that we need to reference frequently, even
though it may not exist physically.
Specification of Views in SQL
In SQL, the command to specify a view is CREATE VIEW. The view is given a (virtual)
table name (or view name), a list of attribute names, and a query to specify the contents of
the view. If none of the view attributes results from applying functions or arithmetic
operations, we do not have to specify new attribute names for the view, since they would
be the same as the names of the attributes of the defining tables in the default case.
CREATE VIEW WORKS_ON1
AS SELECT Fname, Lname, Pname, Hours
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE Adhar_No.=E.Adhar_No. AND Pno=Pnumber;
We can now specify SQL queries on a view—or virtual table—in the same way we specify
queries involving base tables. For example, to retrieve the last name and first name of all
employees who work on the ‘ProductX’ project, we can utilize the WORKS_ON1 view
and specify the query as in QV1:
QV1: SELECT Fname, Lname
FROM WORKS_ON
WHERE Pname=‘ProductX’;
A view is supposed to be always up-to-date; if we modify the tuples in the base tables on
which the view is defined, the view must automatically reflect these changes. Hence, the
130
view is not realized or materialized at the time of view definition but rather at the time when
we specify a query on the view. It is the responsibility of the DBMS and not the user to
make sure that the view is kept up-to-date.We will discuss various ways the DBMS can
apply to keep a view up-to-date in the next subsection. If we do not need a view any more,
we can use the DROP VIEW command to dispose of it. For example, to get rid of the view
V1, we can use the SQL statement in V1A:
V1A: DROP VIEW WORKS_ON;

9.7 CHECK YOUR PROGRESS


1. The SQL keyword __________ is used to specify the columns to be obtained.
2. The SQL keyword __________ is used to specify the table(s) that contain the data
to be retrieved.
3. To remove duplicate rows from the result of a query, specify the SQL qualifies
__________.
4. To obtain all columns, use a(n) __________ instead of listing all the column names.
5. The SQL _________ clause contains the condition that specifies which rows are to
be selected

9.8 SUMMARY
Query languages are used to make queries in a database, and Microsoft Structured Query
Language (SQL) is the standard. Under the SQL query umbrella, there are several
extensions of the language, including MySQL, Oracle SQL and NuoDB. Query languages
for other types of databases, such as NoSQL databases and graph databases, include
Cassandra Query Language (CQL), Neo4j's Cypher, Data Mining Extensions (DMX) and
XQuery. The original version of SQL was implemented in the experimental DBMS called
SYSTEM R, which was developed at IBM Research. SQL is designed to be a
comprehensive language that includes statements for data definition, queries, updates,
constraint specification, and view definition.We discussed the following features of SQL
in this chapter: the data definition commands for creating tables, commands for constraint
specification, simple retrieval queries, and database update commands.

131
9.9 KEYWORDS
 SQL Commands: DDL,DML, DCL commands

 DDL Commands: CREATE, ALTER, DROP, RENAME,TRUNCATE

 DML Commands: INSERT, DELETE, UPDATE

 DCL Commands: ROLLBACK, COMMIT, SAVEPOINT

9.10 SELF-ASSESSMENT TEST


1. Write the following queries with example
v) SELECT
vi) DELETE
vii) UPDATE
viii) ALTER
2. Retrieve the birth date and address of the employee(s) whose name is ‘Ram Kumar’.
3. Retrieve the name and address of all employees who work for the ‘Research’
Department.
4. SELECT all EMPLOYEE Adhar_No.s and all combinations of EMPLOYEE
Adhar_No. and DEPARTMENT Dname in the database.
5. For every project located in ‘Delhi’, list the project number, the controlling
department number, and the department manager’s last name, address, and birth
date.

9.11 ANSWERS TO CHECK YOUR PROGRESS


1. SELECT
2. FROM
3. DISTINCT
4. Asterisk(*)
5. WHERE

132
9.12 REFERENCES / SUGGESTED READINGS

 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://www.tutorialspoint.com/sql/sql-overview.htm
 https://study.com/academy/lesson/what-is-query-in-sql.html
 https://searchsqlserver.techtarget.com/definition/query

133
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 10 VETTER:

PROCEDURAL LANGUAGE FOR SQL

STRUCTURE

10.0 Learning Objective

10.1 Introduction

10.2 Definition

10.3 Architecture of PL/SQL

10.4 Features and Advantages of PL/SQL

10.5 Blocks in PL/SQL

10.5.1 Anonymous Blocks

10.5.2 Named Blocks

10.6 Check Your Progress

10.7 Summary

10.8 Keywords

10.9 Self-Assessment Test

10.10 Answers to check your progress

10.11 References / Suggested Readings

134
10.0 LEARNING OBJECTIVE
 The objective of this chapter is to make the reader understand the procedural
language for SQL. The architecture of PL/SQL in detailed will be studies and
to get familiar with the loops that can be used in the procedural language for
SQL.

10.1 INTRODUCTION

PL/SQL stands for Procedural Language/SQL. PL/SQL extends SQL by adding constructs
found in procedural languages, resulting in a structural language that is more powerful than
SQL.PL/SQL is not case sensitive. ‘C’ style comments (/* ……… */) may be used in
PL/SQL programs whenever required.
All PL/SQL programs are made up of blocks, each block performs a logical action in the
program. A PL/SQL block consists of three parts
1. Declaration section
2. Executable section
3. Exception handling section
Only the executable section is required. The other sections are optional.

A PL/SQL block has the following structure:


DECLARE
/* Declaration section */
BEGIN
/* Executable section */
EXCEPTION
/* Exception handling section */
END;

1. Declaration section:
This is first section which is start with word Declare. All the identifiers (constants and
variables) are declared in this section before they are used in SELECT command.
2. Executable section:
This section contain procedural and SQL statements. This is the only section of the block
which is required. This section starts with ‘Begin’ word.

135
 The only SQL statements allowed in a PL/SQL program are SELECT, INSERT,
UPDATE, DELETE and several other data manipulation statements.
 Data definition statements like CREATE, DROP or ALTER are not allowed.
 The executable section also contains constructs such as assignments, branches,
loops, procedure calls and trigger which are all discussed in detail in subsequent
chapters.
3. Exception handling section :
This section is used to handle errors that occurs during execution of PL/SQL
program. This section starts with ‘exception’ word .
The ‘End’ indicate end of PL/SQL block.
Oracle PL/SQL programs, can be invoke either by typing it in sqlplus or by putting
the code in a file and invoking the file. To execute it use ‘/’ on SQL prompt or use ‘.’
and run.

10.2 DEFINITION
Oracle PL/SQL is an extension of SQL language, designed for seamless processing of SQL
statements enhancing the security, portability, and robustness of the database. This PL/SQL
online programming eBook explains some important aspect of PL SQL language like block
structure, data types, packages, triggers, exception handling, etc.

The PL/SQL programming language was developed by Oracle Corporation in the late
1980s as procedural extension language for SQL and the Oracle relational database.
Following are certain notable facts about PL/SQL:

 PL/SQL is a completely portable, high-performance transaction-processing language.

 PL/SQL provides a built-in, interpreted and OS independent programming environment.

 PL/SQL can also directly be called from the command-line SQL*Plus interface.

 Direct call can also be made from external programming language calls to database.

 PL/SQL's general syntax is based on that of ADA and Pascal programming language.

 Apart from Oracle, PL/SQL is available in TimesTen in-memory database and IBM DB2.

136
10.3 ARCHITECTURE OF PL/SQL

The PL/SQL compilation and run-time system is a technology, not an independent product.
Think of this technology as an engine that compiles and executes PL/SQL blocks and
subprograms. The engine can be installed in an Oracle server or in an application
development tool such as Oracle Forms or Oracle Reports. So, PL/SQL can reside in two
environments:
1. The Oracle server
2. Oracle tools.
These two environments are independent. PL/SQL is bundled with the Oracle server but

might be unavailable in some tools. In either environment, the PL/SQL engine accepts as

input any valid PL/SQL block or subprogram. Fig. 10.1 shows the PL/SQL engine

processing an anonymous block. The engine executes procedural statements but sends SQL

statements to the SQL Statement Executor in the Oracle server.

Figure 10.1: PL/SQL Architecture


Application development tools that lack a local PL/SQL engine must rely on Oracle to
process PL/SQL blocks and subprograms. When it contains the PL/SQL engine, an Oracle
server can process PL/SQL blocks and subprograms as well as single SQL statements. The
Oracle server passes the blocks and subprograms to its local PL/SQL engine.

137
Anonymous Blocks

Anonymous PL/SQL blocks can be embedded in an Oracle Precompiler or OCI program.


At run time, the program, lacking a local PL/SQL engine, sends these blocks to the Oracle
server, where they are compiled and executed. Likewise, interactive tools such as
SQL*Plus and Enterprise Manager, lacking a local PL/SQL engine, must send anonymous
blocks to Oracle.

Stored Subprograms

Subprograms can be compiled separately and stored permanently in an Oracle database,


ready to be executed. A subprogram explicitly CREATE using an Oracle tool is called a
stored subprogram. Once compiled and stored in the data dictionary, it is a schema object,
which can be referenced by any number of applications connected to that database.

Stored subprograms defined within a package are called packaged subprograms. Those
defined independently are called standalone subprograms. Those defined within another
subprogram or within a PL/SQL block are called local subprograms, which cannot be
referenced by other applications and exist only for the convenience of the enclosing block.
Stored subprograms offer higher productivity, better performance, memory savings,
application integrity, and tighter security. For example, by designing applications around a
library of stored procedures and functions, you can avoid redundant coding and increase
your productivity. You can call stored subprograms from a database trigger, another stored
subprogram, an Oracle Precompiler application, an OCI application, or interactively from
SQL*Plus or Enterprise Manager. For example, you might call the standalone procedure
create_dept from SQL*Plus as follows:

SQL> CALL create_dept(’FINANCE’, ’NEW YORK’);

Subprograms are stored in parsed, compiled form. So, when called, they are loaded and
passed to the PL/SQL engine immediately. Also, they take advantage of shared memory.
So, only one copy of a subprogram need be loaded into memory for execution by multiple
users.

138
10.4 FEATURES AND ADVANTAGES OF PL/SQL

Features:

PL/SQL has the following features:

 PL/SQL is tightly integrated with SQL.

 It offers extensive error checking.

 It offers numerous data types.

 It offers a variety of programming structures.

 It supports structured programming through functions and procedures.

 It supports object-oriented programming.

 It supports the development of web applications and server pages.

Advantages:

PL/SQL is a completely portable, high-performance transaction processing language that


offers the following advantages:

 Support for SQL


 Support for object-oriented programming
 Better performance
 Higher productivity
 Full portability
 Tight integration with Oracle
 Tight security

Support for SQL

SQL has become the standard database language because it is flexible, powerful, and easy
to learn. A few English-like commands such as SELECT, INSERT, UPDATE, and
DELETE make it easy to manipulate the data stored in a relational database. SQL is non-
procedural, meaning that you can state what you want done without stating how to do it.

139
Oracle determines the best way to carry out your request. There is no necessary connection
between consecutive statements because Oracle executes SQL statements one at a time.

PL/SQL lets you use all the SQL data manipulation, cursor control, and transaction control
commands, as well as all the SQL functions, operators, and pseudocolumns. So, you can
manipulate Oracle data flexibly and safely. Also, PL/SQL fully supports SQL datatypes.
That reduces the need to convert data passed between your applications and the database.
PL/SQL also supports dynamic SQL, an advanced programming technique that makes your
applications more flexible and versatile. Your programs can build and process SQL data
definition, data control, and session control statements "on the fly" at run time.

Support for Object-Oriented Programming

Object types are an ideal object-oriented modeling tool, which you can use to reduce the
cost and time required to build complex applications. Besides allowing you to create
software components that are modular, maintainable, and reusable, object types allow
different teams of programmers to develop software components concurrently. By
encapsulating operations with data, object types let you move data-maintenance code out
of SQL scripts and PL/SQL blocks into methods. Also, object types hide implementation
details, so that you can change the details without affecting client programs. In addition,
object types allow for realistic data modeling. Complex real-world entities and
relationships map directly into object types. That helps your programs better reflect the
world they are trying to simulate.

Better Performance

Without PL/SQL, Oracle must process SQL statements one at a time. Each SQL statement
results in another call to Oracle and higher performance overhead. In a networked
environment, the overhead can become significant. Every time a SQL statement is issued,
it must be sent over the network, creating more traffic. However, with PL/SQL, an entire
block of statements can be sent to Oracle at one time. This can drastically reduce
communication between your application and Oracle. As Figure 10.2 shows, if your
application is database intensive, you can use PL/SQL blocks and subprograms to group
SQL statements before sending them to Oracle for execution.

PL/SQL stored procedures are compiled once and stored in executable form, so procedure
calls are quick and efficient. Also, stored procedures, which execute in the server, can be

140
invoked over slow network connections with a single call. That reduces network traffic and
improves round-trip response times. Executable code is automatically cached and shared
among users. That lowers memory requirements and invocation overhead.

PL/SQL also improves performance by adding procedural processing power to Oracle


tools. Using PL/SQL, a tool can do any computation quickly and efficiently without calling
on the Oracle server. This saves time and reduces network traffic.

Figure 10.2: PL/SQL Boosts Performance

Higher Productivity

PL/SQL adds functionality to non-procedural tools such as Oracle Forms and Oracle
Reports. With PL/SQL in these tools, you can use familiar procedural constructs to build
applications. For example, you can use an entire PL/SQL block in an Oracle Forms trigger.
You need not use multiple trigger steps, macros, or user exits. Thus, PL/SQL increases
productivity by putting better tools in your hands.

Also, PL/SQL is the same in all environments. As soon as you master PL/SQL with one
Oracle tool, you can transfer your knowledge to other tools, and so multiply the
productivity gains. For example, scripts written with one tool can be used by other tools.

141
Full Portability

Applications written in PL/SQL are portable to any operating system and platform on which
Oracle runs. In other words, PL/SQL programs can run anywhere Oracle can run; you need
not tailor them to each new environment. That means you can write portable program
libraries, which can be reused in different environments.

Tight Integration with SQL

The PL/SQL and SQL languages are tightly integrated. PL/SQL supports all the SQL
datatypes and the non-value NULL. That allows you manipulate Oracle data easily and
efficiently. It also helps you to write high-performance code. The %TYPE and
%ROWTYPE attributes further integrate PL/SQL with SQL. For example, you can use the
%TYPE attribute to declare variables, basing the declarations on the definitions of database
columns. If a definition changes, the variable declaration changes accordingly the next time
you compile or run your program. The new definition takes effect without any effort on
your part. This provides data independence, reduces maintenance costs, and allows
programs to adapt as the database changes to meet new business needs.

Tight Security

PL/SQL stored procedures enable you to partition application logic between the client and
server. That way, you can prevent client applications from manipulating sensitive Oracle
data. Database triggers written in PL/SQL can disable application updates selectively and
do content-based auditing of user inserts. Furthermore, you can restrict access to Oracle
data by allowing users to manipulate it only through stored procedures that execute with
their definer’s privileges. For example, you can grant users access to a procedure that
updates a table, but not grant them access to the table itself.

10.5 BLOCKS IN PL/SQL

In PL/SQL, All statements are classified into units that is called Blocks. PL/SQL blocks
can include variables, SQL statements, loops, constants, conditional statements and
exception handling as shown in figure 10.3. Blocks can also build a function or a procedure
or a package.

Broadly, PL/SQL blocks are two types: Anonymous blocks and Named Blocks

142
10.5.1 Anonymous blocks: In PL/SQL, That’s blocks which is not have header are
known as anonymous blocks. These blocks do not form the body of a function or triggers
or procedure.
Example: Here a code example of find greatest number with Anonymous blocks.

DECLARE
-- declare variable a, b and c
-- and these three variables datatype are integer
a number;
b number;
c number;
BEGIN
a:= 10;
b:= 100;
--find largest number
--take it in c variable
IF a > b THEN
c:= a;
ELSE
c:= b;
END IF;
dbms_output.put_line(' Maximum number in 10 and 100: ' || c);
END;
/
-- Program End

Output:

Maximum number in 10 and 100: 100

143
Figure 10.3 Blocks in PL/SQL
10.5.2 Named blocks: That’s PL/SQL blocks which having header or labels are known
as Named blocks. These blocks can either be subprograms like functions, procedures,
packages or Triggers.
Example: Here a code example of find greatest number with Named blocks means using
function

DECLARE

-- declare variable a, b and c


-- and these three variables datatype are integer
DECLARE
a number;
b number;
c number;
--Function return largest number of
-- two given number
FUNCTION findMax(x IN number, y IN number)
RETURN number
IS
z number;
BEGIN
IF x > y THEN
z:= x;
ELSE
Z:= y;
END IF;
RETURN z;
END;
BEGIN
a:= 10;
b:= 100;
c := findMax(a, b);
dbms_output.put_line(' Maximum number in 10 and 100 is: ' || c);
END;
/
-- Program End

Output:
Maximum number in 10 and 100: 100

10.6 CHECK YOUR PROGRESS


1. __________ Section is used for declaration of variables.
2. SQL statements are written in ______________section.
3. PL/SQL is _________ type of language.
4. Raw types are used to store _______ type of data.
5. If left out, which loop may cause an infinite loop to occur in a simple loop?

144
10.7 SUMMARY
SQL is data oriented language. PL/SQL is application oriented language. SQL is used to
write queries, create and execute DDL and DML statments. PL/SQL is used to write
program blocks, functions, procedures, triggers and packages. PL/SQL is a block-
structured language whose code is organized into blocks. A PL/SQL block consists of
three sections: declaration, executable, and exception-handling sections. In a block, the
executable section is mandatory while the declaration and exception-handling sections are
optional. A PL/SQL block has a name.

SQL PL/SQL
SQL is a single query that is used to PL/SQL is a block of codes that used to write
perform DML and DDL operations. the entire program blocks/ procedure/ function,
etc.

PL/SQL has these advantages:

 Tight Integration with SQL.


 High Performance.
 High Productivity.
 Portability.
 Scalability.
 Manageability.
 Support for Object-Oriented Programming.
 Support for Developing Web Applications.

10.8 KEYWORDS
 DML- A data manipulation language (DML) is a computer programming language
used for adding (inserting), deleting, and modifying (updating) data in a database.
A DML is often a sublanguage of a broader database language such as SQL, with
the DML comprising some of the operators in the language.
 MYSQL- MySQL is a freely available open source Relational Database
Management System (RDBMS) that uses Structured Query Language (SQL). SQL
is the most popular language for adding, accessing and managing content in a

145
database. It is most noted for its quick processing, proven reliability, ease and
flexibility of use.
 ORACLE- Oracle database is an RDMS system from Oracle Corporation. The
software is built around the relational database framework. It allows data objects
to be accessed by users using SQL language. Oracle is a completely scalable
RDBMS architecture which is widely used all over the world.
 SQL*Plus - SQL*Plus is a command-line tool that provides access to the Oracle
RDBMS. SQL*plus enables you to: ... Connect to an Oracle database. Enter and
execute SQL commands and PL/SQL blocks. Format and print query results.

10.9 SELF-ASSESSMENT TEST


1. Create a program that outputs the message “I am soon to be a PL/SQL expert.”
2. Accept two numbers and print the largest number.
3. Accept a number and check whether it is odd or even.
4. Print 1st 10 terms of Fibonacci series.
5. Accept 10 numbers in a loop and print sum of accepted even numbers and odd
numbers separately.

10.10 ANSWERS TO CHECK YOUR PROGRESS


1. Declare
2. Begin
3. Block-Structured
4. Binary
5. No Loop

10.11 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://www.geeksforgeeks.org/blocks-in-pl-sql/
146
 https://www.tutorialspoint.com/plsql/index.htm
 https://www.tutorialspoint.com/plsql/plsql_tutorial.pdf
 https://www.mobt3ath.com/uplode/book/book-48086.pdf

147
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL

LESSON NO. 11 VETTER:

PL/SQL CHARACTER SET AND DATA TYPES

STRUCTURE

11.0 Learning Objective

11.1 Introduction

11.2 Definition

11.3 Understanding the Main Features

11.2.1 Block Structure

11.2.2 Variable and Constants

11.2.3 Attributes

11.4 Control Structure in PL/SQL

11.5 Datatypes

11.6 Check Your Progress

11.7 Summary

11.8 Keywords

11.9 Self-Assessment Test

11.10 Answers to check your progress

11.11 References / Suggested Readings

148
11.0 LEARNING OBJECTIVE
 The objective of this chapter is to make the reader understand the procedural
language for SQL. To understand the main features of PL/SQL such as Block
structure, variables, datatypes, exception handling and control structure.

11.1 INTRODUCTION

A good way to get acquainted with PL/SQL is to look at a sample program. The program
below processes an order for a tennis racket. First, it declares a variable of type NUMBER
to store the quantity of tennis rackets on hand. Then, it retrieves the quantity on hand from
a database table named inventory. If the quantity is greater than zero, the program updates
the table and inserts a purchase record into another table named purchase_record.
Otherwise, the program inserts an out-of-stock record into the purchase_record table.

-- available online in file ’examp1’

DECLARE

qty_on_hand NUMBER(5);

BEGIN

SELECT quantity INTO qty_on_hand FROM inventory

WHERE product = ’TENNIS RACKET’

FOR UPDATE OF quantity;

IF qty_on_hand > 0 THEN -- check quantity

UPDATE inventory SET quantity = quantity - 1

WHERE product = ’TENNIS RACKET’;

INSERT INTO purchase_record

VALUES (’Tennis racket purchased’, SYSDATE);

ELSE

INSERT INTO purchase_record

VALUES (’Out of tennis rackets’, SYSDATE);

END IF;

149
COMMIT;

END;

With PL/SQL, you can use SQL statements to manipulate Oracle data and flow-of-control
statements to process the data. Moreover, you can declare constants and variables, define
procedures and functions, and trap runtime errors. Thus, PL/SQL combines the data
manipulating power of SQL with the data processing power of procedural languages.

11.2 DEFINITION
The Basic Syntax of PL/SQL which is a block-structured language; this means that the
PL/SQL programs are divided and written in logical blocks of code. Each block consists of
three sub-parts:

1. Declarations-This section starts with the keyword DECLARE. It is an optional


section and defines all variables, cursors, subprograms, and other elements to be
used in the program
2. Executable Commands- This section is enclosed between the keywords BEGIN
and END and it is a mandatory section. It consists of the executable PL/SQL
statements of the program. It should have at least one executable line of code, which
may be just a NULL command to indicate that nothing should be executed.
3. Exception Handling- This section starts with the keyword EXCEPTION. This
optional section contains exception(s) that handle errors in the program.

Every PL/SQL statement ends with a semicolon (;). PL/SQL blocks can be nested within
other PL/SQL blocks using BEGIN and END. Following is the basic structure of a PL/SQL
block:

DECLARE
<declarations section>
BEGIN
<executable command(s)>
EXCEPTION
<exception handling>
END;

150
11.3 UNDERSTANDING THE MAIN FEATURES

11.3.1 BLOCK STRUCTURE

PL/SQL is a block-structured language. That is, the basic units (procedures, functions, and
anonymous blocks) that make up a PL/SQL program are logical blocks, which can contain
any number of nested sub-blocks. Typically, each logical block corresponds to a problem
or subproblem to be solved. Thus, PL/SQL supports the divide-and-conquer approach to
problem solving called stepwise refinement. A block (or sub-block) lets you group logically
related declarations and statements. That way, you can place declarations close to where
they are used. The declarations are local to the block and cease to exist when the block
completes. A PL/SQL block has three parts: a declarative part, an executable part, and an
exception-handling part. (In PL/SQL, a warning or error condition is called an exception.)
Only the executable part is required. The order of the parts is logical. First comes the
declarative part, in which items can be declared. Once declared, items can be manipulated
in the executable part. Exceptions raised during execution can be dealt with in the exception-
handling part.

[DECLARE
-- declarations]
BEGIN
-- statements
[EXCEPTION
-- handlers]
END;
You can nest sub-blocks in the executable and exception-handling parts of a PL/SQL block
or subprogram but not in the declarative part. Also, you can define local subprograms in
the declarative part of any block. However, you can call local subprograms only from the
block in which they are defined.

11.3.2 Variables and Constants


PL/SQL lets you declare constants and variables, then use them in SQL and procedural
statements anywhere an expression can be used. However, forward references are not

151
allowed. So, you must declare a constant or variable before referencing it in other
statements, including other declarative statements.

Declaring Variables

Variables can have any SQL datatype, such as CHAR, DATE, or NUMBER, or any
PL/SQL datatype, such as BOOLEAN or BINARY_INTEGER. For example, assume that
you want to declare a variable named part_no to hold 4-digit numbers and a variable named
in_stock to hold the Boolean value TRUE or FALSE. You declare these variables as
follows:

part_no NUMBER(4);
in_stock BOOLEAN;
You can also declare nested tables, variable-size arrays (varrays for short), and records
using the TABLE, VARRAY, and RECORD composite datatypes.

Declaring Constants

Declaring a constant is like declaring a variable except that you must add the keyword
CONSTANT and immediately assign a value to the constant. Thereafter, no more
assignments to the constant are allowed. In the following example, you declare a constant
named credit_limit: credit_limit CONSTANT REAL := 5000.00;

11.3.3 ATTRIBUTES

PL/SQL variables and cursors have attributes, which are properties that let you reference
the datatype and structure of an item without repeating its definition. Database columns
and tables have similar attributes, which you can use to ease maintenance. A percent sign
(%) serves as the attribute indicator.

%TYPE

The %TYPE attribute provides the datatype of a variable or database column. This is
particularly useful when declaring variables that will hold database values. For example,
assume there is a column named title in a table named books. To declare a variable named
my_title that has the same datatype as column title, use dot notation and the %TYPE
attribute, as follows:

152
my_title books.title%TYPE;

Declaring my_title with %TYPE has two advantages. First, you need not know the exact
datatype of title. Second, if you change the database definition of title (make it a longer
character string for example), the datatype of my_title changes

%ROWTYPE

In PL/SQL, records are used to group data. A record consists of a number of related fields
in which data values can be stored. The %ROWTYPE attribute provides a record type that
represents a row in a table. The record can store an entire row of data selected from the
table or fetched from a cursor or cursor variable.

Columns in a row and corresponding fields in a record have the same names and datatypes.
In the example below, you declare a record named dept_rec. Its fields have the same names
and datatypes as the columns in the dept table.

DECLARE

dept_rec dept%ROWTYPE; -- declare record variable

You use dot notation to reference fields, as the following example shows:

my_deptno := dept_rec.deptno;

If you declare a cursor that retrieves the last name, salary, hire date, and job title of an
employee, you can use %ROWTYPE to declare a record that stores the same information,
as follows:

DECLARE

CURSOR c1 IS

SELECT ename, sal, hiredate, job FROM emp;

emp_rec c1%ROWTYPE; -- declare record variable that represents

-- a row fetched from the emp table

153
11.4 CONTROL STRUCTURE IN PL/SQL
Control structures are the most important PL/SQL extension to SQL. Not only does
PL/SQL let you manipulate Oracle data, it lets you process the data using conditional,
iterative, and sequential flow-of-control statements such as IF-THEN-ELSE, CASE, FOR-
LOOP, WHILE-LOOP, EXIT-WHEN, and GOTO. Collectively, these statements can
handle any situation.

Conditional Control

Often, it is necessary to take alternative actions depending on circumstances. The IF-


THEN-ELSE statement lets you execute a sequence of statements conditionally. The IF
clause checks a condition; the THEN clause defines what to do if the condition is true; the
ELSE clause defines what to do if the condition is false or null. Consider the program
below, which processes a bank transaction. Before allowing you to withdraw $500 from
account 3, it makes sure the account has sufficient funds to cover the withdrawal. If the
funds are available, the program debits the account. Otherwise, the program inserts a record
into an audit table.

-- available online in file ’examp2’

DECLARE

acct_balance NUMBER(11,2);

acct CONSTANT NUMBER(4) := 3;

debit_amt CONSTANT NUMBER(5,2) := 500.00;

BEGIN

SELECT bal INTO acct_balance FROM accounts

WHERE account_id = acct

FOR UPDATE OF bal;

IF acct_balance >= debit_amt THEN

UPDATE accounts SET bal = bal - debit_amt

154
WHERE account_id = acct;

ELSE

INSERT INTO temp VALUES

(acct, acct_balance, ’Insufficient funds’);

-- insert account, current balance, and message

END IF;

COMMIT;

END;

To choose among several values or courses of action, you can use CASE constructs. The
CASE expression evaluates a condition and returns a value for each case. The case
statement evaluates a condition and performs an action (which might be an entire PL/SQL
block) for each case.

-- This CASE statement performs different actions based

-- on a set of conditional tests.

CASE

WHEN shape = ’square’ THEN area := side * side;

WHEN shape = ’circle’ THEN

BEGIN

area := pi * (radius * radius);

DBMS_OUTPUT.PUT_LINE(’Value is not exact because pi is

irrational.’);

END;

WHEN shape = ’rectangle’ THEN area := length * width;

ELSE

155
BEGIN

DBMS_OUTPUT.PUT_LINE(’No formula to calculate area of a’ ||

shape);

RAISE PROGRAM_ERROR;

END;

END CASE;

A sequence of statements that uses query results to select alternative actions is common in
database applications. Another common sequence inserts or deletes a row only if an
associated entry is found in another table. You can bundle these common sequences into a
PL/SQL block using conditional logic.

Iterative Control

LOOP statements let you execute a sequence of statements multiple times. You place the
keyword LOOP before the first statement in the sequence and the keywords END LOOP
after the last statement in the sequence. The following example shows the simplest kind of
loop, which repeats a sequence of statements continually:

LOOP

-- sequence of statements

END LOOP;

The FOR-LOOP statement lets you specify a range of integers, then execute a sequence of
statements once for each integer in the range. For example, the following loop inserts 500
numbers and their square roots into a database table:

FOR num IN 1..500 LOOP

INSERT INTO roots VALUES (num, SQRT(num));

END LOOP;

The WHILE-LOOP statement associates a condition with a sequence of statements. Before


each iteration of the loop, the condition is evaluated. If the condition is true, the sequence

156
of statements is executed, then control resumes at the top of the loop. If the condition is
false or null, the loop is bypassed and control passes to the next statement. In the following
example, you find the first employee who has a salary over $2500 and is higher in the chain
of command than employee 7499:

-- available online in file ’examp3’

DECLARE

salary emp.sal%TYPE := 0;

mgr_num emp.mgr%TYPE;

last_name emp.ename%TYPE;

starting_empno emp.empno%TYPE := 7499;

BEGIN

SELECT mgr INTO mgr_num FROM emp

WHERE empno = starting_empno;

WHILE salary <= 2500 LOOP

SELECT sal, mgr, ename INTO salary, mgr_num, last_name

FROM emp WHERE empno = mgr_num;

END LOOP;

INSERT INTO temp VALUES (NULL, salary, last_name);

COMMIT;

EXCEPTION

WHEN NO_DATA_FOUND THEN

INSERT INTO temp VALUES (NULL, NULL, ’Not found’);

COMMIT;

END;

157
Sequential Control

The GOTO statement lets you branch to a label unconditionally. The label, an undeclared
identifier enclosed by double angle brackets, must precede an executable statement or a
PL/SQL block. When executed, the GOTO statement transfers control to the labeled
statement or block, as the following example shows:

IF rating > 90 THEN

GOTO calc_raise; -- branch to label

END IF;

...

<<calc_raise>>

IF job_title = ’SALESMAN’ THEN -- control resumes here

amount := commission * 0.25;

ELSE

amount := salary * 0.10;

END IF;

Packages

PL/SQL lets you bundle logically related types, variables, cursors, and subprograms into a
package. Each package is easy to understand and the interfaces between packages are
simple, clear, and well defined. This aids application development. Packages usually have
two parts: a specification and a body. The specification is the interface to your applications;
it declares the types, constants, variables, exceptions, cursors, and subprograms available
for use. The body defines cursors and subprograms and so implements the specification. In
the following example, you package two employment procedures:

CREATE PACKAGE emp_actions AS -- package specification

PROCEDURE hire_employee (empno NUMBER, ename CHAR, ...);

PROCEDURE fire_employee (emp_id NUMBER);

158
END emp_actions;

CREATE PACKAGE BODY emp_actions AS -- package body

PROCEDURE hire_employee (empno NUMBER, ename CHAR, ...) IS

BEGIN

INSERT INTO emp VALUES (empno, ename, ...);

END hire_employee;

PROCEDURE fire_employee (emp_id NUMBER) IS

BEGIN

DELETE FROM emp WHERE empno = emp_id;

END fire_employee;

END emp_actions;

Only the declarations in the package specification are visible and accessible to applications.
Implementation details in the package body are hidden and inaccessible. Packages can be
compiled and stored in an Oracle database, where their contents can be shared by many
applications. When you call a packaged subprogram for the first time, the whole package
is loaded into memory. So, subsequent calls to related subprograms in the package require
no disk I/O. Thus, packages can enhance productivity and improve performance.

Error Handling

PL/SQL makes it easy to detect and process predefined and user-defined error conditions
called exceptions. When an error occurs, an exception is raised. That is, normal execution
stops and control transfers to the exception-handling part of your PL/SQL block or
subprogram. To handle raised exceptions, you write separate routines called exception
handlers. Predefined exceptions are raised implicitly by the runtime system. For example,
if you try to divide a number by zero, PL/SQL raises the predefined exception
ZERO_DIVIDE automatically. You must raise user-defined exceptions explicitly with the
RAISE statement.

159
You can define exceptions of your own in the declarative part of any PL/SQL block or
subprogram. In the executable part, you check for the condition that needs special attention.
If you find that the condition exists, you execute a RAISE statement. In the example below,
you compute the bonus earned by a salesperson. The bonus is based on salary and
commission. So, if the commission is null, you raise the exception comm_missing.

DECLARE

...

comm_missing EXCEPTION; -- declare exception

BEGIN

...

IF commission IS NULL THEN

RAISE comm_missing; -- raise exception

END IF;

bonus := (salary * 0.10) + (commission * 0.15);

EXCEPTION

WHEN comm_missing THEN ... -- process the exception

11.5 DATATYPES

The datatypes can be classified as:

 Predefined Datatypes
 User-Defined Datatypes
 Datatype Conversion

Predefined Datatypes

A scalar type has no internal components. A composite type has internal components that
can be manipulated individually. A reference type holds values, called pointers that
designate other program items. A LOB type holds values, called lob locators that specify

160
the location of large objects (graphic images for example) stored out-of-line. Figure 11.1
shows the predefined datatypes available for your use.

Figure 11.1: Pre-defined Datatypes

User-Defined Datatypes

Each PL/SQL base type specifies a set of values and a set of operations applicable to items
of that type. Subtypes specify the same set of operations as their base type but only a subset
of its values. Thus, a subtype does not introduce a new type; it merely places an optional
constraint on its base type. Subtypes can increase reliability, provide compatibility with
ANSI/ISO types, and improve readability by indicating the intended use of constants and
variables. PL/SQL predefines several subtypes in package STANDARD. For example,
PL/SQL predefines the subtypes CHARACTER and INTEGER as follows:

SUBTYPE CHARACTER IS CHAR;

161
SUBTYPE INTEGER IS NUMBER(38,0); -- allows only whole numbers

Datatype Conversion

Sometimes it is necessary to convert a value from one datatype to another. For example, if
you want to examine a rowid, you must convert it to a character string. PL/SQL supports
both explicit and implicit (automatic) datatype conversion.

11.6 CHECK YOUR PROGRESS


1. True or false
Routines written in PL/SQL can be called in Oracle call interface, Java,
Pro*C/C++, COBOL etc.
2. True or false
PL/SQL does not have data types or variables.
3. What is wrong in the following assignment statement?
balance = balance + 2000;
4. Write a single statement that concatenates the words ‘Hello’ and ‘World’ and assign
it in a variable named greeting.
5. Which operator has the highest precedence among the following −
AND, NOT, OR?

11.7 SUMMARY
A PL/SQL program consists of a sequence of statements, each made up of one or
more lines of text. The precise characters available to you will depend on what database
character set you’re using. For example, following table 11.1 illustrates the available
characters in the US7ASCII character set.

Type Characters
Letters A-Z, a-z
Digits 0-9
Symbols ~!@#$%&*()_-+=|[]{}:;"'
<>,.?/^
Whitespace Tab, space, newline, carriage return
Table 11.1: Characters available to PL/SQL in the US7ASCII character set

162
Every keyword in PL/SQL is made from various combinations of characters in this
character set. Now you just have to figure out how to put them all together! By default,
PL/SQL is a case-insensitive language. That is, uppercase letters are treated the same way
as lowercase letters except when characters are surrounded by single quotes, which makes
them a literal string. A number of these characters—both singly and in combination with
other characters—have a special significance in PL/SQL. Table 3-3 lists these special
symbols.

Whereas the control structure in PL/SQL can be understood as: Procedural computer
programs use the basic control structures. The selection structure tests a condition, then
executes one sequence of statements instead of another, depending on whether the
condition is true or false.

11.8 KEYWORDS
 TRIGGER- A trigger is a special type of stored procedure that automatically runs
when an event occurs in the database server. DML triggers run when a user tries to
modify data through a data manipulation language (DML) event. DML events are
INSERT, UPDATE, or DELETE statements on a table or view.
 CURSOR- Implicit cursors are automatically created when select statements are
executed. Explicit cursors needs to be defined explicitly by the user by providing a
name. They are capable of fetching a single row at a time. Explicit cursors can fetch
multiple rows.
 PL/SQL PACKAGE- A package is a file that groups functions, cursors, stored
procedures, and variables in one place.

11.9 SELF-ASSESSMENT TEST


1. Explain the purpose of the PL/SQL language?
2. Discuss in detail the striking features of PL/SQL.
3. What are the different datatypes in PL/SQL?
4. What do you mean by the control structure?
5. How do you declare a user-defined exception?

163
11.10 ANSWERS TO CHECK YOUR PROGRESS
1. True
2. False
3. Use of wrong assignment operator. The correct syntax is: balance := balance +
2000;
4. greeting := ‘Hello’ || ‘World’;
5. NOT

11.11 REFERENCES / SUGGESTED READINGS


 C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
 Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
 Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
 https://www.mobt3ath.com/uplode/book/book-48086.pdf
 https://www.tutorialspoint.com/plsql/plsql_tutorial.pdf

164

You might also like