Professional Documents
Culture Documents
BCA-PC(L)-242
RELATIONAL DATABASE MANAGEMENT
SYSTEM
8. SQL 106-116
ii
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
STRUCTURE
1.1 Introduction
1.6 Summary
1.7 Keywords
1
1.0 LEARNING OBJECTIVE
To understand the concepts of relational database.
To know the difference between DBMS and RDBMS.
To understand the parameters of RDBMS and components of RDBMS in detail
To know the concepts and notations of the relational model.
1.1 INTRODUCTION
The relational model was first introduced by Ted Codd of IBM Research in 1970 in a classic
paper (Codd 1970), and attracted immediate attention due to its simplicity and
mathematical foundation. The model uses the concept of a mathematical relation-which
looks somewhat like a table of values-as its basic building block, and has its theoretical
basis in set theory and first-order predicate logic. In this chapter we discuss the basic
characteristics of the model and its constraints. The first commercial implementations of
the relational model became available in the early 1980s, such as the Oracle DBMS and the
SQL/DS system on the MVS operating system by IBM. Since then, the model has been
implemented in a large number of commercial systems. Current popular relational DBMSs
(RDBMSs) include DB2 and lnformix Dynamic Server (from IBM), Oracle and Rdb (from
Oracle), and SQL Server and Access (from Microsoft). Most of the problems faced at the
time of implementation of any system are outcome of a poor database design. In many cases
it happens that system has to be continuously modified in multiple respects due to changing
requirements of users. It is very important that a proper planning has to be done. A relation
in a relational database is based on a relational schema, which consists of number of
attributes. A relational database is made up of a number of relations and corresponding
relational database schema. The goal of a relational database design is to generate a set of
relation schema that allows us to store information without unnecessary redundancy and
also to retrieve information easily. One approach to design schemas that are in an
appropriate normal form. The normal forms are used to ensure that various types of
anomalies and inconsistencies are not introduced into the database.
2
1.2 DEFINITION RELATIONAL MODELS
The relational model represents the database as a collection of relations. Informally, each
relation resembles a table of values or, to some extent, a "flat" file of records. When a
relation is thought of as a table of values, each row in the table represents a collection of
related data values. In the relational model, each row in the table represents a fact that
typically corresponds to a real-world entity or relationship. The table name and column
names are used to help in interpreting the meaning of the values in each row. For example,
the first table of Figure 1.1 is called STUDENT because each row represents facts about a
particular student entity. The column names-Name, Student-Number, Class, and Major-
specify how to interpret the data values in each row, based on the column each value is in.
All values in a column are of the same data type.
In the formal relational model terminology, a row is called a tuple, a column header
is called an attribute, and the table is called a relation. The data type describing the types
of values that can appear in each column is represented by a domain of possible values. We
now define these terms--domain, tuple, attribute, and relation-more precisely.
RDBMS stands for Relational Database Management System. RDBMS data is structured
in database tables, fields and records. Each RDBMS table consists of database table rows.
Each database table row consists of one or more database table fields. RDBMS store the
data into collection of tables, which might be related by common fields (database table
columns). RDBMS also provide relational operators to manipulate the data stored into the
database tables. Most RDBMS use SQL as database query language. The most popular
RDBMS are MS SQL Server, DB2, Oracle and MySQL. The relational model is an
example of record-based model. Record based models are so named because the database
is structured in fixed format records of several types. Each table contains records of a
particular type. Each record type defines a fixed number of fields, or attributes.
3
Figure 1.1 Database stores information of students and course
The columns of the table correspond to the attributes of the record types. The relational
data model is the most widely used data model, and a vast majority of current database
systems are based on the relational model. The relational model was designed by the IBM
research scientist and mathematician, Dr. E.F.Codd. Many modern DBMS do not conform
to the Codd’s definition of a RDBMS, but nonetheless they are still considered to be
RDBMS.
Two of Dr.Codd’s main focal points when designing the relational model were to further
reduce data redundancy and to improve data integrity within database systems.
The relational model originated from a paper authored by Dr.codd entitled “A
4
Relational Model of Data for Large Shared Data Banks”, written in 1970. This paper
included the following concepts that apply to database management systems for relational
databases. The relation is the only data structure used in the relational data model to
represent both entities and relationships between them. Rows of the relation are referred to
as tuples of the relation and columns are its attributes. Each attribute of the column are
drawn from the set of values known as domain. The domain of an attribute contains the set
of values that the attribute may assume. From the historical perspective, the relational data
model is relatively new .The first database systems were based on either network or
hierarchical models .The relational data model has established itself as the primary data
model for commercial data processing applications. Its success in this domain has led to its
applications outside data processing in systems for computer aided design and other
environments. A relational database management system (RDBMS) is a collection of
programs and capabilities that enable IT teams and others to create, update, administer and
otherwise interact with a relational database. RDBMS store data in the form of tables, with
most commercial relational database management systems using Structured Query
Language (SQL) to access the database. However, since SQL was invented after the initial
development of the relational model, it is not necessary for RDBMS use.
The RDBMS is the most popular database system among organizations across the
world. It provides a dependable method of storing and retrieving large amounts of data
while offering a combination of system performance and ease of implementation.
5
of the database whereas the RDBMS is more with this regard because RDBMS define the
integrity constraint for the purpose of holding ACID PROPERTY.
In general, databases store sets of data that can be queried for use in other
applications. A database management system supports the development, administration and
use of database platforms. An RDBMS is a type of database management system (DBMS)
that stores data in a row-based table structure which connects related data elements. An
RDBMS includes functions that maintain the security, accuracy, integrity and consistency
of the data. This is different than the file storage used in a DBMS. Other differences
between database management systems and relational database management systems
include:
Number of allowed users- While a DBMS can only accept one user at a time, an
RDBMS can operate with multiple users.
Hardware and software requirements- A DBMS needs less software and
hardware than an RDBMS.
Amount of data- RDBMS can handle any amount of data, from small to large,
while a DBMS can only manage small amounts.
Database structure- In a DBMS, data is kept in a hierarchical form, whereas an
RDBMS utilizes a table where the headers are used as column names and the rows
contain the corresponding values.
ACID implementation- DBMS do not use the atomicity, consistency, isolation and
durability (ACID) model for storing data. On the other hand, RDBMS base the
structure of their data on the ACID model to ensure consistency.
Distributed databases- While an RDBMS offers complete support for distributed
databases, a DBMS will not provide support.
Types of programs managed- While an RDBMS helps manage the relationships
between its incorporated tables of data, a DBMS focuses on maintaining databases
that are present within the computer network and system hard disks.
Support of database normalization- An RDBMS can be normalized, but a DBMS
cannot.
6
DBMS vs RDBMS using different parameters
Storage DBMS stores data as a file. Data is stored in the form of tables.
DBMS system, stores data in either a Where the headers are the column
Database structure
navigational or hierarchical form. names, and the rows contain
corresponding values
Number of Users DBMS supports single user only. It supports multiple users.
the data may not be construct, but they are consistent and
ACID stored following the ACID model. well structured. They obey ACID
DBMS does not support the integrity constraints at the schema level.
Integrity constraints constants. The integrity constants are Values beyond a defined range
not imposed at the file level. cannot be stored into the particular
RDMS column.
7
Parameter DBMS RDBMS
DBMS does not support distributed RBMS offers support for distributed
Distributed Databases
database. databases.
DBMS system mainly deals with small RDMS is designed to handle a large
Ideally suited for
quantity of data. amount of data.
Dbms satisfy less than seven of Dr. E.F. Dbms satisfy 8 to 10 Dr. E.F. Codd
Dr. E.F. Codd Rules
Codd Rules Rules
Data fetching is slower for the complex Data fetching is rapid because of its
Data Fetching
and large amount of data. relational approach.
Data redundancy is common in this Keys and indexes do not allow Data
Data Redundancy
model. redundancy.
Data Relationship No relationship between data which are related to each other with
object level.
8
1.4 DOMAINS, ATTRIBUTES, TUPLES AND RELATIONS
A domain D is a set of atomic values. By atomic we mean that each value in the domain is
indivisible as far as the relational model is concerned. A common method of specifying a
domain is to specify a data type from which the data values forming the domain are drawn.
It is also useful to specify a name for the domain, to help in interpreting its values. Some
examples of domains follow:
• USA_phone_numbers: The set of ten-digit phone numbers valid in the United States.
The preceding are called logical definitions of domains. A data type or format is also
specified for each domain. For example, the data type for the domain USA_phone_
numbers can be declared as a character string of the form (ddd) ddd-dddd, where each d is
a numeric (decimal) digit and the first three digits form a valid telephone area code. The
data type for Employee_ages is an integer number between 15 and 80. For Academic_
department_names, the data type is the set of all character strings that represent valid
department names. A domain is thus given a name, data type, and format. Additional
information for interpreting the values of a domain can also be given; for example, a
numeric domain such as Person_weights should have the units of measurement, such as
pounds or kilograms.
9
A relation schema R, denoted by R(A1, A2, ... , An), is made up of a relation name
R and a list of attributes A1, A2, ..., An, Each attribute Ai is the name of a role played by
some domain D in the relation schema R. D is called the domain of Ai and is denoted by
dom(Ai). A relation schema is used to describe a relation; R is called the name of this
relation. The degree (or arity) of a relation is the number of attributes n of its relation
schema. An example of a relation schema for a relation of degree seven, which describes
Using the data type of each attribute, the definition is sometimes written as:
For this relation schema, STUDENT is the name of the relation, which has seven attributes.
In the above definition, we showed assignment of generic types such as string or integer to
the attributes. More precisely, we can specify the following previously defined domains for
some of the attributes of the STUDENT relation: dom(Name) = Names; dom(SSN) =
Social_security_numbers;dom(HomePhone)= LocaLphone_numbers,3 dom(OfficePhone)
= Localjphonejiumbers, and dom(GPA) = Gradepoint averages. It is also possible to refer
to attributes of a relation schema by their position within the relation; thus, the second
attribute of the STUDENT relation is SSN, whereas the fourth attribute is Address.
10
Figure 1.2 shows an example of a STUDENT relation, which corresponds to the STUDENT
schema just specified. Each tuple in the relation represents a particular student entity. We
display the relation as a table, where each tuple is shown as a row and each attribute
corresponds to a column header indicating a role or interpretation of the values in that
column. Null values represent attributes whose values are unknown or do not exist for some
individual STUDENT tuple. The earlier definition of a relation can be restated more
formally as follows. A relation (or relation state) r(R) is a mathematical relation of degree
n on the domains dom(A1) , dom(Az), ... , domi.A}, which is a subset of the Cartesian
product of the domains that define R:
The Cartesian product specifies all possible combinations of values from the underlying
domains. Hence, if we denote the total number of values, or cardinality, in a domain D by
ID I (assuming that all domains are finite), the total number of tuples in the Cartesian
product is
Idom(A1) I X Idom(A2) I X ... X Idom(An ) I
Of all these possible combinations, a relation state at a given time-the current relation state-
reflects only the valid tuples that represent a particular state of the real world. In general,
as the state of the real world changes, so does the relation, by being transformed into another
relation state. However, the schema R is relatively static and does not change except very
infrequently-for example, as a result of adding an attribute to represent new information
that was not originally stored in the relation. It is possible for several attributes to have the
same domain. The attributes indicate different roles, or interpretations, for the domain. For
example, in the STUDENT relation, the same domain Local_phone_numbers plays the role
of HomePhone, referring to the "home phone of a student," and the role of OfficePhone,
referring to the "office phone of the student."
11
mathematician, Dr. …………………..
5. The ………………… is the only data structure used in the relational data model to
represent both entities and relationships between them.
6. Does the normal forms never removes anomalies?
7. Is each attribute of the column are drawn from the set of values known as domain?
1.6 SUMMARY
A DBMS is a software used to store and manage data. The DBMS was introduced during
1960's to store any data. It also offers manipulation of the data like insertion, deletion, and
updating of the data. DBMS system also performs the functions like defining, creating,
revising and controlling the database. It is specially designed to create and maintain data
and enable the individual business application to extract the desired data.
12
duplicate data. Table is the simplest form of data storage. All data stored in the tables are
provided by an RDBMS. Ensures that all data stored are in the form of rows and columns.
Facilitates primary key, which helps in unique identification of the rows. Index creation for
retrieving data at a higher speed. Facilitates a common column to be shared amid two or
more tables. Major components of RDBMS are Table, Record or Tuple, Field, Domain,
Instance, Schema, Keys. Relational database stores data in tables. Tables are organized into
columns, and each column stores one type of data (integer, real number, character strings,
date). The data for a single “instance” of a table is stored as a row. Many relational database
systems have an option of using the SQL (Structured Query Language) for querying and
maintaining the database.
1.7 KEYWORDS
Domain- A domain describes the set of possible values for a given attribute, and
can be considered a constraint on the value of the attribute. Mathematically,
attaching a domain to an attribute means that any value for the attribute must be an
element of the specified set. The character string "ABC", for instance, is not in the
integer domain, but the integer value 123 is.
Constraints- Constraints make it possible to further restrict the domain of an
attribute. For instance, a constraint can restrict a given integer attribute to values
between 1 and 10.
Tuple- A data set representing a single item.
Column- A labeled element of a tuple, e.g. "Address" or "Date of birth"
Table- A set of tuples sharing the same attributes; a set of columns and rows
View- Any set of tuples; a data report from the RDBMS in response to a query
13
4. What are record based models?
5. How RDBMS stores its data?
14
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
STRUCTURE
2.1 Introduction
2.2 Definition
15
2.5 Summary
2.6 Keywords
2.1 INTRODUCTION
The relational model was first introduced by Ted Codd of IBM Research in 1970 in a classic
paper (Codd 1970), and attracted immediate attention due to its simplicity and
mathematical foundation. The model uses the concept of a mathematical relation-which
looks somewhat like a table of values-as its basic building block, and has its theoretical
basis in set theory and first-order predicate logic. In this chapter we discuss in detail about
the brief history of Dr. codd and his research, and what are the rules stated by him to define
a database as relational database. Dr Edgar F. Codd, after his extensive research on the
Relational Model of database systems, came up with twelve rules of his own, which
according to him, a database must obey in order to be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using only its
relational capabilities. This is a foundation rule, which acts as a base for all the other rules.
Database Management System or DBMS essentially consists of a comprehensive set of
application programs that can be leveraged to access, manage and update the data, provided
the data is interrelated and profoundly persistent. Just like any management system, the
goal of a DBMS is to provide an efficient and convenient environment in which it becomes
easy to retrieve and store the information into the database. It goes without mentioning that
databases are used to store and manage large amounts of information.
16
Data Modeling − It is all about defining the structures for information storage.
Provision of Mechanisms − To manipulate processed data and modify file and
system structures, it is important to provide query processing mechanisms.
Crash Recovery and Security − To avoid any discrepancies and ensure that the data
is secure, crash recovery and security mechanisms are must.
Concurrency Control − If the system is shared by multiple users, concurrency
control is the need of the hour.
Based on relational model, the Relational database was created. Codd proposed 13 rules
popularly known as Codd's 12 rules to test DBMS's concept against his relational model.
Codd's rule actualy define what quality a DBMS requires in order to become a Relational
Database Management System (RDBMS). Till now, there is hardly any commercial
product that follows all the 13 Codd's rules.
Terminology used:
17
are considered different from each other.
Table 1 and Table 2 represent relational model having two relations STUDENT and
STUDENT_COURSE.
2.2 DEFINATION
Dr E.F.Codd, also known to the world as the ‘Father of Database Management Systems’
had propounded 12 rules which are in-fact 13 in number. The rules are numbered from zero
to twelve. According to him, a DBMS is fully relational if it abides by all his twelve rules.
Till now, only few databases abide by all the eleven rules. His twelve rules are fondly called
‘E.F.Codd’s Twelve Commandments’. His brilliant and seminal research paper ‘A
Relational Model of Data for Large Shared Data Banks’ in its entirety is a visual treat to
eyes.
The relational model was first introduced by Ted Codd of IBM Research in 1970 in a classic
paper (Codd 1970), and attracted immediate attention due to its simplicity and
mathematical foundation. The relational model was introduced by Codd (1970) in a classic
paper. Codd also introduced relational algebra and laid the theoretical foundations for the
relational model in a series of papers (Codd 1971, 1972, 1972a, 1974); he was later given
the Turing award, the highest honor of the ACM, for his work on the relational model. In a
later paper, Codd (1979) discussed extending the relational model to incorporate more
18
meta-data and semantics about the relations; he also proposed a three-valued logic to deal
with uncertainty in relations and incorporating NULLs in the relational algebra. The
resulting model is known as RM/T. Childs (1968) had earlier used set theory to model
databases. Later, Codd (1990) published a book examining over 300 features of the
relational data model and database systems. E.F Codd was a Computer Scientist who
invented the Relational model for Database management. Based on relational model,
the Relational database was created. Codd proposed 13 rules popularly known as Codd's
12 rules to test DBMS's concept against his relational model. Codd's rule actualy define
what quality a DBMS requires in order to become a Relational Database Management
System(RDBMS). Till now, there is hardly any commercial product that follows all the 13
Codd's rules. Even Oracle follows only eight and half(8.5) out of 13.
This is the foundational Rule. This rule states that any database system should have
characteristics as relational, as a database and as a management system to be RDBMS. That
means a database should be a relational by having the relation / mapping among the tables
in the database. They have to be related to one another by means of constraints/ relation.
There should not be any independent tables hanging in the database. RDBMS is a database
i.e.; it stores the data in a well-organized form called tables. It should be able to handle
large amount of information too. In short, it should meet the objectives of a database.
RDBMS is management system – that means it should be able to manage the data,
relation, retrieval, update, delete, permission on the objects. It should be able handle all
these administrative tasks without affecting the objectives of database. It should be
performing all these tasks by using query languages.
19
2.3.2 Codd’s Rule 1- Rule of Information
Relational Databases should store the data in the form of relations. Tables are relations in
Relational Database Management Systems. Be it any user defined data or meta-data, it is
important to store the value as an entity in the table cells. A database consists of lot of data
– may be user data and the data about these data or metadata. Each group of these data must
be stored in a table in the form of rows and columns. Each cell in the table should have
these datas. The order of rows and columns in the table should not affect the meaning of
the table. Each cell should have single data. There should not be any group/range of values
separated by comma, space or hyphen (Normalized data). This should be the only way to
store the data in a database. This rule is satisfied by all the databases.
For Example: Order of storing personal details about ‘James’ and ‘Antony’ in PERSON
table should not have any difference. There should be flexibility of storing them in any
order in a row. Similarly, storing Person name first and then his address should be same as
storing address and then his name. It does not make any difference on the meaning of table.
The use of pointers to access data logically is strictly forbidden. Every data entity which is
atomic in nature should be accessed logically by using a right combination of the name of
table, primary key represented by a specific row value and column name represented by
attribute value. Each unique piece of data(atomic value) should be accesible by : Table
Name + Primary Key(Row) + Attribute(column).
This rule refers to the primary key. It states that any data/column/attribute in the table
should be able logically accessed by using the table in which it is stored, the primary key
column of the table and the column which we want to access. When combination of these
3 is used, it should give the correct result. Any column/ cell value should not be directly
accessed without specifying the table and primary key. From figure 2.1:
20
Figure 2.1: Database of STUDENT
Null values are completely supported in relational databases. They should be uniformly
considered as ‘missing information’. Null values are independent of any data type. They
should not be mistaken for blanks or zeroes or empty strings. Null values can also be
interpreted as ‘inapplicable data’ or ‘unknown information.’ This rule states about handling
the NULLs in the database. As database consists of various types of data, each cell will
have different datatypes. If any of the cell value is unknown, or not applicable or missing,
it cannot be represent as zero or empty. It will be always represented as NULL. This NULL
should be acting irrespective of the datatype used for the cell. When used in logical or
arithmetical operation, it should result the value correctly.
For example:
Adding NULL to numeric 5 should result NULL –
It should not result in any zero or numeric value. DBMS should be strong enough to
handle these NULLs according to the situation and the datatypes. Null values (distinct
from an empty character string or a string of blank characters and distinct from zero or
any other number) are supported in a fully relational DBMS for representing missing
information and inapplicable information in a systematic way, independent of the data
type.
21
2.3.5 Codd’s Rule 4- Rule of Active and online relational Catalog
In the Database Management Systems lexicon, ‘metadata’ is the data about the database or
the data about the data. The active online catalog that stores the metadata is called ‘Data
dictionary’. The so called data dictionary is accessible only by authored users who have the
required privileges and the query languages used for accessing the database should be used
for accessing the data of data dictionary. The database description is represented at the
logical level in the same way as ordinary data, so that authorized users can apply the same
relational language to its interrogation as they apply to the regular data. This rule illustrates
data dictionary. Metadata should be maintained for all the data in the database. These
metadata should also be stored as tables, rows and columns. It should also have access
privileges. In short, these metadata stored in the data dictionary should also obey all the
characteristics of a database. Also, it should have correct up to date data. We should be able
to access these metadata by using same query language that we use to access the database.
Active online catalog based on the relational model: The system must support an
online, inline, relational catalog that is accessible to authorized users by means of their
regular query language. That is, users must be able to access the database's structure
(catalog) using the same query language that they use to access the database's data. The
structure description of the entire database must be stored in an online catalog, known as
data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.
A single robust language should be able to define integrity constraints, views, data
manipulations, transactions and authorizations. If the database allows access to the
aforementioned ones, it is violating this rule. A relational system may support several
languages and various modes of terminal use (for example, the fill-in-blanks mode).
However, there must be at least one language whose statements are expressible, per some
well-defined syntax, as character strings and whose ability to support all of the following
is comprehensible: data definition, view definition, data manipulation (interactive and by
program), integrity constraints, and transaction boundaries (begin, commit, and rollback).
Any RDBMS database should not be directly accessed. It should always be accessed by
using some strong query language. This query language should be able to access the data,
22
manipulate the data and maintain the consistency and integrity of the database. They query
should make sure that the transaction is fully complete or not done at all.
For Example:
Views should reflect the updates of their respective base tables and vice versa. A view is a
logical table which shows restricted data. Views generally make the data readable but not
modifiable. Views help in data abstraction. Views are the virtual tables created by using
queries to show the partial view of the table. That is views are subset of table, it is only
partial table with few rows and columns. This rule states that views are also be able to get
updated as we do with its table.
For Example:
Suppose we have created a view on Employee table, in which we have details of the
employees who work for particular department, say ‘Testing’. Here STUDENT is the
whole table and STUDENT_TEST is the view with Testing Employees. According to this
rule, we should be able to update the records in STUDENT_VIEW.
But in real database systems, we cannot give this privilege on views. Basic intension of
creating the view is to give the group of data to the user in the form of table. When lengthy
queries have to be written to get some details from the database, view shortens the length
of the query and gives more meaningful and shorter query. In such case, updating the view
is not feasible. Although updating the view will update the table used for creating it, it is
not recommended by most of the database. Hence this rule is not used in most of the
database. All views of the data which are theoretically updatable must be updatable in
practice by the DBMS.
23
2.3.8 Codd’s Rule 7- Rule of Set level insertion, update and deletion
A single operation should be sufficient to retrieve, insert, update and delete the data. The
capability of handling a base relation or a derived relation as a single operand applies not
only to the retrieval of data but also to the insertion, update, and deletion of data. This rule
states that every query language used by the database should support INSERT, DELETE
and UPDATE on the records. It should also support set operations like UNION, UNION
ALL, MINUS, INTERSECT and INTERSECT ALL. All these operation should not be
restricted to single table or row at a time. It should be able to handle multiple tables and
rows in its operation.
For Example:
Suppose employees got 5% hike in a year. Then their salary has to be updated to reflect the
new salary. Since this is the annual hike given to the employees, this increment is applicable
for all the employees. Hence, the query should not be written for updating the salary one
by one for thousands of employee. A single query should be strong enough to update the
entire employee’s salary at a time. A database must support high-level insertion, updation,
and deletion. This must not be limited to a single row, that is, it must also support union,
intersection and minus operations to yield sets of data records.
Batch and end user operations are logically separated from physical storage and respective
access methods. Application programs and terminal activities remain logically unimpaired
whenever any changes are made in either storage representations or access methods. If there
is any change in the physical storage of the data, it should not affect the data at the logical
or external view. The physical storage of data should not matter to the system. If say, some
file supporting table is renamed or moved from one disk to another, it should not affect the
application.
For Example:
If the data stored in one disk is transferred to another disk, then the user viewing the data
should not feel the difference or delay in access time. The user should be able to access the
data as he was accessing before. Similarly, if the file name for the table is changed in the
24
memory, it should not affect the table or the user viewing the table. This is known as
physical independence and database should support this feature.
Batch and end users can change the database schema without having to recreate it or
recreate the applications built upon it. Application programs and terminal activities remain
logically unimpaired when information preserving changes of any kind that theoretically
permit unimpairment are made to the base tables. This is similar to physical data
independence. Here if there are any changes to the logical view, then it should not be
reflected in the user view.
For Example:
If we split the EMPLOYEE table according to his department into multiple employee
tables, the user viewing the employee table should not feel that these records are coming
from different tables. These split tables should be able to get joined and show the result. In
our example we can use UNION and display the results to the user.
But in ideal scenario, this is difficult to achieve since all the logical and user view will be
tied so st rongly that they will be almost same.
Integrity constraints should be available and stored as metadata in data dictionary and not
in the application programs. Database should be able apply integrity rules by using its query
languages. It should not be dependent on any external factor or application to maintain the
integrity. The keys and constraints in the database should be strong enough to handle the
integrity. A good RDBMS should be independent of the frontend application. It should at
least support primary key and foreign key integrity constraints. Integrity constraints must
be definable in the RDBMS sub-language and stored in the system catalogue and not within
individual application programs.
For Example:
25
perform the task of fetching if department 50 exists, if not insert the department and then
inserting the employee. It should all handled by the database.
The Data Manipulation Language of the relational system should not be concerned about
the physical data storage and no alterations should be required if the physical data is
centralized or distributed. The end-user must not be able to see that the data is distributed
over various locations. Users should always get the impression that the data is located at
one site only. This rule has been regarded as the foundation of distributed database systems.
The database can be located at the user server or at any other network. The end user should
not be able to know about the database servers. He should be able to get the records as if
he is pulling the records locally. Even if the database is located in different servers, the
accessibility time should be comparatively less. An RDBMS has distribution independence.
Distribution independence implies that users should not have to be aware of whether a
database is distributed.
Any row should obey the security and integrity constraints imposed. No special privileges
are applicable.
Almost all full scale DBMSs are RDMSs. Oracle implements 11+ rules and so does Sybase.
SQL Server also implements 11+ rules while FoxPro implements 7+ rules.
If a system has an interface that provides access to low-level records, then the interface
must not be able to subvert the system and bypass security and integrity constraints. When
a query is fired in the database, it will be converted into low level language so that it can
be understood by the underlying systems to retrieve the data. In such case, when accessing
or manipulating the records at low level language, there should not be any loopholes that
alter the integrity of the database. In other words, even though the query written does not
change the integrity of the tables, the converted low level language should be same as the
query written. It should not be converted into some other low level language which changes
the data integrity in the database or performs some unwanted actions in the database.
For Example:
26
Update Student’s address query should always be converted into low level language which
updates the address record in the student file in the memory. It should not be updating any
other record in the file nor inserting some malicious record into the file/memory.
2.5 SUMMARY
Every database which has tables and constraints need not be a relational database system.
Any database which simply has relational data model is not a relational database system
(RDBMS). There are certain rules for a database to be perfect RDBMS. These rules are
developed by Dr Edgar F Codd (EF Codd) in 1985 to define a perfect RDBMS. For a
RDBMS to be a perfect RDBMS, it has to follow his rules. But no RDBMS can obey all
his rules. EF Codd has developed 13 rules for a database to be a RDBMS. According to
him, all these rule help to have perfect RDBMS and hence correct data and relation among
the objects in database. But none of the database follows all these rules; but obeys to some
extent. For example, oracle follows only 8.5 Codd’s rules.
Since Codd's pioneering work, much research has been conducted on various
aspects of the relational model. Todd (1976) describes an experimental DBMS called
PRTV that directly implements the relational algebra operations. Schmidt and Swenson
(1975) introduces additional semantics into the relational model by classifying different
types of relations. Chen's (1976) entity-relationship model means to communicate the real-
world semantics of a relational database at the conceptual level. Wiederhold and Elmasri
(1979) introduces various types of connections.
27
Several characteristics differentiate relations from ordinary tables or files. The first
is that tuples in a relation are not ordered. The second involves the ordering of attributes in
a relation schema and the corresponding ordering of values within a tuple. We gave an
alternative definition of relation that does not require these two orderings, but we continued
to use the first definition, which requires attributes and tuple values to be ordered, for
convenience. We then discussed values in tuples and introduced null values to represent
missing or unknown information. We then classified database constraints into inherent
model-based constraints, schema-based constraints and application-based constraints. We
then discussed the schema constraints pertaining to the relational model, starting with
domain constraints, then key constraints, including the concepts of super key, candidate
key, and primary key, and the NOT NULL constraint on attributes. We then defined
relational databases and relational database schemas. Additional relational constraints
include the entity integrity constraint, which prohibits primary key attributes from being
null. The interrelation referential integrity constraint was then described, which is used to
maintain consistency of references among tuples from different relations.
2.6 KEYWORDS
DML- A data manipulation language (DML) is a computer programming language
used for adding (inserting), deleting, and modifying (updating) data in a database.
A DML is often a sublanguage of a broader database language such as SQL, with
the DML comprising some of the operators in the language.
Super Key- A superkey is a set of attributes within a table whose values can be
used to uniquely identify a tuple. A candidate key is a minimal set of attributes
necessary to identify a tuple; this is also called a minimal superkey.
Primary Key- A primary key, also called a primary keyword, is a key in a relational
database that is unique for each record. It is a unique identifier, such as a driver
license number, telephone number (including area code), or vehicle identification
number (VIN). A relational database must always have one and only one primary
key.
SQL- SQL is Structured Query Language, which is a computer language for
storing, manipulating and retrieving data stored in a relational database. SQL is the
standard language for Relational Database System.
28
Schema- The database schema of a database is its structure described in a formal
language supported by the database management system (DBMS). The term
"schema" refers to the organization of data as a blueprint of how the database is
constructed (divided into database tables in the case of relational databases).
29
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
RELATIONAL ALGEBRA
STRUCTURE
3.1 Introduction
3.2 Definition
3.3.1 Select
3.3.2 Project
3.3.3 Rename
3.6 Summary
3.7 Keywords
30
3.0 LEARNING OBJECTIVE
To understand the concepts of relational algebra, which is the integral part of the
relational data model. To learn the different notations such as unary as well as
binary with examples in detail.
3.1 INTRODUCTION
Database management systems (DBMS) must have a query language so that the users can
access the data stored in the database. Relational algebra (RA) is considered as
a procedural query language where the user tells the system to carry out a set of operations
to obtain the desired results. i.e. The user tells what data should be retrieved from the
database and how to retrieve it. In this article, I will give a brief introduction to relational
algebra and go through a few operations with examples and PostgreSQL commands.
The relational algebra is often considered to be an integral part of the relational data
model, and its operations can be divided into two groups. One group includes set operations
from mathematical set theory; these are applicable because each relation is defined to be a
set of tuples in the formal relational model. Set operations include UNION,
INTERSECTION, SET DIFFERENCE, and CARTESIAN PRODUCT. The other group
consists of operations developed specifically for relational databases-these include SELECT
PROJECT, and JOIN, among others. This chapter firstly discuss the SELECT and POJECT
operations because they are unary operations that operate on single relations. Then the
chapter discusses the JOIN and other complex binary operations, which operate on two
tables. Some common database requests cannot be performed with the original relational
algebra operations, so additional operations were created to express these requests. These
include aggregate functions, which are operations that can summarize data from the tables,
as well as additional types of JOIN and UNION operations. These operations were added to
the original relational algebra because of their importance to many database applications.
As, the chapter ends with the discussion of relational algebra, the subsequent chapter will
focus on describing the other main formal language for relational databases and relational
calculus.
31
3.2 DEFINATION
Relational algebra is a procedural query language that works on relational model. The
purpose of a query language is to retrieve data from database or perform various operations
such as insert, update, and delete on the data. When I say that relational algebra is a
procedural query language, it means that it tells what data to be retrieved and how to be
retrieved. On the other hand relational calculus is a non-procedural query language, which
means it tells what data to be retrieved but doesn’t tell how to retrieve it. We will discuss
relational calculus in a separate tutorial. Relational algebra is a procedural query language.
It gives a step by step process to obtain the result of the query. It uses operators to perform
queries.
Select
Project
Union
Set different
Cartesian product
Rename
32
Relational algebra is a family of algebras with a well-founded semantics used for modelling
the data stored in relational databases, and defining. It takes instances of relations as input
and yields instances of relations as output. It uses operators to perform queries. An operator
can be either unary or binary. They accept relations as their input and yield relations as
their output. Relational algebra is performed recursively on a relation and intermediate
results are also considered relations. Relational algebra collects instances of relations as
input and gives occurrences of relations as output. It uses various operations to perform this
action. SQL Relational algebra query operations are performed recursively on a relation.
The output of these operations is a new relation, which might be formed from one or more
input relations.
The figure 3.1 shows how we use relational algebra to fetch information or data from a
bigger dataset or table. In relational algebra, input is a relation (table from which data has
to be accessed) and output is also a relation (a temporary table holding the data asked for
by the user).
Relational Algebra works on the whole table at once, so we do not have to use loops etc.
to iterate over all the rows (tuples) of data one by one. All we have to do is specify the
table name from which we need the data, and in a single line of command, relational
algebra will traverse the entire given table to fetch data for you.
33
3.3 UNARY RELATIONAL OPERATIONS
In mathematics, a unary operation is an operation with only one operand, i.e. a single
input. This is in contrast to binary operations, which use two operands. An example is the
function f : A → A, where A is a set. The function f is a unary operation on A. An operator
can be either unary or binary. They accept relations as their input and yield relations as
their output. Relational algebra is performed recursively on a relation and intermediate
results are also considered relations. Operators act on what's known as operands.
An operator can act on one operand, and then it is called a unary operator, or, it can act
on two operands and then it is called a binary operator. It can act on more than two
operands but we won't go into this now. Figure 3.2 shows different relational operations.
Notation − σp(r)
34
Where σ stands for selection predicate and r stands for relation. p is prepositional logic
formula which may use connectors like and, or, and not. These terms may use relational
operators like − =, ≠, ≥, < , >, ≤.
σ is the predicate
p is prepositional logic
For example –
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
35
Query:
σ Customer_City="Agra" (CUSTOMER)
Output:
Customer_Id Customer_Name Customer_City
----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
Query:
σ BRANCH_NAME="perryride" (LOAN)
Output:
36
3.3.2 PROJECT (∏)
The projection eliminates all attributes of the input relation but those mentioned in the
projection list. The projection method defines a relation that contains a vertical subset of
Relation. This helps to extract the values of specified attributes to eliminate duplicate
values. (pi) symbol is used to choose attributes from a relation. This operator helps you to
keep specific columns from a relation and discards the other columns. Project operation is
used to project only a certain set of attributes of a relation. In simple words, If you want to
see only the names all of the students in the Student table, then you can use Project
Operation.
It will only project or show the columns or attributes asked for, and will also remove
duplicate data from the columns. Projection is used to project required column data from a
relation. Project operator is denoted by ∏ symbol and it is used to select desired columns
(or attributes) from a table (or relation).
o Project operator in relational algebra is similar to the Select statement in SQL. This
operation shows the list of those attributes that we wish to appear in the result. Rest
of the attributes are eliminated from the table.
The Projection operation works on a single relation R and defines a relation that contains a
vertical subset of R, extracting the values of specified attributes and eliminating duplicates.
Notation − ∏A , A , A (r)
1 2 n
Produce a list of salaries for all staff, showing only the staffNo, fName, lName, and
salary details
37
Lets takes some more example for better understanding the Project notation
1 Google Active
2 Amazon Active
3 Apple Inactive
4 Alibaba Active
CustomerName Status
Google Active
Amazon Active
Apple Inactive
Alibaba Active
In another example let’s take CUSTOMER with three columns, we want to fetch only
two columns of the table, which we can do with the help of Project Operator ∏.
Table: CUSTOMER
38
Output:
Customer_Name Customer_City
------------- -------------
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi
Syntax: A ∪ B
39
For a union operation to be valid, the following conditions must hold -
For example, if we have two tables RegularClass and ExtraClass, both have a
column student to save name of student, then,
∏Student(RegularClass) ∪ ∏Student(ExtraClass)
Above operation will give us name of Students who are attending both regular classes and
extra classes, eliminating repetition.
Example:
table_name1 ∪ table_name2
Table 1: COURSE
Table 2: STUDENT
40
Query:
Output:
Student_Name
------------
Aditya
Carl
Paul
Lucy
Rick
Steve
Note: As you can see there are no duplicate names present in the output even though we
had few common names in both the tables, also in the COURSE table we had the duplicate
name itself.
Intersection operator is denoted by ∩ symbol and it is used to select common rows (tuples)
from two tables (relations). Lets say we have two relations R1 and R2 both have same
columns and we want to select all those tuples(rows) that are present in both the relations,
then in that case we can apply intersection operation on these two relations R1 ∩ R2.
Note: Only those rows that are present in both the tables will appear in the result set.
Syntax:
table_name1 ∩ table_name2
Table 1: COURSE
41
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT
Query:
Output:
Student_Name
------------
Aditya
Steve
Paul
Lucy
Set Difference in relational algebra is same set difference operation as in set theory with
the constraint that both relation should have same set of attributes. The result of set
difference operation is tuples, which are present in one relation but are not in the second
relation. Lets take the same tables COURSE and STUDENT that we have seen above.
Notation − r – s or A − B
Query:
42
Output:
Student_Name
------------
Carl
Rick
Cartesian Product is denoted by X symbol. Lets say we have two relations R1 and R2 then
the cartesian product of these two relations (R1 X R2) would combine each tuple of first
relation R1 with the each tuple of second relation R2. I know it sounds confusing but once
we take an example of this, you will be able to understand this.
Syntax:
R1 X R2
Table 1: R
Col_A Col_B
----- ------
AA 100
BB 200
CC 300
Table 2: S
Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101
Query:
R X S
43
Output:
3.6 SUMMARY
In this chapter we presented two formal languages for the relational model of data.
They are used to manipulate relations and produce new relations as answers to queries. We
discussed the relational algebra and its operations, which are used to specify a sequence of
operations to specify a query. Then we introduced two types of relational calculi called
tuple calculus and domain calculus; they are declarative in that they specify the result of a
query without specifying how to produce the query result. The data for a single “instance”
of a table is stored as a row. Many relational database systems have an option of using the
SQL (Structured Query Language) for querying and maintaining the database.
We introduced the basic relational algebra operations and illustrated the types of
queries for which each is used. The unary relational operator SELECT and PROJECT, as
well as the RENAME operation, were discussed first. Then we discussed binary set
44
theoretic operations requiring that relations on which they are applied be union compatible;
these include UNION, INTERSECTION, and SET DIFFERENCE. The CARTESIAN
PRODUCT operation is a set operation that can be used to combine tuples from two
relations, producing all possible combinations. It is rarely used in practice; however, we
showed how CARTESIAN PRODUCT followed by SELECT can be used to define
matching tuples from two relations and leads to the JOIN operation. Different JOIN
operations called THETA JOIN, EQUIJOIN, and NATURAL JOIN were introduced. Some
important types of queries that cannot be stated with the basic relational algebra operations
but are important for practical situations. We introduced the AGGREGATE FUNCTION
operation to deal with aggregate types of requests. We discussed recursive queries, for
which there is no direct support in the algebra but which can be approached in a step-by-
step approach, as we demonstrated. We then presented the OUTER JOIN and OUTER
UNION operations, which extend JOIN and UNION and allow all information in source
relations to be preserved in the result.
3.7 KEYWORDS
DATASET- A data set (or dataset) is a collection of data. In the case of tabular
data, a data set corresponds to one or more database tables, where every column of
a table represents a particular variable, and each row corresponds to a given record
of the data set in question
TUPLE- A tuple is a collection of objects which ordered and immutable. Tuples
are sequences, just like lists. The differences between tuples and lists are, the tuples
cannot be changed unlike lists and tuples use parentheses, whereas lists use square
brackets.
RDBMS- Stands for "Relational Database Management System." An RDBMS is a
DBMS designed specifically for relational databases. Therefore, RDBMSes are a
subset of DBMSes. A relational database refers to a database that stores data in a
structured format, using rows and columns.
QUERY- A database query is a request for data from a database. Usually the
request is to retrieve data; however, data can also be manipulated using queries.
SQL- Structured Query Language, which is a computer language for storing,
manipulating and retrieving data stored in a relational database. SQL is the standard
language for Relational Database System.
45
3.8 SELF-ASSESSMENT TEST
1. Explain the role of relational algebra in relational database.
2. What are the different type of relational algebra? Discuss in detail
3. How different notations are used in relational algebra, discuss with examples.
4. Rename operator comes under which category, when it comes to relational algebra?
5. What are major differences between and unary and binary notations in relational
algebra?
46
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
RELATIONAL CALCULUS
STRUCTURE
4.1 Introduction
4.6 Summary
4.7 Keywords
4.1 INTRODUCTION
In the previous chapter we have discussed relational algebra, which is a procedural query
language. In this tutorial, we will discuss Relational Calculus, which is a non-procedural
query language. In this chapter, you will learn about the relational calculus and its concept
47
about the database management system. A certain arrangement is explicitly stated in
relational algebra expression, and a plan for assessing the query is implied. In the relational
calculus, there is no description and depiction of how to assess a query; instead, a relational
calculus query focuses on what is to retrieve rather than how to retrieve it. It uses
mathematical predicate calculus. The relational calculus is not the same as that of
differential and integral calculus in mathematics but takes its name from a branch of
symbolic logic termed as predicate calculus. When applied to databases, it is found in two
forms. These are
Tuple relational calculus which was originally proposed by Codd in the year 1972
and
Domain relational calculus which was proposed by Lacroix and Pirotte in the year
1977
A calculus expression specifies what is to be retrieved rather than how to retrieve it.
Therefore, the relational calculus is considered to be a nonprocedural language. This differs
from relational algebra, where we must write a sequence of operations to specify a retrieval
request; hence, it can be considered as a procedural way of stating a query. It is possible to
nest algebra operations to form a single expression; however, a certain order among the
operations is always explicitly specified in a relational algebra expression. This order also
influences the strategy for evaluating the query. A calculus expression may be written in
different ways, but the way it is written has no bearing on how a query should be evaluated.
It has been shown that any retrieval that can be specified in the basic relational
algebra can also be specified in relational calculus, and vice versa; in other words, the
expressive power of the two languages is identical. This led to the definition of the concept
of a relationally complete language. A relational query language L is considered
relationally complete if we can express in L any query that can be expressed in relational
calculus. Relational completeness has become an important basis for comparing the
expressive power of high-level query languages. However, as certain frequently required
queries in database applications cannot be expressed in basic relational algebra or calculus.
Most relational query languages are relationally complete but have more expressive power
than relational algebra or relational calculus because of additional operations such as
aggregate functions, grouping, and ordering.
48
4.2 DEFINATION OF RELATIONAL CALCULUS
What is Relational Calculus?
Relational calculus is a non-procedural query language that tells the system what data to be
retrieved but doesn’t tell how to retrieve it. Relational calculus is a non-procedural query
language. In the non-procedural query language, the user is concerned with the details of
how to obtain the end results. The relational calculus tells what to do but never explains
how to do. Contrary to Relational Algebra which is a procedural query language to fetch
data and which also explains how it is done, Relational Calculus in non-procedural query
language and has no description about how the query will work or the data will be fetched.
It only focusses on what to do, and not on how to do it.
For example, steps involved in listing all the employees who attend the 'Networking' Course
would be:
49
SELECT the tuples from COURSE relation with COURSENAME =
'NETWORKING'
In the tuple relational calculus, you will have to find tuples for which a predicate is true.
The calculus is dependent on the use of tuple variables. A tuple variable is a variable that
'ranges over' a named relation: i.e., a variable who’s only permitted values are tuples of the
relation. The tuple relational calculus is specified to select the tuples in a relation. In TRC,
filtering variable uses the tuples of a relation. The result of the relation can have one or
more tuples. Tuple Relational Calculus is a non-procedural query language unlike relational
algebra. Tuple Calculus provides only the description of the query but it does not provide
the methods to solve it. Thus, it explains what to do but not how to do.
Syntax:
{ T | Condition }
For example, to specify the range of a tuple variable S as the Staff relation, we write:
Staff(S)
To express the query 'Find the set of all tuples S such that F(S) is true,' we can write:
{S | F(S)}
In this form of relational calculus, we define a tuple variable, specify the table (relation)
name in which the tuple is to be searched for, along with a condition.
We can also specify column name using a . dot operator, with the tuple variable to only get
a certain attribute(column) in result. A lot of informtion, right! Give it some time to sink
50
in. A tuple variable is nothing but a name, can be anything, generally we use a single
alphabet for this, so let's say T is a tuple variable. To specify the name of the relation (table)
in which we want to look for data, we do the following:
Then comes the condition part, to specify a condition applicable for a particular attribute
(column), we can use the. Dot variable with the tuple variable to specify it, like in table
Student, if we want to get data for students with age greater than 17, then, we can write it
as,
Putting it all together, if we want to use Tuple Relational Calculus to fetch names of
students, from table Student, with age greater than 17, then, for T being our tuple variable,
Table: STUDENT
Query to display the last name of those students where age is greater than 30.
51
In the above query you can see two parts separated by | symbol. The second part is where
we define the condition and in the first part we specify the fields which we want to display
for the selected tuples.
Last_Name
---------
Singh
Query to display all the details of students where Last name is ‘Singh’.
Output:
Let’s take one more example for better understanding of Tuple relational calculus.
Table-1: Customer
Saurabh A7 Patiala
Mehak B6 Jalandhar
Sumiti D9 Ludhiana
Ria A5 Patiala
Table-2: Branch
52
ABC Patiala
DEF Ludhiana
GHI Jalandhar
Table-3: Account
Table-4: Loan
Table-5: Borrower
Saurabh L33
Mehak L49
Ria L98
Table-6: Depositor
Saurabh 1111
53
Mehak 1113
Sumiti 1114
Queries-1: Find the loan number, branch, amount of loans of greater than or equal to 10000
amount.
Queries-2: Find the loan number for each loan of an amount greater or equal to 10000.
LOAN NUMBER
L33
L35
L98
Queries-3: Find the names of all customers who have a loan and an account at the bank.
54
CUSTOMER NAME
Saurabh
Mehak
Queries-4: Find the names of all customers having a loan at the “ABC” branch.
{t | ∃ s ∈ borrower(t[customer-name] = s[customer-name]
∧ ∃ u ∈ loan(u[branch-name] = “ABC” ∧ u[loan-number] = s[loan-
number]))}
Resulting relation:
CUSTOMER NAME
Saurabh
In contrast to tuple relational calculus, domain relational calculus uses list of attribute to be
selected from the relation based on the condition. It is same as TRC, but differs by selecting
the attributes rather than selecting whole tuples. In the tuple relational calculus, you have
use variables that have a series of tuples in a relation. In the domain relational calculus, you
will also use variables, but in this case, the variables take their values from domains of
attributes rather than tuples of relations. In domain relational calculus, filtering is done
based on the domain of the attributes and not based on the tuple values. The second form
of relation is known as Domain relational calculus.
Notation:
55
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Or
{ < x1, x2, x3, ..., xn > | P (x1, x2, x3, ..., xn ) }
where, < x1, x2, x3, …, xn > represents resulting domains variables and P (x1, x2, x3, …,
xn ) represents the condition or formula equivalent to the Predicate calculus.
Example 1:
Table: STUDENT
Query:
Note: The symbols used for logical operators are: ∧ for AND, ∨ for OR and ┓ for NOT.
Output:
First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28
Example 2:
Table-1: Customer
56
Debomit Kadamtala Alipurduar
Table-2: Loan
L10 Sub 90
L08 Main 60
Table-3: Borrower
Ritu L01
Debomit L08
Soumya L03
Query-1: Find the loan number, branch, amount of loans of greater than or equal to 100 amount.
Query-2: Find the loan number for each loan of an amount greater or equal to 150.
57
{≺l≻ | ∃ b, a (≺l, b, a≻ ∈ loan ∧ (a ≥ 150)}
Resulting relation:
LOAN NUMBER
L01
L03
Query-3: Find the names of all customers having a loan at the “Main” branch and find the loan
amount.
Ritu 200
Debomit 60
Soumya 150
58
d) None of the mentioned
5. A query in the tuple relational calculus is expressed as:
a) {t | P() | t}
b) {P(t) | t }
c) {t | P(t)}
d) All of the mentioned
6. Which of the following symbol is used in the place of except?
a) ^
b) V
c) ¬
d) ~
7. An expression in the domain relational calculus is of the form
a) {P(x1, x2, . . . , xn) | < x1, x2, . . . , xn > }
b) {x1, x2, . . . , xn | < x1, x2, . . . , xn > }
c) { x1, x2, . . . , xn | x1, x2, . . . , xn}
d) {< x1, x2, . . . , xn > | P(x1, x2, . . . , xn)}
4.6 SUMMARY
Relational calculus is a non-procedural query language. It uses mathematical predicate
calculus instead of algebra. It provides the description about the query to get the result
whereas relational algebra gives the method to get the result. It informs the system what to
do with the relation, but does not inform how to perform it. For example, steps involved in
listing all the students who attend ‘Database’ Course in relational algebra would be
There are two types of relational calculus – Tuple Relational Calculus (TRC) and Domain
Relational Calculus (DRC).
59
certain attribute values etc. The resulting relation can have one or more tuples. It can be
denoted as:- {t | P (t)} or {t | condition (t)}
DRC- In contrast to tuple relational calculus, domain relational calculus uses list of attribute
to be selected from the relation based on the condition. It is same as TRC, but differs by
selecting the attributes rather than selecting whole tuples. It is denoted as below:
Where a1, a2, a3, … an are attributes of the relation and P is the condition.
4.7 KEYWORDS
DATABASE- A database is a collection of information that is organized so that it
can be easily accessed, managed and updated. Computer databases typically contain
aggregations of data records or files, containing information about sales
transactions or interactions with specific customers.
NON PROCEDURAL LANGUAGE- A computer language that does not require
writing traditional programming logic. Also known as a "declarative language,"
users concentrate on defining the input and output rather than the program steps
required in a procedural programming language such as C++ or Java.
RELATIONAL QUERY LANGUAGE- Relational query languages use
relational algebra to break the user requests and instruct the DBMS to execute the
requests. It is the language by which user communicates with the database. These
relational query languages can be procedural or non-procedural.
PROCEDURAL QUERY LANGUAGE: In procedural query language, user
instructs the system to perform a series of operations to produce the desired results.
Here users tells what data to be retrieved from database and how to retrieve it.
60
Which of the following will be the TRC query to obtain the department names that
do not have any girl students?
Qption 1.
{d.Dname | department (d) ∧ ~ ((∃(s)) student(s) ∧ s.sex ≠ ‘F’ ∧ s.deptNo = d.deptId)}
Qption 2.
{d.Dname | department (d) ∧ ((∀ (s)) student(s) ∧ s.sex ≠ ‘F’ ∧ s.deptNo = d.deptId)}
Qption 3.
{d.Dname | department (d) ∧ ~ ((∃(s)) student(s) ∧ s.sex = ‘F’ ∧ s.deptNo = d.deptId)}
3. What do you mean by Relational calculus, how it is different from relational algebra.
61
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
STRUCTURE
5.1 Introduction
5.2 Definition
5.7 Summary
5.8 Keywords
62
5.1 INTRODUCTION
Each relation schema consists of a number of attributes, and the relational database schema
consists of a number of relation schemas. So far, we have assumed that attributes are
grouped to form a relation schema by using the common sense of the database designer or
by mapping a database schema design from a conceptual data model such as the ER or
enhanced ER (EER) or some other conceptual data model. These models make the designer
identify entity types and relationship types and their respective attributes, which leads to a
natural and logical grouping of the attributes into relations when the mapping procedures.
We have not developed any measure of appropriateness or "goodness" to measure the
quality of the design, other than the intuition of the designer. In this chapter we discuss
some of the theory that has been developed with the goal of evaluating relational schemas
for design quality-that is, to measure formally why one set of groupings of attributes into
relation schemas is better than another.
There are two levels at which we can discuss the "goodness" of relation schemas.
The first is the logical (or conceptual) level-how users interpret the relation schemas and
the meaning of their attributes. Having good relation schemas at this level enables users to
understand clearly the meaning of the data in the relations, and hence to formulate their
queries correctly. The second is the implementation (or storage) level-how the tuples in a
base relation are stored and updated. This level applies only to schemas of base relations-
which will be physically stored as files-whereas at the logical level we are interested in
schemas of both base relations and views (virtual relations). The relational database design
theory developed in this chapter applies mainly to base relations, although some criteria of
appropriateness also apply to views. As with many design problems, database design may
be performed using two approaches: bottom-up or top-down. A bottom-up design
methodology (also called design by synthesis) considers the basic relationships among
individual attributes as the starting point and uses those to construct relation schemas. This
approach is not very popular in practice. Because it suffers from the problem of having to
collect a large number of binary relationships among attributes as the starting point. In
contrast, a top-down design methodology (also called design by analysis) starts with a
number of groupings of attributes into relations that exist together naturally, for example,
on an invoice, a form, or a report. The relations are then analysed individually and
collectively, leading to further decomposition until all desirable properties are met. The
theory described in this chapter is applicable to both the top-down and bottom-up design
63
approaches, but is more practical when used with the top-down approach. We define the
concept of functional dependency, a formal constraint among attributes that is the main tool
for formally measuring the appropriateness of attribute groupings into relation schemas.
Properties of functional dependencies are also studied and analysed. Then properties of
functional dependencies are also studied and analysed. Then we will discuss the how
functional dependencies can be used to group attributes into relation schemas that are in a
normal form. A relation schema is in a normal form when it satisfies certain desirable
properties. The process of normalization consists of analysing relations to meet
increasingly more stringent normal forms leading to progressively better groupings of
attributes. Normal forms are specified in terms of functional dependencies-which are
identified by the database designer-and key attributes of relation schemas.
When developing the schema of a relational database, one of the most important
aspects to be taken into account is to ensure that the duplication is minimized. This is done
for 2 purposes:
Reducing the amount of storage needed to store the data.
Avoiding unnecessary data conflicts that may creep in because of multiple copies
of the same data getting stored.
5.2 DEFINITION
Functional Dependency: A functional dependency is a constraint between two sets of
attributes from the database. Suppose that our relational database schema has n attributes
AI, A2, ……, An; let us think of the whole database as being described by a single universal
relation schema R = {A1, A2, A3…….., An). 6We do not imply that we will actually store
the database as a single universal table; we use this concept only in developing the formal
theory of data dependencies
Definition: A functional dependency is a constraint between two sets of attributes from the
database. Suppose that our relational database schema has n attributes A1, A2, ..., An. If
we think of the whole database as being described by a single universal relation schema R
= {A1, A2, ... , An}. A functional dependency (FD) is a relationship between two attributes,
typically between the PK and other non-key attributes within a table. For any relation R,
attribute Y is functionally dependent on attribute X (usually the PK), if for every valid
instance of X, that value of X uniquely determines the value of Y.
64
It determines the relation of one attribute to another attribute in a database management
system (DBMS) system. Functional dependency helps you to maintain the quality of data
in the database. A functional dependency is denoted by an arrow →. The functional
dependency of X on Y is represented by X → Y. Functional Dependency plays a vital role
to find the difference between good and bad database design.
Example:
In this example, if we know the value of Employee number, we can obtain Employee Name,
city, salary, etc. By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.
Definition of Normalization:
65
This means that the values of the Y component of a tuple in r depend on, or are determined
by, the values of the X component; we say that the values of the X component of a tuple
uniquely (or functionally) determine the values of the Y component. We say that there is a
functional dependency from X to Y, or that Y is functionally dependent on X.
Functional dependency is represented as FD or f.d. The set of attributes X is called the left-
hand side of the FD, and Y is called the right-hand side.
X functionally determines Y in a relation schema R if, and only if, whenever two tuples of
r(R) agree on their X-value, they must necessarily agree on their Y-value. If a constraint on
R states that there cannot be more than one tuple with a given X-value in any relation
instance r(R)—that is, X is a candidate key of R— this implies that X Y for any subset
of attributes Y of R.
Relation extensions r(R) that satisfy the functional dependency constraints are called legal
relation states (or legal extensions) of R. Functional dependencies are used to describe
further a relation schema R by specifying constraints on its attributes that must hold at all
times. Certain FDs can be specified without referring to a specific relation, but as a property
of those attributes given their commonly understood meaning.
For example, {State, Driver_license_number} Ssn should hold for any adult in the
United States and hence should hold whenever these attributes appear in a relation.
Consider the relation schema EMP_PROJ from the semantics of the attributes and the
relation, we know that the following functional dependencies should hold:
a. SsnEname
b. Pnumber {Pname, Plocation}
66
c. {Ssn, Pnumber}Hours
67
Sometimes data redundancy happens by accident while other times it is intentional.
Accidental data redundancy can be the result of a complex process or inefficient coding
while intentional data redundancy can be used to protect data and ensure consistency —
simply by leveraging the multiple occurrences of data for disaster recovery and quality
checks. If data redundancy is intentional, it’s important to have a central field or space for
the data. This allows you to easily update all records of redundant data when necessary.
Four major advantages of Data Redundancy:
Although data redundancy sounds like a negative event, there are many organizations that
can benefit from this process when it’s intentionally built into daily operations.
1. Alternative data backup method
Backing up data involves creating compressed and encrypted versions of data and storing
it in a computer system or the cloud. Data redundancy offers an extra layer of protection
and reinforces the backup by replicating data to an additional system. It’s often an
advantage when companies incorporate data redundancy into their disaster recovery plans.
2. Better data security
Data security relates to protecting data, in a database or a file storage system, from
unwanted activities such as cyberattacks or data breaches. Having the same data stored in
two or more separate places can protect an organization in the event of a cyberattack or
breach — an event which can result in lost time and money, as well as a damaged
reputation.
3. Faster data access and updates
When data is redundant, employees enjoy fast access and quick updates because the
necessary information is available on multiple systems. This is particularly important for
customer service-based organizations whose customers expect promptness and efficiency.
4. Improved data reliability
Data that is reliable is complete and accurate. Organizations can use data redundancy to
double check data and confirm it’s correct and completed in full — a necessity when
interacting with customers, vendors, internal staff, and others.
Although there are noteworthy advantages of intentional data redundancy, there are also
several significant drawbacks when organizations are unaware of its presence.
Possible data inconsistency
68
Data redundancy occurs when the same piece of data exists in multiple places, whereas
data inconsistency is when the same data exists in different formats in multiple tables.
Unfortunately, data redundancy can cause data inconsistency, which can provide a
company with unreliable and/or meaningless information.
Increase in data corruption
Data corruption is when data becomes damaged as a result of errors in writing, reading,
storage, or processing. When the same data fields are repeated in a database or file storage
system, data corruption arises. If a file gets corrupted, for example, and an employee tries
to open it, they may get an error message and not be able to complete their task.
Increase in database size
Data redundancy may increase the size and complexity of a database — making it more of
a challenge to maintain. A larger database can also lead to longer load times and a great
deal of headaches and frustrations for employees as they’ll need to spend more time
completing daily tasks.
Increase in cost
When more data is created due to data redundancy, storage costs suddenly increase. This
can be a serious issue for organizations who are trying to keep costs low in order to increase
profits and meet their goals. In addition, implementing a database system can become more
expensive.
There are four informal measures of quality for relation schema design.
Semantics of the Relation Attributes- The easier it is to explain the semantics of the
GUIDELINE 1: Design a relation schema so that it is easy to explain its meaning. Do not
combine attributes from multiple entity types and relationship types into a single relation.
Intuitively, if a relation schema corresponds to one entity type or one relationship type, the
meaning tends to be clear. Otherwise, the relation corresponds to a mixture of multiple
entities and relationships and hence becomes semantically unclear.
69
Example: A relation involves two entities- poor design.
EMP DEPT
ENAME SSN BDATE ADDREESS DNUMBER DNAME DMGRSSN
Consider the two relation schemas EMP_LOCS and EMP_PROJl in Figure 5.1 a, A tuple
in EMP_LOCS means that the employee whose name is ENAME works on some project
Figure 5.1 (a): The two relation schemas EMP_LOCS and EMP_PROJ1
70
Figure 5.1 (b) The result of projecting the extension of EMP_PROJ form Figure 5.1(a) on
the relations EMP_LOCS and EMP_PROJ1
Update anomalies for base relations EMP DEPT and EMP PROJ in Figure 5.1
Insertion anomalies: For EMP DEPT relation in Figure 5.1
To insert a new employee tuple, we need to make sure that the values of
attributes DNUMBER, DNAME, and DMGRSSN are consistent to other
employees (tuples) in EMP DEPT.
It is difficult to insert a new department that has no employees as yet in the EMP
DEPT relation.
Deletion anomalies: If we delete from EMP DEPT an employee tuple that happens
to represent the last employee working for a particular department, the information
concerning that department is lost from the database.
71
Modification anomalies: If we update the value of MGRSSN in a particular
department, we must to update the tuples of all employees who work in that
department; otherwise, the database will become inconsistent.
72
b) Lossy decomposition
c) Lossless-join decomposition
d) Both Lossy and Lossy-join decomposition
5. Suppose relation R(A,B,C,D,E) has the following functional dependencies:
A -> B
B -> C
BC -> A
A -> D
E -> A
D -> E
Which of the following is not a key?
a) A
b) E
c) B, C
d) D
5.7 SUMMARY
Functional dependency (FD) is a set of constraints between two attributes in a relation.
Functional dependency says that if two tuples have same values for attributes A1, A2,...,
An, then those two tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X
functionally determines Y. The left-hand side attributes determine the values of attributes
on the right-hand side. Database normalization is the process of efficiently organizing data
in a database so that redundant data is eliminated. This process can ensure that all of a
company’s data looks and reads similarly across all records. By implementing data
normalization, an organization standardizes data fields such as customer names, addresses,
and phone numbers. Normalizing data involves organizing the columns and tables of a
database to make sure their dependencies are enforced correctly. The “normal form” refers
to the set of rules or normalizing data, and a database is known as “normalized” if it’s free
of delete, update, and insert anomalies. When it comes to normalizing data, each company
has their own unique set of criteria. Therefore, what one organization believes to be
“normal,” may not be “normal” for another organization. For instance, one company may
want to normalize the state or province field with two digits, while another may prefer the
full name. Regardless, database normalization can be the key to reducing data redundancy
across any company.
73
Efficient data redundancy is possible. Many organizations like home improvement
companies, real estate agencies, and companies focused on customer interactions have
customer relationship management (CRM) systems. When a CRM system is integrated
with another business software like an accounting software that combines customer and
financial data, redundant manual data is eliminated, leading to more insightful reports and
improved customer service. Database management systems are also used in a variety of
organizations. They receive direction from a database administrator (DBA) and allow the
system to load, retrieve, or change existing data from the systems. Database management
systems adhere to the rules of normalization, which reduces data redundancy. Hospitals,
nursing homes, and other healthcare entities use database management systems to generate
reports that provide useful information for physicians and other employees. When data
redundancy is efficient and does not lead to data inconsistency, these systems can alert
healthcare providers of rises in denial claim rates, how successful a certain medication is,
and other important pieces of information.
5.8 KEYWORDS
AXIOM - Axioms is a set of inference rules used to infer all the functional
dependencies on a relational database.
DECOMPOSITION- It is a rule that suggests if you have a table that appears to
contain two entities which are determined by the same primary key then you should
consider breaking them up into two different tables.
DEPENDENT - It is displayed on the right side of the functional dependency
diagram.
UNION - It suggests that if two tables are separate, and the PK is the same, you
should consider putting them. Together.
DETERMINANT - It is displayed on the left side of the functional dependency
Diagram.
74
5. Discuss with example the redundancy in functional dependency.
75
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
STRUCTURE
6.1 Introduction
6.2 Definition
6.6 Summary
6.7 Keywords
76
To know the characteristics of functional dependencies
6.1 INTRODUCTION
Relational database design ultimately produces a set of relations. The implicit goals of the
design activity are: information preservation and minimum redundancy. So we need to
firstly focus on the Informal Design Guidelines for Relation Schemas.
Four informal guidelines that may be used as measures to determine the quality of relation
schema design:
Making sure that the semantics of the attributes is clear in the schema
Reducing the redundant information in tuples
Reducing the NULL values in tuples
Disallowing the possibility of generating spurious tuples
The semantics of a relation refers to its meaning resulting from the interpretation of
attribute values in a tuple. The relational schema design should have a clear meaning.
Guideline 1:
2. Do not combine attributes from multiple entity types and relationship types into a single
relation.
One goal of schema design is to minimize the storage space used by the base relations (and
hence the corresponding files). Grouping attributes into relation schemas has a significant
effect on storage space storing natural joins of base relations leads to an additional problem
referred to as update anomalies. These are: insertion anomalies, deletion anomalies, and
modification anomalies.
When insertion of a new tuple is not done properly and will therefore can make
the database become inconsistent.
77
When the insertion of a new tuple introduces a NULL value (for example a
department in which no employee works as of yet). This will violate the
integrity constraint of the table since ESsn is a primary key for the table.
Deletion Anomalies:
The problem of deletion anomalies is related to the second insertion anomaly situation just
discussed. Example: If we delete from EMP_DEPT an employee tuple that happens to
represent the last employee working for a particular department, the information
concerning that department is lost from the database.
Modification Anomalies happen if we fail to update all tuples as a result in the change in
a single one. Example: if the manager changes for a department, all employees who work
for that department must be updated in all the tables. It is easy to see that these three
anomalies are undesirable and cause difficulties to maintain consistency of data as well as
require unnecessary updates that can be avoided; hence
Guideline 2
Design the base relation schemas so that no insertion, deletion, or modification anomalies
are present in the relations. If any anomalies are present, note them clearly and make sure
that the programs that update the database will operate correctly. The second guideline is
consistent with and, in a way, a restatement of the first guideline.
Fat Relations: A relation in which too many attributes are grouped. If many of the attributes
do not apply to all tuples in the relation, we end up with many NULLs in those tuples. This
can waste space at the storage level and may also lead to problems with understanding the
meaning of the attributes and with specifying JOIN operations at the logical level. Another
problem with NULLs is how to account for them when aggregate operations such as
COUNT or SUM are applied. SELECT and JOIN operations involve comparisons; if
NULL values are present, the results may become unpredictable. Moreover, NULLs can
have multiple interpretations, such as the following:
The attribute does not apply to this tuple. For example, Visa_status may not apply
to U.S. students.
78
The attribute value for this tuple is unknown. For example, the Date_of_birth may
be unknown for an employee.
The value is known but absent; that is, it has not been recorded yet. For example,
the Home_Phone_Number for an employee may exist, but may not be available and
recorded yet. Having the same representation for all NULLs compromises the
different meanings they may have. Therefore, we may state another guideline.
Guideline 3
As much as possible, avoid placing attributes in a base relation whose values may
frequently be NULL. If NULLs are unavoidable, make sure that they apply in exceptional
cases only.
For example, if only 15 percent of employees have individual offices, there is little
justification for including an attribute Office_number in the EMPLOYEE relation; rather,
a relation EMP_OFFICES(Essn, Office_number) can be created.
Guideline 4
Design relation schemas so that they can be joined with equality conditions on attributes
that are appropriately related (primary key, foreign key) pairs in a way that guarantees that
no spurious tuples are generated. Avoid relations that contain matching attributes that are
not (foreign key, primary key) combinations because joining on such attributes may
produce spurious tuples.
6.2 DEFINITION
Functional Dependency (FD) determines the relation of one attribute to another attribute in
a database management system (DBMS) system. Functional dependency helps you to
maintain the quality of data in the database. A functional dependency is denoted by an
arrow →. The functional dependency of X on Y is represented by X → Y. Functional
Dependency plays a vital role to find the difference between good and bad database design.
Example:
79
1 Dana 50000 San Francisco
In this example, if we know the value of Employee number, we can obtain Employee Name,
city, salary, etc. By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.
A functional dependency A->B in a relation holds if two tuples having same value of
attribute A also have same value for attribute B. For Example, in relation STUDENT shown
in table 1, Functional Dependencies.
But
Functional Dependency Set: Functional Dependency set or FD set of a relation is the set
of all FDs present in the relation. For Example, FD set for relation STUDENT shown in
table 1 is:
Dependencies in DBMS is a relation between two or more attributes. It has the following
types in DBMS –
Fully-Functional Dependency
Transitive Dependency
Multivalued Dependency
Partial Dependency
Example:
<ProjectCost>
ProjectID ProjectCost
001 1000
002 5000
<EmployeeProject>
81
EmpID ProjectID Days (spent on the project)
E099 001 320
E056 002 190
Example:
Book → Author: Here, the Book attribute determines the Author attribute. If you
know the book name, you can learn the author's name. However, Author does not
determine Book, because an author can write multiple books. For example, just
82
because we know the author's name Orson Scott Card, we still don't know the book
name.
Author → Author_Nationality: Likewise, the Author attribute determines
the Author_Nationality, but not the other way around; just because we know the
nationality does not mean we can determine the author.
->->
Example:
P->->Q
Q->->R
The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example −
<StudentProject>
The prime key attributes are StudentID and ProjectNo. As stated, the non-prime attributes
i.e. StudentName and ProjectName should be functionally dependent on part of a candidate
key, to be Partial Dependent. The StudentName can be determined by StudentID that makes
83
the relation Partial Dependent. The ProjectName can be determined by ProjectID, which
that the relation Partial Dependent.
There is a one-to-one relationship between the left-hand side and right-hand side
attributes
Holds for all time
The determinant has the minimal number of necessary attributes
6.6 SUMMARY
A functional dependency is a constraint between two sets of attributes from the database.
Suppose that our relational database schema has n attributes A1, A2, ..., An. If we think of
the whole database as being described by a single universal relation schema R = {A1, A2,
... , An}. A functional dependency, denoted by X Y, between two sets of attributes X
and Y that are subsets of R, such that any two
tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
This means that the values of the Y component of a tuple in r depend on, or are determined
by, the values of the X component; we say that the values of the X component of a tuple
84
uniquely (or functionally) determine the values of the Y component. We say that there is a
Functional dependency is represented as FD or f.d. The set of attributes X is called the left-
hand side of the FD, and Y is called the right-hand side. X functionally determines Y in a
relation schema R if, and only if, whenever two tuples of r(R) agree on their X-value, they
must necessarily agree on their Y-value. If a constraint on R states that there cannot be
more than one tuple with a given X-value in any relation instance r(R)—that is, X is a
candidate key of R— this implies that X Y for any subset of attributes Y of R. If X is a
candidate key of R, then XR.
6.7 KEYWORDS
AXIOM- An axiom or postulate is a statement that is taken to be true, to serve as a
premise or starting point for further reasoning and arguments. The word comes from
the Greek axíōma (ἀξίωμα) 'that which is thought worthy or fit' or 'that which
commends itself as evident.
TRIVIAL − If a functional dependency (FD) X → Y holds, where Y is a subset of
X, then it is called a trivial FD. ... Completely non-trivial − If an FD X → Y holds,
where x intersect Y = Φ, it is said to be a completely non-trivial FD.
FOREIGN KEY- Foreign keys are the columns of a table that points to the primary
key of another table. They act as a cross-reference between tables.
JOIN DEPENDENCY- A join dependency is a constraint on the set of legal
relations over a database scheme. A table is subject to a join dependency if can
always be recreated by joining multiple tables each having a subset of the attributes
of.
85
6.8 SELF-ASSESSMENT TEST
1. What do you mean by fully functional dependency?
2. Write a short note on transitive dependency, give an example of transitive
dependency.
3. What is functional dependency and its types?
4. Discuss the characteristics of FD?
5. What is the use of functional dependency in RDBMS?
86
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
STRUCTURE
7.1 Introduction
7.3 Decomposition
7.9 Summary
7.10 Keywords
87
To understand the concept of Normalization in removing anomalies in
database
Study and learn decomposition methods and different forms of Normalization
7.1 INTRODUCTION
NORMALIZATION is a database design technique that reduces data redundancy and
eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization rules divides larger tables into smaller tables and links them using
relationships. The purpose of Normalization in SQL is to eliminate redundant (repetitive)
data and ensure data is stored logically. The inventor of the relational model Edgar Codd
proposed the theory of normalization with the introduction of the First Normal Form, and
he continued to extend theory with Second and Third Normal Form. Later he joined
Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing
the department details in which the employee works. At some point of time the table looks
like this in table 7.1:
88
166 Glenn Chennai D900
Update anomaly: In the above table we have two rows for employee Rick as he belongs
to two departments of the company. If we want to update the address of Rick then we have
to update the same in two rows or the data will become inconsistent. If somehow, the correct
address gets updated in one department but not in other then as per the database, Rick would
be having two different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890
then deleting the rows that are having emp_dept as D890 would also delete the information
of employee Maggie since she is assigned only to this department.
To understand (RDBMS) normalization in the database with example tables, let's assume
that we are supposed to store the details of courses and instructors in a university. Here is
what a sample database could look like:
89
CS101 Lecture Hall 20 Prof. George +91 6514821924
Here, the data basically stores the course code, course venue, instructor name, and
instructor’s phone number. At first, this design seems to be good. However, issues start to
develop once we need to modify information. For instance, suppose, if Prof. George
changed his mobile number. In such a situation, we will have to make edits in 2 places.
What if someone just edited the mobile number against CS101, but forgot to edit it for
CS154? This will lead to stale/wrong information in the database.
This problem, however, can be easily tackled by dividing our table into 2 simpler tables:
Table 1 (Instructor):
1. Instructor ID
2. Instructor Name
3. Instructor mobile number
Table 2 (Course):
Course code
Course venue
Instructor ID
Table 1 (Instructor):
Insturctor's ID Instructor's name Instructor's number
Table 2 (Course):
90
Course code Course venue Instructor ID
CS154 CS Auditorium 1
Basically, we store the instructors separately and in the course table, we do not store the
entire data of the instructor. We rather store the ID of the instructor. Now, if someone wants
to know the mobile number of the instructor, he/she can simply look up the instructor table.
Also, if we were to change the mobile number of Prof. George, it can be done in exactly
one place. This avoids the stale/wrong data problem.
Further, if you observe, the mobile number now need not be stored 2 times. We
have stored it at just 1 place. This also saves storage. This may not be obvious in the above
simple example. However, think about the case when there are hundreds of courses and
instructors and for each instructor, we have to store not just the mobile number, but also
other details like office address, email address, specialization, availability, etc. In such a
situation, replicating so much data will increase the storage requirement unnecessarily. The
above is a simplified example of how database normalization works. We will now more
formally study it.
91
3. Third Normal Form
4. BCNF
5. Fourth Normal Form
7.3 DECOMPOSITION
Definition. The normal form of a relation refers to the highest normal form condition that
it meets, and hence indicates the degree to which it has been normalized. Normal forms,
when considered in isolation from other factors, do not guarantee a good database design.
It is generally not sufficient to check separately that each relation schema in the database
is, say, in BCNF or 3NF. Rather, the process of normalization through decomposition must
also confirm the existence of additional properties that the relational schemas, taken
The nonadditive join or lossless join property, which guarantees that the spurious
tuple generation problem does not occur with respect to the relation schemas
In fact Normalization is carried out in practice so that the resulting designs are of high
quality and meet the desirable properties stated previously. The practical utility of these
normal forms becomes questionable when the constraints on which they are based are rare,
and hard to understand or to detect by the database designers and users who must discover
these constraints. Thus, database design as practiced in industry today pays particular
attention to normalization only up to 3NF, BCNF, or at most 4NF. Another point worth
noting is that the database designers need not normalize to the highest possible normal
92
7.4 FIRST NORMAL FORM (1NF)
First normal form (1NF) is now considered to be part of the formal definition of a relation
in the basic (flat) relational model; historically, it was defined to disallow multivalued
attributes, composite attributes, and their combinations. It states that the domain of an
attribute must include only atomic (simple, indivisible) values and that the value of any
attribute in a tuple must be a single value from the domain of that attribute. Hence, 1NF
value for a single tuple. In other words, 1NF disallows relations within relations or
relations as attribute values within tuples. The only attribute values permitted by 1NF are
Consider the DEPARTMENT relation schema shown in Figure below, whose primary key
is Dnumber, and suppose that we extend it by including the Dlocations attribute as shown
in Figure. We assume that each department can have a number of locations.. As we can see,
DEPARTMENT
Dnumber Dname Dmgr_SSN DLocation
There are three main techniques to achieve first normal form for such a relation:
Remove the attribute Dlocations that violates 1NF and place it in a separate relation
Expand the key so that there will be a separate tuple in the original DEPARTMENT
93
If a maximum number of values is known for the attribute—for example, if it is
known that at most three locations can exist for a department—replace the
Dlocation3. This solution has the disadvantage of introducing NULL values if most
Of the three solutions above, the first is generally considered best because it does not suffer
number of values.
Example:
The First normal form simply says that each cell of a table should contain exactly one value.
Let us take an example. Suppose we are storing the courses that a particular instructor takes,
we can store it like this:
Here, the issue is that in the first row, we are storing 2 courses against Prof. George. This
isn’t the optimal way since that’s now how SQL databases are designed to be used. A better
method would be to store the courses separately. For instance:
This way, if we want to edit some information related to CS101, we do not have to touch
the data corresponding to CS154. Also, observe that each row stores unique information.
94
7.5 Second Normal Form (2NF)
Second normal form (2NF) is based on the concept of full functional dependency. A
A from X means that the dependency does not hold any more; that is, for any attribute A ε
dependency if some attribute A ε X can be removed from X and the dependency still holds;
functionally dependent on the primary key of R. The test for 2NF involves testing for
functional dependencies whose left-hand side attributes are part of the primary key. If the
primary key contains a single attribute, the test need not be applied at all.
If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a
number of 2NF relations in which nonprime attributes are associated only with the part of
the primary key on which they are fully functionally dependent. The following example
shows how we can decompose a relation not in 2NF into three relations which are now in
2NF. For a table to be in second normal form, the following 2 conditions are to be met:
The first point is obviously straightforward since we just studied 1NF. Let us understand
the first point - 1 column primary key. Well, a primary key is a set of columns that uniquely
95
(a) Relation not in 2NF
FD1
FD2
FD3
Example:
Here, in this table, the course code is unique. So, that becomes our primary key. Let us take
another example of storing student enrollment in various courses. Each student may enroll
in multiple courses. Similarly, each course may have multiple enrollments. A sample table
may look like this (student name and course code):
96
Rajat CS101
Rahul CS154
Raman CS101
Here, the first column is the student name and the second column is the course taken by the
student. Clearly, the student name column isn’t unique as we can see that there are 2 entries
corresponding to the name ‘Rahul’ in row 1 and row 3. Similarly, the course code column
is not unique as we can see that there are 2 entries corresponding to course code CS101 in
row 2 and row 4. However, the tuple (student name, course code) is unique since a student
cannot enroll in the same course more than once. So, these 2 columns when combined form
the primary key for the database.
As per the second normal form definition, our enrollment table above isn’t in the second
normal form. To achieve the same (1NF to 2NF), we can rather break it into 2 tables:
Students:
Here the second column is unique and it indicates the enrollment number for the student.
Clearly, the enrollment number is unique. Now, we can attach each of these enrollment
numbers with course codes.
Courses:
97
These 2 tables together provide us with the exact same information as our original table.
Third normal form (3NF) is based on the concept of transitive dependency. A functional
attributes Z in R that is neither a candidate key nor a subset of any key of R, and both X→Z
satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key.
The relation schema EMP_DEPT in Figure (a) below is in 2NF, since no partial
dependencies
on a key exist. However, EMP_DEPT is not in 3NF because of the transitive dependency
EMP_DEPT by decomposing it into the two 3NF relation schemas shown in Figure (b).
Intuitively, we see that the two relations represent independent entity facts about employees
and departments :
(a)
FD1
FD2
(b)
FD1 FD2
98
General Definition
Definition. A relation schema R is in third normal form (3NF) if, whenever a nontrivial
dependency X → A holds in R that does not meet either condition—meaning that it violates
both conditions (a) and (b) of 3NF. This can occur due to two types of problematic
functional dependencies:
Here we have a partial dependency that violates 3NF (and also 2NF). Therefore, we can
Example:
Before we delve into details of third normal form, let us again understand the
concept of a functional dependency on a table. Column A is said to be functionally
dependent on column B if changing the value of A may require a change in the
value of B. As an example, consider the following table:
99
Here, the department column is dependent on the professor name column. This is because
if in a particular row, we change the name of the professor, we will also have to change the
department value. As an example, suppose MA214 is now taken by Prof. Ronald who
happens to be from the Mathematics department, the table will look like this:
Here, when we changed the name of the professor, we also had to change the department
column. This is not desirable since someone who is updating the database may remember
to change the name of the professor, but may forget updating the department value. This
can cause inconsistency in the database.
Third normal form avoids this by breaking this into separate tables:
Here, the third column is the ID of the professor who’s taking the course.
Here, in the above table, we store the details of the professor against his/her ID. This way,
whenever we want to reference the professor somewhere, we don’t have to put the other
details of the professor in that table again. We can simply use the ID.
Therefore, in the third normal form, the following conditions are required:
100
The table should be in the second normal form.
There should not be any functional dependency.
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was
found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF; however, a
differs from the definition of 3NF in that condition (b) of 3NF, which allows A to be prime,
is absent from BCNF. That makes BCNF a stronger normal form compared to 3NF. In
practice, most relation schemas that are in 3NF are also in BCNF. Only if X→A holds in a
relation schema R with X not being a superkey and A being a prime attribute will R be in
3NF but not in BCNF. Consider an example which shows a relation TEACH with the
following dependencies:
101
Boyce-Codd Normal form is a stronger generalization of third normal form. A table is in
Boyce-Codd Normal form if and only if at least one of the following conditions are met for
each functional dependency A → B:
A is a superkey
It is a trivial functional dependency.
Let us first understand what a superkey means. To understand BCNF in DBMS, consider
the following BCNF example table:
Here, the first column (course code) is unique across various rows. So, it is a superkey.
Consider the combination of columns (course code, professor name). It is also unique
across various rows. So, it is also a superkey. A superkey is basically a set of columns such
that the value of that set of columns is unique across various rows. That is, no 2 rows have
the same set of values for those columns. Some of the superkeys for the table above are:
Course code
Course code, professor name
Course code, professor mobile number
A superkey whose size (number of columns) is the smallest is called as a candidate key.
For instance, the first superkey above has just 1 column. The second one and the last one
have 2 columns. So, the first superkey (Course code) is a candidate key.
Boyce-Codd Normal Form says that if there is a functional dependency A → B, then either
A is a superkey or it is a trivial functional dependency. A trivial functional dependency
means that all columns of B are contained in the columns of A. For instance, (course code,
professor name) → (course code) is a trivial functional dependency because when we know
102
the value of course code and professor name, we do know the value of course code and so,
the dependency becomes trivial.
A is a superkey: this means that only and only on a superkey column should it be the case
that there is a dependency of other columns. Basically, if a set of columns (B) can be
determined knowing some other set of columns (A), then A should be a superkey. Superkey
basically determines each row uniquely.
103
7.9 SUMMARY
Normalization of data can be considered a process of analyzing the given relation schemas
based on their FDs and primary keys to achieve the desirable properties of (1) minimizing
redundancy and (2) minimizing the insertion, deletion, and update anomalies. It can be
considered as a “filtering” or “purification” process to make the design have successively
better quality. Unsatisfactory relation schemas that do not meet certain conditions—the
normal form tests—are decomposed into smaller relation schemas that meet the tests and
hence possess the
desirable properties. Thus, the normalization procedure provides database designers with
the following:
■ A formal framework for analyzing relation schemas based on their keys and on the
functional dependencies among their attributes.
■ A series of normal form tests that can be carried out on individual relation schemas so
that the relational database can be normalized to any desired degree.
Database Normalization is a technique of organizing the data in the database.
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion
Anomalies. It is a multi-step process that puts data into tabular form, removing duplicated
data from the relation tables. Normalization is used for mainly two purposes,
Eliminating redundant(useless) data.
Ensuring data dependencies make sense i.e data is logically stored.
7.10 KEYWORDS
SUPERKEY: A superkey is a set of attributes within a table whose values can be
used to uniquely identify a tuple. A candidate key is a minimal set of attributes
necessary to identify a tuple; this is also called a minimal superkey.
ANOMALY: Anomalies are problems that can occur in poorly planned, un-
normalised databases where all the data is stored in one table (a flat-file database).
CANDIDATE KEY: Primary Key is a unique and non-null key which identify a
record uniquely in table. A table can have only one primary key. Candidate key is
also a unique key to identify a record uniquely in a table but a table can have
multiple candidate keys
104
4NF: Fourth normal form (4NF): Fourth normal form (4NF) is a level of database
normalization where there are no non-trivial multivalued dependencies other than a
candidate key. It builds on the first three normal forms (1NF, 2NF and 3NF) and
the Boyce-Codd Normal Form (BCNF).
5NF: Fifth normal form (5NF), also known as project-join normal form (PJ/NF), is
a level of database normalization designed to reduce redundancy in relational
databases recording multi-valued facts by isolating semantically related multiple
relationships.
105
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
SQL
STRUCTURE
8.1 Introduction
8.8 Summary
8.9 Keywords
The objective of this chapter is to make the reader understand the most popular and
widely used query language SQL. This chapter presents the main features of the
106
SQL standard for commercial relational DBMSs. The main characteristics of SQL,
SQL data types and SQL literals.
8.1 INTRODUCTION
The SQL language may be considered one of the major reasons for the commercial success
of relational databases. Because it became a standard for relational databases, users were
less concerned about migrating their database applications from other types of database
systems—for example, network or hierarchical systems—to relational systems. This is
because even if the users became dissatisfied with the particular relational DBMS product
they were using, converting to another relational DBMS product was not expected to be
too expensive and time-consuming because both systems followed the same language
standards. However, the relational algebra operations are considered to be too technical for
most commercial DBMS users because a query in relational algebra is written as a sequence
of operations that, when executed, produces the required result. Hence, the user must
specify how—that is, in what order—to execute the query operations. On the other hand,
the SQL language provides a higher-level declarative language interface, so the user only
specifies what the result is to be, leaving the actual optimization and decisions on how to
execute the query to the DBMS. Although SQL includes some features from relational
algebra, it is based to a greater extent on the tuple relational calculus.
The name SQL is presently expanded as Structured Query Language. Originally,
SQL was called SEQUEL (Structured English QUEry Language) and was designed and
implemented at IBM Research as the interface for an experimental relational database
system called SYSTEM R. SQL is now the standard language for commercial relational
DBMSs. A joint effort by the American National Standards Institute (ANSI) and the
International Standards Organization (ISO) has led to a standard version of SQL (ANSI
1986), called SQL-86 or SQL1. A revised and much expanded standard called SQL-92
(also referred to as SQL2) was subsequently developed. The next standard that is well-
recognized is SQL:1999, which started out as SQL3. Two later updates to the standard are
SQL:2003 and SQL:2006, which added XML features among other updates to the
language. Another update in 2008 incorporated more object database features in SQL. .SQL
is a comprehensive database language: It has statements for data definitions, queries, and
updates. Hence, it is both a DDL and a DML. In addition, it has facilities for defining views
on the database, for specifying security and authorization, for defining integrity constraints,
107
and for specifying transaction controls. It also has rules for embedding SQL statements into
a general-purpose programming language such as Java, COBOL, or C/C++.
8.2 DEFINITION
SQL uses the terms table, row, and column for the formal relational model terms relation,
tuple, and attribute, respectively. Unlike most programming languages, SQL is unique in
that it is not procedural but declarative in nature. This means that when using this language
one states what data is desired and not how to get that data. A component within the
database server known as the optimizer will automatically determine how to get the data
most efficiently. Therefore the user may concentrate solely on what data is desired and then
allow the database to automatically select the optimum method by which to retrieve that
data. The SQL language has several aspects to it:
The Data Definition Language (DDL): This subset of SQL supports the creation,
deletion, and modifcation of definitions for tables and views. Integrity constraints
can be defined on tables, either when the table is created or later. The DDL also
provides commands for specifying access rights or privileges to tables and views.
Although the standard does not discuss indexes, commercial implementations also
provide commands for creating and deleting indexes.
The Data Manipulation Language (DML): This subset of SQL allows users to
pose queries and to insert, delete, and modify rows.
Embedded and dynamic SQL: Embedded SQL features allow SQL code to be
called from a host language such as C or COBOL. Dynamic SQL features allow a
query to be constructed (and executed) at run-time.
Triggers: The new SQL:1999 standard includes support for triggers, which are
actions executed by the DBMS whenever changes to the database meet conditions
specified in the trigger.
Security: SQL provides mechanisms to control users' access to data objects such
as tables and views.
Transaction management: Various commands allow a user to explicitly control
aspects of how a transaction is to be executed.
108
Client-server execution and remote database access: These commands control
how a client application program can connect to an SQL database server, or access
data from a database over a network.
109
8.4 SQL DATA TYPES
The basic data types available for attributes include numeric, character string, bit
string, Boolean, date, and time.
■ Numeric data types include integer numbers of various sizes (INTEGER or INT, and
SMALLINT) and floating-point (real) numbers of various precision (FLOAT or REAL,
and DOUBLE PRECISION). Formatted numbers can be declared by using
DECIMAL(i,j)—or DEC(i,j) or NUMERIC(i,j)—where i, the precision, is the total number
of decimal digits and j, the scale, is the number of digits after the decimal point. The default
for scale is zero, and the default for precision is implementation-defined.
■ Character-string data types are either fixed length—CHAR(n) or CHARACTER(n),
where n is the number of characters—or varying length— VARCHAR(n) or CHAR
VARYING(n) or CHARACTER VARYING(n), where n is the maximum number of
characters. When specifying a literal string value, it is placed between single quotation
marks (apostrophes), and it is case sensitive (a distinction is made between uppercase and
lowercase). For fixedlength strings, a shorter string is padded with blank characters to the
right. For example, if the value ‘Sudha’ is for an attribute of type CHAR(10), it is padded
with five blank characters to become ‘Sudha ’ if needed. Padded blanks are generally
ignored when strings are compared
■ Bit-string data types are either of fixed length n—BIT(n)—or varying length—BIT
VARYING(n), where n is the maximum number of bits. The default for n, the length of a
character string or bit string, is 1. Literal bit strings are placed between single quotes but
preceded by a B to distinguish them from character strings; for example, B‘10101’.5
Another variable-length bitstring data type called BINARY LARGE OBJECT or BLOB is
also available
to specify columns that have large binary values, such as images. As for CLOB, the
maximum length of a BLOB can be specified in kilobits (K), megabits (M), or gigabits (G).
For example, BLOB(30G) specifies a maximum length of 30 gigabits.
■ A Boolean data type has the traditional values of TRUE or FALSE. In SQL, because of
the presence of NULL values, a three-valued logic is used, so a third possible value for a
Boolean data type is UNKNOWN.
110
■ The DATE data type has ten positions, and its components are YEAR, MONTH, and
DAY in the form YYYY-MM-DD. The TIME data type has at least eight positions, with
the components HOUR, MINUTE, and SECOND in the form HH:MM:SS. Only valid dates
and times should be allowed by the SQL implementation. This implies that months should
be between 1 and 12 and dates must be between 1 and 31; furthermore, a date should be a
valid date for the corresponding month. The < (less than) comparison can be used with
dates or times—an earlier date is considered to be smaller than a later date, and similarly
with time.
Some additional data types are discussed below. The list of types discussed here is not
exhaustive; different implementations have added more data types to SQL.
■ A timestamp data type (TIMESTAMP) includes the DATE and TIME fields, plus a
minimum of six positions for decimal fractions of seconds and an optional WITH TIME
ZONE qualifier. Literal values are represented by single quoted strings preceded by the
keyword TIMESTAMP, with a blank space between data and time; for example,
TIMESTAMP ‘2008-09-27 09:12:47.648302’.
■ Another data type related to DATE, TIME, and TIMESTAMP is the INTERVAL data
type. This specifies an interval—a relative value that can be used to increment or
decrement an absolute value of a date, time, or timestamp. Intervals are qualified to be
either YEAR/MONTH intervals or DAY/TIME intervals. The format of DATE, TIME, and
TIMESTAMP can be considered as a special type of string. Hence, they can generally be
used in string comparisons by being cast (or coerced or converted) into the equivalent
strings.
Data Literal: A program source element that represents a data value. Data literals can be
divided into multiple groups depending upon the type of the data it is representing and how
it is representing.
1. Character String Literals are used to construct character strings, exact numbers,
approximate numbers and data and time values. The syntax rules of character string literals
are pretty simple:
A character string literal is a sequence of characters enclosed by quote characters.
The quote character is the single quote character "'".
111
If "'" is part of the sequence, it needs to be doubled it as "''".
Examples of character string literals:
'Hello’
‘world!'
'Loews
'123'
2. Hex String Literals are used to construct character strings and exact numbers.
Hexadecimal literals consist of 0 to 62000 hexadecimal digits delimited by a matching pair
of single quotes, where a hexadecimal digit is a character from 0 to 9, a to f, or A to F. The
syntax rules for hex string literals are also very simple:
A hex string literal is a sequence of hex digits enclosed by quote characters and
prefixed with "x".
The quote character is the single quote character "'".
Examples of hex string literals:
x ‘41423534’
x ‘ 57664873’
3. Numeric Literals are used to construct exact numbers and approximate numbers. A
numeric literal is a string of 1 to 40 characters selected from the following:
• plus sign
• minus sign
• digits 0 through 9
• decimal point
Numeric literals are also referred to as numeric constants. Syntax rules of numeric literals
are:
A numeric literal can be written in signed integer form, signed real numbers without
exponents, or real numbers with exponents.
Examples of numeric literals:
1
22.33
-345
4. Date and Time Literals are used to construct date and time values. The syntax of date
and time literals are:
A date literal is written in the form of "DATE 'yyyy-mm-dd'".
A time literal is written in the form of "TIMESTAMP 'yyyy-mm-dd hh:mm:ss'".
112
Examples of data and time literals:
DATE ‘2013-07-15’
TIMESTAMP ’2013-07-15 01:02:03’
Following are some of the most commonly used constraints available in SQL:
NOT NULL Constraint: Ensures that a column cannot have a NULL value.
DEFAULT Constraint: Provides a default value for a column when none is specified.
UNIQUE Constraint: Ensures that all the values in a column are different.
CHECK Constraint: The CHECK constraint ensures that all values in a column satisfy
certain conditions.
INDEX: Used to create and retrieve data from the database very quickly.
113
8.8 SUMMARY
SQL (pronounced "ess-que-el") stands for Structured Query Language. SQL is used to
communicate with a database. According to ANSI (American National Standards Institute),
it is the standard language for relational database management systems. SQL statements
are used to perform tasks such as update data on a database, or retrieve data from a database.
Some common relational database management systems that use SQL are: Oracle, Sybase,
Microsoft SQL Server, Access, Ingres, etc. Although most database systems use SQL, most
of them also have their own additional proprietary extensions that are usually only used on
their system. However, the standard SQL commands such as "Select", "Insert", "Update",
"Delete", "Create", and "Drop" can be used to accomplish almost everything that one needs
to do with a database. This tutorial will provide you with the instruction on the basics of
each of these commands as well as allow you to put them to practice using the SQL
Interpreter.
Allows users to define the data in a database and manipulate that data.
Allows to embed within other languages using SQL modules, libraries & pre-compilers
114
Allows users to create and drop databases and tables.
8.9 KEYWORDS
PL (PROGRAMMING LANUAGE)- A programming language is a vocabulary
and set of grammatical rules for instructing a computer or computing device to
perform specific tasks. The term programming language usually refers to high-
level languages, such as BASIC, C, C++, COBOL, Java, FORTRAN, Ada, and
Pascal.
CONSTRAINTS- Constraints make it possible to further restrict the domain of an
attribute. For instance, a constraint can restrict a given integer attribute to values
between 1 and 10.
TUPLE- A data set representing a single item.
COLUMN- A labeled element of a tuple, e.g. "Address" or "Date of birth"
TABLE- A set of tuples sharing the same attributes; a set of columns and rows
VIEW- Any set of tuples; a data report from the RDBMS in response to a query
OPEN-SOURCE- MYSQL is an open-source relational database management
system (RDBMS). ... SQL is a language programmers use to create, modify and
extract data from the relational database, as well as control user access to the
database.
2. How does SQL allow implementation of the entity integrity and referential integrity
3. How do the relations (tables) in SQL differ from the relations defined formally in
relation algebra? Discuss the other differences in terminology. Why does SQL
116
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
STRUCTURE
9.1 Introduction
9.2 Definition
9.5.1 Select
9.5.2 Insert
9.5.3 Delete
9.5.4 Update
9.8 Summary
117
9.9 Keywords
9.1 INTRODUCTION
When we are executing an SQL command for any RDBMS, the system determines the best
way to carry out your request and SQL engine figures out how to interpret the task. There
are various components included in the process. These components are Query Dispatcher,
Optimization Engines, Classic Query Engine and SQL Query Engine, etc. Classic query
engine handles all non-SQL queries, but SQL query engine won't handle logical files.
Following is a simple diagram in figure 9.1 showing SQL Architecture:
When you are executing an SQL command for any RDBMS, the system determines the
best way to carry out your request and SQL engine figures out how to interpret the task.
Query Dispatcher
Optimization Engines
A classic query engine handles all the non-SQL queries, but a SQL query engine won't
handle logical files.
118
Figure 9.1: SQL Architecture
9.2 DEFINITION
Query Dispatcher- The function of the dispatcher is to route the query request to either
CQE or SQE, depending on the attributes of the query. All queries are processed by the
dispatcher. It cannot be bypassed.
Optimization Engines- The query optimizer determines the most efficient way to execute
a SQL statement after considering many factors related to the objects referenced and the
conditions specified in the query.
Classic Query Engine- Classic query engine handles all non-SQL queries but SQL query
engine won't handle logical files. CREATE Creates a new table, a view of a table, or other
object in database ALTER Modifies an existing database object, such as a table.
119
SQL Query Engine- SQL engine is defined as software that recognizes and interprets SQL
commands to access a relational database and interrogate data. SQL engine is also
commonly referred to as a SQL database engine or a SQL query engine.
120
Figure 9.2: SQL Commands types
DDL (Data Definition Language): DDL or Data Definition Language actually consists of
the SQL commands that can be used to define the database schema. It simply deals with
descriptions of the database schema and is used to create and modify the structure of
database objects in the database.
The CREATE TABLE command is used to specify a new relation by giving it a name
And specifying its attributes and initial constraints. The attributes are specified first,
And each attribute is given a name, a data type to specify its domain of values, and
Any attribute constraints, such as NOT NULL. The key, entity integrity, and referential
Integrity constraints can be specified within the CREATE TABLE statement after
The attributes are declared. The syntax of the statement is as follows:
CREATE TABLE tablename ( attribute_name1 datatype, attribute_name2
datatype)
Example
CREATE TABLE EMPLOYEE
( Fname VARCHAR(15) NOT NULL, Minit CHAR, Lname VARCHAR(15) NOT NULL,
Adhar_No. CHAR(9) NOT NULL, Bdate DATE, Address VARCHAR(30), Sex CHAR,
Salary DECIMAL(10,2), Super_Adhar_No. CHAR(9), Dno INT NOT NULL, PRIMARY
KEY (Adhar_No.),
FOREIGN KEY (Super_Adhar_No.) REFERENCES EMPLOYEE(Adhar_No.),
FOREIGN KEY (Dno) REFERENCES DEPARTMENT(Dnumber) );
The definition of a base table or of other named schema elements can be changed by using
the ALTER command. For base tables, the possible alter table actions include adding or
dropping a column (attribute), changing a column definition, and adding or dropping table
121
constraints. For example, to add an attribute for keeping track of jobs of employees to the
EMPLOYEE base relation in the COMPANY schema , we can use the command
ALTER TABLE COMPANY.EMPLOYEE ADD COLUMN Job VARCHAR(12);
We must still enter a value for the new attribute Job for each individual EMPLOYEE tuple.
This can be done either by specifying a default clause or by using the UPDATE command
individually on each tuple. If no default clause is specified, the new attribute will have
NULLs in all the tuples of the relation immediately after the command is executed; hence,
the NOT NULL constraint is not allowed in this case. To drop a column, we must choose
either CASCADE or RESTRICT for drop behavior. If CASCADE is chosen, all constraints
and views that reference the column are dropped automatically from the schema, along with
the column. If RESTRICT is chosen, the command is successful only if no views or
constraints (or other schema elements) reference the column.
ALTER TABLE COMPANY.EMPLOYEE DROP COLUMN Address
CASCADE;
122
Notice that the DROP TABLE command not only deletes all the records in the table if
successful, but also removes the table definition from the catalog. If it is desired to delete
only the records but to leave the table definition for future use, then the DELETE command
should be used instead of DROP TABLE. The DROP command can also be used to drop
other types of named schema elements, such as constraints or domains.
123
The basic form of the SELECT statement, sometimes called a mapping or a select-
from- where block, is formed of the three clauses SELECT, FROM, and WHERE and
has the following form:
SELECT <attribute list>
FROM <table list>
WHERE <condition>;
where
■ <attribute list> is a list of attribute names whose values are to be retrieved
by the query.
■ <table list> is a list of the relation names required to process the query.
■ <condition> is a conditional (Boolean) expression that identifies the tuples
to be retrieved by the query.
The SELECT clause of SQL specifies the attributes whose values are to be
retrieved, which are called the projection attributes, and the WHERE clause
specifies the Boolean condition that must be true for any retrieved tuple, which is known
as the selection condition.
Unspecified WHERE Clause and Use of the Asterisk
A missing WHERE clause indicates no condition on tuple selection; hence, all
tuples of the relation specified in the FROM clause qualify and are selected for the
query result. If more than one relation is specified in the FROM clause and there is
no WHERE clause, then the CROSS PRODUCT—all possible tuple combinations—of
these relations is selected. For example:
SELECT *
FROM EMPLOYEE
WHERE Dno=5;
Substring Pattern Matching and Arithmetic Operators
In this section we discuss several more features of SQL. The first feature allows
comparison conditions on only parts of a character string, using the LIKE
comparison
operator. This can be used for string pattern matching. Partial strings are specified
using two reserved characters: % replaces an arbitrary number of zero or more characters,
and the underscore (_) replaces a single character. For example, consider the following
query.
124
Query . Retrieve all employees whose address is in Sirsa, Haryana.
SELECT Fname, Lname
FROM EMPLOYEE
WHERE Address LIKE ‘%Sirsa, Haryana%’;
Another example:
Find all employees who were born during the 1950s.
SELECT Fname, Lname
FROM EMPLOYEE
WHERE Bdate LIKE ‘_ _ 5 _ _ _ _ _ _ _’;
Another feature allows the use of arithmetic in queries. The standard arithmetic
operators for addition (+), subtraction (–), multiplication (*), and division (/) can
be applied to numeric values or attributes with numeric domains. For example,
suppose that we want to see the effect of giving all employees who work on the
‘ProductX’ project a 10 percent raise; we can use the following query:
SELECT E.Fname, E.Lname, 1.1 * E.Salary AS Increased_sal
FROM EMPLOYEE AS E, WORKS_ON AS W, PROJECT AS P
WHERE E.Adhar_No.=W.Adhar_No. AND W.Pno=P.Pnumber AND
P.Pname=‘ProductX’;
GROUP BY & HAVING CLAUSE
In many cases we want to apply the aggregate functions to subgroups of tuples in a relation,
where the subgroups are based on some attribute values. For example, we may want to find
the average salary of employees in each department or the number of employees who work
on each project. In these cases we need to partition the relation into nonoverlapping
subsets (or groups) of tuples. Each group (partition) will consist of the tuples that have the
same value of some attribute(s), called the grouping attribute(s). We can then apply the
function to each such group independently to produce summary information about each
group. SQL has a GROUP BY clause for this purpose. The GROUP BY clause specifies
the grouping attributes, which should also appear in the SELECT clause, so that the value
resulting from applying each aggregate function to a group of tuples appears along with the
value of the grouping attribute(s).
For each department, retrieve the department number, the number of employees in the
department, and their average salary.
SELECT Dno, COUNT (*), AVG (Salary)
FROM EMPLOYEE
125
GROUP BY Dno;
Sometimes we want to retrieve the values of these functions only for groups that satisfy
certain conditions. For example, suppose that we want to modify above Query so that only
projects with more than two employees appear in the result. SQL provides a HAVING
clause, which can appear in conjunction with a GROUP BY clause, for this purpose.
HAVING provides a condition on the summary information regarding the group of tuples
associated with each value of the grouping attributes. Only the groups that satisfy the
condition are retrieved in the result of the query. This is illustrated by Query below:
For each project on which more than two employees work, retrieve the project
number, the project name, and the number of employees who work on the project.
SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname
HAVING COUNT (*) > 2;
A retrieval query in SQL can consist of up to six clauses, but only the first two— SELECT
and FROM—are mandatory. The query can span several lines, and is ended by a semicolon.
Query terms are separated by spaces, and parentheses can be used to group relevant parts
of a query in the standard way. The clauses are specified in the following order, with the
clauses between square brackets [ ... ] being optional:
SELECT <attribute and function list>
FROM <table list>
[ WHERE <condition> ]
[ GROUP BY <grouping attribute(s)> ]
[ HAVING <group condition> ]
[ ORDER BY <attribute list> ];
The SELECT clause lists the attributes or functions to be retrieved. The FROM clause
specifies all relations (tables) needed in the query, including joined relations, but not those
in nested queries. The WHERE clause specifies the conditions for selecting the tuples from
these relations, including join conditions if needed. GROUP BY specifies grouping
attributes, whereas HAVING specifies a condition on the groups being selected rather than
on the individual tuples. The built-in aggregate functions COUNT, SUM, MIN, MAX, and
AVG are used in conjunction with grouping, but they can also be applied to all the selected
126
tuples in a query without a GROUP BY clause. Finally, ORDER BY specifies an order for
displaying the result of a query.
127
DELETE FROM EMPLOYEE
WHERE Lname=‘Bansal’;
DELETE FROM EMPLOYEE
WHERE Adhar_No.=‘123456789’;
DELETE FROM EMPLOYEE
WHERE Dno=5;
DELETE FROM EMPLOYEE;
128
that Oracle Database treats as a single unit. This statement also erases all savepoints
in the transaction and releases transaction locks. Oracle Database issues an
implicit COMMIT before and after any data definition language (DDL) statement.
You can also use this statement to
Commit an in-doubt distributed transaction manually
Terminate a read-only transaction begun by a SET TRANSACTION statement
Committing an Insert: Example
This statement inserts a row into the hr.regions table and commits this
change:
INSERT INTO regions VALUES (5, 'Antarctica');
COMMIT WORK;
ROLLBACK
Use the ROLLBACK statement to undo work done in the current transaction or to manually
undo the work done by an in-doubt distributed transaction. To roll back your current
transaction, no privileges are necessary. To manually roll back an in-doubt distributed
transaction that you originally committed, you must have the FORCE TRANSACTION
system privilege. To manually roll back an in-doubt distributed transaction originally
committed by another user, you must have the FORCE ANY TRANSACTION system
privilege. The following statement rolls back your entire current transaction:
ROLLBACK;
SET TRANSACTION SAVEPOINT
Specify the savepoint to which you want to roll back the current transaction. If you omit
this clause, then the ROLLBACK statement rolls back the entire transaction. Using
ROLLBACK without the TO SAVEPOINT clause performs the following operations: Ends
the transaction
Undoes all changes in the current transaction
Erases all savepoints in the transaction
Releases any transaction locks
Using ROLLBACK with the TO SAVEPOINT clause performs the following operations:
Rolls back just the portion of the transaction after the savepoint
Erases all savepoints created after that savepoint. The named savepoint is retained,
so you can roll back to the same savepoint multiple times. Prior savepoints are also
retained.
129
Releases all table and row locks acquired since the savepoint. Other transactions
that have requested access to rows locked after the savepoint must continue to wait
until the transaction is committed or rolled back. Other transactions that have not
already requested the rows can request and access the rows immediately. The
following statement rolls back your current transaction to savepoint banda_sal:
ROLLBACK TO SAVEPOINT banda_sal;
VIEW-
A view in SQL terminology is a single table that is derived from other tables. These other
tables can be base tables or previously defined views. A view does not necessarily exist in
physical form; it is considered to be a virtual table, in contrast to base tables, whose tuples
are always physically stored in the database. This limits the possible update operations that
can be applied to views, but it does not provide any limitations on querying a view. We can
think of a view as a way of specifying a table that we need to reference frequently, even
though it may not exist physically.
Specification of Views in SQL
In SQL, the command to specify a view is CREATE VIEW. The view is given a (virtual)
table name (or view name), a list of attribute names, and a query to specify the contents of
the view. If none of the view attributes results from applying functions or arithmetic
operations, we do not have to specify new attribute names for the view, since they would
be the same as the names of the attributes of the defining tables in the default case.
CREATE VIEW WORKS_ON1
AS SELECT Fname, Lname, Pname, Hours
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE Adhar_No.=E.Adhar_No. AND Pno=Pnumber;
We can now specify SQL queries on a view—or virtual table—in the same way we specify
queries involving base tables. For example, to retrieve the last name and first name of all
employees who work on the ‘ProductX’ project, we can utilize the WORKS_ON1 view
and specify the query as in QV1:
QV1: SELECT Fname, Lname
FROM WORKS_ON
WHERE Pname=‘ProductX’;
A view is supposed to be always up-to-date; if we modify the tuples in the base tables on
which the view is defined, the view must automatically reflect these changes. Hence, the
130
view is not realized or materialized at the time of view definition but rather at the time when
we specify a query on the view. It is the responsibility of the DBMS and not the user to
make sure that the view is kept up-to-date.We will discuss various ways the DBMS can
apply to keep a view up-to-date in the next subsection. If we do not need a view any more,
we can use the DROP VIEW command to dispose of it. For example, to get rid of the view
V1, we can use the SQL statement in V1A:
V1A: DROP VIEW WORKS_ON;
9.8 SUMMARY
Query languages are used to make queries in a database, and Microsoft Structured Query
Language (SQL) is the standard. Under the SQL query umbrella, there are several
extensions of the language, including MySQL, Oracle SQL and NuoDB. Query languages
for other types of databases, such as NoSQL databases and graph databases, include
Cassandra Query Language (CQL), Neo4j's Cypher, Data Mining Extensions (DMX) and
XQuery. The original version of SQL was implemented in the experimental DBMS called
SYSTEM R, which was developed at IBM Research. SQL is designed to be a
comprehensive language that includes statements for data definition, queries, updates,
constraint specification, and view definition.We discussed the following features of SQL
in this chapter: the data definition commands for creating tables, commands for constraint
specification, simple retrieval queries, and database update commands.
131
9.9 KEYWORDS
SQL Commands: DDL,DML, DCL commands
132
9.12 REFERENCES / SUGGESTED READINGS
C.J Date, “An Introduction to Database Systems”, 8th edition, Addison Wesley N.
Delhi.
Ivan Bayross, “SQL, PL/SQL-The Programming Language of ORACLE”, BPB
Publication 3rd edition.
Elmasri and Navathe, “Fundamentals of Database Systems”, 5th edition, Pearson
Education.
https://www.tutorialspoint.com/sql/sql-overview.htm
https://study.com/academy/lesson/what-is-query-in-sql.html
https://searchsqlserver.techtarget.com/definition/query
133
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
STRUCTURE
10.1 Introduction
10.2 Definition
10.7 Summary
10.8 Keywords
134
10.0 LEARNING OBJECTIVE
The objective of this chapter is to make the reader understand the procedural
language for SQL. The architecture of PL/SQL in detailed will be studies and
to get familiar with the loops that can be used in the procedural language for
SQL.
10.1 INTRODUCTION
PL/SQL stands for Procedural Language/SQL. PL/SQL extends SQL by adding constructs
found in procedural languages, resulting in a structural language that is more powerful than
SQL.PL/SQL is not case sensitive. ‘C’ style comments (/* ……… */) may be used in
PL/SQL programs whenever required.
All PL/SQL programs are made up of blocks, each block performs a logical action in the
program. A PL/SQL block consists of three parts
1. Declaration section
2. Executable section
3. Exception handling section
Only the executable section is required. The other sections are optional.
1. Declaration section:
This is first section which is start with word Declare. All the identifiers (constants and
variables) are declared in this section before they are used in SELECT command.
2. Executable section:
This section contain procedural and SQL statements. This is the only section of the block
which is required. This section starts with ‘Begin’ word.
135
The only SQL statements allowed in a PL/SQL program are SELECT, INSERT,
UPDATE, DELETE and several other data manipulation statements.
Data definition statements like CREATE, DROP or ALTER are not allowed.
The executable section also contains constructs such as assignments, branches,
loops, procedure calls and trigger which are all discussed in detail in subsequent
chapters.
3. Exception handling section :
This section is used to handle errors that occurs during execution of PL/SQL
program. This section starts with ‘exception’ word .
The ‘End’ indicate end of PL/SQL block.
Oracle PL/SQL programs, can be invoke either by typing it in sqlplus or by putting
the code in a file and invoking the file. To execute it use ‘/’ on SQL prompt or use ‘.’
and run.
10.2 DEFINITION
Oracle PL/SQL is an extension of SQL language, designed for seamless processing of SQL
statements enhancing the security, portability, and robustness of the database. This PL/SQL
online programming eBook explains some important aspect of PL SQL language like block
structure, data types, packages, triggers, exception handling, etc.
The PL/SQL programming language was developed by Oracle Corporation in the late
1980s as procedural extension language for SQL and the Oracle relational database.
Following are certain notable facts about PL/SQL:
PL/SQL can also directly be called from the command-line SQL*Plus interface.
Direct call can also be made from external programming language calls to database.
PL/SQL's general syntax is based on that of ADA and Pascal programming language.
Apart from Oracle, PL/SQL is available in TimesTen in-memory database and IBM DB2.
136
10.3 ARCHITECTURE OF PL/SQL
The PL/SQL compilation and run-time system is a technology, not an independent product.
Think of this technology as an engine that compiles and executes PL/SQL blocks and
subprograms. The engine can be installed in an Oracle server or in an application
development tool such as Oracle Forms or Oracle Reports. So, PL/SQL can reside in two
environments:
1. The Oracle server
2. Oracle tools.
These two environments are independent. PL/SQL is bundled with the Oracle server but
might be unavailable in some tools. In either environment, the PL/SQL engine accepts as
input any valid PL/SQL block or subprogram. Fig. 10.1 shows the PL/SQL engine
processing an anonymous block. The engine executes procedural statements but sends SQL
137
Anonymous Blocks
Stored Subprograms
Stored subprograms defined within a package are called packaged subprograms. Those
defined independently are called standalone subprograms. Those defined within another
subprogram or within a PL/SQL block are called local subprograms, which cannot be
referenced by other applications and exist only for the convenience of the enclosing block.
Stored subprograms offer higher productivity, better performance, memory savings,
application integrity, and tighter security. For example, by designing applications around a
library of stored procedures and functions, you can avoid redundant coding and increase
your productivity. You can call stored subprograms from a database trigger, another stored
subprogram, an Oracle Precompiler application, an OCI application, or interactively from
SQL*Plus or Enterprise Manager. For example, you might call the standalone procedure
create_dept from SQL*Plus as follows:
Subprograms are stored in parsed, compiled form. So, when called, they are loaded and
passed to the PL/SQL engine immediately. Also, they take advantage of shared memory.
So, only one copy of a subprogram need be loaded into memory for execution by multiple
users.
138
10.4 FEATURES AND ADVANTAGES OF PL/SQL
Features:
Advantages:
SQL has become the standard database language because it is flexible, powerful, and easy
to learn. A few English-like commands such as SELECT, INSERT, UPDATE, and
DELETE make it easy to manipulate the data stored in a relational database. SQL is non-
procedural, meaning that you can state what you want done without stating how to do it.
139
Oracle determines the best way to carry out your request. There is no necessary connection
between consecutive statements because Oracle executes SQL statements one at a time.
PL/SQL lets you use all the SQL data manipulation, cursor control, and transaction control
commands, as well as all the SQL functions, operators, and pseudocolumns. So, you can
manipulate Oracle data flexibly and safely. Also, PL/SQL fully supports SQL datatypes.
That reduces the need to convert data passed between your applications and the database.
PL/SQL also supports dynamic SQL, an advanced programming technique that makes your
applications more flexible and versatile. Your programs can build and process SQL data
definition, data control, and session control statements "on the fly" at run time.
Object types are an ideal object-oriented modeling tool, which you can use to reduce the
cost and time required to build complex applications. Besides allowing you to create
software components that are modular, maintainable, and reusable, object types allow
different teams of programmers to develop software components concurrently. By
encapsulating operations with data, object types let you move data-maintenance code out
of SQL scripts and PL/SQL blocks into methods. Also, object types hide implementation
details, so that you can change the details without affecting client programs. In addition,
object types allow for realistic data modeling. Complex real-world entities and
relationships map directly into object types. That helps your programs better reflect the
world they are trying to simulate.
Better Performance
Without PL/SQL, Oracle must process SQL statements one at a time. Each SQL statement
results in another call to Oracle and higher performance overhead. In a networked
environment, the overhead can become significant. Every time a SQL statement is issued,
it must be sent over the network, creating more traffic. However, with PL/SQL, an entire
block of statements can be sent to Oracle at one time. This can drastically reduce
communication between your application and Oracle. As Figure 10.2 shows, if your
application is database intensive, you can use PL/SQL blocks and subprograms to group
SQL statements before sending them to Oracle for execution.
PL/SQL stored procedures are compiled once and stored in executable form, so procedure
calls are quick and efficient. Also, stored procedures, which execute in the server, can be
140
invoked over slow network connections with a single call. That reduces network traffic and
improves round-trip response times. Executable code is automatically cached and shared
among users. That lowers memory requirements and invocation overhead.
Higher Productivity
PL/SQL adds functionality to non-procedural tools such as Oracle Forms and Oracle
Reports. With PL/SQL in these tools, you can use familiar procedural constructs to build
applications. For example, you can use an entire PL/SQL block in an Oracle Forms trigger.
You need not use multiple trigger steps, macros, or user exits. Thus, PL/SQL increases
productivity by putting better tools in your hands.
Also, PL/SQL is the same in all environments. As soon as you master PL/SQL with one
Oracle tool, you can transfer your knowledge to other tools, and so multiply the
productivity gains. For example, scripts written with one tool can be used by other tools.
141
Full Portability
Applications written in PL/SQL are portable to any operating system and platform on which
Oracle runs. In other words, PL/SQL programs can run anywhere Oracle can run; you need
not tailor them to each new environment. That means you can write portable program
libraries, which can be reused in different environments.
The PL/SQL and SQL languages are tightly integrated. PL/SQL supports all the SQL
datatypes and the non-value NULL. That allows you manipulate Oracle data easily and
efficiently. It also helps you to write high-performance code. The %TYPE and
%ROWTYPE attributes further integrate PL/SQL with SQL. For example, you can use the
%TYPE attribute to declare variables, basing the declarations on the definitions of database
columns. If a definition changes, the variable declaration changes accordingly the next time
you compile or run your program. The new definition takes effect without any effort on
your part. This provides data independence, reduces maintenance costs, and allows
programs to adapt as the database changes to meet new business needs.
Tight Security
PL/SQL stored procedures enable you to partition application logic between the client and
server. That way, you can prevent client applications from manipulating sensitive Oracle
data. Database triggers written in PL/SQL can disable application updates selectively and
do content-based auditing of user inserts. Furthermore, you can restrict access to Oracle
data by allowing users to manipulate it only through stored procedures that execute with
their definer’s privileges. For example, you can grant users access to a procedure that
updates a table, but not grant them access to the table itself.
In PL/SQL, All statements are classified into units that is called Blocks. PL/SQL blocks
can include variables, SQL statements, loops, constants, conditional statements and
exception handling as shown in figure 10.3. Blocks can also build a function or a procedure
or a package.
Broadly, PL/SQL blocks are two types: Anonymous blocks and Named Blocks
142
10.5.1 Anonymous blocks: In PL/SQL, That’s blocks which is not have header are
known as anonymous blocks. These blocks do not form the body of a function or triggers
or procedure.
Example: Here a code example of find greatest number with Anonymous blocks.
DECLARE
-- declare variable a, b and c
-- and these three variables datatype are integer
a number;
b number;
c number;
BEGIN
a:= 10;
b:= 100;
--find largest number
--take it in c variable
IF a > b THEN
c:= a;
ELSE
c:= b;
END IF;
dbms_output.put_line(' Maximum number in 10 and 100: ' || c);
END;
/
-- Program End
Output:
143
Figure 10.3 Blocks in PL/SQL
10.5.2 Named blocks: That’s PL/SQL blocks which having header or labels are known
as Named blocks. These blocks can either be subprograms like functions, procedures,
packages or Triggers.
Example: Here a code example of find greatest number with Named blocks means using
function
DECLARE
Output:
Maximum number in 10 and 100: 100
144
10.7 SUMMARY
SQL is data oriented language. PL/SQL is application oriented language. SQL is used to
write queries, create and execute DDL and DML statments. PL/SQL is used to write
program blocks, functions, procedures, triggers and packages. PL/SQL is a block-
structured language whose code is organized into blocks. A PL/SQL block consists of
three sections: declaration, executable, and exception-handling sections. In a block, the
executable section is mandatory while the declaration and exception-handling sections are
optional. A PL/SQL block has a name.
SQL PL/SQL
SQL is a single query that is used to PL/SQL is a block of codes that used to write
perform DML and DDL operations. the entire program blocks/ procedure/ function,
etc.
10.8 KEYWORDS
DML- A data manipulation language (DML) is a computer programming language
used for adding (inserting), deleting, and modifying (updating) data in a database.
A DML is often a sublanguage of a broader database language such as SQL, with
the DML comprising some of the operators in the language.
MYSQL- MySQL is a freely available open source Relational Database
Management System (RDBMS) that uses Structured Query Language (SQL). SQL
is the most popular language for adding, accessing and managing content in a
145
database. It is most noted for its quick processing, proven reliability, ease and
flexibility of use.
ORACLE- Oracle database is an RDMS system from Oracle Corporation. The
software is built around the relational database framework. It allows data objects
to be accessed by users using SQL language. Oracle is a completely scalable
RDBMS architecture which is widely used all over the world.
SQL*Plus - SQL*Plus is a command-line tool that provides access to the Oracle
RDBMS. SQL*plus enables you to: ... Connect to an Oracle database. Enter and
execute SQL commands and PL/SQL blocks. Format and print query results.
147
SUBJECT: RELATIONAL DATABASE MANAGEMENT
SYSTEM
COURSE CODE: BCA-244 AUTHOR: DR. DEEPAK NANDAL
STRUCTURE
11.1 Introduction
11.2 Definition
11.2.3 Attributes
11.5 Datatypes
11.7 Summary
11.8 Keywords
148
11.0 LEARNING OBJECTIVE
The objective of this chapter is to make the reader understand the procedural
language for SQL. To understand the main features of PL/SQL such as Block
structure, variables, datatypes, exception handling and control structure.
11.1 INTRODUCTION
A good way to get acquainted with PL/SQL is to look at a sample program. The program
below processes an order for a tennis racket. First, it declares a variable of type NUMBER
to store the quantity of tennis rackets on hand. Then, it retrieves the quantity on hand from
a database table named inventory. If the quantity is greater than zero, the program updates
the table and inserts a purchase record into another table named purchase_record.
Otherwise, the program inserts an out-of-stock record into the purchase_record table.
DECLARE
qty_on_hand NUMBER(5);
BEGIN
ELSE
END IF;
149
COMMIT;
END;
With PL/SQL, you can use SQL statements to manipulate Oracle data and flow-of-control
statements to process the data. Moreover, you can declare constants and variables, define
procedures and functions, and trap runtime errors. Thus, PL/SQL combines the data
manipulating power of SQL with the data processing power of procedural languages.
11.2 DEFINITION
The Basic Syntax of PL/SQL which is a block-structured language; this means that the
PL/SQL programs are divided and written in logical blocks of code. Each block consists of
three sub-parts:
Every PL/SQL statement ends with a semicolon (;). PL/SQL blocks can be nested within
other PL/SQL blocks using BEGIN and END. Following is the basic structure of a PL/SQL
block:
DECLARE
<declarations section>
BEGIN
<executable command(s)>
EXCEPTION
<exception handling>
END;
150
11.3 UNDERSTANDING THE MAIN FEATURES
PL/SQL is a block-structured language. That is, the basic units (procedures, functions, and
anonymous blocks) that make up a PL/SQL program are logical blocks, which can contain
any number of nested sub-blocks. Typically, each logical block corresponds to a problem
or subproblem to be solved. Thus, PL/SQL supports the divide-and-conquer approach to
problem solving called stepwise refinement. A block (or sub-block) lets you group logically
related declarations and statements. That way, you can place declarations close to where
they are used. The declarations are local to the block and cease to exist when the block
completes. A PL/SQL block has three parts: a declarative part, an executable part, and an
exception-handling part. (In PL/SQL, a warning or error condition is called an exception.)
Only the executable part is required. The order of the parts is logical. First comes the
declarative part, in which items can be declared. Once declared, items can be manipulated
in the executable part. Exceptions raised during execution can be dealt with in the exception-
handling part.
[DECLARE
-- declarations]
BEGIN
-- statements
[EXCEPTION
-- handlers]
END;
You can nest sub-blocks in the executable and exception-handling parts of a PL/SQL block
or subprogram but not in the declarative part. Also, you can define local subprograms in
the declarative part of any block. However, you can call local subprograms only from the
block in which they are defined.
151
allowed. So, you must declare a constant or variable before referencing it in other
statements, including other declarative statements.
Declaring Variables
Variables can have any SQL datatype, such as CHAR, DATE, or NUMBER, or any
PL/SQL datatype, such as BOOLEAN or BINARY_INTEGER. For example, assume that
you want to declare a variable named part_no to hold 4-digit numbers and a variable named
in_stock to hold the Boolean value TRUE or FALSE. You declare these variables as
follows:
part_no NUMBER(4);
in_stock BOOLEAN;
You can also declare nested tables, variable-size arrays (varrays for short), and records
using the TABLE, VARRAY, and RECORD composite datatypes.
Declaring Constants
Declaring a constant is like declaring a variable except that you must add the keyword
CONSTANT and immediately assign a value to the constant. Thereafter, no more
assignments to the constant are allowed. In the following example, you declare a constant
named credit_limit: credit_limit CONSTANT REAL := 5000.00;
11.3.3 ATTRIBUTES
PL/SQL variables and cursors have attributes, which are properties that let you reference
the datatype and structure of an item without repeating its definition. Database columns
and tables have similar attributes, which you can use to ease maintenance. A percent sign
(%) serves as the attribute indicator.
%TYPE
The %TYPE attribute provides the datatype of a variable or database column. This is
particularly useful when declaring variables that will hold database values. For example,
assume there is a column named title in a table named books. To declare a variable named
my_title that has the same datatype as column title, use dot notation and the %TYPE
attribute, as follows:
152
my_title books.title%TYPE;
Declaring my_title with %TYPE has two advantages. First, you need not know the exact
datatype of title. Second, if you change the database definition of title (make it a longer
character string for example), the datatype of my_title changes
%ROWTYPE
In PL/SQL, records are used to group data. A record consists of a number of related fields
in which data values can be stored. The %ROWTYPE attribute provides a record type that
represents a row in a table. The record can store an entire row of data selected from the
table or fetched from a cursor or cursor variable.
Columns in a row and corresponding fields in a record have the same names and datatypes.
In the example below, you declare a record named dept_rec. Its fields have the same names
and datatypes as the columns in the dept table.
DECLARE
You use dot notation to reference fields, as the following example shows:
my_deptno := dept_rec.deptno;
If you declare a cursor that retrieves the last name, salary, hire date, and job title of an
employee, you can use %ROWTYPE to declare a record that stores the same information,
as follows:
DECLARE
CURSOR c1 IS
153
11.4 CONTROL STRUCTURE IN PL/SQL
Control structures are the most important PL/SQL extension to SQL. Not only does
PL/SQL let you manipulate Oracle data, it lets you process the data using conditional,
iterative, and sequential flow-of-control statements such as IF-THEN-ELSE, CASE, FOR-
LOOP, WHILE-LOOP, EXIT-WHEN, and GOTO. Collectively, these statements can
handle any situation.
Conditional Control
DECLARE
acct_balance NUMBER(11,2);
BEGIN
154
WHERE account_id = acct;
ELSE
END IF;
COMMIT;
END;
To choose among several values or courses of action, you can use CASE constructs. The
CASE expression evaluates a condition and returns a value for each case. The case
statement evaluates a condition and performs an action (which might be an entire PL/SQL
block) for each case.
CASE
BEGIN
irrational.’);
END;
ELSE
155
BEGIN
shape);
RAISE PROGRAM_ERROR;
END;
END CASE;
A sequence of statements that uses query results to select alternative actions is common in
database applications. Another common sequence inserts or deletes a row only if an
associated entry is found in another table. You can bundle these common sequences into a
PL/SQL block using conditional logic.
Iterative Control
LOOP statements let you execute a sequence of statements multiple times. You place the
keyword LOOP before the first statement in the sequence and the keywords END LOOP
after the last statement in the sequence. The following example shows the simplest kind of
loop, which repeats a sequence of statements continually:
LOOP
-- sequence of statements
END LOOP;
The FOR-LOOP statement lets you specify a range of integers, then execute a sequence of
statements once for each integer in the range. For example, the following loop inserts 500
numbers and their square roots into a database table:
END LOOP;
156
of statements is executed, then control resumes at the top of the loop. If the condition is
false or null, the loop is bypassed and control passes to the next statement. In the following
example, you find the first employee who has a salary over $2500 and is higher in the chain
of command than employee 7499:
DECLARE
salary emp.sal%TYPE := 0;
mgr_num emp.mgr%TYPE;
last_name emp.ename%TYPE;
BEGIN
END LOOP;
COMMIT;
EXCEPTION
COMMIT;
END;
157
Sequential Control
The GOTO statement lets you branch to a label unconditionally. The label, an undeclared
identifier enclosed by double angle brackets, must precede an executable statement or a
PL/SQL block. When executed, the GOTO statement transfers control to the labeled
statement or block, as the following example shows:
END IF;
...
<<calc_raise>>
ELSE
END IF;
Packages
PL/SQL lets you bundle logically related types, variables, cursors, and subprograms into a
package. Each package is easy to understand and the interfaces between packages are
simple, clear, and well defined. This aids application development. Packages usually have
two parts: a specification and a body. The specification is the interface to your applications;
it declares the types, constants, variables, exceptions, cursors, and subprograms available
for use. The body defines cursors and subprograms and so implements the specification. In
the following example, you package two employment procedures:
158
END emp_actions;
BEGIN
END hire_employee;
BEGIN
END fire_employee;
END emp_actions;
Only the declarations in the package specification are visible and accessible to applications.
Implementation details in the package body are hidden and inaccessible. Packages can be
compiled and stored in an Oracle database, where their contents can be shared by many
applications. When you call a packaged subprogram for the first time, the whole package
is loaded into memory. So, subsequent calls to related subprograms in the package require
no disk I/O. Thus, packages can enhance productivity and improve performance.
Error Handling
PL/SQL makes it easy to detect and process predefined and user-defined error conditions
called exceptions. When an error occurs, an exception is raised. That is, normal execution
stops and control transfers to the exception-handling part of your PL/SQL block or
subprogram. To handle raised exceptions, you write separate routines called exception
handlers. Predefined exceptions are raised implicitly by the runtime system. For example,
if you try to divide a number by zero, PL/SQL raises the predefined exception
ZERO_DIVIDE automatically. You must raise user-defined exceptions explicitly with the
RAISE statement.
159
You can define exceptions of your own in the declarative part of any PL/SQL block or
subprogram. In the executable part, you check for the condition that needs special attention.
If you find that the condition exists, you execute a RAISE statement. In the example below,
you compute the bonus earned by a salesperson. The bonus is based on salary and
commission. So, if the commission is null, you raise the exception comm_missing.
DECLARE
...
BEGIN
...
END IF;
EXCEPTION
11.5 DATATYPES
Predefined Datatypes
User-Defined Datatypes
Datatype Conversion
Predefined Datatypes
A scalar type has no internal components. A composite type has internal components that
can be manipulated individually. A reference type holds values, called pointers that
designate other program items. A LOB type holds values, called lob locators that specify
160
the location of large objects (graphic images for example) stored out-of-line. Figure 11.1
shows the predefined datatypes available for your use.
User-Defined Datatypes
Each PL/SQL base type specifies a set of values and a set of operations applicable to items
of that type. Subtypes specify the same set of operations as their base type but only a subset
of its values. Thus, a subtype does not introduce a new type; it merely places an optional
constraint on its base type. Subtypes can increase reliability, provide compatibility with
ANSI/ISO types, and improve readability by indicating the intended use of constants and
variables. PL/SQL predefines several subtypes in package STANDARD. For example,
PL/SQL predefines the subtypes CHARACTER and INTEGER as follows:
161
SUBTYPE INTEGER IS NUMBER(38,0); -- allows only whole numbers
Datatype Conversion
Sometimes it is necessary to convert a value from one datatype to another. For example, if
you want to examine a rowid, you must convert it to a character string. PL/SQL supports
both explicit and implicit (automatic) datatype conversion.
11.7 SUMMARY
A PL/SQL program consists of a sequence of statements, each made up of one or
more lines of text. The precise characters available to you will depend on what database
character set you’re using. For example, following table 11.1 illustrates the available
characters in the US7ASCII character set.
Type Characters
Letters A-Z, a-z
Digits 0-9
Symbols ~!@#$%&*()_-+=|[]{}:;"'
<>,.?/^
Whitespace Tab, space, newline, carriage return
Table 11.1: Characters available to PL/SQL in the US7ASCII character set
162
Every keyword in PL/SQL is made from various combinations of characters in this
character set. Now you just have to figure out how to put them all together! By default,
PL/SQL is a case-insensitive language. That is, uppercase letters are treated the same way
as lowercase letters except when characters are surrounded by single quotes, which makes
them a literal string. A number of these characters—both singly and in combination with
other characters—have a special significance in PL/SQL. Table 3-3 lists these special
symbols.
Whereas the control structure in PL/SQL can be understood as: Procedural computer
programs use the basic control structures. The selection structure tests a condition, then
executes one sequence of statements instead of another, depending on whether the
condition is true or false.
11.8 KEYWORDS
TRIGGER- A trigger is a special type of stored procedure that automatically runs
when an event occurs in the database server. DML triggers run when a user tries to
modify data through a data manipulation language (DML) event. DML events are
INSERT, UPDATE, or DELETE statements on a table or view.
CURSOR- Implicit cursors are automatically created when select statements are
executed. Explicit cursors needs to be defined explicitly by the user by providing a
name. They are capable of fetching a single row at a time. Explicit cursors can fetch
multiple rows.
PL/SQL PACKAGE- A package is a file that groups functions, cursors, stored
procedures, and variables in one place.
163
11.10 ANSWERS TO CHECK YOUR PROGRESS
1. True
2. False
3. Use of wrong assignment operator. The correct syntax is: balance := balance +
2000;
4. greeting := ‘Hello’ || ‘World’;
5. NOT
164