DBMS Unit 1,2 Notes - R21

UNIT- I
Introduction: Database systems applications, Purpose of Database Systems, view of

Data, Database Languages, Relational Databases, Database Design, Data Storage
and Querying, Transaction Management, Database Architecture, Data Mining and
Information Retrieval, Specialty Databases, Database users and Administrators.
Introduction to Relational Model: Structure of Relational Databases, Database

Schema, Keys, Schema Diagrams, Relational Query Languages, Relational
Operations.
Introduction
Data is unprocessed facts and figures without any added interpretation or analysis. i.e Data is
raw, unorganized facts that need to be processed. Data can be something simple and seemingly
random and useless until it is organized.
Information is data that has been interpreted so that it has meaning for the user. i.e When data is
processed, organized, structured or presented in a given context so as to make it useful, it is
called information.
Knowledge is a combination of information, experience and insight that may benefit the
individual or the organization.
Database : A database is an organized collection of data. The data is typically organized to

model aspects of reality in a way that supports processes requiring information
A database is a collection of information that is organized so that it can easily be
accessed, managed, and updated. A system which contains databases is called a Database
Management System (DBMS)
Fig 0.1 Database

Examples of databases could be: Database for Educational Institute or a Bank, Library, Railway
Reservation system etc
UNIT – I / 1
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
File system: A file system is the way in which files are named and where they are placed
logically for storage and retrieval. Every Operating System having its own file systems in which
files are placed somewhere in a hierarchical (tree) structure. A file is placed in a directory (folder
in Windows) or subdirectory at the desired place in the tree structure. A collection of individual
files accessed by applications programs.
A database-management system (DBMS) is a collection of interrelated data and a set of

programs to access those data. The collection of data, usually referred to as the database,
contains information relevant to an enterprise. The primary goal of a DBMS is to provide a way
to store and retrieve database information that is both convenient and efficient.
 DBMS Consists of two things - a Database and a set of programs.
 Database is a very large, integrated collection of data.
 The set of programs are used to Access and Process the database.
 So DBMS can be defined as the software package designed to store and manage
or process the database.
 Management of data involves
o Definition of structures for the storage of information
o Methods to manipulate information
o Safety of the information stored despite system crashes.
 Database models real world enterprise by entities and relationships
o Entities (e.g., students, courses, class, subject)
o Relationships
The types of DBMS are entirely dependent upon how the database is structured by that particular
DBMS.
UNIT – I / 2
1 Hierarchical DBMS 2 Network DBMS 3 Relational DBMS 4 Object-oriented DBMS
Advantages of DBMS
Due to its centralized nature, the database system can overcome the disadvantages of the
file system-based system
1. Data independency: Application program should not be exposed to details of data
representation and storage. DBMS provides the abstract view that hides these details.
2. Efficient data access: DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently.
3. Data integrity and security: Data is accessed through DBMS, it can enforce integrity
constraints.
4. Data Administration: When users share data, centralizing the data is an important task,
Experience professionals can minimize data redundancy and perform fine tuning which reduces
retrieval time.
5. Concurrent access and Crash recovery: DBMS schedules concurrent access to the data.
DBMS protects user from the effects of system failure.
6. Reduced application development time: DBMS supports important functions that are
common to many applications.
Functions of DBMS
 Data Definition: The DBMS provides functions to define the structure of the data in the
application.
 Data Manipulation: Once the data structure is defined, data needs to be inserted, modified or
deleted. These functions which perform these operations are part of DBMS.
 Data Security & Integrity: The DBMS contains modules which handle the security and
integrity of data in the application.
 Data Recovery and Concurrency: Recovery of the data after system failure and concurrent
access of records by multiple users is also handled by DBMS.
 Data Dictionary Maintenance: Maintaining the data dictionary which contains the data
definition of the application is also one of the functions of DBMS.
 Performance: Optimizing the performance of the queries is one of the important functions of
DBMS
UNIT – I / 3
The Components of a Database
The basic structure for storing data in a database is a table. The table is the whole
collection of information. In a table, data is entered in rows. Each row is known as a record. A
record includes all of the pieces of information related to one individual entry in your database.
In a record, the name for each category or each piece of information that makes up the
record is known as a field The actual information you type into each cell is called the data.
Difference between DBMS and File-processing system:
UNIT – I / 4
1.1 Database-System Applications
Databases are widely used. Here are some representative applications:
• Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting
information.
◦ Human resources: For information about employees, salaries, payroll taxes, and
benefits, and for generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of
items in factories, inventories of items in warehouses and stores, and orders for items.
◦Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
• Banking and Finance
◦ Banking: For customer information, accounts, loans, and banking transactions.
◦ Credit card transactions: For purchases on credit cards and generation of monthly
statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds; also for storing real-time market data to enable
online trading by customers and automated trading by the firm.
• Universities: For student information, course registrations, and grades
• Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.
 Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication
networks.
1.2 Purpose of Database Systems

Database systems arose in response to early methods of computerized management of
commercial data. As an example of such methods, typical of the 1960s, consider part of a
university organization that, among other data, keeps information about all instructors, students,
departments, and course offerings. One way to keep the information on a computer is to store it
in operating system files. To allow users to manipulate the information, the system has a number
of application programs that manipulate the files, including programs to:
• Add new students, instructors, and courses
• Register students for courses and generate class rosters
• Assign grades to students, compute grade point averages (GPA), and generate
transcripts
UNIT – I / 5
System programmers wrote these application programs to meet the needs of the university. New
application programs are added to the system as the need arises. As a result, the university
creates a new department and creates new permanent files (or adds information to existing files)
to record information about all the instructors in the department, students in that major, course
offerings, degree requirements, etc. The university may have to write new application programs
to deal with rules specific to the new major. New application programs may also have to be
written to handle new rules in the university. Thus, as time goes by, the system acquires more
files and more application programs.
This typical file-processing system is supported by a conventional operating system. The
system stores permanent records in various files, and it needs different application programs to
extract records from, and add records to, the appropriate files. Before database management
systems (DBMSs) were introduced, organizations usually stored information in such systems.
Keeping organizational information in a file-processing system has a number of major
disadvantages:
• Data redundancy and inconsistency. Since different programmers create the files and
application programs over a long period, the various files are likely to have different structures
and the programs may be written in several programming languages. Moreover, the same
information may be duplicated in several places (files). This redundancy leads to higher storage
and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the
same data may no longer agree.
• Difficulty in accessing data. The conventional file-processing environments do not allow
needed data to be retrieved in a convenient and efficient manner. More responsive data-retrieval
systems are required for general use.
• Data isolation. Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
• Integrity problems. The data values stored in the database must satisfy certain types of
consistency constraints.
• Atomicity problems. A computer system, like any other device, is subject to failure. In many
applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that
existed prior to the failure. In all transactions activities must be atomic—it must happen in its
entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system.
UNIT – I / 6
• Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. Indeed, today,
the largest Internet retailers may have millions of accesses per day to their data by shoppers. In
such an environment, interaction of concurrent updates is possible and may result in inconsistent
data.
• Security problems. Not every user of the database system should be able to access all the data.
Application programs are added to the file-processing system in an ad hoc manner, enforcing
such security constraints is difficult.
1.3 View of Data

A database system is a collection of interrelated data and a set of programs that allow
users to access and modify these data. A major purpose of a database system is to provide users
with an abstract view of the data. That is, the system hides certain details of how the data are
stored and maintained.
1.3.1 Data Abstraction
Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding
irrelevant details from user is called data abstraction. i.e The major purpose of a database
system is to provide users with an abstract view of the system. The system hides certain details
of how data is stored and created and maintained. Complexity should be hidden from database
users.
There are several levels of abstraction:
• Physical level. The lowest level of abstraction describes how the data are actually stored.
The physical level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the
entire database in terms of a small number of relatively simple structures. Although
implementation of the simple structures at the logical level may involve complex physical-level
structures, the user of the logical level does not need to be aware of this complexity. This is
referred to as physical data independence. Database administrators, who must decide what
UNIT – I / 7
information to keep in the database, use the logical level of abstraction. It is also called
Conceptual Level.
• View level. The highest level of abstraction describes only part of the entire database. Even
though the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database. Many users of the database system do not need all this
information; instead, they need to access only a part of the database. The view level of
abstraction exists to simplify their interaction with the system. The system may provide many
views for the same database.
Figure 1.1 The three levels of abstraction.

Data Independence : A major objective for three-level architecture is to provide data
independence, which means that upper levels are unaffected by changes in lower levels. There
are two kinds of data independence
UNIT – I / 8
Logical Data Independence : Logical data independence indicates that the conceptual
schema can be changed without affecting the existing external schemas. The change would be
absorbed by the mapping between the external and conceptual levels. Logical data independence
also insulates application programs from operations such as combining two records into one or
splitting an existing record into two or more records. This would require a change in the
external/conceptual mapping so as to leave the external view unchanged.

Physical Data Independence : Physical data independence indicates that the physical
storage structures or devices could be changed without affecting conceptual schema. The
change would be absorbed by the mapping between the conceptual and internal levels. Physical
data independence is achieved by the presence of the internal level of the database.
The Logical data independence is difficult to achieve than physical data independence as
it requires the flexibility in the design of database.
1.3.2 Instances and Schemas
Database schema skeleton structure of and it represents the logical view of entire
database. i.e The overall design of the database is called the database schema. It tells about how
the data is organized and how relation among them is associated. It formulates all database
constraints that would be put on data in relations, which resides in database.
A database schema defines its entities and the relationship among them. Database schema
is a descriptive detail of the database. All these activities are done by database designer to help
programmers in order to give some ease of understanding all aspect of database.
Database schema does not contain any data or information. Database schema can be
divided broadly in two categories:
Physical Database Schema: This schema pertains to the actual storage of data and its form of
storage like files, indices etc. It defines the how data will be stored in secondary storage etc. It
describes the database design at the physical Level.
Logical Database Schema: This defines all logical constraints that need to be applied on data
stored. It defines tables, views and integrity constraints etc. It describes the database design at the
logical level.
UNIT – I / 9
A database may also have several schemas at the view level, sometimes called sub schemas, that
describe different views of the database.
Fig 1.2 Different levels of Schemas
Database Instance
The collection of information stored in the database at a particular moment is called an instance.
Database instances, is a state of operational database with data at any given time. This is a
snapshot of database. Database instances tend to change with time. DBMS ensures that its every
instance (state) must be a valid state by keeping up to all validation, constraints and condition
that database designers has imposed or it is expected from DBMS itself.
1.3.3 Data Models

Data models are a collection of conceptual tools for describing data, data relationships, data
semantics and data constraints. A data model provides a way to describe the design of a database
at the physical, logical, and view levels. The data models can be classified into four different
categories:
• Relational Model. The relational model uses a collection of tables to represent both data and
the relationships among those data. Each table has multiple columns, and each column has a
unique name. Tables are also known as relations. The relational model is an example of a
record-based model. Record-based models are so named because the database is structured in
UNIT – I / 10
fixed-format records of several types. Each table contains records of a particular type. Each
record type defines a fixed number of fields, or attributes. The columns of the table correspond to
the attributes of the record type. The relational data model is the most widely used data model,
and a vast majority of current database systems are based on the relational model.
• Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of

basic objects, called entities, and relationships among these objects. An entity is a “thing” or
“object” in the real world that is distinguishable from other objects. The entity-relationship
model is widely used in database design.
• Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has
become the dominant software-development methodology. This led to the development of an
object-oriented data model that can be seen as extending the E-R model with notions of
encapsulation, methods (functions), and object identity. The object-relational data model
combines features of the object-oriented data model and relational data model.
• Semistructured Data Model. The semistructured data model permits the specification of data
where individual data items of the same type may have different sets of attributes. This is in
contrast to the data models mentioned earlier, where every data item of a particular type must
have the same set of attributes. The Extensible Markup Language (XML) is widely used to
represent semistructured data.
The network data model and the hierarchical data model preceded the relational data
model.
1.4 Database Languages

A database system provides a data-definition language to specify the database schema and a
data-manipulation language to express database queries and updates. These languages are
widely used in SQL language.
1.4.1 Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate
data as organized by the appropriate data model. The types of access are:
• Retrieval of information stored in the database
UNIT – I / 11
• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database
There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what
data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs.
A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language.
There are a number of database query languages in use, either commercially or
experimentally. The most widely used query language is SQL (Structured Query Language).
1.4.2 Data-Definition Language
A database schema specified by a set of definitions expressed by a special language
called a data-definition language (DDL). The DDL is also used to specify additional properties
of the data.
We specify the storage structure and access methods used by the database system by a set
of statements in a special type of DDL called a data storage and definition language. These
statements define the implementation details of the database schemas, which are usually hidden
from the users.
The data values stored in the database must satisfy certain consistency constraints.
The DDL provides facilities to specify such constraints. The database system checks these
constraints every time the database is updated.
• Domain Constraints. A domain of possible values must be associated with every attribute (for
example, integer types, character types, date/time types). Domain constraints are the most
elementary form of integrity constraint. They are tested easily by the system whenever a new
data item is entered into the database.
• Referential Integrity. There are cases where we wish to ensure that a value that appears in one
relation for a given set of attributes also appears in a certain set of attributes in another relation
(referential integrity). Database modifications can cause violations of referential integrity. When
UNIT – I / 12
a referential-integrity constraint is violated, the normal procedure is to reject the action that
caused the violation.
• Assertions. An assertion is any condition that the database must always satisfy. Domain
constraints and referential-integrity constraints are special forms of assertions. When an assertion
is created, the system tests it for validity. If the assertion is valid, then any future modification to
the database is allowed only if it does not cause that assertion to be violated.
• Authorization. We may want to differentiate among the users as far as the type of access they
are permitted on various data values in the database. These differentiations are expressed in
terms of authorization, the most common being: read authorization, which allows reading, but
not modification, of data; insert authorization, which allows insertion of new data, but not
modification of existing data; update authorization, which allows modification, but not
deletion, of data; and delete authorization, which allows deletion of data. We may assign the
user all, none, or a combination of these types of authorization.
The output of the DDL is placed in the data dictionary, which contains metadata—that is, data
about data.
1.5 Relational Databases

A relational database is based on the relational model and uses a collection of tables to represent
both data and the relationships among those data. It also includes a DML and DDL. Most
commercial relational database systems employ the SQL language.
1.5.1 Tables
Table is a collection of data which is organized in terms of rows and columns. In DBMS, the
table is known as relation and row as a tuple. Each table has multiple columns and each column
has a unique name, it also be called an attribute. Table is a simple form of data storage. A table
is also considered as a convenient representation of relations. The following table represents
Employee table.
EMP_ID EMP_NAME CITY PHONE_NO
1 Kristen Washington 7289201223
2 Anna Franklin 9378282882
3 Jackson Bristol 9264783838
4 Kellan California 7254728346 UNIT – I / 13

The relational model is an example of a record-based model. Record-based models are so
named because the database is structured in fixed-format records of several types. Each table
contains records of a particular type. Each record type defines a fixed number of fields, or
attributes. The columns of the table correspond to the attributes of the record type.
1.5.2 Data-Manipulation Language
Data Manipulation Language which is used to manipulate data itself. For example: insert,
update, delete are instructions in SQL. For instance, the following SQL DML Insert command
is used to insert data into a table.
Syntax :
Insert into <table_name> (column list) values (column values);
1.5.3 Data Definition Language
SQL provides a rich DDL that allows one to define tables, integrity constraints, assertions, etc.
For instance, the following SQL DDL statement defines the department table:
create table department
(dept_name char (20),
building char (15),
budget numeric (12,2));
Execution of the above DDL statement creates the department table with three columns: dept
name, building, and budget, each of which has a specific data type associated with it. The DDL
statement updates the data dictionary, which contains metadata. The schema of a table is an
example of metadata.
1.5.4 Database Access from Application Programs
SQL is not as powerful as a universal Turing machine; that is, there are some computations that
are possible using a general-purpose programming language but are not possible using SQL.
SQL also does not support actions such as input from users, output to displays, or
communication over the network. Such computations and actions must be written in a host
language, such as C, C++, or Java, with embedded SQL queries that access the data in the
database. Application programs are programs that are used to interact with the database in this
UNIT – I / 14
fashion. To access the database, DML statements need to be executed from the host language.
There are two ways to do this:
• By providing an application program interface (set of procedures) that can be used to send
DML and DDL statements to the database and retrieve the results.
The Open Database Connectivity (ODBC) standard for use with the C language is a
commonly used application program interface standard. The Java Database Connectivity
(JDBC) standard provides corresponding features to the Java language.
• By extending the host language syntax to embed DML calls within the host language
program. Usually, a special character prefaces DML calls, and a preprocessor, called the
DML pre-compiler, converts the DML statements to normal procedure calls in the host
language.
1.6 Database Design

Database Design is a collection of processes that facilitate the designing, development,
implementation and maintenance of enterprise data management systems. i.e Database design
mainly involves the design of the database schema. Properly designed database is easy to
maintain, improves data consistency and are cost effective in terms of disk storage space. The
database designer decides how the data elements correlate and what data must be stored.
The main objectives of database design in DBMS are to produce logical and physical designs
models of the proposed database system.
The logical model concentrates on the data requirements and the data to be stored independent
of physical considerations. It does not concern itself with how the data will be stored or where it
will be stored physically.
The physical data design model involves translating the logical DB design of the database onto
physical media using hardware resources and software systems such as database management
systems (DBMS).
1.6.1 Database Design for a University Organization

To illustrate the design process, first discuss the initial specification of user requirements may be
based on interviews with the database users, and on the designer’s own analysis of the
UNIT – I / 15
organization. The description that arises from this design phase serves as the basis for specifying
the conceptual structure of the database. Here are the major characteristics of the university.
 The university is organized into departments. Each department is identified by a unique
name (dept name), is located in a particular building, and has a budget.
 Each department has a list of courses it offers. Each course has associated with it a course
id, title, dept name, and credits, and may also have associated prerequisites.
 Instructors are identified by their unique ID. Each instructor has name, associated
department (dept name), and salary.
 Students are identified by their unique ID. Each student has a name, an associated major
department (dept name), and tot credits.
 The university maintains a list of classrooms, specifying the name of the building, room
number, and room capacity.
 The university maintains a list of all classes (sections) taught. Each section is identified
by a course id, sec id, year, and semester, and has associated with it a semester, year,
building, room number, and time slot id (the time slot when the class meets).
 The department has a list of teaching assignments specifying, for each instructor, the
sections the instructor is teaching.
 The university has a list of all student course registrations, specifying, for each student,
the courses and the associated sections that the student has taken.
1.6.2 The Entity-Relationship Model
The entity-relationship (E-R) data model uses a collection of basic objects, called
entities, and relationships among these objects. An entity is a “thing” or “object” in the real
world that is distinguishable from other objects. For example, each person is an entity, and bank
accounts can be considered as entities.
Entities are described in a database by a set of attributes. For example, the attributes dept
name, building, and budget may describe one particular department in a university, and they
form attributes of the department entity set. Similarly, attributes ID, name, and salary may
describe an instructor entity. The extra attribute ID is used to identify an instructor uniquely. A
unique instructor identifier must be assigned to each instructor.
A relationship is an association among several entities. For example, a member
relationship associates an instructor with his/her department. The set of all entities of the same
UNIT – I / 16
type and the set of all relationships of the same type are termed an entity set and relationship
set.
The overall logical structure (schema) of a database can be expressed graphically by an
entity-relationship (E-R) diagram. There are several ways in which to draw these diagrams. One
of the most popular is to use the Unified Modeling Language (UML).
 Entity sets are represented by a rectangular box with the entity set name in the header and
the attributes listed below it.
 Relationship sets are represented by a diamond connecting a pair of related entity sets.
The name of the relationship is placed inside the diamond.
The E-R diagram indicates that there are two entity sets, instructor and department, with
attributes as outlined earlier. The diagram also shows a relationship member between instructor
and department. In addition to entities and relationships, the E-R model represents certain
constraints to which the contents of a database must conform. One important constraint is
mapping cardinalities, which express the number of entities to which another entity can be
associated via a relationship set.
1.6.3 Normalization
Another method for designing a relational database is to use a process commonly known as
normalization. The goal is to generate a set of relation schemas that allows us to store
information without unnecessary redundancy. The approach is to design schemas that are in an
appropriate normal form. The most common approach is to use functional dependency.
UNIT – I / 17
1.7 Data Storage and Querying
A database system is partitioned into modules that deal with each of the responsibilities
of the overall system. The functional components of a database system can be broadly divided
into the storage manager and the query processor components.
The storage manager is important because databases typically require a large amount of
storage space. Corporate databases range in size from hundreds of gigabytes to, for the largest
databases, terabytes of data. Since the main memory of computers cannot store this much
information, the information is stored on disks. Data are moved between disk storage and main
memory as needed. Since the movement of data to and from disk is slow relative to the speed of
the central processing unit, it is imperative that the database system structure the data so as to
minimize the need to move data between disk and main memory.
The query processor is important because it helps the database system to simplify and
facilitate access to data. The query processor allows database users to obtain good performance
while being able to work at the view level and not be burdened with understanding the physical-
level details of the implementation of the system. It is the job of the database system to translate
updates and queries written in a nonprocedural language, at the logical level, into an efficient
sequence of operations at the physical level.
1.7.1 Storage Manager
The storage manager is the component of a database system that provides the interface
between the low-level data stored in the database and the application programs and queries
submitted to the system. The storage manager is responsible for the interaction with the file
manager. The raw data are stored on the disk using the file system provided by the operating
system. The storage manager translates the various DML statements into low-level file-system
commands.
Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database.
The storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
UNIT – I / 18
• Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory.
The storage manager implements several data structures as part of the physical system
implementation:
• Data files, which store the database itself.
• Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
• Indices, which can provide fast access to data items. Like the index in this textbook, a database
index provides pointers to those data items that hold a particular value.
1.7.2 The Query Processor
The query processor components include:
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. The DML
compiler also performs query optimization; that is, it picks the lowest cost evaluation plan from
among the alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
1.8 Transaction Management

A transaction is a single logical unit of work which accesses and possibly modifies the
contents of a database. Transactions access data using read and write operations. In order to
maintain consistency in a database, before and after the transaction, certain properties are
followed. These are called ACID properties.
UNIT – I / 19
Fig : 1.4 ACID Properties
Ensuring the atomicity and durability properties is the responsibility of the database system itself
—specifically, of the recovery manager. In the absence of failures, all transactions complete
successfully, and atomicity is achieved easily. However, because of various types of failure, a
transaction may not always complete its execution successfully. Thus, the database must be
restored to the state in which it was before the transaction in question started executing. The
database system must therefore perform failure recovery, that is, detect system failures and
restore the database to the state that existed prior to the occurrence of the failure.
Finally, when several transactions update the database concurrently, the consistency of
data may no longer be preserved, even though each individual transaction is correct. It is the
responsibility of the concurrency-control manager to control the interaction among the
concurrent transactions, to ensure the consistency of the database.
UNIT – I / 20
The transaction manager consists of the concurrency-control manager and the recovery
manager.
1.9 Database Architecture

The architecture of a database system is greatly influenced by the underlying computer
system on which the database system runs. Database systems can be centralized, or client-server,
where one server machine executes work on behalf of multiple client machines. Database
systems can also be designed to exploit parallel computer architectures
Database applications are usually partitioned into two or three parts, as in Figure 1.5. In a
two-tier architecture, the application resides at the client machine, where it invokes database
system functionality at the server machine through query language statements. Application
program interface standards like ODBC and JDBC are used for interaction between the client
and the server.
Fig : 1.5 Two-tier and three-tier architectures
In a three-tier architecture, the client machine acts as merely a front end and does not
contain any direct database calls. Instead, the client end communicates with an application
server, usually through a forms interface. The application server in turn communicates with a
database system to access data. The business logic of the application, which says what actions to
carry out under what conditions, is embedded in the application server, instead of being
UNIT – I / 21
distributed across multiple clients. Three-tier applications are more appropriate for large
applications, and for applications that run on the World Wide Web.
Fig 1.6 System structure.
UNIT – I / 22
A database system is partitioned into modules that deal with each of the responsibilities
of the overall system. The functional components of a database system can be broadly divided
into the storage manager and the query processor components.
The storage manager is important because databases typically require a large amount of
storage space. Corporate databases range in size from hundreds of gigabytes to, for the largest
databases, terabytes of data. Since the main memory of computers cannot store this much
information, the information is stored on disks. Data are moved between disk storage and main
memory as needed. Since the movement of data to and from disk is slow relative to the speed of
the central processing unit, it is imperative that the database system structure the data so as to
minimize the need to move data between disk and main memory.
The query processor is important because it helps the database system to simplify and
facilitate access to data. The query processor allows database users to obtain good performance
while being able to work at the view level and not be burdened with understanding the physical-
level details of the implementation of the system. It is the job of the database system to translate
updates and queries written in a nonprocedural language, at the logical level, into an efficient
sequence of operations at the physical level.
1.9.1 Storage Manager
The storage manager is the component of a database system that provides the interface
between the low-level data stored in the database and the application programs and queries
submitted to the system. The storage manager is responsible for the interaction with the file
manager. The raw data are stored on the disk using the file system provided by the operating
system. The storage manager translates the various DML statements into low-level file-system
commands.
Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database.
The storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
UNIT – I / 23
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory.
The storage manager implements several data structures as part of the physical system
implementation:
• Data files, which store the database itself.
• Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
• Indices, which can provide fast access to data items. Like the index in this textbook, a database
index provides pointers to those data items that hold a particular value.
1.9.2 The Query Processor
The query processor components include:
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. The DML
compiler also performs query optimization; that is, it picks the lowest cost evaluation plan from
among the alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
UNIT – I / 24
1.10 Data Mining and Information Retrieval
The term data mining refers loosely to the process of semiautomatically analyzing large
databases to find useful patterns. Like knowledge discovery in artificial intelligence (also called
machine learning) or statistical analysis, data mining attempts to discover rules and patterns
from data. However, data mining differs from machine learning and statistics in that it deals with
large volumes of data, stored primarily on disk. That is, data mining deals with “knowledge
discovery in databases.”
Some types of knowledge discovered from a database can be represented by a set of
rules. Large companies have diverse sources of data that they need to use for making business
decisions. To execute queries efficiently on such diverse data, companies have built data
warehouses. Data warehouses gather data from multiple sources under a unified schema, at a
single site. Thus, they provide the user a single uniform interface to data.
Textual data, too, has grown explosively. Textual data is unstructured, unlike the rigidly
structured data in relational databases. Querying of unstructured textual data is referred to as
information retrieval.
Information retrieval systems have much in common with database systems—in
particular, the storage and retrieval of data on secondary storage.
1.11 Specialty Databases

A collection of focused information on one or more specific fields of study is referred to as
Specialty Databases. The information is stored in such a way that the user can locate and
retrieve it quickly and easily.
1.11.1 Object-Based Data Models: As the name suggests Object-Based Data Model is a model
which is built on object-oriented programming which relates the methods that are nothing but
procedures with objects that can benefit from class hierarchies. Objects are the levels of
abstraction that include properties and actions. This type of data model is one that tries to focus
on how to express data. The data here is divided into different units in which each unit has some
defining properties. The object-oriented data model also supports a rich type system, structured
and collection types. Examples of Object-Based Data Models:
 ER (Entity Relationship) Data Model
UNIT – I / 25
 Semantic Data Model
 Functional Data Model
1.11.2 Semi-Structured Data Models: These data models were planned as an evolution of the
relational data model. It is a database model in which there is no partition between the data and
the schema. It allows the representation of data with a workable structure. In this data, model
items can have different numbers of attributes but one item may contain items with different
structures. It is a data model where the data values and the schema components synchronize
properly. There are some characteristics of Semi-structured Data Models:
 One can change the schema easily.
 It gives a workable format to exchange the data between different types of databases.
 Data transfer format may be transferable.
1.12 Database Users and Administrators

Primary goal of a database system is to retrieve information from and store new
information into the database. People who work with a database can be categorized as database
users or database administrators.
1.12.1 Database Users and User Interfaces
There are four different types of database-system users, differentiated by the way they
expect to interact with the system. Different types of user interfaces have been designed for the
different types of users.
• Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. Naive users may also simply read
reports generated from the database.
• Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools to develop user interfaces. Rapid
application development (RAD) tools are tools that enable an application programmer to
construct forms and reports with minimal programming effort.
• Sophisticated users interact with the system without writing programs. Instead, they form their
requests either using a database query language or by using tools such as data analysis software.
Analysts who submit queries to explore data in the database fall in this category.
UNIT – I / 26
• Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework. Among these applications are computer-
aided design systems, knowledgebase and expert systems, systems that store data with complex
data types (for example, graphics data and audio data), and environment-modeling systems.
1.12.2 Database Administrator
A person who has such central control over the system is called a database administrator
(DBA). The functions of a DBA include:
• Schema definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.
• Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance activities
are:
◦ Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
◦ Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
◦ Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.
UNIT – I / 27
Introduction to the Relational Model
A database is a collection of one or more ‘relations’, where each relation is a table with rows
and columns. It is the primary data model for commercial data processing applications. The
major advantages of the relational model over the older data models are,
1.It is simple and elegant.
2.simple data representation.
3.The ease with which even complex queries can be expressed.
The main construct for representing data in the relational model is a ‘relation’. A relation
consists of
1.Relation Schema.
2.Relation Instance.
1.Relation Schema: The relation schema describes the column heads for the table. The schema
specifies the relation’s name, the name of each field (column, attribute) and the ‘domain’ of each
field. A domain is referred to in a relation schema by the domain name and has a set of
associated values.
Example:
Student information in a university database to illustrate the parts of a relation schema.
Students (Sid: string, name: string, login: string, age: integer, gross: real)
This says that the field named ‘sid’ has a domain named ‘string’.
The set of values associated with domain ‘string’ is the set of all character strings.
2.Relation Instance: This is a table specifying the information. An instance of a relation is a set
of ‘tuples’, also called ‘records’, in which each tuple has the same number of fields as the
relation schemas.
A relation instance can be thought of as a table in which each tuple is a row and all rows have the
same number of fields. The relation instance is also called as ‘relation’. Each relation is
defined to be a set of unique tuples or rows.
UNIT – I / 28
Example:
This example is an instance of the students relation, which consists 4 tuples and 5 fields. No two
rows are identical.
Degree: The number of fields is called as ‘degree’. This is also called as ‘arity’.
Cardinality: The cardinality of a relation instance is the number of tuples in it.
Example: In the above example, the degree of the relation is 5 and the cardinality is 4.
Relational database: It is a collection of relations with distinct relation names.
Relational database schema: It is the collection of schemas for the relations in the database.
Instance:
An instance of a relational database is a collection of relation instances, one per relation schema
in the database schema. Each relation instance must satisfy the domain constraints in its schema.
Instance: An instance of a relational database is a collection of relation instances, one per
relation schema in the database schema. Each relation instance must satisfy the domain
constraints in its schema.
1.13 Structure of Relational Databases

A relational database consists of a collection of tables, each of which is assigned a
unique name. For example, consider the following instructor table, which stores
information about instructors. The table has four column headers: ID, name, dept_
name, and salary. Each row of this table records information about an instructor,
consisting of the instructor’s ID, name, dept_name, and salary. Similarly, the course
table stores information about courses, consisting of a course_id, title,
dept_name, and credits, for each course. Note that each instructor is identified by
UNIT – I / 29
the value of the column ID, while each course is identified by the value of the
column course_id.
Third table, prereq, which stores the prerequisite courses for

each course. The table has two columns, course_id and prereq_id. Each row consists
of a pair of course identifiers such that the second course is a prerequisite for the
first course.
Thus, a row in the prereq table indicates that two courses are related in the
sense that one course is a prerequisite for the other. As another example, we
consider the table instructor, a row in the table can be thought of as representing
the relationship between a specified ID and the corresponding values for name,
dept_name, and salary values.
In general, a row in a table represents a relationship among a set of values.

Since a table is a collection of such relationships, there is a close correspondence
between the concept of table and the mathematical concept of relation, from which
the relational data model takes its name. In mathematical terminology, a tuple is
simply a sequence (or list) of values. A relationship between n values is represented
UNIT – I / 30
mathematically by an n-tuple of values, i.e., a tuple with n values, which
corresponds to a row in a table.
Thus, in the relational model the term relation is used to refer to a table, while
the term tuple is used to refer to a row. Similarly, the term attribute refers to a column of a
table.
UNIT – I / 31
1.14 Database Schema
A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are associated.
It formulates all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database, which can be depicted by means of schema diagrams. It’s the database
designers who design the schema to help programmers understand the database and make it
useful.
A database schema can be divided broadly into two categories −

 Physical Database Schema − This schema pertains to the actual storage of data and its
form of storage like files, indices, etc. It defines how the data will be stored in a
secondary storage.
 Logical Database Schema − This schema defines all the logical constraints that need to
be applied on the data stored. It defines tables, views, and integrity constraints.
UNIT – I / 32
Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton
of database. It is designed when the database doesn't exist at all. Once the database is
operational, it is very difficult to make any changes to it. A database schema does not contain
any data or information.
A database instance is a state of operational database with data at any given time. It contains a
snapshot of the database. Database instances tend to change with time. A DBMS ensures that
its every instance (state) is in a valid state, by diligently following all the validations, constraints,
and conditions that the database designers have imposed.
Difference between Schema and Instance
Schema Instance
It is the collection of information stored in

It is the overall description of the database. a database at a particular moment.
Data in instances can be changed using

Schema is same for whole database. addition, deletion, updation.
Does not change Frequently. Changes Frequently.
Defines the basic structure of the database i.e It is the set of Information stored at a
how the data will be stored in the database. particular time.
UNIT – I / 33
1.15 Keys
A key refers to an attribute or a set of attributes that help us identify a row (or tuple) uniquely
in a table (or relation). A key is also used when we want to establish relationships between the
different columns and tables of a relational database. The individual values present in a key are
commonly referred to as key values. There are following 10 important keys in DBMS :
1. Super key 2. Candidate key 3. Primary key 4. Alternate key 5. Foreign key
1. Partial key 7. Composite key 8. Unique key 9. Surrogate key 10. Secondary key
1. Super Key
 A super key is a set of attributes that can identify each tuple uniquely in the given relation.
 A super key is not restricted to have any specific number of attributes.
 Thus, a super key may consist of any number of attributes.
<STUDENT> Table
Student_Number Student_Name Student_Phone Subject_Number
1 Ruthwiz 6615927284 10
2 Sahasra 6583654865 20
3 Karthikeya 4647567463 10
<SUBJECT> Table <ENROLL> Table
UNIT – I / 34
Student_Number Subject_Number
Subject_Number Subject_Name Subject_Instructor
1 10
10 DBMS Korth
2 20
20 Algorithms Cormen
3 10
30 Algorithms Leiserson
The Super Keys in <Student> table are :
{Student_Number}
{Student_Phone}
{Student_Number,Student_Name}
{Student_Number,Student_Phone}
{Student_Number,Subject_Number}
{Student_Phone,Student_Name}
{Student_Phone,Subject_Number}
{Student_Number,Student_Name,Student_Phone}
{Student_Number,Student_Phone,Subject_Number}
{Student_Number,Student_Name,Subject_Number}
{Student_Phone,Student_Name,Subject_Number}
The Super Keys in <Subject> table are :
{Subject_Number}
{Subject_Number,Subject_Name}
{Subject_Number,Subject_Instructor}
{Subject_Number,Subject_Name,Subject_Instructor}
{Subject_Name,Subject_Instructor}
NOTE : All the attributes in a super key are definitely sufficient to identify each tuple uniquely
in the given relation but all of them may not be necessary.
UNIT – I / 35
2. Candidate Key
A minimal super key is called as a candidate key. (or) A set of minimal attribute(s) that can
identify each tuple uniquely in the given relation is called as a candidate key.
 The Candidate Key in <Student> table is {Student_Number} or {Student_Phone}

 The Candidate Key in <Subject> table is {Subject_Number} or
{Subject_Name,Subject_Instructor}
 The Candidate Key in <Student> table is {Student_Number, Subject_Number}
NOTE :
 All the attributes in a candidate key are sufficient as well as necessary to identify each
tuple uniquely.
 The value of candidate key must always be unique.

 The value of candidate key can never be NULL.
 It is possible to have multiple candidate keys in a relation.
 Those attributes which appears in some candidate key are called as prime attributes.
3. Primary Key
A primary key, also called a primary keyword, is a key in a relational database that is
unique for each record. It is a unique identifier; it is used to prevent NULL values and also
duplicate values into a particular column of table. A table must always have one and only one
primary key.
A primary key is a candidate key that the database designer selects while designing the
database.
Ex : SQL> create table student(Student_Number number(4 ) primary key,
Student_Name varchar2(20),
Student_Phone number(10) ,
Subject_Number number(10));
 The Primary Key in <Student> table is {Student_Number}

 The Primary Key in <Subject> table is {Subject_Number}
 The Primary Key in <Enroll> table is {Student_Number, Subject_Number}
UNIT – I / 36
NOTE :
 The value of primary key can never be NULL.
 The value of primary key must always be unique.
 The values of primary key can never be changed i.e. no updation is possible.
 The value of primary key must be assigned when inserting a record.
 A relation is allowed to have only one primary key.
4. Alternate Key
The candidate key other than the primary key is called an alternate key.
For Example, STUD_NO, as well as STUD_PHONE both, are candidate keys for relation
STUDENT but STUD_PHONE will be an alternate key (only one out of many candidate
keys). It is a secondary key. i.e All the keys which are not primary keys are called Alternate
keys.
UNIT – I / 37
5. Foreign Key
A foreign key means the values in one table must also appear in another table. The foreign key
in the child table will generally reference a primary key in the parent table. The referencing
table is called the child table & referenced table is called the parent table.
In order to provide Referential Integrity, the conditions must exist
1) The data types of TWO columns must be same

2) The referenced key must be Primary Key.
{Subject_Number} is the Foreign Key of <Student> table and Primary key of <Subject> table
EX : SQL> create table department(deptno number(2) primary key,

dname varchar2(15),
location varchar2(15));
SQL > create table emp(empno number(6) primary key,

empname varchar2(20) not null,
designation varchar2(15),
deptno number(2) references department(deptno));
NOTE :
 Foreign key references the primary key of the table.
 Foreign key can take only those values which are present in the primary key of the
referenced relation.
 Foreign key can take the NULL value.
 There is no restriction on a foreign key to be unique.
 In fact, foreign key is not unique most of the time.
 Referenced relation may also be called as the master table or primary table.
 Referencing relation may also be called as the foreign table.
UNIT – I / 38
6. Partial Key
 Partial key is a key using which all the records of the table can not be identified uniquely.
 However, a bunch of related tuples can be selected from the table using the partial key.
Ex : Consider the following schema-
Department ( Emp_no , Dependent_name , Relation )
Emp_no Dependent_name Relation
E1 Suman Mother
E1 Ajay Father
E2 Vijay Father
E2 Ankush Son
Here, using partial key Emp_no, we can not identify a tuple uniquely but we can select a bunch
of tuples from the table.
7. Composite Key
If any single attribute of a table is not capable of being the key i.e it cannot identify a row
uniquely, then we combine two or more attributes to form a key. This is known as a composite
key.
The Composite Key in <Enroll> table is {Student_Number, Subject_Number}
8. Unique Key
Unique is used for preventing duplicate values into a column of a table. But this constraint can
allow NULL values. Once assigned, its value can not be changed i.e. it is non-updatable.
create table student(Student_Number number(4 ) primary key,
Student_Name varchar2(20),
Student_Phone number(10) unique,
Subject_Number number(10));
UNIT – I / 39
9. Surrogate Key
Surrogate key is a key with the following properties-
 Itis unique for all the records of the table.
 It is updatable.
 It can not be NULL i.e. it must have some value.
Example-
Student_Phone of student, where every student owns a mobile phone.
10. Secondary Key

Secondary key is required for the indexing purpose for better and faster searching. Only one of
the candidate keys is selected as the primary key. The rest of them are known as secondary keys.
The Secondary Key in <Student> table is {Student_Phone}

The Secondary Key in <Subject> table is {Subject_Name,Subject_Instructor}
1.16 Schema Diagrams

A database schema, along with primary key and foreign key dependencies, can be depicted by
schema diagrams. The following Figure shows the schema diagram for our university
organization. Each relation appears as a box, with the relation name at the top, and the attributes
listed inside the box. Primary key attributes are shown underlined. Foreign key dependencies
appear as arrows from the foreign key attributes of the referencing relation to the primary key of
the referenced relation.
Referential integrity constraints other than foreign key constraints are not shown
explicitly in schema diagrams. Many database systems provide design tools with a graphical user
interface for creating schema diagrams.
UNIT – I / 40
1.17 Relational Query Languages
A query language is a language in which a user requests information from the database. These
languages are usually on a level higher than that of a standard programming language. Query
languages can be categorized as either procedural or nonprocedural. In a procedural language,
the user instructs the system to perform a sequence of operations on the database to compute the
desired result. In a nonprocedural language, the user describes the desired information without
giving a specific procedure for obtaining that information.
Query languages used in practice include elements of both the procedural and the
nonprocedural approaches. There are a number of “pure” query languages: The relational algebra
is procedural, whereas the tuple relational calculus and domain relational calculus are
nonprocedural. These query languages are terse and formal, lacking the “syntactic sugar” of
UNIT – I / 41
commercial languages, but they illustrate the fundamental techniques for extracting data from the
database. The relational algebra consists of a set of operations that take one or two relations as
input and produce a new relation as their result. The relational calculus uses predicate logic to
define the result desired without giving any specific algebraic procedure for obtaining that result.
1.18 Relational Operations

The relational algebra is a procedural query language. It consists of a set of operations that take
one or two relations as input and produce a new relation as their result. The fundamental
operations in the relational algebra are select, project, union, set difference, Cartesian
product, and rename. In addition to the fundamental operations, there are several other
operations namely, set intersection, natural join, division, and assignment.
Fundamental Operations The select, project, and rename operations are called unary
operations, because they operate on one relation. The other three operations operate on pairs of
relations and are, therefore, called binary operations. Various operations are shown as follows:
Select Operation (σ) :

Relational algebra includes operators to select rows from a relation and to project columns (π).
These operations allow us to manipulate data in a single relation. It selects tuples that satisfy the
given predicate from a relation. i.e The selection operator σ allows us to extract rows
from a relation
Notation : σp (r)
UNIT – I / 42
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula
which may use connectors like and, or, and not. These terms may use relational operators like −
=, ≠, ≥, < , >, ≤.
Ex 1 : To select those tuples of the loan relation where the branch is “Perryridge,”
Ans : σ branch-name = “Perryridge” (loan)
Ex 2 : Selects tuples from books where subject is 'database'.
σsubject = "database"(Books)
Ex 3 : Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450"(Books)
Ex 4 : Selects tuples from books where subject is 'database' and 'price' is
450 or those books published after 2010.
σsubject = "database" and price = "450" or year > "2010"(Books)
Let us consider the following schemas
UNIT – I / 43
Project Operation (∏) : Projection is the operation of selecting certain attributes from a
relation R to form a new relation. i.e The projection operator π allows us to extract
columns from a relation
It projects column(s) that satisfy a given predicate.

Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
Ex 1 : Selects columns named as subject and author from the relation Books
∏subject, author (Books)
Union Operation (∪) :

It return a relation consist of all rows appearing in either or both of two specifying relations. i.e
RUS returns a relation containing all tupples that occur in either R or S both.
Here, R and S must be union compatible and schema of the result is defined to be identical to the
schema of R.
For Union operation to be valid, the following conditions must hold
 r, and s must have the same number of attributes.
 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.
Ex1 : Select the names of the authors who have either written a book or an article or both.
∏ author (Books) ∪ ∏ author (Articles)
UNIT – I / 44
Set Difference (−)
R-S returns a relation instance containing all tuples that occur in R but not in S. The
relations R and S must be union-compatible, and the schema of the result is defined to be
identical to the schema of R.
Notation : r − s
Ex2 : Provides the name of authors who have written books but not articles
∏ author (Books) − ∏ author (Articles)
Cartesian Product (Χ)

R x S returns a relation instance whose schema contains all the fields of R followed by S
Notation : r Χ s
Rename Operation (ρ)
The rename operation allows us to rename the output relation. 'rename' operation is denoted with
small Greek letter rho ρ.

Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are −
 Set intersection
 Assignment
 Natural join
Set intersection : R ∩ S' returns a relation instance containing all tuples that occur in both R and
S. The relations Rand S must be union-compatible.
Assignment : It provides a convenient way to express complex queries. Assignment must always
be made to a temporary relation variable.
UNIT – I / 45
Joins : The join operation is one of the most useful operations in relational algebra and the
most commonly used way to combine information from two or more relations. Although a join
can be defined as a cross-product followed by selections and projections. It is denoted by ⋈.
There are TWO types of Joins

1. Inner Join 2. Outer Joins
Inner Join can be classified into three ways

1) Condition Join or Theta Join 2) Equi Join 3) Natural Join
Outer Join can be classified as

1) Left outer Join 2) Right outer Join 3) Full outer Join
Condition Join or Theta Join : Theta join combines tuples from different relations provided
they satisfy the theta condition. The join condition is denoted by the symbol θ.
Notation : R1 ⋈θ R2
or
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the
attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.
Student
SID Name Std
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Ex : STUDENT ⋈Student.Std = Subject.Class SUBJECT
UNIT – I / 46
Student_detail
SID Name Std Class Subject
101 Alex 10 10 Math
101 Alex 10 10 English
102 Maria 11 11 Music
102 Maria 11 11 Sports
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above
example corresponds to equijoin.
Natural Join ( ⋈)
It is a binary operator that is written as (R ⋈S) where R and S are relations. The result of the
natural join is the set of all combinations of tuples in R and S that are equal on their common
attribute names. For an example consider the tables Employee and Dept and their natural join:
Employee Dept
Name EmpId DeptName DeptName Manager Employee ⋈ Dept
Harry 3415 Finance Finance George DeptNam
Name EmpId Manager
Sally 2241 Sales Sales Harriet e
George 3401 Finance Production Charles Harry 3415 Finance George
Harriet 2202 Sales Sally 2241 Sales Harriet
George 3401 Finance George
Harriet 2202 Sales Harriet
Note : The Default join operation used in the join is Natural Join. But Natural Join operation
results in some loss of Information.
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only
those tuples with matching attributes and the rest are discarded in the resulting relation.
Therefore, we need to use outer joins to include all the tuples from the participating relations in
the resulting relation. There are three kinds of outer joins − left outer join, right outer join,
and full outer join.
UNIT – I / 47
Left Outer Join(R S)
All the tuples from the Left relation, R, are included in the resulting relation. If there are
tuples in R without any matching tuple in the Right relation S, then the S-attributes of the
resulting relation are made NULL.
Right Outer Join: ( R S)
All the tuples from the Right relation, S, are included in the resulting relation. If there are tuples
in S without any matching tuple in R, then the R-attributes of resulting relation are made NULL.
Full Outer Join: ( R S)
All the tuples from both participating relations are included in the resulting relation. If there are
no matching tuples for both relations, their respective unmatched attributes are made NULL.
Employee
Employee_name Street City
Coyote Toon Hollywood
Rabbit Tunnel Carrotvile
Smith Revolver Death Valley
Williams Seaview Seattle
Works
Employee_name Branch Name Salary
Coyote Mesa 1500
UNIT – I / 48
Rabbit Mesa 1300
Gates Redmond 5300
Williams Redmond 1500
Result of Employee works
Employee_name Street City Branch Name Salary
Coyote Toon Hollywood Mesa 1500
Rabbit Tunnel Carrotvile Mesa 1300
Williams Seaview Seattle Redmond 1500

NULL NULL
.Result of Employee works


NULL NULL
Gates Redmond 5300
UNIT – I / 49

NULL NULL
Gates Redmond 5300
NULL NULL
Division : The division is a binary operation that is written as R ÷ S. The result consists of the
restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the
header of S, for which it holds that all their combinations with tuples in S are present in R.
Ex 1 :
Completed DBProject Completed ÷ DBProject

Student Task Task Student
Fred Database1 Database1 Fred
Fred Database2 Database2 Sarah
Fred Compiler1
Eugene Database1
Eugene Compiler1
Sarah Database1
Sarah Database2
Ex 2 :
UNIT – I / 50
More Examples of Algebra Queries
UNIT- II
UNIT – I / 51
Introduction to SQL: Overview of the SQL Query Language, SQL Data Definition,
Basic Structure of SQL Queries, Additional Basic Operations, Set Operations, Null
Values, Aggregate Functions, Nested Sub-queries, Modification of the Database.
Intermediate SQL: Joint Expressions, Views, Transactions, Integrity Constraints,
SQL Data types and schemas, Authorization.
Advanced SQL: Accessing SQL from a Programming Language, Functions and
Procedures, Triggers, Recursive Queries, OLAP, Formal relational query languages.
2.1 Overview of the SQL Query Language

IBM developed the original version of SQL, originally called Sequel, as part of the
System R project in the early 1970s. The Sequel language has evolved since then, and its name
has changed to SQL (Structured Query Language). SQL has clearly established itself as the
standard relational database language.
In 1986, the American National Standards Institute (ANSI) and the International
Organization for Standardization (ISO) published an SQL standard, called SQL-86. ANSI
published an extended standard for SQL, SQL-89, in 1989. The next version of the standard was
SQL-92 standard, followed by SQL:1999, SQL:2003, SQL:2006, and most recently SQL:2008.
The SQL language has several parts:
 Data-definition language (DDL): The SQL DDL provides commands for defining
relation schemas, deleting relations, and modifying relation schemas.
 Data-manipulation language (DML): The SQL DML provides the ability to query
information from the database and to insert tuples into, delete tuples from, and modify
tuples in the database.
 Integrity: The SQL DDL includes commands for specifying integrity constraints that the
data stored in the database must satisfy. Updates that violate integrity constraints are
disallowed.
 View definition: The SQL DDL includes commands for defining views.
 Transaction control: SQL includes commands for specifying the beginning and ending
of transactions.
UNIT – I / 52
 Embedded SQL and dynamic SQL: Embedded and dynamic SQL define how SQL
statements can be embedded within general-purpose programming languages, such as C,
C++, and Java.
 Authorization: The SQL DDL includes commands for specifying access rights to
relations and views.
2.2 SQL Data Definition

The set of relations in a database must be specified to the system by means of a data-definition
language (DDL). The SQL DDL allows specification of not only a set of relations, but also
information about each relation, including:
o The schema for each relation.
o The types of values associated with each attribute.
o The integrity constraints.
o The set of indices to be maintained for each relation.
o The security and authorization information for each relation.
o The physical storage structure of each relation on disk.
2.2.1 Basic Types
The SQL standard supports a variety of built-in types, including:
 char(n): A fixed-length character string with user-specified length n.
 varchar(n): A variable-length character string with user-specified maximum length n.
 int: An integer (a finite subset of the integers that is machine dependent).
 smallint: A small integer (a machine-dependent subset of the integer type).
 numeric(p, d):A fixed-point number with user-specified precision. The number consists
of p digits (plus a sign), and d of the p digits are to the right of the decimal point. Thus,
numeric(3,1) allows 44.5 to be stored exactly, but neither 444.5 or 0.32 can be stored
exactly in a field of this type.
 real, double precision: Floating-point and double-precision floating-point numbers with
machine-dependent precision.
 float(n): A floating-point number, with precision of at least n digits.
Each type may include a special value called the null value. A null value indicates an absent
value that may exist but be unknown or that may not exist at all.
UNIT – I / 53
The char data type stores fixed length strings.
For example, an attribute A of type char(10). If we store a string “PBRVITS” in this attribute, 3
spaces are appended to the string to make it 10 characters long.
In contrast, if attribute B were of type varchar(10), and we store “PBRVITS” in attribute B, no
spaces would be added. When comparing two values of type char, if they are of different lengths
extra spaces are automatically added to the shorter one to make them the same size, before
comparison.
When comparing a char type with a varchar type, one may expect extra spaces to be
added to the varchar type to make the lengths equal, before comparison; however, this may or
may not be done, depending on the database system. As a result, even if the same value
“PBRVITS” is stored in the attributes A and B above, a comparison A=B may return false. We
recommend you always use the varchar type instead of the char type to avoid these problems.
2.2.2 Basic Schema Definition
We define an SQL relation by using the create table command. The following command
creates a relation department in the database.
(dept_name varchar (20),
building varchar (15),
budget numeric (12,2),
primary key (dept name));
The relation created above has three attributes, dept_name, which is a character string of
maximum length 20, building, which is a character string of maximum length 15, and budget,
which is a number with 12 digits in total, 2 of which are after the decimal point. The create table
command also specifies that the dept_name attribute is the primary key of the department
relation.
UNIT – I / 54
The general form of the create table command is:
create table r
(A1 D1,
A2 D2,
...,
An Dn,
<integrity-constraint 1>
...,
<integrity-constraint k> );
where r is the name of the relation, each Ai is the name of an attribute in the schema of relation r,
and Di is the domain of attribute Ai; that is, Di specifies the type of attribute Ai along with
optional constraints that restrict the set of allowed values for Ai.
SQL supports a number of different integrity constraints. An integrity constraint (IC) is a
condition specified on a database schema and restricts the data that can be stored in an instance
of the database. If a database instance satisfies all the integrity constraints specifies on the
database schema, it is a legal instance. A DBMS permits only legal instances to be stored in the
database. Many kinds of integrity constraints can be specified in the relational model:
Domain Constraints:
A relation schema specifies the domain of each field in the relation instance. These
domain constraints in the schema specify the condition that each instance of the relation has to
satisfy: The values that appear in a column must be drawn from the domain associated with that
column. Thus, the domain of a field is essentially the type of that field. It can be enforced using:
 Check
 NOT NULL.
The CHECK Constraint enables a condition to check the value being entered into a record. If the
condition evaluates to false, the record violates the constraint and isn't entered into the table.
SYNTAX:
Check <Conditional expression>
Ex : SQL> create table client(clno char(4),

clname varchar(20),
clcity varchar(15),
check(clno like 'c%'),
check(clname in upper(clname)),
check(clcity in('mumbai','newdelhi','chennai','culcutta')));
Table created.
UNIT – I / 55
NOT NULL
The NOT NULL constraint is a restriction placed on a column in a relational database table. It
enforces the condition that, in that column, every row of data must contain a value. It cannot be
left blank during insert or update operations. If this column is left blank, this will produce an
error message and the entire insert or update operation will fail.
Ex : SQL> create table student (rollno number(7) NOT NULL,
name varchar2(15),
branch char(4),dob date);
Note : NOT NULL constraint can be added only at Column level, but not table level.
Entity Integrity Constraints
Entity integrity is an integrity rule which states that every table must have a primary key and
that the column or columns chosen to be the primary key should be unique and not NULL
There are TWO types of Entity Integrity Constraints
1. Unique 2. primary key
allow NULL values.
A primary key, also called a primary keyword, is a key in a relational database that is unique
for each record. It is a unique identifier, it is used to prevent NULL values and also duplicate
values into a particular column of table. A table must always have one and only one primary
key.
Ex : SQL> create table student(rno number(4 ) primary key,
name varchar(20),
login varchar(20) unique,
dob date);
Referential integrity constraint
Referential Integrity Constraint is used to specify interdependencies between relations. This

constraint specifies a column or list of columns as a foreign key of the referencing table.
In order to provide Referential Integrity the conditions must exist
UNIT – I / 56

dname varchar(15),
location varchar(15));

empname varchar(20) not null,
designation varchar(15),
A newly created relation is empty initially. We can use the insert command to load data into the
relation. The Data Manipulation Language (DML) is used to retrieve, insert and modify
database information. These commands will be used by all database users during the routine
operation of the database. Let's take a brief look at the basic DML commands:
1. INSERT 2. UPDATE 3. DELETE
1. INSERT INTO: This is used to add records into a relation. These are three type of INSERT
INTO queries which are as
a) Inserting a single record
Syntax: INSERT INTO relationname(field_1,field_2,.field_n)VALUES
(data_1,data_2,........data_n);
Example: SQL>INSERT INTO student(sno,sname,class,address)VALUES
(1,’ SRI AJITH’,’B.Tech’,’KAVALI’);
b) Inserting multiple records
Syntax: INSERT INTO relation_name field_1,field_2,.....field_n) VALUES
(&data_1,&data_2,........&data_n);
Example: SQL>INSERT INTO student(sno,sname,class,address)
VALUES(&sno,’&sname’,’&class’,’&address’);
Enter value for sno: 101
Enter value for name: PREETHAM
Enter value for class: B.Tech
Enter value for name: KAVALI
2. UPDATE-SET-WHERE: This is used to update the content of a record in a relation.
Syntax: SQL>UPDATE relation name SET Field_name1=data,field_name2=data,
UNIT – I / 57
WHERE field_name=data;
Example: SQL>UPDATE student SET sname = ‘kumar’ WHERE sno=1;
3. DELETE-FROM: This is used to delete all the records of a relation but it will retain the
structure of that relation.
a) DELETE-FROM: This is used to delete all the records of relation.
Syntax: SQL>DELETE FROM relation_name;
Example: SQL>DELETE FROM std;
b) DELETE -FROM-WHERE: This is used to delete a selected record from a relation.
Syntax: SQL>DELETE FROM relation_name WHERE condition;
Example: SQL>DELETE FROM student WHERE sno = 2;
To remove a relation from an SQL database, we use the drop table command. The drop table
command deletes all information about the dropped relation from the database.
Syntax: drop table <relation name>;
is a more drastic action than delete from <relation name>;
We use the alter table command to add attributes to an existing relation. All tuples in the
relation are assigned null as the value for the new attribute. The form of the alter table command
is
(a)ALTER TABLE ...ADD...: This is used to add some extra fields into existing relation.
Syntax: ALTER TABLE relation_name ADD(new field_1 data_type(size), new field_2
data_type(size),..);
Example : SQL>ALTER TABLE std ADD(Address CHAR(10));
(b)ALTER TABLE...MODIFY...: This is used to change the width as well as data type of
fields of existing relations.
Syntax: ALTER TABLE relation_name MODIFY (field_1 newdata_type(Size), field_2
newdata_type(Size),....field_newdata_type(Size));
Example: SQL>ALTER TABLE student MODIFY(sname VARCHAR(10),class
VARCHAR(5));
UNIT – I / 58
2.3 Basic Structure of SQL Queries
The basic form of an SQL query is as follows:
SELECT [DISTINCT] select-list
FROM from-list
WHERE < Condition >
Every query must have a SELECT clause, which specifies columns to be retained in the
result, and a FROM clause, which specifies a cross-product of tables. The optional WHERE
clause specifies selection conditions on the tables mentioned in the FROM clause.
UNIT – I / 59
• The from-list in the FROM clause is a list of table names. A table name can be followed
by a range variable; a range variable is particularly useful when the same table name appears
more than once in the from-list.
• The select-list is a list of column names of tables named in the from-list. Column names
can be prefixed by a range variable.
• The condition in the WHERE clause is a boolean combination (i.e., an expression using
the logical connectives AND, OR, and NOT)
• The DISTINCT keyword is optional. It indicates that the table computed as an answer
to this query should not contain duplicates, that is, two copies of the same row. The default is
that duplicates are not eliminated.
Ex 1 : Find the' names and ages of all sailors.
SELECT DISTINCT S.sname, S.age FROM Sailors S
The answer is a set of rows, each of which is a pair (sname, age). If two or more sailors
have the same name and age, the answer still contains just one pair with that name and age. This
query is equivalent to applying the projection operator of relational algebra.
UNIT – I / 60
UNIT – I / 61
The Natural Join
Natural join is an SQL join operation that creates join on the base of the common columns in
the tables. To perform natural join there must be one common attribute(Column) between two
tables. Natural join will retrieve from multiple relations. It works in three steps.
Syntax: SQL > SELECT *
FROM TABLE1
NATURAL JOIN TABLE2;
Features of Natural Join :
1. It will perform the Cartesian product.
2. It finds consistent tuples and deletes inconsistent tuples.
3. Then it deletes the duplicate attributes.
Ex : Consider the query “For all instructors in the university who have taught
some course, find their names and the course ID of all courses they taught”,
which we wrote earlier as:
select name, course_id
from instructor, teaches
where instructor.ID= teaches.ID;
This query can be written more concisely using the natural-join operation in
SQL as:
select name, course_id
from instructor natural join teaches;
Both of the above queries generate the same result.
UNIT – I / 62
2.4 Additional Basic Operations
There are number of additional basic operations that are supported in SQL.
2.4.1 The Rename Operation
Consider the query below:
SQL > select name, courseid
where instructor.ID = teaches.ID;
The result of this query is a relation with the following attributes: name, courseid
SQL provides a way of renaming the attributes of a result relation. It uses the as clause,
taking the form:
SQL>select name as instructor_name, course id
where instructor.ID = teaches.ID;

'as' clause - Renaming relations
The as clause is particularly useful to replace a long relation name with a shortened version that
is more convenient to use elsewhere in the query.
SQL> select T.name, S.courseid
from instructor as T, teaches as S
where T.ID = S.ID;

Ex :
Q. Find the names of all instructors whose salary is greater than at least one instructor in
the Biology department.
 instructor (ID, name, dept_name, salary)
SQL > select distinct T.name
from instructor as T, instructor as S
where T.salary > S.salary and S.deptname = ’Biology’;
UNIT – I / 63
In the above query, T and S can be thought as aliases, that is as alternative names, for the relation
instructor.
2.4.2 String Operations
SQL specifies strings by enclosing them in single quotes, for example, ’Computer’. A single
quote character that is part of a string can be specified by using two single quote characters;
Example, the string "It’s right" can be specified by "It''s right".
The SQL standard specifies that the equality operation on strings is case sensitive; as a result the
expression "'comp. sci.' = 'Comp. Sci.'" evaluates to false.
Pattern matching can be performed on strings, using the operator like. We describe patterns by
using two special characters:
1. Percent (%): The % character matches any substring

2. Underscore (_): The character matches any character
To illustrate pattern matching, we consider the following examples:
o ’Intro%’ matches any string beginning with “Intro”.
o ’%Comp%’ matches any string containing “Comp” as a substring, for example,

’Intro. to Computer Science’, and ’Computational Biology’.
o ’_ _ _ ’ matches any string of exactly three characters.
o ’_ _ _%’ matches any string of at least three characters.
o Patterns are case sensitive; that is, uppercase characters do not match lowercase
characters, or vice versa.
o SQL allows us to search for mismatches instead of matches by using the not
like comparison operator.
Example: “Find the names of all departments whose building name includes the substring
‘Computer’.”
select deptname
from department
where building like ’%Watson%’;
For patterns to include the special pattern characters (that is,%and ), SQL allows the
specification of an escape character. The escape character is used immediately before a special
UNIT – I / 64
pattern character to indicate that the special pattern character is to be treated like a normal
character. We define the escape character for a like comparison using the escape keyword.
To illustrate, consider the following patterns, which use a backslash (\) as the escape character:
Q. Write a query that matches all strings beginning with “ab%cd”.
like ’ab\%cd%’ escape ’\’
Q . Write a query that matches all strings beginning with “ab\cd”.
like ’ab\\cd%’ escape ’\’
2.4.3 Attribute Specification in Select Clause

The asterisk symbol “ * ” can be usedin the select clause to denote “all attributes.” Thus, the use
of instructor.* in the select clause of the query:
SQL > select instructor.*
where instructor.ID= teaches.ID;
indicates that all attributes of instructor are to be selected. A select clause of the form select *
indicates that all attributes of the result relation of the from clause are selected.
2.4.4 Ordering the Display of Tuples
SQL offers the user some control over the order in which tuples in a relation are displayed. The
order by clause causes the tuples in the result of a query to appear in sorted order. To list in
alphabetic order all instructors in the Physics department, we write:
SQL > select name
from instructor
where dept name = ’Physics’
order by name;
By default, the order by clause lists items in ascending order. To specify the sort order, we may
specify desc for descending order or asc for ascending order. If several instructors have the same
salary, we order them in ascending order by name. We express this query in SQL as follows:
SQL > select *
from instructor
order by salary desc, name asc;
2.4.5 Where Clause Predicates
SQL includes a between comparison operator to simplify where clauses that specify that a value
be less than or equal to some value and greater than or equal to some other value. If we wish to
UNIT – I / 65
find the names of instructors with salary amounts between $90,000 and $100,000, we can use the
between comparison to write:
SQL > select name
from instructor
where salary between 90000 and 100000;
instead of:
SQL > select name
from instructor
where salary <= 100000 and salary >= 90000;
Similarly, we can use the not between comparison operator.
2.5 Set Operations

The SQL operations union, intersect, and except operate on relations and correspond to
the mathematical set-theory operations ∪, ∩, and −. Set operators are used to join the results of
two (or more) SELECT statements.
UNION is used to combine the results of two or more Select statements. However it will eliminate
duplicate rows from its result set. In case of union, number of columns and data type must be
same in both the tables.
Example of UNION
The First table,
ID Name
1 abhi
2 adam
The Second table,
ID Name
2 adam
UNIT – I / 66
3 Chester
Union SQL query will be,
select * from First

UNION
select * from second
The result table will look like,
ID NAME
1 abhi
2 adam
3 Chester
Union All
This operation is similar to Union. But it also shows the duplicate rows.
Union All query will be like,
select * from First

UNION ALL
The result table will look like,
ID NAME
1 abhi
2 adam
2 adam
3 Chester
Intersect
Intersect operation is used to combine two SELECT statements, but it only retuns the records
which are common from both SELECT statements. In case of Intersect the number of columns
and data type must be same. MySQL does not support INTERSECT operator.
Intersect query will be,
UNIT – I / 67
select * from First
INTERSECT
The result table will look like
ID NAME
2 adam
Minus
Minus operation combines result of two Select statements and return only those result which
belongs to first set of result. MySQL does not support INTERSECT operator. MINUS and
EXCEPT are exact synonyms.
Minus query will be,
select * from First

MINUS
ID NAME
1 abhi
2.6 Null Values
The SQL NULL is the term used to represent a missing value. A NULL value in a table is a
value in a field that appears to be blank.
A field with a NULL value is a field with no value. It is very important to understand that a
NULL value is different than a zero value or a field that contains spaces.
A field with a NULL value is one that has been left blank during record creation.
SQL uses the special keyword null in a predicate to test for a null value. Thus, to find all
instructors who appear in the instructor relation with null values for salary, we write:
SQL > select name
from instructor
where salary is null;
UNIT – I / 68
Logical Connectives AND, OR, and NOT
There are three Logical Operators namely, AND, OR, and NOT. These operators
compare two conditions at a time to determine whether a row can be selected for the output.
When retrieving data using a SELECT statement, you can use logical operators in the WHERE
clause, which allows you to combine more than one condition.
Logical
Description
Operators
For the row to be selected at least one of the conditions must
OR
be true.
For a row to be selected all the specified conditions must be
AND
true.
NOT For a row to be selected the specified condition must be false.
2.7 Aggregate Functions
An aggregate function is a function that derives a single value from a set of values from a
column. Aggregate functions must be used with SELECT or HAVING clauses.
Common aggregate functions include
Function Description
AVG(column) Returns the average value of a column
COUNT(column) Returns the number of rows (without a NULL value) of a column
COUNT(*) Returns the number of selected rows
COUNT(DISTINCT column) Returns the number of distinct results
MAX(column) Returns the highest value of a column
MIN(column) Returns the lowest value of a column
SUM(column) Returns the sum of a column
Ex : List the sum salary of all employees dept wise.
SQL > Select deptno,sum(sal) from emp group by deptno;
Ex : Display the average salary dept wise
SQL > Select deptno,avg(sal) from emp group by deptno;.
Ex : Display the maximum salary in each department.
SQL > Select deptno,max(sal) from emp group by deptno;
Ex : Display the minimum salary in each department
UNIT – I / 69
SQL > Select deptno,min(sal) from emp group by deptno;
Ex : List the number of employees working in each department.
SQL > Select deptno,count(*) from emp group by deptno;
Ex : Display total salary department wise where the dept wise total salary is above 5000.
SQL > select deptno,sum(sal) from emp group by deptno having sum(sal) >= 5000;
Ex : List the deptname and average salary of them.
SQL > Select dname,avg(sal) from emp,dept

where emp.deptno=dept.deptno group by dname;
Ex : List the deptname and sum of salary of them.
SQL > Select dname,sum(sal) from emp,dept

where emp.deptno=dept.deptno group by dname;
EX : Display the details of the employee whose salary is maximum.
SQL > Select * from emp where sal = (select max(sal) from emp);
Ex : Find second maximum salary
SQL > Select max(sal) from emp where

sal < (select max(sal) from emp);
2.8 Nested Subqueries

A nested query is a query that has another query embedded within it; the embedded query is
called a sub query. i.e Subquery or Inner query or Nested query is a query in a query. A
subquery typically appears within the WHERE clause of a query. Subqueries can sometimes
appear in the FROM clause or the HAVING clause
 The subquery can be nested inside a SELECT, INSERT, UPDATE, or DELETE statement or
inside another subquery.
 A subquery is usually added within the WHERE Clause of another SQL SELECT statement.
 You can use the comparison operators, such as >, <, or =. The comparison operator can also
be
a multiple-row operator, such as IN, ANY, or ALL.
UNIT – I / 70
 A subquery is also called an inner query or inner select, while the statement containing a
subquery is also called an outer query or outer select.
 The inner query executes first before its parent query so that the results of inner query can be
passed to the outer query.
Correlated Subquery
A query is called correlated subquery when both the inner query and the outer query are
interdependent. For every row processed by the inner query, the outer query is processed as well.
The inner query depends on the outer query before it can be processed.
Ex : To find the employees whose salary is equal to the salary of at least one employee in
department of id 300?
SQL > SELECT EMPLOYEE_ID, SALARY

FROM EMPLOYEES
WHERE SALARY IN
( SELECT SALARY
FROM EMPLOYEES
WHERE DEPARTMENT_ID = 300);
Ex : To find the employees whose salary is greater than at least on employee in department of id
500?
Sql > SELECT EMPLOYEE_ID, SALARY

FROM EMPLOYEES
WHERE SALARY > ANY
( SELECT SALARY
FROM EMPLOYEES
WHERE DEPARTMENT_ID = 500);
Ex : Write a query to find the employees whose salary is less than the salary of all employees in
department of id 100?
SQL > SELECT EMPLOYEE_ID, SALARY

FROM EMPLOYEES
WHERE SALARY < ALL
( SELECT SALARY
UNIT – I / 71
FROM EMPLOYEES
WHERE DEPARTMENT_ID = 100 );
Ex : Write a query to find the employees whose manager and department should match with the
employee of id 20 or 30?
SQL > SELECT EMPLOYEE_ID, MANAGER_ID,

DEPARTMENT_ID
FROM EMPLOYEES
WHERE (MANAGER_ID,DEPARTMENT_ID) IN
( SELECT MANAGER_ID,
DEPARTMENT_ID
FROM EMPLOYEES
WHERE EMPLOYEE_ID IN (20,30) );
Ex . Write a query to list the department names which have at least one employee?
SQL > SELECT DEPARTMENT_ID,

DEPARTMENT_NAME
FROM DEPARTMENTS D
WHERE EXISTS
(
SELECT 1
FROM EMPLOYEES E
WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID)
Ex : Write a query to find the departments which do not have employees at all?
SQL > SELECT DEPARTMENT_ID,

DEPARTMENT_NAME
FROM DEPARTMENTS D
WHERE NOT EXISTS
(
SELECT 1
FROM EMPLOYEES E
WHERE E.DEPARTMENT_ID = D.DEPARTMENT_ID)
2.9 Modification of the Database

The data modification clauses in SQL are INSERT, UPDATE, and DELETE statements. It is
used for inserting new rows, updating existing values, or deleting rows from the database.
1. INSERT 2. UPDATE 3. DELETE

1. INSERT INTO: This is used to add records into a relation. These are three type of INSERT
INTO queries which are as
UNIT – I / 72
a) Inserting a single record
Syntax: INSERT INTO relationname(field_1,field_2,.field_n)VALUES
(data_1,data_2,........data_n);
Example: SQL>INSERT INTO student(sno,sname,class,address)VALUES
(1,’ SRI AJITH’,’B.Tech’,’KAVALI’);
b) Inserting all records from another relation
Syntax: INSERT INTO relation_name_1 SELECT Field_1,field_2,field_n
FROM relation_name_2 WHERE field_x=data;
Example: SQL>INSERT INTO std SELECT sno,sname FROM student
WHERE name = ‘SRI AJITH‘;
c) Inserting multiple records

Syntax: INSERT INTO relation_name field_1,field_2,.....field_n) VALUES
(&data_1,&data_2,........&data_n);
Example: SQL>INSERT INTO student(sno,sname,class,address)
VALUES(&sno,’&sname’,’&class’,’&address’);
Enter value for sno: 101
Enter value for name: PREETHAMi
Enter value for class: B.Tech
Enter value for name: KAVALI
2. UPDATE-SET-WHERE: This is used to update the content of a record in a relation.
Syntax: SQL>UPDATE relation name SET Field_name1=data,field_name2=data,
WHERE field_name=data;
Example: SQL>UPDATE student SET sname = ‘kumar’ WHERE sno=1;
3. DELETE-FROM: This is used to delete all the records of a relation but it will retain the
structure of that relation.
a) DELETE-FROM: This is used to delete all the records of relation.
Syntax: SQL>DELETE FROM relation_name;
Example: SQL>DELETE FROM std;
b) DELETE -FROM-WHERE: This is used to delete a selected record from a relation.
Syntax: SQL>DELETE FROM relation_name WHERE condition;
UNIT – I / 73
Example: SQL>DELETE FROM student WHERE sno = 2;
Intermediate SQL
2.10 Joins
The join operation is one of the most useful operations in relational algebra and the most
commonly used way to combine information from two or more relations. Although a join can be
defined as a cross-product followed by selections and projections. It is denoted by ⋈.
There are TWO types of Joins

2. Inner Join 2. Outer Joins
Inner Join can be classified into three ways

2) Condition Join or Theta Join 2) Equi Join 3) Natural Join
Outer Join can be classified as
2) Left outer Join 2) Right outer Join 3) Full outer Join
Condition Join or Theta Join : Theta join combines tuples from different relations provided
they satisfy the theta condition. The join condition is denoted by the symbol θ.
Notation : R1 ⋈θ R2
or
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the
attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
UNIT – I / 74
Theta join can use all kinds of comparison operators.
Student
SID Name Std
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Ex : STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail
SID Name Std Class Subject
101 Alex 10 10 Math
101 Alex 10 10 English
102 Maria 11 11 Music
102 Maria 11 11 Sports
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above
example corresponds to equijoin.
Natural Join ( ⋈)
It is a binary operator that is written as (R ⋈S) where R and S are relations. The result of the
natural join is the set of all combinations of tuples in R and S that are equal on their common
attribute names. For an example consider the tables Employee and Dept and their natural join:
Employee Dept
Name EmpId DeptName DeptName Manager Employee ⋈ Dept
Harry 3415 Finance Finance George DeptNam
Name EmpId Manager
Sally 2241 Sales Sales Harriet e
George 3401 Finance Production Charles Harry 3415 Finance George
Harriet 2202 Sales Sally 2241 Sales Harriet
George 3401 Finance George
Harriet 2202 Sales Harriet
UNIT – I / 75
Note : The Default join operation used in the join is Natural Join. But Natural Join operation
results in some loss of Information.
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only
those tuples with matching attributes and the rest are discarded in the resulting relation.
Therefore, we need to use outer joins to include all the tuples from the participating relations in
the resulting relation. There are three kinds of outer joins − left outer join, right outer join,
and full outer join.
Left Outer Join(R S)
All the tuples from the Left relation, R, are included in the resulting relation. If there are
tuples in R without any matching tuple in the Right relation S, then the S-attributes of the
resulting relation are made NULL.
Right Outer Join: ( R S)
All the tuples from the Right relation, S, are included in the resulting relation. If there are tuples
in S without any matching tuple in R, then the R-attributes of resulting relation are made NULL.
Full Outer Join: ( R S)
All the tuples from both participating relations are included in the resulting relation. If there are
no matching tuples for both relations, their respective unmatched attributes are made NULL.
Employee
Employee_name Street City
Coyote Toon Hollywood
Rabbit Tunnel Carrotvile
UNIT – I / 76
Williams Seaview Seattle
Works
Employee_name Branch Name Salary
Coyote Mesa 1500
Rabbit Mesa 1300
Gates Redmond 5300
Williams Redmond 1500
Result of Employee works

NULL NULL

UNIT – I / 77

NULL NULL
Gates Redmond 5300


NULL NULL
Gates Redmond 5300
NULL NULL
2.11 Views
UNIT – I / 78
A view is a table whose rows are not explicitly stored in the database but are computed as needed
from a view definition. A view is nothing more than a SQL statement that is stored in the
database with an associated name. A view is actually a composition of a table in the form of a
predefined SQL query.
A view can contain all rows of a table or select rows from a table. A view can be created from
one or many tables which depends on the written SQL query to create a view.
Views, which are kind of virtual tables, allow users to do the following:
 Structure data in a way that users or classes of users find natural or intuitive.
 Restrict access to the data such that a user can see and (sometimes) modify exactly what
they need and no more.
 Summarize data from various tables which can be used to generate reports.
Creating Views:
Database views are created using the CREATE VIEW statement. Views can be created from a
single table, multiple tables, or another view.
To create a view, a user must have the appropriate system privilege according to the specific
implementation. The basic CREATE VIEW syntax is as follows:
CREATE VIEW view_name AS

SELECT column1, column2.....
FROM table_name
WHERE [condition];
You can include multiple tables in your SELECT statement in very similar way as you use them
in normal SQL SELECT query.
Updating a View: A view can be updated under certain conditions:
 The SELECT clause may not contain the keyword DISTINCT.

 The SELECT clause may not contain summary functions.
 The SELECT clause may not contain set functions.
 The SELECT clause may not contain set operators.
 The SELECT clause may not contain an ORDER BY clause.
 The FROM clause may not contain multiple tables.
UNIT – I / 79
 The WHERE clause may not contain subqueries.
 The query may not contain GROUP BY or HAVING.
 Calculated columns may not be updated.
Deleting Rows into a View:
Rows of data can be deleted from a view. The same rules that apply to the UPDATE and
INSERT commands apply to the DELETE command.
Dropping Views:
Obviously, where you have a view, you need a way to drop the view if it is no longer needed.
The syntax is very simple as given below:
DROP VIEW view_name;
Ex :
1) Create table for Costumer and insert records
Consider the CUSTOMERS table having the following records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
2) This view would be used to have customer name and age from CUSTOMERS table
SQL > CREATE VIEW CUSTOMERS_VIEW AS

SELECT name, age
FROM CUSTOMERS;
Now, you can query CUSTOMERS_VIEW in similar way as you query an actual table.
SQL > SELECT * FROM CUSTOMERS_VIEW;

+----------+-----+
| name | age |
+----------+-----+
| Ramesh | 32 |
| Khilan | 25 |
UNIT – I / 80
| kaushik | 23 |
| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+----------+-----+
3)Update the age of Ramesh using CUSTOMERS_VIEW
SQL > UPDATE CUSTOMERS_VIEW

SET AGE = 35
WHERE name='Ramesh';
This would ultimately update the base table CUSTOMERS and same would reflect in the view
itself. Now, try to query base table, and SELECT statement would produce the following result:
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
4)Delete a record having AGE= 22 using CUSTOMERS_VIEW
SQL > DELETE FROM CUSTOMERS_VIEW WHERE age = 22;
This would ultimately delete a row from the base table CUSTOMERS and same would reflect in
the view itself. Now, try to query base table, and SELECT statement would produce the
following result:
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
6)Drop CUSTOMERS_VIEW from CUSTOMERS table:
SQL> DROP VIEW CUSTOMERS_VIEW;
UNIT – I / 81
2.12 Transactions
A transaction consists of a sequence of query and/or update statements. The SQL standard
specifies that a transaction begins implicitly when an SQL statement is executed. One of the
following SQL statements must end the transaction:
• Commit : COMMIT in SQL is a transaction control language that is used to permanently
save the changes done in the transaction in tables/databases. The database cannot regain its
previous state after its execution of commit.
• Rollback : ROLLBACK in SQL is a transactional control language that is used to undo
the transactions that have not been saved in the database. The command is only been used to
undo changes since the last COMMIT.
Difference between COMMIT and ROLLBACK

COMMIT ROLLBACK
COMMIT permanently saves the changes ROLLBACK undo the changes made by
1. made by the current transaction. the current transaction.
The transaction can not undo changes after Transaction reaches its previous state
2. COMMIT execution. after ROLLBACK.
When the transaction is aborted, incorrect

When the transaction is successful, execution, system failure ROLLBACK
3. COMMIT is applied. occurs.
In ROLLBACK statement if any

operations fail during the completion of a
COMMIT statement permanently save the transaction, it cannot permanently save
state, when all the statements are executed the change and we can undo them using
4. successfully without any error. this statement.
UNIT – I / 82
COMMIT ROLLBACK
Syntax of COMMIT statement are: Syntax of ROLLBACK statement are:

COMMIT; ROLLBACK;
5.
2.13 Integrity Constraints

An integrity constraint (IC) is a condition specified on a database schema and restricts
the data that can be stored in an instance of the database. If a database instance satisfies all the
integrity constraints specifies on the database schema, it is a legal instance. A DBMS permits
only legal instances to be stored in the database.
Many kinds of integrity constraints can be specified in the relational model:
Domain Constraints:
A relation schema specifies the domain of each field in the relation instance. These
domain constraints in the schema specify the condition that each instance of the relation has to
satisfy: The values that appear in a column must be drawn from the domain associated with that
column. Thus, the domain of a field is essentially the type of that field. It can be enforced using:
 Check
 NOT NULL.
The CHECK Constraint enables a condition to check the value being entered into a record. If the
condition evaluates to false, the record violates the constraint and isn't entered into the table.
SYNTAX:
Check <Conditional expression>
Ex : SQL> create table client(clno char(4),

clname varchar2(20),
clcity varchar2(15),
check(clno like 'c%'),
check(clname in upper(clname)),
check(clcity in('mumbai','newdelhi','chennai','culcutta')));
Table created.
UNIT – I / 83
NOT NULL
The NOT NULL constraint is a restriction placed on a column in a relational database table. It
enforces the condition that, in that column, every row of data must contain a value. It cannot be
left blank during insert or update operations. If this column is left blank, this will produce an
error message and the entire insert or update operation will fail.
Ex : SQL> create table student (rollno number(7) NOT NULL,
name varchar2(15),
branch char(4),dob date);
Note : NOT NULL constraint can be added only at Column level, but not table level.
Entity Integrity Constraints
Entity integrity is an integrity rule which states that every table must have a primary key and
that the column or columns chosen to be the primary key should be unique and not NULL
There are TWO types of Entity Integrity Constraints
2. Unique 2. primary key
allow NULL values.
A primary key, also called a primary keyword, is a key in a relational database that is unique
for each record. It is a unique identifier, it is used to prevent NULL values and also duplicate
values into a particular column of table. A table must always have one and only one primary
key.
Ex : SQL> create table student(rno number(4 ) primary key,
name varchar2(20),
login varchar2(20) unique,
dob date);
Referential integrity constraint
Referential Integrity Constraint is used to specify interdependencies between relations. This

constraint specifies a column or list of columns as a foreign key of the referencing table.
In order to provide Referential Integrity the conditions must exist
UNIT – I / 84

dname varchar2(15),
location varchar2(15));
empname varchar2(20) not null,
designation varchar2(15),
Current database systems support such general constraints in the form of table
constraints and assertions. Table constraints are associated with a single table and checked
whenever that table is modified. In contrast, assertions involve several tables and are checked
whenever any of these tables is modified.
Assertions
An assertion is a predicate expressing a condition we wish the database to always satisfy.
DBMS checks the assertion after any change that may violate the expression
Ex : For table constraint, which ensures always the salary of an employee, is above 1000:
CREATE TABLE employee (eid number(10), ename varchar2(20), salary number(10,2),
CHECK(salary>1000));
Ex : create assertion spousal_supervisor check(supervisorid < >spousalid);
2.14 SQL Data Types and Schemas

2.14.1 Date and Time Types in SQL
SQL standard supports several data types relating to dates and times:
 date: A calendar date containing a (four-digit) year, month, and day of the month.
 time: The time of day, in hours, minutes, and seconds. A variant, time(p), can be used to
specify the number of fractional digits for seconds (the default being 0).
 timestamp: A combination of date and time. A variant, timestamp(p), can be used to
specify the number of fractional digits for seconds (the default here being 6). Time-zone
information is also stored if with time zone is specified.
UNIT – I / 85
Date and time values can be specified like this:
date ’2022-10-22’
time ’09:30:00’
timestamp ’2022-10-22 10:29:01.45’
Dates must be specified in the format year followed by month followed by day, as shown. The
seconds field of time or timestamp can have a fractional part, as in the timestamp above.
SQL defines several functions to get the current date and time. For example, current_date
returns the current date, current_time returns the current time (with time zone), and local_time
returns the current local time (without time zone). Timestamps (date plus time) are returned by
current_timestamp (with time zone) and localtimestamp (local date and time without time
zone).
2.14.2 Default Values
Default integrity constraint is also called as default column value. It is used to define
value for a column. the default value can help to avoid errors as zero that there is a number such
as zero that applies to a column that has no entry.
SYNTAX: <columnname> <datatype><size>default typical value
EX1 : SQL> create table enrolled (rno number(4),

cousno char(3) primary key,
grade char(1) default 'A');
SQL > insert into enrolled2(rno,cousno)values(&rno,'&cousno')
EX2 : SQL> create table student

(ID varchar (5),
name varchar (20) not null,
dept_name varchar (20),
totcred numeric (3,0) default 0,
primary key (ID));
The default value of the tot cred attribute is declared to be 0. As a result, when a tuple is inserted
into the student relation, if no value is provided for the totcred attribute, its value is set to 0. The
following insert statement illustrates how an insertion can omit the value for the tot cred
attribute.
SQL> insert into student(ID, name, dept name) values (’12789’, ’Newman’, ’Comp. Sci.’);
2.14.3 Index Creation
UNIT – I / 86
An index on an attribute of a relation is a data structure that allows the database system to find
those tuples in the relation that have a specified value for that attribute efficiently, without
scanning through all the tuples of the relation.
SQL> create index studentID index on student(ID);
The above statement creates an index named studentID index on the attribute ID of the relation
student.
2.14.4 Large-Object Types

Many current-generation database applications need to store attributes that can be large (of the
order of many kilobytes), such as a photograph, or very large (of the order of many megabytes or
even gigabytes), such as a high-resolution medical image or video clip. SQL therefore provides
large-object data types for character data (clob) and binary data (blob). The letters “lob” in these
data types stand for “Large OBject.” For example, we may declare attributes
book review clob(10KB)
image blob(10MB)
movie blob(2GB)
2.14.5 User-Defined Types
SQL supports two forms of user-defined data types. The first form, which we cover here, is
called distinct types. The other form, called structured data types, allows the creation of
complex data types with nested record structures, arrays, and multisets.
The create type clause can be used to define new types. For example, the statements:
create type Dollars as numeric(12,2);
create type Pounds as numeric(12,2);
For example, we can declare the department table as:
(dept name varchar (20),
building varchar (15),
budget Dollars);
UNIT – I / 87
SQL provides drop type and alter type clauses to drop or modify types that have been created
earlier.
2.14.6 Create Table Extensions
Applications often require creation of tables that have the same schema as an existing table.
EX : creating an employee table from existing table
Description: To copy the structure and records from the old name into newtable
Syntax:
SQL > Create table<new table name>as select <colomn name>from<old table
name>[where<condition>];
SQL > Insert into <table name>(select * from <oldtable name>);
Sol :
SQL> create table employee1(eno number(10),ename varchar2(20),salary number(10));

Table created.
SQL> desc employee1
Name Null? Type
-----------------------------------------------------------------
ENO Number(10)
ENAME Varchar2(20)
SALARY Number(10)
SQL> insert into employee1 values(&eno,'&ename',&salary);

SQL> create table employee2(eno ,ename ,salary)as select eno ,ename ,salary from employee1;
Table created.
SQL> desc employee2
SQL> create table employee3 as select from employee2 where 1=2;
Table created.
2.15 Authorization
Authorizations on data include:
o Authorization to read data.
o Authorization to insert new data.
UNIT – I / 88
o Authorization to update data.
o Authorization to delete data.
Each of these types of authorizations is called a privilege. We may authorize the user all, none,
or a combination of these types of privileges on specified parts of a database, such as a relation
or a view. When a user submits a query or an update, the SQL implementation first checks if the
query or update is authorized, based on the authorizations that the user has been granted. If the
query or update is not authorized, it is rejected.
A user who has some form of authorization may be allowed to pass on (grant) this authorization
to other users, or to withdraw (revoke) an authorization that was granted earlier. The ultimate
form of authority is that given to the database administrator. The database administrator may
authorize new users, restructure the database, and so on. This form of authorization is analogous
to that of a superuser, administrator, or operator for an operating system.
2.15.1 Granting and Revoking of Privileges
The SQL standard includes the privileges select, insert, update, and delete. The privilege all
privileges can be used as a short form for all the allowable privileges. A user who creates a new
relation is given all privileges on that relation automatically.
The SQL data-definition language includes commands to grant and revoke privileges. The grant
statement is used to confer authorization. The basic form of this statement is:
grant <privilege list>
on <relation name or view name>
to <user/role list>;
The privilege list allows the granting of several privileges in one command. The select
authorization on a relation is required to read tuples in the relation. The following grant
statement grants database users Amit and Satoshi select authorization on the department relation:
grant select on department to Amit, Satoshi;
To revoke an authorization, we use the revoke statement. It takes a form almost identical to that
of grant:
revoke <privilege list>
on <relation name or view name>
from <user/role list>;
Thus, to revoke the privileges that we granted previously,
UNIT – I / 89
revoke select on department from Amit, Satoshi;
revoke update (budget) on department from Amit, Satoshi;
Advanced SQL
2.16 Accessing SQL From a Programming Language
SQL provides a powerful declarative query language. Writing queries in SQL is usually much
easier than coding the same queries in a general-purpose programming language. However, a
database programmer must have access to a general-purpose programming language for at least
two reasons:
1. Not all queries can be expressed in SQL, since SQL does not provide the full
expressive power of a general-purpose language. That is, there exist queries that can be
expressed in a language such as C, Java, or Cobol that cannot be expressed in SQL. To
write such queries, we can embed SQL within a more powerful language.
2. Nondeclarative actions—such as printing a report, interacting with a user, or sending the
results of a query to a graphical user interface—cannot be done from within SQL.
Applications usually have several components, and querying or updating data is only one
component; other components are written in general-purpose programming languages.
There are two approaches to accessing SQL from a general-purpose programming language:
 Dynamic SQL: A general-purpose program can connect to and communicate with a
database server using a collection of functions (for procedural languages) or methods (for
object-oriented languages). Dynamic SQL allows the program to construct an SQL query
as a character string at runtime, submit the query, and then retrieve the result into
UNIT – I / 90
program variables a tuple at a time. The dynamic SQL component of SQL allows
programs to construct and submit SQL queries at runtime.
 Embedded SQL: Like dynamic SQL, embedded SQL provides a means by which a
program can interact with a database server. However, under embedded SQL, the SQL
statements are identified at compile time using a pre-processor. The pre-processor
submits the SQL statements to the database system for pre compilation and optimization;
then it replaces the SQL statements in the application program with appropriate code and
function calls before invoking the programming-language compiler.
2.16.1 JDBC
The JDBC standard defines an application program interface (API) that Java programs can
use to connect to database servers. (The word JDBC was originally an abbreviation for Java
Database Connectivity)
Connecting to the Database: The first step in accessing a database from a Java program is to
open a connection to the database. This step is required to select which database to use.
connection is opened using the getConnection method of the Driver Manager class (within
java.sql). This method takes three parameters
 The first parameter to the getConnection call is a string that specifies the URL, or
machine name, where the server runs, along with possibly some other information such
as the protocol to be used to communicate with the database, the port number the
database system uses for communication, and the specific database on the server to be
used.
 The second parameter to getConnection is a database user identifier, which is a string.
 The third parameter is a password, which is also a string.
Each database product that supports JDBC provides a JDBC driver that must be dynamically
loaded in order to access the database from Java. In fact, loading the driver must be done first,
before connecting to the database.
Shipping SQL Statements to the Database System:
Once a database connection is open, the program can use it to send SQL statements to the
database system for execution. This is done via an instance of the class Statement. A Statement
UNIT – I / 91
object is not the SQL statement itself, but rather an object that allows the Java program to invoke
methods that ship an SQL statement given as an argument for execution by the database system.
Other Features :
JDBC provides a number of other features, such as updatable result sets. It can create an
updatable result set from a query that performs a selection and/or a projection on a database
relation. An update to a tuple in the result set then results in an update to the corresponding tuple
of the database relation.
2.16.2 ODBC
The Open Database Connectivity (ODBC) standard defines an API that applications can use to
open a connection with a database, send queries and updates, and get back results. Applications
such as graphical user interfaces, statistics pack ages, and spreadsheets can make use of the same
ODBC API to connect to any database server that supports ODBC. Each database system
supporting ODBC provides a library that must be linked with the client program. When the client
program makes an ODBC API call, the code in the library communicates with the server to carry
out the requested action, and fetch results. The ODBC standard defines conformance levels,
which specify subsets of the functionality defined by the standard. An ODBC implementation
may provide only core level features, or it may provide more advanced (level 1 or level 2)
features. Level 1 requires support for fetching information about the catalog, such as information
about what relations are present and the types of their attributes. Level 2 requires further
features, such as the ability to send and retrieve arrays of parameter values and to retrieve more
detailed catalog information.
The SQL standard defines a call level interface (CLI) that is similar to the ODBC interface.
2.16.3 Embedded SQL
The SQL standard defines embeddings of SQL in a variety of programming languages, such as
C, C++, Cobol, Pascal, Java, PL/I, and Fortran. A language in which SQL queries are embedded
is referred to as a host language, and the SQL structures permitted in the host language constitute
embedded SQL.
Programs written in the host language can use the embedded SQL syntax to access and
update data stored in a database. An embedded SQL program must be processed by a special pre-
processor prior to compilation. The pre-processor replaces embedded SQL requests with host-
language declarations and procedure calls that allow runtime execution of the database accesses.
UNIT – I / 92
Then, the resulting program is compiled by the host-language compiler. This is the main
distinction between embedded SQL and JDBC or ODBC.
To identify embedded SQL requests to the pre-processor, we use the EXEC SQL statement; it
has the form:
EXEC SQL <embedded SQL statement >;
2.17 Functions and Procedures

A procedures or function is a group or set of SQL and PL/SQL statements that perform a
specific task."
A function and procedure is a named PL/SQL Block which is similar. The major difference
between a procedure and a function is, a function must always return a value, but a procedure
may or may not return a value.
Procedures:
A procedure is a named PL/SQL block which performs one or more specific task. This is similar
to a procedure in other programming languages. A procedure has a header and a body.
The header consists of the name of the procedure and the parameters or variables passed to the
procedure.
The body consists or declaration section, execution section and exception section similar
to a general PL/SQL Block. A procedure is similar to an anonymous PL/SQL Block but it is
named for repeated usage. We can pass parameters to procedures in three ways :
Parameters Description
IN type These types of parameters are used to send values to stored procedures.
OUT type These types of parameters are used to get values from stored procedures. This is similar
to a return type in functions.
UNIT – I / 93
IN OUT type These types of parameters are used to send values and get values from stored
procedures.
Syntax:
CREATE [OR REPLACE] PROCEDURE procedure_name (<Argument> {IN, OUT, IN
OUT} <Datatype>,…)
IS
Declaration section<variable, constant> ;
BEGIN
Execution section
EXCEPTION
Exception section
END
IS - marks the beginning of the body of the procedure and is similar to DECLARE in anonymous
PL/SQL Blocks. The code between IS and BEGIN forms the Declaration section.
The syntax within the brackets [ ] indicate they are optional.
By using CREATE OR REPLACE together the procedure is created if no other procedure with
the same name exists or the existing procedure is replaced with the current code.
To execute a procedure:
 From the SQL prompt : EXECUTE [or EXEC] procedure_name;
Functions:
A function is a named PL/SQL Block which is similar to a procedure. The major difference
between a procedure and a function is, a function must always return a value, but a procedure
may or may not return a value.
Syntax:
CREATE [OR REPLACE] FUNCTION function_name [parameters]
RETURN return_datatype; {IS, AS}
Declaration_section <variable,constant> ;
BEGIN
Execution_section
Return return_variable;
UNIT – I / 94
EXCEPTION
exception section
Return return_variable;
END;
RETURN TYPE: The header section defines the return type of the function. The return datatype
can be any of the oracle datatype like varchar, number etc.
The execution and exception section both should return a value which is of the datatype defined
in the header section.
A function can be executed in the following ways.
 As a part of a SELECT statement : SELECT emp_details_func FROM dual;
 In a PL/SQL Statements : dbms_output.put_line(emp_details_func);
2.18 Triggers
A trigger is a statement that the system executes automatically as a side effect of a modification
to the database. To design a trigger mechanism, we must meet two requirements:
1. Specify when a trigger is to be executed. This is broken up into an event (INSERT,
UPDATE or DELETE) that causes the trigger to be checked and a condition that must be
satisfied for trigger execution to proceed.
2. Specify the actions to be taken when the trigger executes.
Once we enter a trigger into the database, the database system takes on the responsibility of
executing it whenever the specified event occurs and the corresponding condition is satisfied.
Syntax
create trigger Trigger_name
(before | after)
[insert | update | delete]
on [table_name]
[for each row]
[trigger_body]
1. CREATE TRIGGER: These two keywords specify that a triggered block is going to
be declared.
UNIT – I / 95
2. TRIGGER_NAME: It creates or replaces an existing trigger with the Trigger_name.
The trigger name should be unique.
3. BEFORE | AFTER: It specifies when the trigger will be initiated i.e. before the
ongoing event or after the ongoing event.
4. INSERT | UPDATE | DELETE: These are the DML operations and we can use either
of them in a given trigger.
5. ON[TABLE_NAME]: It specifies the name of the table on which the trigger is going
to be applied.
6. FOR EACH ROW: Row-level trigger gets executed when any row value of any
column changes.
7. TRIGGER BODY: It consists of queries that need to be executed when the trigger is
called.
Example
Suppose we have a table named Student containing the attributes Student_id, Name, Address,
and Marks.
Now, we want to create a trigger that will add 100 marks to each new row of
the Marks column whenever a new student is inserted to the table.
The SQL Trigger will be:
CREATE TRIGGER Add_marks

BEFORE
INSERT
ON Student
FOR EACH ROW
SET new.Marks = new.Marks + 100;
The new keyword refers to the row that is getting affected.
After creating the trigger, we will write the query for inserting a new student in the database.
SQL > INSERT INTO Student(Name, Address, Marks) VALUES('Alizeh', 'Maldives', 110);
UNIT – I / 96
The Student_id column is an auto-increment field and will be generated automatically when a
new record is inserted into the table.
To see the final output the query would be: SELECT * FROM Student;
Advantages of Triggers
1. Triggers provide a way to check the integrity of the data. When there is a change in the
database the triggers can adjust the entire database.
2. Triggers help in keeking User Interface lightweight. Instead of putting the same
function call all over the application you can put a trigger and it will be executed.
Disadvantages of Triggers
1. Triggers may be difficult to troubleshoot as they execute automatically in the database.

If there is some error then it is hard to find the logic of trigger because they are fired
before or after updates/inserts happen.
2. The triggers may increase the overhead of the database as they are executed every time
any field is updated.
2.19 OLAP
An online analytical processing (OLAP) system is an interactive system that permits an
analyst to view different summaries of multidimensional data.
UNIT – I / 97
Online analytical processing (OLAP) is defined as “The dynamic synthesis, analysis, and
consolidation large volumes of multi-dimensional data.”
OLAP enables users to gain a deeper understanding and knowledge about various aspects
of their corporate data through fast, consistent, interactive access to a wide variety of possible
views of the data.
OLAP databases are divided into one or more cubes and these cubes are known as Hyper-
cubes.
There are the following key features of OLAP:
Multi-dimensional views of data : A multi-dimensional view of data provides the basis for
analytical processing through flexible access to corporate data. It enables users to analyze data
across any dimension at any level of aggregation with equal functionality and ease.
Support for complex calculations : OLAP software must provide a range of powerful
computational methods such as that required by sales forecasting such as moving averages and
percentage growth.
Time intelligence : Time intelligence is used to judge the performance of almost any analytical
application over time. For example, this month versus last month or this month versus the same
month last year or a user may require to view, the sales of the month of Mayor the sales for the
first five months of 2007. Concepts such as year-to-date and period-over-period comparisons
should be easily defined in an OLAP system.
Types of OLAP Servers

We have four types of OLAP servers
 Relational OLAP (ROLAP)

 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
 Specialized SQL Servers
Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end tools. To
store and manage warehouse data, ROLAP uses relational or extended-relational DBMS.
ROLAP includes the following −
UNIT – I / 98
 Implementation of aggregation navigation logic.
 Optimization for each DBMS back end.
 Additional tools and services.
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of data.
With multidimensional data stores, the storage utilization may be low if the data set is sparse.
Therefore, many MOLAP server use two levels of data storage representation to handle dense
and sparse data sets.
Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of
ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large data
volumes of detailed information. The aggregations are stored separately in MOLAP store.
Specialized SQL Servers

Specialized SQL servers provide advanced query language and query processing support for
SQL queries over star and snowflake schemas in a read-only environment.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP
operations in multidimensional data.
Here is the list of OLAP operations −
 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
 By climbing up a concept hierarchy for a dimension

 By dimension reduction
The following diagram illustrates how roll-up works.
UNIT – I / 99
 Roll-up is performed by climbing up a concept hierarchy for the dimension location.
 Initially the concept hierarchy was "street < city < province < country".
 On rolling up, the data is aggregated by ascending the location hierarchy from the level of
city to the level of country.
 The data is grouped into cities rather than countries.
 When roll-up is performed, one or more dimensions from the data cube are removed.
Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the following ways −
 By stepping down a concept hierarchy for a dimension

 By introducing a new dimension.
The following diagram illustrates how drill-down works
UNIT – I / 100
 Drill-down is performed by stepping down a concept hierarchy for the dimension time.
 Initially the concept hierarchy was "day < month < quarter < year."
 On drilling down, the time dimension is descended from the level of quarter to the level
of month.
 When drill-down is performed, one or more dimensions from the data cube are added.
 It navigates the data from less detailed data to highly detailed data.
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-
cube. Consider the following diagram that shows how slice works.
UNIT – I / 101
 Here Slice is performed for the dimension "time" using the criterion time = "Q1".
 It will form a new sub-cube by selecting one or more dimensions.
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider
the following diagram that shows the dice operation.
UNIT – I / 102
The dice operation on the cube based on the following selection criteria involves three
dimensions.
 (location = "Toronto" or "Vancouver")

 (time = "Q1" or "Q2")
 (item =" Mobile" or "Modem")
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to provide
an alternative presentation of data. Consider the following diagram that shows the pivot
operation.
UNIT – I / 103
UNIT – I / 104
OLAP vs OLTP
S.No. Data Warehouse (OLAP) Operational Database (OLTP)
1 Involves historical processing of Involves day-to-day processing.

information.
2 OLAP systems are used by knowledge OLTP systems are used by clerks,
workers such as executives, managers and DBAs, or database professionals.
analysts.
3 Useful in analyzing the business. Useful in running the business.
4 It focuses on Information out. It focuses on Data in.
5 Based on Star Schema, Snowflake, Based on Entity Relationship Model.

Schema and Fact Constellation Schema.
6 Contains historical data. Contains current data.
7 Provides summarized and consolidated Provides primitive and highly detailed

data. data.
8 Provides summarized and Provides detailed and flat relational

multidimensional view of data. view of data.
9 Number or users is in hundreds. Number of users is in thousands.
10 Number of records accessed is in millions. Number of records accessed is in tens.
11 Database size is from 100 GB to 1 TB Database size is from 100 MB to 1 GB.
12 Highly flexible. Provides high performance.
UNIT – I / 105

DBMS Unit 1,2 Notes - R21

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBMS Unit 1,2 Notes - R21

Uploaded by

Copyright:

Available Formats

UNIT- I

Introduction: Database systems applications, Purpose of Database Systems, view of

Introduction to Relational Model: Structure of Relational Databases, Database

Database : A database is an organized collection of data. The data is typically organized to

Fig 0.1 Database

A database-management system (DBMS) is a collection of interrelated data and a set of

1.2 Purpose of Database Systems

1.3 View of Data

Figure 1.1 The three levels of abstraction.

external/conceptual mapping so as to leave the external view unchanged.

1.3.2 Instances and Schemas

Fig 1.2 Different levels of Schemas

1.3.3 Data Models

• Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of

1.4 Database Languages

1.5 Relational Databases

EMP_ID EMP_NAME CITY PHONE_NO

1 Kristen Washington 7289201223

2 Anna Franklin 9378282882

3 Jackson Bristol 9264783838

4 Kellan California 7254728346 UNIT – I / 13

1.6 Database Design

1.6.1 Database Design for a University Organization

1.8 Transaction Management

1.9 Database Architecture

Fig : 1.5 Two-tier and three-tier architectures

Fig 1.6 System structure.

1.11 Specialty Databases

1.12 Database Users and Administrators

1.13 Structure of Relational Databases

Third table, prereq, which stores the prerequisite courses for

In general, a row in a table represents a relationship among a set of values.

A database schema can be divided broadly into two categories −

Difference between Schema and Instance

It is the collection of information stored in

Data in instances can be changed using

Does not change Frequently. Changes Frequently.

Student_Number Student_Name Student_Phone Subject_Number

The Super Keys in <Student> table are :

 The Candidate Key in <Student> table is {Student_Number} or {Student_Phone}

 The value of candidate key must always be unique.

 The Primary Key in <Student> table is {Student_Number}

 The value of primary key must always be unique.

 The value of primary key must be assigned when inserting a record.

 A relation is allowed to have only one primary key.

1) The data types of TWO columns must be same

EX : SQL> create table department(deptno number(2) primary key,

SQL > create table emp(empno number(6) primary key,

Emp_no Dependent_name Relation

The Composite Key in <Enroll> table is {Student_Number, Subject_Number}

10. Secondary Key

The Secondary Key in <Student> table is {Student_Phone}

1.16 Schema Diagrams

1.18 Relational Operations

Select Operation (σ) :

σsubject = "database" and price = "450" or year > "2010"(Books)

Let us consider the following schemas

It projects column(s) that satisfy a given predicate.

∏subject, author (Books)

Union Operation (∪) :

For Union operation to be valid, the following conditions must hold

 r, and s must have the same number of attributes.

 Attribute domains must be compatible.

 Duplicate tuples are automatically eliminated.