Professional Documents
Culture Documents
Introduction
Data is unprocessed facts and figures without any added interpretation or analysis. i.e Data is
raw, unorganized facts that need to be processed. Data can be something simple and seemingly
random and useless until it is organized.
Information is data that has been interpreted so that it has meaning for the user. i.e When data is
processed, organized, structured or presented in a given context so as to make it useful, it is
called information.
Knowledge is a combination of information, experience and insight that may benefit the
individual or the organization.
UNIT – I / 1
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
File system: A file system is the way in which files are named and where they are placed
logically for storage and retrieval. Every Operating System having its own file systems in which
files are placed somewhere in a hierarchical (tree) structure. A file is placed in a directory (folder
in Windows) or subdirectory at the desired place in the tree structure. A collection of individual
files accessed by applications programs.
The types of DBMS are entirely dependent upon how the database is structured by that particular
DBMS.
UNIT – I / 2
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
1 Hierarchical DBMS 2 Network DBMS 3 Relational DBMS 4 Object-oriented DBMS
Advantages of DBMS
Due to its centralized nature, the database system can overcome the disadvantages of the
file system-based system
1. Data independency: Application program should not be exposed to details of data
representation and storage. DBMS provides the abstract view that hides these details.
2. Efficient data access: DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently.
3. Data integrity and security: Data is accessed through DBMS, it can enforce integrity
constraints.
4. Data Administration: When users share data, centralizing the data is an important task,
Experience professionals can minimize data redundancy and perform fine tuning which reduces
retrieval time.
5. Concurrent access and Crash recovery: DBMS schedules concurrent access to the data.
DBMS protects user from the effects of system failure.
6. Reduced application development time: DBMS supports important functions that are
common to many applications.
Functions of DBMS
Data Definition: The DBMS provides functions to define the structure of the data in the
application.
Data Manipulation: Once the data structure is defined, data needs to be inserted, modified or
deleted. These functions which perform these operations are part of DBMS.
Data Security & Integrity: The DBMS contains modules which handle the security and
integrity of data in the application.
Data Recovery and Concurrency: Recovery of the data after system failure and concurrent
access of records by multiple users is also handled by DBMS.
Data Dictionary Maintenance: Maintaining the data dictionary which contains the data
definition of the application is also one of the functions of DBMS.
Performance: Optimizing the performance of the queries is one of the important functions of
DBMS
UNIT – I / 3
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The Components of a Database
The basic structure for storing data in a database is a table. The table is the whole
collection of information. In a table, data is entered in rows. Each row is known as a record. A
record includes all of the pieces of information related to one individual entry in your database.
In a record, the name for each category or each piece of information that makes up the
record is known as a field The actual information you type into each cell is called the data.
Difference between DBMS and File-processing system:
UNIT – I / 4
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
1.1 Database-System Applications
Databases are widely used. Here are some representative applications:
• Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting
information.
◦ Human resources: For information about employees, salaries, payroll taxes, and
benefits, and for generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of
items in factories, inventories of items in warehouses and stores, and orders for items.
◦Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
• Banking and Finance
◦ Banking: For customer information, accounts, loans, and banking transactions.
◦ Credit card transactions: For purchases on credit cards and generation of monthly
statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds; also for storing real-time market data to enable
online trading by customers and automated trading by the firm.
• Universities: For student information, course registrations, and grades
• Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.
Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication
networks.
UNIT – I / 5
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
System programmers wrote these application programs to meet the needs of the university. New
application programs are added to the system as the need arises. As a result, the university
creates a new department and creates new permanent files (or adds information to existing files)
to record information about all the instructors in the department, students in that major, course
offerings, degree requirements, etc. The university may have to write new application programs
to deal with rules specific to the new major. New application programs may also have to be
written to handle new rules in the university. Thus, as time goes by, the system acquires more
files and more application programs.
This typical file-processing system is supported by a conventional operating system. The
system stores permanent records in various files, and it needs different application programs to
extract records from, and add records to, the appropriate files. Before database management
systems (DBMSs) were introduced, organizations usually stored information in such systems.
Keeping organizational information in a file-processing system has a number of major
disadvantages:
• Data redundancy and inconsistency. Since different programmers create the files and
application programs over a long period, the various files are likely to have different structures
and the programs may be written in several programming languages. Moreover, the same
information may be duplicated in several places (files). This redundancy leads to higher storage
and access cost. In addition, it may lead to data inconsistency; that is, the various copies of the
same data may no longer agree.
• Difficulty in accessing data. The conventional file-processing environments do not allow
needed data to be retrieved in a convenient and efficient manner. More responsive data-retrieval
systems are required for general use.
• Data isolation. Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
• Integrity problems. The data values stored in the database must satisfy certain types of
consistency constraints.
• Atomicity problems. A computer system, like any other device, is subject to failure. In many
applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that
existed prior to the failure. In all transactions activities must be atomic—it must happen in its
entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system.
UNIT – I / 6
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
• Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. Indeed, today,
the largest Internet retailers may have millions of accesses per day to their data by shoppers. In
such an environment, interaction of concurrent updates is possible and may result in inconsistent
data.
• Security problems. Not every user of the database system should be able to access all the data.
Application programs are added to the file-processing system in an ad hoc manner, enforcing
such security constraints is difficult.
• Logical level. The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the
entire database in terms of a small number of relatively simple structures. Although
implementation of the simple structures at the logical level may involve complex physical-level
structures, the user of the logical level does not need to be aware of this complexity. This is
referred to as physical data independence. Database administrators, who must decide what
UNIT – I / 7
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
information to keep in the database, use the logical level of abstraction. It is also called
Conceptual Level.
• View level. The highest level of abstraction describes only part of the entire database. Even
though the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database. Many users of the database system do not need all this
information; instead, they need to access only a part of the database. The view level of
abstraction exists to simplify their interaction with the system. The system may provide many
views for the same database.
UNIT – I / 8
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Logical Data Independence : Logical data independence indicates that the conceptual
schema can be changed without affecting the existing external schemas. The change would be
absorbed by the mapping between the external and conceptual levels. Logical data independence
also insulates application programs from operations such as combining two records into one or
splitting an existing record into two or more records. This would require a change in the
The Logical data independence is difficult to achieve than physical data independence as
it requires the flexibility in the design of database.
Database schema skeleton structure of and it represents the logical view of entire
database. i.e The overall design of the database is called the database schema. It tells about how
the data is organized and how relation among them is associated. It formulates all database
constraints that would be put on data in relations, which resides in database.
A database schema defines its entities and the relationship among them. Database schema
is a descriptive detail of the database. All these activities are done by database designer to help
programmers in order to give some ease of understanding all aspect of database.
Database schema does not contain any data or information. Database schema can be
divided broadly in two categories:
Physical Database Schema: This schema pertains to the actual storage of data and its form of
storage like files, indices etc. It defines the how data will be stored in secondary storage etc. It
describes the database design at the physical Level.
Logical Database Schema: This defines all logical constraints that need to be applied on data
stored. It defines tables, views and integrity constraints etc. It describes the database design at the
logical level.
UNIT – I / 9
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
A database may also have several schemas at the view level, sometimes called sub schemas, that
describe different views of the database.
Database Instance
The collection of information stored in the database at a particular moment is called an instance.
Database instances, is a state of operational database with data at any given time. This is a
snapshot of database. Database instances tend to change with time. DBMS ensures that its every
instance (state) must be a valid state by keeping up to all validation, constraints and condition
that database designers has imposed or it is expected from DBMS itself.
UNIT – I / 10
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
fixed-format records of several types. Each table contains records of a particular type. Each
record type defines a fixed number of fields, or attributes. The columns of the table correspond to
the attributes of the record type. The relational data model is the most widely used data model,
and a vast majority of current database systems are based on the relational model.
• Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has
become the dominant software-development methodology. This led to the development of an
object-oriented data model that can be seen as extending the E-R model with notions of
encapsulation, methods (functions), and object identity. The object-relational data model
combines features of the object-oriented data model and relational data model.
• Semistructured Data Model. The semistructured data model permits the specification of data
where individual data items of the same type may have different sets of attributes. This is in
contrast to the data models mentioned earlier, where every data item of a particular type must
have the same set of attributes. The Extensible Markup Language (XML) is widely used to
represent semistructured data.
The network data model and the hierarchical data model preceded the relational data
model.
UNIT – I / 11
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database
There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what
data are needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs.
A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language.
There are a number of database query languages in use, either commercially or
experimentally. The most widely used query language is SQL (Structured Query Language).
1.4.2 Data-Definition Language
A database schema specified by a set of definitions expressed by a special language
called a data-definition language (DDL). The DDL is also used to specify additional properties
of the data.
We specify the storage structure and access methods used by the database system by a set
of statements in a special type of DDL called a data storage and definition language. These
statements define the implementation details of the database schemas, which are usually hidden
from the users.
The data values stored in the database must satisfy certain consistency constraints.
The DDL provides facilities to specify such constraints. The database system checks these
constraints every time the database is updated.
• Domain Constraints. A domain of possible values must be associated with every attribute (for
example, integer types, character types, date/time types). Domain constraints are the most
elementary form of integrity constraint. They are tested easily by the system whenever a new
data item is entered into the database.
• Referential Integrity. There are cases where we wish to ensure that a value that appears in one
relation for a given set of attributes also appears in a certain set of attributes in another relation
(referential integrity). Database modifications can cause violations of referential integrity. When
UNIT – I / 12
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
a referential-integrity constraint is violated, the normal procedure is to reject the action that
caused the violation.
• Assertions. An assertion is any condition that the database must always satisfy. Domain
constraints and referential-integrity constraints are special forms of assertions. When an assertion
is created, the system tests it for validity. If the assertion is valid, then any future modification to
the database is allowed only if it does not cause that assertion to be violated.
• Authorization. We may want to differentiate among the users as far as the type of access they
are permitted on various data values in the database. These differentiations are expressed in
terms of authorization, the most common being: read authorization, which allows reading, but
not modification, of data; insert authorization, which allows insertion of new data, but not
modification of existing data; update authorization, which allows modification, but not
deletion, of data; and delete authorization, which allows deletion of data. We may assign the
user all, none, or a combination of these types of authorization.
The output of the DDL is placed in the data dictionary, which contains metadata—that is, data
about data.
Syntax :
Insert into <table_name> (column list) values (column values);
1.5.3 Data Definition Language
SQL provides a rich DDL that allows one to define tables, integrity constraints, assertions, etc.
For instance, the following SQL DDL statement defines the department table:
create table department
(dept_name char (20),
building char (15),
budget numeric (12,2));
Execution of the above DDL statement creates the department table with three columns: dept
name, building, and budget, each of which has a specific data type associated with it. The DDL
statement updates the data dictionary, which contains metadata. The schema of a table is an
example of metadata.
1.5.4 Database Access from Application Programs
SQL is not as powerful as a universal Turing machine; that is, there are some computations that
are possible using a general-purpose programming language but are not possible using SQL.
SQL also does not support actions such as input from users, output to displays, or
communication over the network. Such computations and actions must be written in a host
language, such as C, C++, or Java, with embedded SQL queries that access the data in the
database. Application programs are programs that are used to interact with the database in this
UNIT – I / 14
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
fashion. To access the database, DML statements need to be executed from the host language.
There are two ways to do this:
• By providing an application program interface (set of procedures) that can be used to send
DML and DDL statements to the database and retrieve the results.
The Open Database Connectivity (ODBC) standard for use with the C language is a
commonly used application program interface standard. The Java Database Connectivity
(JDBC) standard provides corresponding features to the Java language.
• By extending the host language syntax to embed DML calls within the host language
program. Usually, a special character prefaces DML calls, and a preprocessor, called the
DML pre-compiler, converts the DML statements to normal procedure calls in the host
language.
The logical model concentrates on the data requirements and the data to be stored independent
of physical considerations. It does not concern itself with how the data will be stored or where it
will be stored physically.
The physical data design model involves translating the logical DB design of the database onto
physical media using hardware resources and software systems such as database management
systems (DBMS).
UNIT – I / 15
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
organization. The description that arises from this design phase serves as the basis for specifying
the conceptual structure of the database. Here are the major characteristics of the university.
The university is organized into departments. Each department is identified by a unique
name (dept name), is located in a particular building, and has a budget.
Each department has a list of courses it offers. Each course has associated with it a course
id, title, dept name, and credits, and may also have associated prerequisites.
Instructors are identified by their unique ID. Each instructor has name, associated
department (dept name), and salary.
Students are identified by their unique ID. Each student has a name, an associated major
department (dept name), and tot credits.
The university maintains a list of classrooms, specifying the name of the building, room
number, and room capacity.
The university maintains a list of all classes (sections) taught. Each section is identified
by a course id, sec id, year, and semester, and has associated with it a semester, year,
building, room number, and time slot id (the time slot when the class meets).
The department has a list of teaching assignments specifying, for each instructor, the
sections the instructor is teaching.
The university has a list of all student course registrations, specifying, for each student,
the courses and the associated sections that the student has taken.
1.6.2 The Entity-Relationship Model
The entity-relationship (E-R) data model uses a collection of basic objects, called
entities, and relationships among these objects. An entity is a “thing” or “object” in the real
world that is distinguishable from other objects. For example, each person is an entity, and bank
accounts can be considered as entities.
Entities are described in a database by a set of attributes. For example, the attributes dept
name, building, and budget may describe one particular department in a university, and they
form attributes of the department entity set. Similarly, attributes ID, name, and salary may
describe an instructor entity. The extra attribute ID is used to identify an instructor uniquely. A
unique instructor identifier must be assigned to each instructor.
A relationship is an association among several entities. For example, a member
relationship associates an instructor with his/her department. The set of all entities of the same
UNIT – I / 16
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
type and the set of all relationships of the same type are termed an entity set and relationship
set.
The overall logical structure (schema) of a database can be expressed graphically by an
entity-relationship (E-R) diagram. There are several ways in which to draw these diagrams. One
of the most popular is to use the Unified Modeling Language (UML).
Entity sets are represented by a rectangular box with the entity set name in the header and
the attributes listed below it.
Relationship sets are represented by a diamond connecting a pair of related entity sets.
The name of the relationship is placed inside the diamond.
The E-R diagram indicates that there are two entity sets, instructor and department, with
attributes as outlined earlier. The diagram also shows a relationship member between instructor
and department. In addition to entities and relationships, the E-R model represents certain
constraints to which the contents of a database must conform. One important constraint is
mapping cardinalities, which express the number of entities to which another entity can be
associated via a relationship set.
1.6.3 Normalization
Another method for designing a relational database is to use a process commonly known as
normalization. The goal is to generate a set of relation schemas that allows us to store
information without unnecessary redundancy. The approach is to design schemas that are in an
appropriate normal form. The most common approach is to use functional dependency.
UNIT – I / 17
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
1.7 Data Storage and Querying
A database system is partitioned into modules that deal with each of the responsibilities
of the overall system. The functional components of a database system can be broadly divided
into the storage manager and the query processor components.
The storage manager is important because databases typically require a large amount of
storage space. Corporate databases range in size from hundreds of gigabytes to, for the largest
databases, terabytes of data. Since the main memory of computers cannot store this much
information, the information is stored on disks. Data are moved between disk storage and main
memory as needed. Since the movement of data to and from disk is slow relative to the speed of
the central processing unit, it is imperative that the database system structure the data so as to
minimize the need to move data between disk and main memory.
The query processor is important because it helps the database system to simplify and
facilitate access to data. The query processor allows database users to obtain good performance
while being able to work at the view level and not be burdened with understanding the physical-
level details of the implementation of the system. It is the job of the database system to translate
updates and queries written in a nonprocedural language, at the logical level, into an efficient
sequence of operations at the physical level.
1.7.1 Storage Manager
The storage manager is the component of a database system that provides the interface
between the low-level data stored in the database and the application programs and queries
submitted to the system. The storage manager is responsible for the interaction with the file
manager. The raw data are stored on the disk using the file system provided by the operating
system. The storage manager translates the various DML statements into low-level file-system
commands.
Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database.
The storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
UNIT – I / 18
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
• Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory.
The storage manager implements several data structures as part of the physical system
implementation:
• Data files, which store the database itself.
• Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
• Indices, which can provide fast access to data items. Like the index in this textbook, a database
index provides pointers to those data items that hold a particular value.
1.7.2 The Query Processor
The query processor components include:
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. The DML
compiler also performs query optimization; that is, it picks the lowest cost evaluation plan from
among the alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
UNIT – I / 19
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Fig : 1.4 ACID Properties
Ensuring the atomicity and durability properties is the responsibility of the database system itself
—specifically, of the recovery manager. In the absence of failures, all transactions complete
successfully, and atomicity is achieved easily. However, because of various types of failure, a
transaction may not always complete its execution successfully. Thus, the database must be
restored to the state in which it was before the transaction in question started executing. The
database system must therefore perform failure recovery, that is, detect system failures and
restore the database to the state that existed prior to the occurrence of the failure.
Finally, when several transactions update the database concurrently, the consistency of
data may no longer be preserved, even though each individual transaction is correct. It is the
responsibility of the concurrency-control manager to control the interaction among the
concurrent transactions, to ensure the consistency of the database.
UNIT – I / 20
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The transaction manager consists of the concurrency-control manager and the recovery
manager.
In a three-tier architecture, the client machine acts as merely a front end and does not
contain any direct database calls. Instead, the client end communicates with an application
server, usually through a forms interface. The application server in turn communicates with a
database system to access data. The business logic of the application, which says what actions to
carry out under what conditions, is embedded in the application server, instead of being
UNIT – I / 21
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
distributed across multiple clients. Three-tier applications are more appropriate for large
applications, and for applications that run on the World Wide Web.
UNIT – I / 22
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
A database system is partitioned into modules that deal with each of the responsibilities
of the overall system. The functional components of a database system can be broadly divided
into the storage manager and the query processor components.
The storage manager is important because databases typically require a large amount of
storage space. Corporate databases range in size from hundreds of gigabytes to, for the largest
databases, terabytes of data. Since the main memory of computers cannot store this much
information, the information is stored on disks. Data are moved between disk storage and main
memory as needed. Since the movement of data to and from disk is slow relative to the speed of
the central processing unit, it is imperative that the database system structure the data so as to
minimize the need to move data between disk and main memory.
The query processor is important because it helps the database system to simplify and
facilitate access to data. The query processor allows database users to obtain good performance
while being able to work at the view level and not be burdened with understanding the physical-
level details of the implementation of the system. It is the job of the database system to translate
updates and queries written in a nonprocedural language, at the logical level, into an efficient
sequence of operations at the physical level.
1.9.1 Storage Manager
The storage manager is the component of a database system that provides the interface
between the low-level data stored in the database and the application programs and queries
submitted to the system. The storage manager is responsible for the interaction with the file
manager. The raw data are stored on the disk using the file system provided by the operating
system. The storage manager translates the various DML statements into low-level file-system
commands.
Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database.
The storage manager components include:
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
UNIT – I / 23
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory.
The storage manager implements several data structures as part of the physical system
implementation:
• Data files, which store the database itself.
• Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
• Indices, which can provide fast access to data items. Like the index in this textbook, a database
index provides pointers to those data items that hold a particular value.
1.9.2 The Query Processor
The query processor components include:
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. The DML
compiler also performs query optimization; that is, it picks the lowest cost evaluation plan from
among the alternatives.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
UNIT – I / 24
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
1.10 Data Mining and Information Retrieval
The term data mining refers loosely to the process of semiautomatically analyzing large
databases to find useful patterns. Like knowledge discovery in artificial intelligence (also called
machine learning) or statistical analysis, data mining attempts to discover rules and patterns
from data. However, data mining differs from machine learning and statistics in that it deals with
large volumes of data, stored primarily on disk. That is, data mining deals with “knowledge
discovery in databases.”
Some types of knowledge discovered from a database can be represented by a set of
rules. Large companies have diverse sources of data that they need to use for making business
decisions. To execute queries efficiently on such diverse data, companies have built data
warehouses. Data warehouses gather data from multiple sources under a unified schema, at a
single site. Thus, they provide the user a single uniform interface to data.
Textual data, too, has grown explosively. Textual data is unstructured, unlike the rigidly
structured data in relational databases. Querying of unstructured textual data is referred to as
information retrieval.
Information retrieval systems have much in common with database systems—in
particular, the storage and retrieval of data on secondary storage.
UNIT – I / 25
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Semantic Data Model
Functional Data Model
1.11.2 Semi-Structured Data Models: These data models were planned as an evolution of the
relational data model. It is a database model in which there is no partition between the data and
the schema. It allows the representation of data with a workable structure. In this data, model
items can have different numbers of attributes but one item may contain items with different
structures. It is a data model where the data values and the schema components synchronize
properly. There are some characteristics of Semi-structured Data Models:
One can change the schema easily.
It gives a workable format to exchange the data between different types of databases.
Data transfer format may be transferable.
UNIT – I / 26
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
• Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework. Among these applications are computer-
aided design systems, knowledgebase and expert systems, systems that store data with complex
data types (for example, graphics data and audio data), and environment-modeling systems.
1.12.2 Database Administrator
A person who has such central control over the system is called a database administrator
(DBA). The functions of a DBA include:
• Schema definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.
• Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance activities
are:
◦ Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
◦ Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
◦ Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.
UNIT – I / 27
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Introduction to the Relational Model
A database is a collection of one or more ‘relations’, where each relation is a table with rows
and columns. It is the primary data model for commercial data processing applications. The
major advantages of the relational model over the older data models are,
1.It is simple and elegant.
2.simple data representation.
3.The ease with which even complex queries can be expressed.
The main construct for representing data in the relational model is a ‘relation’. A relation
consists of
1.Relation Schema.
2.Relation Instance.
1.Relation Schema: The relation schema describes the column heads for the table. The schema
specifies the relation’s name, the name of each field (column, attribute) and the ‘domain’ of each
field. A domain is referred to in a relation schema by the domain name and has a set of
associated values.
Example:
Student information in a university database to illustrate the parts of a relation schema.
Students (Sid: string, name: string, login: string, age: integer, gross: real)
This says that the field named ‘sid’ has a domain named ‘string’.
The set of values associated with domain ‘string’ is the set of all character strings.
2.Relation Instance: This is a table specifying the information. An instance of a relation is a set
of ‘tuples’, also called ‘records’, in which each tuple has the same number of fields as the
relation schemas.
A relation instance can be thought of as a table in which each tuple is a row and all rows have the
same number of fields. The relation instance is also called as ‘relation’. Each relation is
defined to be a set of unique tuples or rows.
UNIT – I / 28
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Example:
This example is an instance of the students relation, which consists 4 tuples and 5 fields. No two
rows are identical.
Degree: The number of fields is called as ‘degree’. This is also called as ‘arity’.
Cardinality: The cardinality of a relation instance is the number of tuples in it.
Example: In the above example, the degree of the relation is 5 and the cardinality is 4.
Relational database: It is a collection of relations with distinct relation names.
Relational database schema: It is the collection of schemas for the relations in the database.
Instance:
An instance of a relational database is a collection of relation instances, one per relation schema
in the database schema. Each relation instance must satisfy the domain constraints in its schema.
Instance: An instance of a relational database is a collection of relation instances, one per
relation schema in the database schema. Each relation instance must satisfy the domain
constraints in its schema.
UNIT – I / 29
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
the value of the column ID, while each course is identified by the value of the
column course_id.
Thus, a row in the prereq table indicates that two courses are related in the
sense that one course is a prerequisite for the other. As another example, we
consider the table instructor, a row in the table can be thought of as representing
the relationship between a specified ID and the corresponding values for name,
dept_name, and salary values.
UNIT – I / 30
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
mathematically by an n-tuple of values, i.e., a tuple with n values, which
corresponds to a row in a table.
Thus, in the relational model the term relation is used to refer to a table, while
the term tuple is used to refer to a row. Similarly, the term attribute refers to a column of a
table.
UNIT – I / 31
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
1.14 Database Schema
A database schema is the skeleton structure that represents the logical view of the entire
database. It defines how the data is organized and how the relations among them are associated.
It formulates all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database, which can be depicted by means of schema diagrams. It’s the database
designers who design the schema to help programmers understand the database and make it
useful.
UNIT – I / 32
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton
of database. It is designed when the database doesn't exist at all. Once the database is
operational, it is very difficult to make any changes to it. A database schema does not contain
any data or information.
A database instance is a state of operational database with data at any given time. It contains a
snapshot of the database. Database instances tend to change with time. A DBMS ensures that
its every instance (state) is in a valid state, by diligently following all the validations, constraints,
and conditions that the database designers have imposed.
Schema Instance
Defines the basic structure of the database i.e It is the set of Information stored at a
how the data will be stored in the database. particular time.
UNIT – I / 33
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
1.15 Keys
A key refers to an attribute or a set of attributes that help us identify a row (or tuple) uniquely
in a table (or relation). A key is also used when we want to establish relationships between the
different columns and tables of a relational database. The individual values present in a key are
commonly referred to as key values. There are following 10 important keys in DBMS :
1. Super key 2. Candidate key 3. Primary key 4. Alternate key 5. Foreign key
1. Partial key 7. Composite key 8. Unique key 9. Surrogate key 10. Secondary key
1. Super Key
A super key is a set of attributes that can identify each tuple uniquely in the given relation.
A super key is not restricted to have any specific number of attributes.
Thus, a super key may consist of any number of attributes.
<STUDENT> Table
1 Ruthwiz 6615927284 10
2 Sahasra 6583654865 20
3 Karthikeya 4647567463 10
<SUBJECT> Table <ENROLL> Table
UNIT – I / 34
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Student_Number Subject_Number
Subject_Number Subject_Name Subject_Instructor
1 10
10 DBMS Korth
2 20
20 Algorithms Cormen
3 10
30 Algorithms Leiserson
{Student_Number}
{Student_Phone}
{Student_Number,Student_Name}
{Student_Number,Student_Phone}
{Student_Number,Subject_Number}
{Student_Phone,Student_Name}
{Student_Phone,Subject_Number}
{Student_Number,Student_Name,Student_Phone}
{Student_Number,Student_Phone,Subject_Number}
{Student_Number,Student_Name,Subject_Number}
{Student_Phone,Student_Name,Subject_Number}
The Super Keys in <Subject> table are :
{Subject_Number}
{Subject_Number,Subject_Name}
{Subject_Number,Subject_Instructor}
{Subject_Number,Subject_Name,Subject_Instructor}
{Subject_Name,Subject_Instructor}
NOTE : All the attributes in a super key are definitely sufficient to identify each tuple uniquely
in the given relation but all of them may not be necessary.
UNIT – I / 35
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
2. Candidate Key
A minimal super key is called as a candidate key. (or) A set of minimal attribute(s) that can
identify each tuple uniquely in the given relation is called as a candidate key.
3. Primary Key
A primary key, also called a primary keyword, is a key in a relational database that is
unique for each record. It is a unique identifier; it is used to prevent NULL values and also
duplicate values into a particular column of table. A table must always have one and only one
primary key.
A primary key is a candidate key that the database designer selects while designing the
database.
Ex : SQL> create table student(Student_Number number(4 ) primary key,
Student_Name varchar2(20),
Student_Phone number(10) ,
Subject_Number number(10));
UNIT – I / 36
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
NOTE :
The value of primary key can never be NULL.
The values of primary key can never be changed i.e. no updation is possible.
4. Alternate Key
The candidate key other than the primary key is called an alternate key.
For Example, STUD_NO, as well as STUD_PHONE both, are candidate keys for relation
STUDENT but STUD_PHONE will be an alternate key (only one out of many candidate
keys). It is a secondary key. i.e All the keys which are not primary keys are called Alternate
keys.
UNIT – I / 37
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
5. Foreign Key
A foreign key means the values in one table must also appear in another table. The foreign key
in the child table will generally reference a primary key in the parent table. The referencing
table is called the child table & referenced table is called the parent table.
In order to provide Referential Integrity, the conditions must exist
{Subject_Number} is the Foreign Key of <Student> table and Primary key of <Subject> table
NOTE :
Foreign key references the primary key of the table.
Foreign key can take only those values which are present in the primary key of the
referenced relation.
Foreign key can take the NULL value.
There is no restriction on a foreign key to be unique.
In fact, foreign key is not unique most of the time.
Referenced relation may also be called as the master table or primary table.
Referencing relation may also be called as the foreign table.
UNIT – I / 38
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
6. Partial Key
Partial key is a key using which all the records of the table can not be identified uniquely.
However, a bunch of related tuples can be selected from the table using the partial key.
Ex : Consider the following schema-
Department ( Emp_no , Dependent_name , Relation )
E1 Suman Mother
E1 Ajay Father
E2 Vijay Father
E2 Ankush Son
Here, using partial key Emp_no, we can not identify a tuple uniquely but we can select a bunch
of tuples from the table.
7. Composite Key
If any single attribute of a table is not capable of being the key i.e it cannot identify a row
uniquely, then we combine two or more attributes to form a key. This is known as a composite
key.
8. Unique Key
Unique is used for preventing duplicate values into a column of a table. But this constraint can
allow NULL values. Once assigned, its value can not be changed i.e. it is non-updatable.
create table student(Student_Number number(4 ) primary key,
Student_Name varchar2(20),
Student_Phone number(10) unique,
Subject_Number number(10));
UNIT – I / 39
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
9. Surrogate Key
Surrogate key is a key with the following properties-
Itis unique for all the records of the table.
It is updatable.
It can not be NULL i.e. it must have some value.
Example-
Student_Phone of student, where every student owns a mobile phone.
UNIT – I / 40
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
1.17 Relational Query Languages
A query language is a language in which a user requests information from the database. These
languages are usually on a level higher than that of a standard programming language. Query
languages can be categorized as either procedural or nonprocedural. In a procedural language,
the user instructs the system to perform a sequence of operations on the database to compute the
desired result. In a nonprocedural language, the user describes the desired information without
giving a specific procedure for obtaining that information.
Query languages used in practice include elements of both the procedural and the
nonprocedural approaches. There are a number of “pure” query languages: The relational algebra
is procedural, whereas the tuple relational calculus and domain relational calculus are
nonprocedural. These query languages are terse and formal, lacking the “syntactic sugar” of
UNIT – I / 41
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
commercial languages, but they illustrate the fundamental techniques for extracting data from the
database. The relational algebra consists of a set of operations that take one or two relations as
input and produce a new relation as their result. The relational calculus uses predicate logic to
define the result desired without giving any specific algebraic procedure for obtaining that result.
from a relation
Notation : σp (r)
UNIT – I / 42
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula
which may use connectors like and, or, and not. These terms may use relational operators like −
=, ≠, ≥, < , >, ≤.
Ex 1 : To select those tuples of the loan relation where the branch is “Perryridge,”
Ans : σ branch-name = “Perryridge” (loan)
Ex 2 : Selects tuples from books where subject is 'database'.
σsubject = "database"(Books)
Ex 3 : Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450"(Books)
Ex 4 : Selects tuples from books where subject is 'database' and 'price' is
450 or those books published after 2010.
UNIT – I / 43
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Project Operation (∏) : Projection is the operation of selecting certain attributes from a
relation R to form a new relation. i.e The projection operator π allows us to extract
columns from a relation
Ex 1 : Selects columns named as subject and author from the relation Books
Ex1 : Select the names of the authors who have either written a book or an article or both.
UNIT – I / 44
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Set Difference (−)
R-S returns a relation instance containing all tuples that occur in R but not in S. The
relations R and S must be union-compatible, and the schema of the result is defined to be
identical to the schema of R.
Notation : r − s
Ex2 : Provides the name of authors who have written books but not articles
∏ author (Books) − ∏ author (Articles)
Notation : r Χ s
The rename operation allows us to rename the output relation. 'rename' operation is denoted with
Set intersection
Assignment
Natural join
Set intersection : R ∩ S' returns a relation instance containing all tuples that occur in both R and
S. The relations Rand S must be union-compatible.
Assignment : It provides a convenient way to express complex queries. Assignment must always
be made to a temporary relation variable.
UNIT – I / 45
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Joins : The join operation is one of the most useful operations in relational algebra and the
most commonly used way to combine information from two or more relations. Although a join
can be defined as a cross-product followed by selections and projections. It is denoted by ⋈.
Condition Join or Theta Join : Theta join combines tuples from different relations provided
they satisfy the theta condition. The join condition is denoted by the symbol θ.
Notation : R1 ⋈θ R2
or
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the
attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
Student
SID Name Std
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Ex : STUDENT ⋈Student.Std = Subject.Class SUBJECT
UNIT – I / 46
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Student_detail
SID Name Std Class Subject
101 Alex 10 10 Math
101 Alex 10 10 English
102 Maria 11 11 Music
102 Maria 11 11 Sports
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above
example corresponds to equijoin.
Natural Join ( ⋈)
It is a binary operator that is written as (R ⋈S) where R and S are relations. The result of the
natural join is the set of all combinations of tuples in R and S that are equal on their common
attribute names. For an example consider the tables Employee and Dept and their natural join:
Employee Dept
Name EmpId DeptName DeptName Manager Employee ⋈ Dept
Harry 3415 Finance Finance George DeptNam
Name EmpId Manager
Sally 2241 Sales Sales Harriet e
George 3401 Finance Production Charles Harry 3415 Finance George
Harriet 2202 Sales Sally 2241 Sales Harriet
George 3401 Finance George
Harriet 2202 Sales Harriet
Note : The Default join operation used in the join is Natural Join. But Natural Join operation
results in some loss of Information.
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only
those tuples with matching attributes and the rest are discarded in the resulting relation.
Therefore, we need to use outer joins to include all the tuples from the participating relations in
the resulting relation. There are three kinds of outer joins − left outer join, right outer join,
and full outer join.
UNIT – I / 47
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Left Outer Join(R S)
All the tuples from the Left relation, R, are included in the resulting relation. If there are
tuples in R without any matching tuple in the Right relation S, then the S-attributes of the
resulting relation are made NULL.
All the tuples from the Right relation, S, are included in the resulting relation. If there are tuples
in S without any matching tuple in R, then the R-attributes of resulting relation are made NULL.
All the tuples from both participating relations are included in the resulting relation. If there are
no matching tuples for both relations, their respective unmatched attributes are made NULL.
Employee
Works
UNIT – I / 48
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Rabbit Mesa 1300
UNIT – I / 49
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
.Result of Employee works
Employee_name Street City Branch Name Salary
Division : The division is a binary operation that is written as R ÷ S. The result consists of the
restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the
header of S, for which it holds that all their combinations with tuples in S are present in R.
Ex 1 :
Ex 2 :
UNIT – I / 50
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
More Examples of Algebra Queries
UNIT- II
UNIT – I / 51
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Introduction to SQL: Overview of the SQL Query Language, SQL Data Definition,
Basic Structure of SQL Queries, Additional Basic Operations, Set Operations, Null
Values, Aggregate Functions, Nested Sub-queries, Modification of the Database.
Intermediate SQL: Joint Expressions, Views, Transactions, Integrity Constraints,
SQL Data types and schemas, Authorization.
Advanced SQL: Accessing SQL from a Programming Language, Functions and
Procedures, Triggers, Recursive Queries, OLAP, Formal relational query languages.
UNIT – I / 52
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Embedded SQL and dynamic SQL: Embedded and dynamic SQL define how SQL
statements can be embedded within general-purpose programming languages, such as C,
C++, and Java.
Authorization: The SQL DDL includes commands for specifying access rights to
relations and views.
UNIT – I / 53
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The char data type stores fixed length strings.
For example, an attribute A of type char(10). If we store a string “PBRVITS” in this attribute, 3
spaces are appended to the string to make it 10 characters long.
In contrast, if attribute B were of type varchar(10), and we store “PBRVITS” in attribute B, no
spaces would be added. When comparing two values of type char, if they are of different lengths
extra spaces are automatically added to the shorter one to make them the same size, before
comparison.
When comparing a char type with a varchar type, one may expect extra spaces to be
added to the varchar type to make the lengths equal, before comparison; however, this may or
may not be done, depending on the database system. As a result, even if the same value
“PBRVITS” is stored in the attributes A and B above, a comparison A=B may return false. We
recommend you always use the varchar type instead of the char type to avoid these problems.
2.2.2 Basic Schema Definition
We define an SQL relation by using the create table command. The following command
creates a relation department in the database.
create table department
(dept_name varchar (20),
building varchar (15),
budget numeric (12,2),
primary key (dept name));
The relation created above has three attributes, dept_name, which is a character string of
maximum length 20, building, which is a character string of maximum length 15, and budget,
which is a number with 12 digits in total, 2 of which are after the decimal point. The create table
command also specifies that the dept_name attribute is the primary key of the department
relation.
UNIT – I / 54
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The general form of the create table command is:
create table r
(A1 D1,
A2 D2,
...,
An Dn,
<integrity-constraint 1>
...,
<integrity-constraint k> );
where r is the name of the relation, each Ai is the name of an attribute in the schema of relation r,
and Di is the domain of attribute Ai; that is, Di specifies the type of attribute Ai along with
optional constraints that restrict the set of allowed values for Ai.
SQL supports a number of different integrity constraints. An integrity constraint (IC) is a
condition specified on a database schema and restricts the data that can be stored in an instance
of the database. If a database instance satisfies all the integrity constraints specifies on the
database schema, it is a legal instance. A DBMS permits only legal instances to be stored in the
database. Many kinds of integrity constraints can be specified in the relational model:
Domain Constraints:
A relation schema specifies the domain of each field in the relation instance. These
domain constraints in the schema specify the condition that each instance of the relation has to
satisfy: The values that appear in a column must be drawn from the domain associated with that
column. Thus, the domain of a field is essentially the type of that field. It can be enforced using:
Check
NOT NULL.
The CHECK Constraint enables a condition to check the value being entered into a record. If the
condition evaluates to false, the record violates the constraint and isn't entered into the table.
SYNTAX:
Check <Conditional expression>
Table created.
UNIT – I / 55
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
NOT NULL
The NOT NULL constraint is a restriction placed on a column in a relational database table. It
enforces the condition that, in that column, every row of data must contain a value. It cannot be
left blank during insert or update operations. If this column is left blank, this will produce an
error message and the entire insert or update operation will fail.
Ex : SQL> create table student (rollno number(7) NOT NULL,
name varchar2(15),
branch char(4),dob date);
Note : NOT NULL constraint can be added only at Column level, but not table level.
Entity Integrity Constraints
Entity integrity is an integrity rule which states that every table must have a primary key and
that the column or columns chosen to be the primary key should be unique and not NULL
There are TWO types of Entity Integrity Constraints
1. Unique 2. primary key
Unique is used for preventing duplicate values into a column of a table. But this constraint can
allow NULL values.
A primary key, also called a primary keyword, is a key in a relational database that is unique
for each record. It is a unique identifier, it is used to prevent NULL values and also duplicate
values into a particular column of table. A table must always have one and only one primary
key.
Ex : SQL> create table student(rno number(4 ) primary key,
name varchar(20),
login varchar(20) unique,
dob date);
UNIT – I / 56
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
2) The referenced key must be Primary Key.
A newly created relation is empty initially. We can use the insert command to load data into the
relation. The Data Manipulation Language (DML) is used to retrieve, insert and modify
database information. These commands will be used by all database users during the routine
operation of the database. Let's take a brief look at the basic DML commands:
1. INSERT 2. UPDATE 3. DELETE
1. INSERT INTO: This is used to add records into a relation. These are three type of INSERT
INTO queries which are as
a) Inserting a single record
Syntax: INSERT INTO relationname(field_1,field_2,.field_n)VALUES
(data_1,data_2,........data_n);
Example: SQL>INSERT INTO student(sno,sname,class,address)VALUES
(1,’ SRI AJITH’,’B.Tech’,’KAVALI’);
b) Inserting multiple records
Syntax: INSERT INTO relation_name field_1,field_2,.....field_n) VALUES
(&data_1,&data_2,........&data_n);
Example: SQL>INSERT INTO student(sno,sname,class,address)
VALUES(&sno,’&sname’,’&class’,’&address’);
Enter value for sno: 101
Enter value for name: PREETHAM
Enter value for class: B.Tech
Enter value for name: KAVALI
2. UPDATE-SET-WHERE: This is used to update the content of a record in a relation.
Syntax: SQL>UPDATE relation name SET Field_name1=data,field_name2=data,
UNIT – I / 57
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
WHERE field_name=data;
Example: SQL>UPDATE student SET sname = ‘kumar’ WHERE sno=1;
3. DELETE-FROM: This is used to delete all the records of a relation but it will retain the
structure of that relation.
a) DELETE-FROM: This is used to delete all the records of relation.
Syntax: SQL>DELETE FROM relation_name;
Example: SQL>DELETE FROM std;
b) DELETE -FROM-WHERE: This is used to delete a selected record from a relation.
Syntax: SQL>DELETE FROM relation_name WHERE condition;
Example: SQL>DELETE FROM student WHERE sno = 2;
To remove a relation from an SQL database, we use the drop table command. The drop table
command deletes all information about the dropped relation from the database.
Syntax: drop table <relation name>;
is a more drastic action than delete from <relation name>;
We use the alter table command to add attributes to an existing relation. All tuples in the
relation are assigned null as the value for the new attribute. The form of the alter table command
is
(a)ALTER TABLE ...ADD...: This is used to add some extra fields into existing relation.
Syntax: ALTER TABLE relation_name ADD(new field_1 data_type(size), new field_2
data_type(size),..);
Example : SQL>ALTER TABLE std ADD(Address CHAR(10));
(b)ALTER TABLE...MODIFY...: This is used to change the width as well as data type of
fields of existing relations.
Syntax: ALTER TABLE relation_name MODIFY (field_1 newdata_type(Size), field_2
newdata_type(Size),....field_newdata_type(Size));
Example: SQL>ALTER TABLE student MODIFY(sname VARCHAR(10),class
VARCHAR(5));
UNIT – I / 58
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
2.3 Basic Structure of SQL Queries
The basic form of an SQL query is as follows:
SELECT [DISTINCT] select-list
FROM from-list
WHERE < Condition >
Every query must have a SELECT clause, which specifies columns to be retained in the
result, and a FROM clause, which specifies a cross-product of tables. The optional WHERE
clause specifies selection conditions on the tables mentioned in the FROM clause.
UNIT – I / 59
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
• The from-list in the FROM clause is a list of table names. A table name can be followed
by a range variable; a range variable is particularly useful when the same table name appears
more than once in the from-list.
• The select-list is a list of column names of tables named in the from-list. Column names
can be prefixed by a range variable.
• The condition in the WHERE clause is a boolean combination (i.e., an expression using
the logical connectives AND, OR, and NOT)
• The DISTINCT keyword is optional. It indicates that the table computed as an answer
to this query should not contain duplicates, that is, two copies of the same row. The default is
that duplicates are not eliminated.
Ex 1 : Find the' names and ages of all sailors.
SELECT DISTINCT S.sname, S.age FROM Sailors S
The answer is a set of rows, each of which is a pair (sname, age). If two or more sailors
have the same name and age, the answer still contains just one pair with that name and age. This
query is equivalent to applying the projection operator of relational algebra.
UNIT – I / 60
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
UNIT – I / 61
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The Natural Join
Natural join is an SQL join operation that creates join on the base of the common columns in
the tables. To perform natural join there must be one common attribute(Column) between two
tables. Natural join will retrieve from multiple relations. It works in three steps.
Syntax: SQL > SELECT *
FROM TABLE1
NATURAL JOIN TABLE2;
Features of Natural Join :
1. It will perform the Cartesian product.
2. It finds consistent tuples and deletes inconsistent tuples.
3. Then it deletes the duplicate attributes.
Ex : Consider the query “For all instructors in the university who have taught
some course, find their names and the course ID of all courses they taught”,
which we wrote earlier as:
select name, course_id
from instructor, teaches
where instructor.ID= teaches.ID;
This query can be written more concisely using the natural-join operation in
SQL as:
select name, course_id
from instructor natural join teaches;
UNIT – I / 62
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
2.4 Additional Basic Operations
There are number of additional basic operations that are supported in SQL.
2.4.1 The Rename Operation
Consider the query below:
SQL > select name, courseid
The result of this query is a relation with the following attributes: name, courseid
SQL provides a way of renaming the attributes of a result relation. It uses the as clause,
taking the form:
UNIT – I / 63
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
In the above query, T and S can be thought as aliases, that is as alternative names, for the relation
instructor.
2.4.2 String Operations
SQL specifies strings by enclosing them in single quotes, for example, ’Computer’. A single
quote character that is part of a string can be specified by using two single quote characters;
Example, the string "It’s right" can be specified by "It''s right".
The SQL standard specifies that the equality operation on strings is case sensitive; as a result the
expression "'comp. sci.' = 'Comp. Sci.'" evaluates to false.
Pattern matching can be performed on strings, using the operator like. We describe patterns by
using two special characters:
Example: “Find the names of all departments whose building name includes the substring
‘Computer’.”
select deptname
from department
where building like ’%Watson%’;
For patterns to include the special pattern characters (that is,%and ), SQL allows the
specification of an escape character. The escape character is used immediately before a special
UNIT – I / 64
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
pattern character to indicate that the special pattern character is to be treated like a normal
character. We define the escape character for a like comparison using the escape keyword.
To illustrate, consider the following patterns, which use a backslash (\) as the escape character:
Q. Write a query that matches all strings beginning with “ab%cd”.
like ’ab\%cd%’ escape ’\’
Q . Write a query that matches all strings beginning with “ab\cd”.
like ’ab\\cd%’ escape ’\’
UNIT – I / 65
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
find the names of instructors with salary amounts between $90,000 and $100,000, we can use the
between comparison to write:
SQL > select name
from instructor
where salary between 90000 and 100000;
instead of:
SQL > select name
from instructor
where salary <= 100000 and salary >= 90000;
UNION is used to combine the results of two or more Select statements. However it will eliminate
duplicate rows from its result set. In case of union, number of columns and data type must be
same in both the tables.
Example of UNION
ID Name
1 abhi
2 adam
ID Name
2 adam
UNIT – I / 66
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
3 Chester
ID NAME
1 abhi
2 adam
3 Chester
Union All
This operation is similar to Union. But it also shows the duplicate rows.
ID NAME
1 abhi
2 adam
2 adam
3 Chester
Intersect
Intersect operation is used to combine two SELECT statements, but it only retuns the records
which are common from both SELECT statements. In case of Intersect the number of columns
and data type must be same. MySQL does not support INTERSECT operator.
UNIT – I / 67
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
select * from First
INTERSECT
select * from second
ID NAME
2 adam
Minus
Minus operation combines result of two Select statements and return only those result which
belongs to first set of result. MySQL does not support INTERSECT operator. MINUS and
EXCEPT are exact synonyms.
The SQL NULL is the term used to represent a missing value. A NULL value in a table is a
value in a field that appears to be blank.
A field with a NULL value is a field with no value. It is very important to understand that a
NULL value is different than a zero value or a field that contains spaces.
A field with a NULL value is one that has been left blank during record creation.
SQL uses the special keyword null in a predicate to test for a null value. Thus, to find all
instructors who appear in the instructor relation with null values for salary, we write:
SQL > select name
from instructor
where salary is null;
UNIT – I / 68
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Logical Connectives AND, OR, and NOT
There are three Logical Operators namely, AND, OR, and NOT. These operators
compare two conditions at a time to determine whether a row can be selected for the output.
When retrieving data using a SELECT statement, you can use logical operators in the WHERE
clause, which allows you to combine more than one condition.
Logical
Description
Operators
For the row to be selected at least one of the conditions must
OR
be true.
For a row to be selected all the specified conditions must be
AND
true.
NOT For a row to be selected the specified condition must be false.
2.7 Aggregate Functions
An aggregate function is a function that derives a single value from a set of values from a
column. Aggregate functions must be used with SELECT or HAVING clauses.
Common aggregate functions include
Function Description
AVG(column) Returns the average value of a column
COUNT(column) Returns the number of rows (without a NULL value) of a column
COUNT(*) Returns the number of selected rows
COUNT(DISTINCT column) Returns the number of distinct results
MAX(column) Returns the highest value of a column
MIN(column) Returns the lowest value of a column
SUM(column) Returns the sum of a column
UNIT – I / 69
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
SQL > Select deptno,min(sal) from emp group by deptno;
Ex : Display total salary department wise where the dept wise total salary is above 5000.
SQL > select deptno,sum(sal) from emp group by deptno having sum(sal) >= 5000;
SQL > Select * from emp where sal = (select max(sal) from emp);
UNIT – I / 70
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
A subquery is also called an inner query or inner select, while the statement containing a
subquery is also called an outer query or outer select.
The inner query executes first before its parent query so that the results of inner query can be
passed to the outer query.
Correlated Subquery
A query is called correlated subquery when both the inner query and the outer query are
interdependent. For every row processed by the inner query, the outer query is processed as well.
The inner query depends on the outer query before it can be processed.
Ex : To find the employees whose salary is equal to the salary of at least one employee in
department of id 300?
Ex : To find the employees whose salary is greater than at least on employee in department of id
500?
Ex : Write a query to find the employees whose salary is less than the salary of all employees in
department of id 100?
UNIT – I / 71
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
FROM EMPLOYEES
WHERE DEPARTMENT_ID = 100 );
Ex : Write a query to find the employees whose manager and department should match with the
employee of id 20 or 30?
Ex . Write a query to list the department names which have at least one employee?
Ex : Write a query to find the departments which do not have employees at all?
UNIT – I / 72
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
a) Inserting a single record
Syntax: INSERT INTO relationname(field_1,field_2,.field_n)VALUES
(data_1,data_2,........data_n);
Example: SQL>INSERT INTO student(sno,sname,class,address)VALUES
(1,’ SRI AJITH’,’B.Tech’,’KAVALI’);
b) Inserting all records from another relation
Syntax: INSERT INTO relation_name_1 SELECT Field_1,field_2,field_n
FROM relation_name_2 WHERE field_x=data;
Example: SQL>INSERT INTO std SELECT sno,sname FROM student
WHERE name = ‘SRI AJITH‘;
UNIT – I / 73
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Example: SQL>DELETE FROM student WHERE sno = 2;
Intermediate SQL
2.10 Joins
The join operation is one of the most useful operations in relational algebra and the most
commonly used way to combine information from two or more relations. Although a join can be
defined as a cross-product followed by selections and projections. It is denoted by ⋈.
Condition Join or Theta Join : Theta join combines tuples from different relations provided
they satisfy the theta condition. The join condition is denoted by the symbol θ.
Notation : R1 ⋈θ R2
or
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the
attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
UNIT – I / 74
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Theta join can use all kinds of comparison operators.
Student
SID Name Std
101 Alex 10
102 Maria 11
Subjects
Class Subject
10 Math
10 English
11 Music
11 Sports
Ex : STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail
SID Name Std Class Subject
101 Alex 10 10 Math
101 Alex 10 10 English
102 Maria 11 11 Music
102 Maria 11 11 Sports
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above
example corresponds to equijoin.
Natural Join ( ⋈)
It is a binary operator that is written as (R ⋈S) where R and S are relations. The result of the
natural join is the set of all combinations of tuples in R and S that are equal on their common
attribute names. For an example consider the tables Employee and Dept and their natural join:
Employee Dept
Name EmpId DeptName DeptName Manager Employee ⋈ Dept
Harry 3415 Finance Finance George DeptNam
Name EmpId Manager
Sally 2241 Sales Sales Harriet e
George 3401 Finance Production Charles Harry 3415 Finance George
Harriet 2202 Sales Sally 2241 Sales Harriet
George 3401 Finance George
Harriet 2202 Sales Harriet
UNIT – I / 75
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Note : The Default join operation used in the join is Natural Join. But Natural Join operation
results in some loss of Information.
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only
those tuples with matching attributes and the rest are discarded in the resulting relation.
Therefore, we need to use outer joins to include all the tuples from the participating relations in
the resulting relation. There are three kinds of outer joins − left outer join, right outer join,
and full outer join.
All the tuples from the Left relation, R, are included in the resulting relation. If there are
tuples in R without any matching tuple in the Right relation S, then the S-attributes of the
resulting relation are made NULL.
All the tuples from the Right relation, S, are included in the resulting relation. If there are tuples
in S without any matching tuple in R, then the R-attributes of resulting relation are made NULL.
All the tuples from both participating relations are included in the resulting relation. If there are
no matching tuples for both relations, their respective unmatched attributes are made NULL.
Employee
UNIT – I / 76
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Smith Revolver Death Valley
Works
2.11 Views
UNIT – I / 78
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
A view is a table whose rows are not explicitly stored in the database but are computed as needed
from a view definition. A view is nothing more than a SQL statement that is stored in the
database with an associated name. A view is actually a composition of a table in the form of a
predefined SQL query.
A view can contain all rows of a table or select rows from a table. A view can be created from
one or many tables which depends on the written SQL query to create a view.
Views, which are kind of virtual tables, allow users to do the following:
Structure data in a way that users or classes of users find natural or intuitive.
Restrict access to the data such that a user can see and (sometimes) modify exactly what
they need and no more.
Summarize data from various tables which can be used to generate reports.
Creating Views:
Database views are created using the CREATE VIEW statement. Views can be created from a
single table, multiple tables, or another view.
To create a view, a user must have the appropriate system privilege according to the specific
implementation. The basic CREATE VIEW syntax is as follows:
You can include multiple tables in your SELECT statement in very similar way as you use them
in normal SQL SELECT query.
UNIT – I / 79
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The WHERE clause may not contain subqueries.
Rows of data can be deleted from a view. The same rules that apply to the UPDATE and
INSERT commands apply to the DELETE command.
Dropping Views:
Obviously, where you have a view, you need a way to drop the view if it is no longer needed.
The syntax is very simple as given below:
Ex :
1) Create table for Costumer and insert records
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
2) This view would be used to have customer name and age from CUSTOMERS table
Now, you can query CUSTOMERS_VIEW in similar way as you query an actual table.
UNIT – I / 80
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
| kaushik | 23 |
| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+----------+-----+
This would ultimately update the base table CUSTOMERS and same would reflect in the view
itself. Now, try to query base table, and SELECT statement would produce the following result:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
This would ultimately delete a row from the base table CUSTOMERS and same would reflect in
the view itself. Now, try to query base table, and SELECT statement would produce the
following result:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
UNIT – I / 81
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
2.12 Transactions
A transaction consists of a sequence of query and/or update statements. The SQL standard
specifies that a transaction begins implicitly when an SQL statement is executed. One of the
following SQL statements must end the transaction:
• Commit : COMMIT in SQL is a transaction control language that is used to permanently
save the changes done in the transaction in tables/databases. The database cannot regain its
previous state after its execution of commit.
• Rollback : ROLLBACK in SQL is a transactional control language that is used to undo
the transactions that have not been saved in the database. The command is only been used to
undo changes since the last COMMIT.
COMMIT permanently saves the changes ROLLBACK undo the changes made by
1. made by the current transaction. the current transaction.
The transaction can not undo changes after Transaction reaches its previous state
2. COMMIT execution. after ROLLBACK.
UNIT – I / 82
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
COMMIT ROLLBACK
Domain Constraints:
A relation schema specifies the domain of each field in the relation instance. These
domain constraints in the schema specify the condition that each instance of the relation has to
satisfy: The values that appear in a column must be drawn from the domain associated with that
column. Thus, the domain of a field is essentially the type of that field. It can be enforced using:
Check
NOT NULL.
The CHECK Constraint enables a condition to check the value being entered into a record. If the
condition evaluates to false, the record violates the constraint and isn't entered into the table.
SYNTAX:
Table created.
UNIT – I / 83
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
NOT NULL
The NOT NULL constraint is a restriction placed on a column in a relational database table. It
enforces the condition that, in that column, every row of data must contain a value. It cannot be
left blank during insert or update operations. If this column is left blank, this will produce an
error message and the entire insert or update operation will fail.
Ex : SQL> create table student (rollno number(7) NOT NULL,
name varchar2(15),
branch char(4),dob date);
Note : NOT NULL constraint can be added only at Column level, but not table level.
Entity Integrity Constraints
Entity integrity is an integrity rule which states that every table must have a primary key and
that the column or columns chosen to be the primary key should be unique and not NULL
There are TWO types of Entity Integrity Constraints
2. Unique 2. primary key
Unique is used for preventing duplicate values into a column of a table. But this constraint can
allow NULL values.
A primary key, also called a primary keyword, is a key in a relational database that is unique
for each record. It is a unique identifier, it is used to prevent NULL values and also duplicate
values into a particular column of table. A table must always have one and only one primary
key.
Ex : SQL> create table student(rno number(4 ) primary key,
name varchar2(20),
login varchar2(20) unique,
dob date);
UNIT – I / 84
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
3) The data types of TWO columns must be same
4) The referenced key must be Primary Key.
Current database systems support such general constraints in the form of table
constraints and assertions. Table constraints are associated with a single table and checked
whenever that table is modified. In contrast, assertions involve several tables and are checked
whenever any of these tables is modified.
Assertions
An assertion is a predicate expressing a condition we wish the database to always satisfy.
DBMS checks the assertion after any change that may violate the expression
Ex : For table constraint, which ensures always the salary of an employee, is above 1000:
CREATE TABLE employee (eid number(10), ename varchar2(20), salary number(10,2),
CHECK(salary>1000));
Ex : create assertion spousal_supervisor check(supervisorid < >spousalid);
UNIT – I / 85
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Date and time values can be specified like this:
date ’2022-10-22’
time ’09:30:00’
timestamp ’2022-10-22 10:29:01.45’
Dates must be specified in the format year followed by month followed by day, as shown. The
seconds field of time or timestamp can have a fractional part, as in the timestamp above.
SQL defines several functions to get the current date and time. For example, current_date
returns the current date, current_time returns the current time (with time zone), and local_time
returns the current local time (without time zone). Timestamps (date plus time) are returned by
current_timestamp (with time zone) and localtimestamp (local date and time without time
zone).
2.14.2 Default Values
Default integrity constraint is also called as default column value. It is used to define
value for a column. the default value can help to avoid errors as zero that there is a number such
as zero that applies to a column that has no entry.
SYNTAX: <columnname> <datatype><size>default typical value
UNIT – I / 86
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
An index on an attribute of a relation is a data structure that allows the database system to find
those tuples in the relation that have a specified value for that attribute efficiently, without
scanning through all the tuples of the relation.
SQL> create index studentID index on student(ID);
The above statement creates an index named studentID index on the attribute ID of the relation
student.
UNIT – I / 87
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
SQL provides drop type and alter type clauses to drop or modify types that have been created
earlier.
2.14.6 Create Table Extensions
Applications often require creation of tables that have the same schema as an existing table.
EX : creating an employee table from existing table
Description: To copy the structure and records from the old name into newtable
Syntax:
SQL > Create table<new table name>as select <colomn name>from<old table
name>[where<condition>];
Sol :
Table created.
Table created.
2.15 Authorization
Authorizations on data include:
o Authorization to read data.
o Authorization to insert new data.
UNIT – I / 88
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
o Authorization to update data.
o Authorization to delete data.
Each of these types of authorizations is called a privilege. We may authorize the user all, none,
or a combination of these types of privileges on specified parts of a database, such as a relation
or a view. When a user submits a query or an update, the SQL implementation first checks if the
query or update is authorized, based on the authorizations that the user has been granted. If the
query or update is not authorized, it is rejected.
A user who has some form of authorization may be allowed to pass on (grant) this authorization
to other users, or to withdraw (revoke) an authorization that was granted earlier. The ultimate
form of authority is that given to the database administrator. The database administrator may
authorize new users, restructure the database, and so on. This form of authorization is analogous
to that of a superuser, administrator, or operator for an operating system.
2.15.1 Granting and Revoking of Privileges
The SQL standard includes the privileges select, insert, update, and delete. The privilege all
privileges can be used as a short form for all the allowable privileges. A user who creates a new
relation is given all privileges on that relation automatically.
The SQL data-definition language includes commands to grant and revoke privileges. The grant
statement is used to confer authorization. The basic form of this statement is:
grant <privilege list>
on <relation name or view name>
to <user/role list>;
The privilege list allows the granting of several privileges in one command. The select
authorization on a relation is required to read tuples in the relation. The following grant
statement grants database users Amit and Satoshi select authorization on the department relation:
grant select on department to Amit, Satoshi;
To revoke an authorization, we use the revoke statement. It takes a form almost identical to that
of grant:
revoke <privilege list>
on <relation name or view name>
from <user/role list>;
Thus, to revoke the privileges that we granted previously,
UNIT – I / 89
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
revoke select on department from Amit, Satoshi;
revoke update (budget) on department from Amit, Satoshi;
Advanced SQL
2.16 Accessing SQL From a Programming Language
SQL provides a powerful declarative query language. Writing queries in SQL is usually much
easier than coding the same queries in a general-purpose programming language. However, a
database programmer must have access to a general-purpose programming language for at least
two reasons:
1. Not all queries can be expressed in SQL, since SQL does not provide the full
expressive power of a general-purpose language. That is, there exist queries that can be
expressed in a language such as C, Java, or Cobol that cannot be expressed in SQL. To
write such queries, we can embed SQL within a more powerful language.
2. Nondeclarative actions—such as printing a report, interacting with a user, or sending the
results of a query to a graphical user interface—cannot be done from within SQL.
Applications usually have several components, and querying or updating data is only one
component; other components are written in general-purpose programming languages.
There are two approaches to accessing SQL from a general-purpose programming language:
Dynamic SQL: A general-purpose program can connect to and communicate with a
database server using a collection of functions (for procedural languages) or methods (for
object-oriented languages). Dynamic SQL allows the program to construct an SQL query
as a character string at runtime, submit the query, and then retrieve the result into
UNIT – I / 90
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
program variables a tuple at a time. The dynamic SQL component of SQL allows
programs to construct and submit SQL queries at runtime.
Embedded SQL: Like dynamic SQL, embedded SQL provides a means by which a
program can interact with a database server. However, under embedded SQL, the SQL
statements are identified at compile time using a pre-processor. The pre-processor
submits the SQL statements to the database system for pre compilation and optimization;
then it replaces the SQL statements in the application program with appropriate code and
function calls before invoking the programming-language compiler.
2.16.1 JDBC
The JDBC standard defines an application program interface (API) that Java programs can
use to connect to database servers. (The word JDBC was originally an abbreviation for Java
Database Connectivity)
Connecting to the Database: The first step in accessing a database from a Java program is to
open a connection to the database. This step is required to select which database to use.
connection is opened using the getConnection method of the Driver Manager class (within
java.sql). This method takes three parameters
The first parameter to the getConnection call is a string that specifies the URL, or
machine name, where the server runs, along with possibly some other information such
as the protocol to be used to communicate with the database, the port number the
database system uses for communication, and the specific database on the server to be
used.
The second parameter to getConnection is a database user identifier, which is a string.
The third parameter is a password, which is also a string.
Each database product that supports JDBC provides a JDBC driver that must be dynamically
loaded in order to access the database from Java. In fact, loading the driver must be done first,
before connecting to the database.
Shipping SQL Statements to the Database System:
Once a database connection is open, the program can use it to send SQL statements to the
database system for execution. This is done via an instance of the class Statement. A Statement
UNIT – I / 91
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
object is not the SQL statement itself, but rather an object that allows the Java program to invoke
methods that ship an SQL statement given as an argument for execution by the database system.
Other Features :
JDBC provides a number of other features, such as updatable result sets. It can create an
updatable result set from a query that performs a selection and/or a projection on a database
relation. An update to a tuple in the result set then results in an update to the corresponding tuple
of the database relation.
2.16.2 ODBC
The Open Database Connectivity (ODBC) standard defines an API that applications can use to
open a connection with a database, send queries and updates, and get back results. Applications
such as graphical user interfaces, statistics pack ages, and spreadsheets can make use of the same
ODBC API to connect to any database server that supports ODBC. Each database system
supporting ODBC provides a library that must be linked with the client program. When the client
program makes an ODBC API call, the code in the library communicates with the server to carry
out the requested action, and fetch results. The ODBC standard defines conformance levels,
which specify subsets of the functionality defined by the standard. An ODBC implementation
may provide only core level features, or it may provide more advanced (level 1 or level 2)
features. Level 1 requires support for fetching information about the catalog, such as information
about what relations are present and the types of their attributes. Level 2 requires further
features, such as the ability to send and retrieve arrays of parameter values and to retrieve more
detailed catalog information.
The SQL standard defines a call level interface (CLI) that is similar to the ODBC interface.
2.16.3 Embedded SQL
The SQL standard defines embeddings of SQL in a variety of programming languages, such as
C, C++, Cobol, Pascal, Java, PL/I, and Fortran. A language in which SQL queries are embedded
is referred to as a host language, and the SQL structures permitted in the host language constitute
embedded SQL.
Programs written in the host language can use the embedded SQL syntax to access and
update data stored in a database. An embedded SQL program must be processed by a special pre-
processor prior to compilation. The pre-processor replaces embedded SQL requests with host-
language declarations and procedure calls that allow runtime execution of the database accesses.
UNIT – I / 92
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Then, the resulting program is compiled by the host-language compiler. This is the main
distinction between embedded SQL and JDBC or ODBC.
To identify embedded SQL requests to the pre-processor, we use the EXEC SQL statement; it
has the form:
EXEC SQL <embedded SQL statement >;
The body consists or declaration section, execution section and exception section similar
to a general PL/SQL Block. A procedure is similar to an anonymous PL/SQL Block but it is
named for repeated usage. We can pass parameters to procedures in three ways :
Parameters Description
IN type These types of parameters are used to send values to stored procedures.
OUT type These types of parameters are used to get values from stored procedures. This is similar
to a return type in functions.
UNIT – I / 93
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
IN OUT type These types of parameters are used to send values and get values from stored
procedures.
Syntax:
CREATE [OR REPLACE] PROCEDURE procedure_name (<Argument> {IN, OUT, IN
OUT} <Datatype>,…)
IS
Declaration section<variable, constant> ;
BEGIN
Execution section
EXCEPTION
Exception section
END
IS - marks the beginning of the body of the procedure and is similar to DECLARE in anonymous
PL/SQL Blocks. The code between IS and BEGIN forms the Declaration section.
The syntax within the brackets [ ] indicate they are optional.
By using CREATE OR REPLACE together the procedure is created if no other procedure with
the same name exists or the existing procedure is replaced with the current code.
To execute a procedure:
From the SQL prompt : EXECUTE [or EXEC] procedure_name;
Functions:
A function is a named PL/SQL Block which is similar to a procedure. The major difference
between a procedure and a function is, a function must always return a value, but a procedure
may or may not return a value.
Syntax:
CREATE [OR REPLACE] FUNCTION function_name [parameters]
RETURN return_datatype; {IS, AS}
Declaration_section <variable,constant> ;
BEGIN
Execution_section
Return return_variable;
UNIT – I / 94
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
EXCEPTION
exception section
Return return_variable;
END;
RETURN TYPE: The header section defines the return type of the function. The return datatype
can be any of the oracle datatype like varchar, number etc.
The execution and exception section both should return a value which is of the datatype defined
in the header section.
A function can be executed in the following ways.
As a part of a SELECT statement : SELECT emp_details_func FROM dual;
In a PL/SQL Statements : dbms_output.put_line(emp_details_func);
2.18 Triggers
A trigger is a statement that the system executes automatically as a side effect of a modification
to the database. To design a trigger mechanism, we must meet two requirements:
1. Specify when a trigger is to be executed. This is broken up into an event (INSERT,
UPDATE or DELETE) that causes the trigger to be checked and a condition that must be
satisfied for trigger execution to proceed.
2. Specify the actions to be taken when the trigger executes.
Once we enter a trigger into the database, the database system takes on the responsibility of
executing it whenever the specified event occurs and the corresponding condition is satisfied.
Syntax
create trigger Trigger_name
(before | after)
[insert | update | delete]
on [table_name]
[for each row]
[trigger_body]
1. CREATE TRIGGER: These two keywords specify that a triggered block is going to
be declared.
UNIT – I / 95
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
2. TRIGGER_NAME: It creates or replaces an existing trigger with the Trigger_name.
The trigger name should be unique.
3. BEFORE | AFTER: It specifies when the trigger will be initiated i.e. before the
ongoing event or after the ongoing event.
4. INSERT | UPDATE | DELETE: These are the DML operations and we can use either
of them in a given trigger.
5. ON[TABLE_NAME]: It specifies the name of the table on which the trigger is going
to be applied.
6. FOR EACH ROW: Row-level trigger gets executed when any row value of any
column changes.
7. TRIGGER BODY: It consists of queries that need to be executed when the trigger is
called.
Example
Suppose we have a table named Student containing the attributes Student_id, Name, Address,
and Marks.
Now, we want to create a trigger that will add 100 marks to each new row of
the Marks column whenever a new student is inserted to the table.
SQL > INSERT INTO Student(Name, Address, Marks) VALUES('Alizeh', 'Maldives', 110);
UNIT – I / 96
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The Student_id column is an auto-increment field and will be generated automatically when a
new record is inserted into the table.
To see the final output the query would be: SELECT * FROM Student;
Advantages of Triggers
1. Triggers provide a way to check the integrity of the data. When there is a change in the
database the triggers can adjust the entire database.
2. Triggers help in keeking User Interface lightweight. Instead of putting the same
function call all over the application you can put a trigger and it will be executed.
Disadvantages of Triggers
2. The triggers may increase the overhead of the database as they are executed every time
any field is updated.
2.19 OLAP
An online analytical processing (OLAP) system is an interactive system that permits an
analyst to view different summaries of multidimensional data.
UNIT – I / 97
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Online analytical processing (OLAP) is defined as “The dynamic synthesis, analysis, and
consolidation large volumes of multi-dimensional data.”
OLAP enables users to gain a deeper understanding and knowledge about various aspects
of their corporate data through fast, consistent, interactive access to a wide variety of possible
views of the data.
OLAP databases are divided into one or more cubes and these cubes are known as Hyper-
cubes.
There are the following key features of OLAP:
Multi-dimensional views of data : A multi-dimensional view of data provides the basis for
analytical processing through flexible access to corporate data. It enables users to analyze data
across any dimension at any level of aggregation with equal functionality and ease.
Support for complex calculations : OLAP software must provide a range of powerful
computational methods such as that required by sales forecasting such as moving averages and
percentage growth.
Time intelligence : Time intelligence is used to judge the performance of almost any analytical
application over time. For example, this month versus last month or this month versus the same
month last year or a user may require to view, the sales of the month of Mayor the sales for the
first five months of 2007. Concepts such as year-to-date and period-over-period comparisons
should be easily defined in an OLAP system.
Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end tools. To
store and manage warehouse data, ROLAP uses relational or extended-relational DBMS.
UNIT – I / 98
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Implementation of aggregation navigation logic.
Optimization for each DBMS back end.
Additional tools and services.
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of data.
With multidimensional data stores, the storage utilization may be low if the data set is sparse.
Therefore, many MOLAP server use two levels of data storage representation to handle dense
and sparse data sets.
Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of
ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large data
volumes of detailed information. The aggregations are stored separately in MOLAP store.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP
operations in multidimensional data.
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
UNIT – I / 99
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Roll-up is performed by climbing up a concept hierarchy for the dimension location.
Initially the concept hierarchy was "street < city < province < country".
On rolling up, the data is aggregated by ascending the location hierarchy from the level of
city to the level of country.
The data is grouped into cities rather than countries.
When roll-up is performed, one or more dimensions from the data cube are removed.
Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the following ways −
UNIT – I / 100
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Drill-down is performed by stepping down a concept hierarchy for the dimension time.
Initially the concept hierarchy was "day < month < quarter < year."
On drilling down, the time dimension is descended from the level of quarter to the level
of month.
When drill-down is performed, one or more dimensions from the data cube are added.
It navigates the data from less detailed data to highly detailed data.
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-
cube. Consider the following diagram that shows how slice works.
UNIT – I / 101
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
Here Slice is performed for the dimension "time" using the criterion time = "Q1".
It will form a new sub-cube by selecting one or more dimensions.
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider
the following diagram that shows the dice operation.
UNIT – I / 102
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
The dice operation on the cube based on the following selection criteria involves three
dimensions.
UNIT – I / 103
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
UNIT – I / 104
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT
OLAP vs OLTP
S.No. Data Warehouse (OLAP) Operational Database (OLTP)
2 OLAP systems are used by knowledge OLTP systems are used by clerks,
workers such as executives, managers and DBAs, or database professionals.
analysts.
UNIT – I / 105
DBMS NOTES (R21) PBR VITS – CSE / AI / IOT