You are on page 1of 80


B. Tech II/CSE II Semester


Text Books: (1) DBMS by Raghu Ramakrishnan

(2) DBMS by Sudarshan and Korth

Slide No:L3-5
Database System Applications
• DBMS contains information about a particular enterprise
–Collection of interrelated data
–Set of programs to access the data
–An environment that is both convenient and efficient to use
• Database Applications:
– Banking: all transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Online retailers: order tracking, customized recommendations
– Manufacturing: production, inventory, orders, supply chain
– Human resources: employee records, salaries, tax
• Databases touch all aspects of our lives

Slide No:L1-1
What Is a DBMS?

• A very large, integrated collection of data.

• Models real-world enterprise.
– Entities (e.g., students, courses)
– Relationships (e.g., Madonna is taking
• A Database Management System (DBMS) is a
software package designed to store and
manage databases.

Slide No:L1-2
Why Use a DBMS?

• Data independence and efficient

• Reduced application development
• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from

Slide No:L1-3
Why Study Databases?? ?

• Shift from computation to information

– at the “low end”: scramble to webspace (a mess!)
– at the “high end”: scientific applications

• Datasets increasing in diversity and volume.

– Digital libraries, interactive video, Human
Genome project, EOS project
– ... need for DBMS exploding

• DBMS encompasses most of CS

– OS, languages, theory, AI, multimedia, logic

Slide No:L1-4
Files vs. DBMS

• Application must stage large datasets

between main memory and secondary
storage (e.g., buffering, page-oriented
access, 32-bit addressing, etc.)
• Special code for different queries
• Must protect data from inconsistency
due to multiple concurrent users
• Crash recovery
• Security and access control

Slide No:L1-5
Purpose of Database Systems
• In the early days, database applications were built
directly on top of file systems
• Drawbacks of using file systems to store data:
– Data redundancy and inconsistency
• Multiple file formats, duplication of information in
different files
– Difficulty in accessing data
• Need to write a new program to carry out each
new task
– Data isolation — multiple files and formats
– Integrity problems
• Integrity constraints (e.g. account balance > 0)
become “buried” in program code rather than
being stated explicitly
• Hard to add new constraints or change existing
Slide No:L1-6
Purpose of Database Systems (Cont.)

• Drawbacks of using file systems (cont.)

– Atomicity of updates
• Failures may leave database in an inconsistent state
with partial updates carried out
• Example: Transfer of funds from one account to
another should either complete or not happen at all
– Concurrent access by multiple users
• Concurrent accessed needed for performance
• Uncontrolled concurrent accesses can lead to
– Example: Two people reading a balance and
updating it at the same time
– Security problems
• Hard to provide user access to some, but not all, data
• Database systems offer solutions to all the above problems

Slide No:L1-7
Levels of Abstraction

• Physical level: describes how a record (e.g., customer) is

• Logical level: describes data stored in database, and the
relationships among the data.
type customer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : string;
• View level: application programs hide details of data
types. Views can also hide information (such as an
employee’s salary) for security purposes.

Slide No:L1-8
• DBMS used to maintain, query large datasets.
• Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security.
• Levels of abstraction give data independence.
• A DBMS typically has a layered architecture.
• DBAs hold responsible jobs
and are well-paid! 
• DBMS R&D is one of the broadest,
most exciting areas in CS.

Slide No:L1-9
View of Data

An architecture for a database system

Slide No:L2-1
Instances and Schemas

• Similar to types and variables in programming

• Schema – the logical structure of the database
– Example: The database consists of information
about a set of customers and accounts and the
relationship between them)
– Analogous to type information of a variable in a
– Physical schema: database design at the
physical level
– Logical schema: database design at the logical
Slide No:L2-2
Instances and Schemas
• Instance – the actual content of the database
at a particular point in time
– Analogous to the value of a variable
• Physical Data Independence – the ability to
modify the physical schema without changing
the logical schema
– Applications depend on the logical schema
– In general, the interfaces between the
various levels and components should be
well defined so that changes in some parts
do not seriously influence others.

Slide No:L2-3
Data Models

• A collection of tools for describing

– Data
– Data relationships
– Data semantics
– Data constraints
• Relational model
• Entity-Relationship data model (mainly for
database design)
• Object-based data models (Object-oriented and
• Semi structured data model (XML)
• Other older models:
– Network model
– Hierarchical model

Slide No:L2-4
Data Models

• A data model is a collection of concepts for

describing data.
• A schema is a description of a particular
collection of data, using the a given data
• The relational model of data is the most
widely used model today.
– Main concept: relation, basically a table with
rows and columns.
– Every relation has a schema, which describes
the columns, or fields.

Slide No:L2-5
Example: University Database
• Conceptual schema:
– Students(sid: string, name: string, login:
age: integer, gpa:real)
– Courses(cid: string, cname:string,
– Enrolled(sid:string, cid:string, grade:string)

• Physical schema:
– Relations stored as unordered files.
– Index on first column of Students.

• External Schema (View):

– Course_info(cid:string,enrollment:integer)
Slide No:L2-6
Data Independence

• Applications insulated from how data

is structured and stored.
• Logical data independence: Protection
from changes in logical structure of
• Physical data independence:
Protection from changes in physical
structure of data.
☛ One of the most important benefits of using a DBMS!
Slide No:L2-7
Data Manipulation Language (DML)

• Language for accessing and manipulating the

data organized by the appropriate data model
– DML also known as query language
• Two classes of languages
– Procedural – user specifies what data is
required and how to get those data
– Declarative (nonprocedural) – user
specifies what data is required without
specifying how to get those data
• SQL is the most widely used query language

Slide No:L3-1
Data Definition Language (DDL)
• Specification notation for defining the database schema
Example: create table account (
account_number char(10),
branch_name char(10),
balance integer)
• DDL compiler generates a set of tables stored in a data
• Data dictionary contains metadata (i.e., data about data)
– Database schema
– Data storage and definition language
• Specifies the storage structure and access methods
– Integrity constraints
• Domain constraints
• Referential integrity (e.g. branch_name must
correspond to a valid branch in the branch table)
– Authorization
Slide No:L3-2
Relational Model

• Example of tabular data in the Attributes

relational model

Slide No:L3-3
A Sample Relational Database

Slide No:L3-4
• SQL: widely used non-procedural language
– Example: Find the name of the customer with
customer-id 192-83-7465
select customer.customer_name
from customer
where customer.customer_id = ‘192-83-7465’
– Example: Find the balances of all accounts held by
the customer with customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer_id = ‘192-83-7465’
depositor.account_number =

Slide No:L3-5

• Application programs generally access databases

through one of
– Language extensions to allow embedded SQL
– Application program interface (e.g., ODBC/JDBC)
which allow SQL queries to be sent to a database

Slide No:L3-6
Database Users

Users are differentiated by the way they expect to interact with

the system
• Application programmers – interact with system through
DML calls
• Sophisticated users – form requests in a database query
• Specialized users – write specialized database applications
that do not fit into the traditional data processing framework
• Naïve users – invoke one of the permanent application
programs that have been written previously
– Examples, people accessing database over the web, bank
tellers, clerical staff

Slide No:L4-1
Database Administrator

• Coordinates all the activities of the database system

– has a good understanding of the enterprise’s
information resources and needs.
• Database administrator's duties include:
– Storage structure and access method definition
– Schema and physical organization modification
– Granting users authority to access the database
– Backing up data
– Monitoring performance and responding to
• Database tuning

Slide No:L4-2
Data storage and Querying

• Storage management
• Query processing
• Transaction processing

Slide No:L5-1
Storage Management

• Storage manager is a program module that provides

the interface between the low-level data stored in the
database and the application programs and queries
submitted to the system.
• The storage manager is responsible to the following
–Interaction with the file manager
–Efficient storing, retrieving and updating of data
• Issues:
– Storage access
– File organization
– Indexing and hashing

Slide No:L5-2
Query Processing

1. Parsingand translation
2. Optimization
3. Evaluation

Slide No:L5-3
Query Processing (Cont.)

• Alternative ways of evaluating a given query

– Equivalent expressions
– Different algorithms for each operation
• Cost difference between a good and a bad way of
evaluating a query can be enormous
• Need to estimate the cost of operations
– Depends critically on statistical information about
relations which the database must maintain
– Need to estimate statistics for intermediate
results to compute cost of complex expressions

Slide No:L5-4
Transaction Management

• A transaction is a collection of operations that

performs a single logical function in a database
• Transaction-management component
ensures that the database remains in a
consistent (correct) state despite system
failures (e.g., power failures and operating
system crashes) and transaction failures.
• Concurrency-control manager controls the
interaction among the concurrent transactions,
to ensure the consistency of the database.

Slide No:L5-5
Database Architecture

The architecture of a database systems is greatly

influenced by
the underlying computer system on which the
database is running:
• Centralized
• Client-server
• Parallel (multiple processors and disks)
• Distributed

Slide No:L6-1
Overall System Structure

Slide No:L6-2
Database Application Architectures

(web browser)

Old Modern

Slide No:L6-3
History of Database Systems
• 1950s and early 1960s:
–Data processing using magnetic tapes for storage
• Tapes provide only sequential access
–Punched cards for input
• Late 1960s and 1970s:
– Hard disks allow direct access to data
– Network and hierarchical data models in widespread
– Ted Codd defines the relational data model
• Would win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley begins Ingres prototype
– High-performance (for the era) transaction processing

Slide No:L1-1
Magnetic tape Hard disk
Magnetic tape unit

Slide No:L1-2
History (cont.)
• 1980s:
– Research relational prototypes evolve into commercial
• SQL becomes industry standard
– Parallel and distributed database systems
–Object-oriented database systems
• 1990s:
– Large decision support and data-mining applications
– Large multi-terabyte data warehouses
–Emergence of Web commerce
• 2000s:
– XML and XQuery standards
– Automated database administration
– Increasing use of highly parallel database systems
– Web-scale distributed data storage systems
Slide No:L1-3
Slide No:L1-4
Slide No:L1-5
Slide No:L1-6
Slide No:L1-7
Slide No:L1-8
Slide No:L1-9
Slide No:L1-10
Database Design

• Conceptual design: (ER Model is used at this stage.)

– What are the entities and relationships in the
– What information about these entities and
relationships should we store in the database?
– What are the integrity constraints or business rules
that hold?
– A database `schema’ in the ER Model can be
represented pictorially (ER diagrams).
– Can map an ER diagram into a relational schema.

Slide No:L2-1
• A database can be modeled as:
–a collection of entities,
–relationship among entities.
• An entity is an object that exists and is
distinguishable from other objects.
–Example: specific person, company, event, plant
• Entities have attributes
– Example: people have names and addresses
• An entity set is a set of entities of the same type
that share the same properties.
– Example: set of all persons, companies, trees,

Slide No:L2-2
Entity Sets customer and loan
customer_id customer_ customer_ customer_ loan_ amount
name street city number

Slide No:L2-3
• An entity is represented by a set of attributes, that is
descriptive properties possessed by all members of an
entity set.
customer = (customer_id, customer_name,
customer_street, customer_city )
loan = (loan_number, amount )
• Domain – the set of permitted values for each
• Attribute types:
– Simple and composite attributes.
– Single-valued and multi-valued attributes
• Example: multivalued attribute: phone_numbers
– Derived attributes
• Can be computed from other attributes
• Example: age, Slide
given date_of_birth
Composite Attributes

Slide No:L2-5
Mapping Cardinality Constraints

• Express the number of entities to which another

entity can be associated via a relationship set.
• Most useful in describing binary relationship sets.
• For a binary relationship set the mapping
cardinality must be one of the following types:
– One to one
– One to many
– Many to one
– Many to many

Slide No:L2-6
Mapping Cardinalities

One to one One to many

Note: Some elements in A and B may not be mapped to any
elements in the other set

Slide No:L2-7
Mapping Cardinalities

Many to one Many to many

Note: Some elements in A and B may not be mapped to any
elements in the other set

Slide No:L2-8
ER Model Basics

ssn lot


• Entity: Real-world object distinguishable from other

objects. An entity is described (in DB) using a set of
• Entity Set: A collection of similar entities. E.g., all
– All entities in an entity set have the same set of
attributes. (Until we consider ISA hierarchies,
– Each entity set has a key.
– Each attribute has a domain.
Slide No:L2-9
ER Model Basics (Contd.)

ssn lot
name dname
ssn lot did budget Employees

super- subord
Employees Works_In Departments visor inate

• Relationship: Association among two or more entities. E.g.,

Attishoo works in Pharmacy department.
• Relationship Set: Collection of similar relationships.
– An n-ary relationship set R relates n entity sets E1 ... En;
each relationship in R involves entities e1 E1, ..., en En
• Same entity set could participate in different
relationship sets, or in different “roles” in same set.

Slide No:L2-10
Relationship Sets
• A relationship is an association among several
Hayes depositor A-102
customer entityrelationship setaccount entity
• A relationship set is a mathematical relation among
n  2 entities, each taken from entity sets
{(e1, e2, … en) | e1  E1, e2  E2, …, en  En}

where (e1, e2, …, en) is a relationship

– Example:
(Hayes, A-102)  depositor

Slide No:L3-1
Relationship Set borrower

Slide No:L3-2
Relationship Sets (Cont.)
• An attribute can also be property of a
relationship set.
• For instance, the depositor relationship set
between entity sets customer and account may
have the attribute access-date

Slide No:L3-3
Degree of a Relationship Set

• Refers to number of entity sets that

participate in a relationship set.
• Relationship sets that involve two entity
sets are binary (or degree two).
Generally, most relationship sets in a
database system are binary.
• Relationship sets may involve more than
two entity sets.

Slide No:L3-4
Degree of a Relationship Set
Example: Suppose employees of a bank
may have jobs (responsibilities) at
multiple branches, with different jobs at
different branches. Then there is a
ternary relationship set between entity
sets employee, job, and branch
• Relationships between more than two entity sets
are rare. Most relationships are binary. (More on
this later.)

Slide No:L3-5
features of the ER name
ssn lot did budget

Key Constraints
Employees Manages Departments

• Consider Works_In:
An employee can
work in many
departments; a dept
can have many
• In contrast, each dept
has at most one
manager, according
to the key constraint
on Manages. 1-to-1 1-to Many Many-to-1

Slide No:L4-1
Participation Constraints
• Does every department have a manager?
– If so, this is a participation constraint: the
participation of Departments in Manages is said to be
total (vs. partial).
• Every Departments entity must appear in an
instance of the Manages relationship.
name dname
ssn lot did budget

Employees Manages Departments



Slide No:L4-2
Weak Entities
• A weak entity can be identified uniquely only by considering
the primary key of another (owner) entity.
– Owner entity set and weak entity set must participate in
a one-to-many relationship set (one owner, many weak
– Weak entity set must have total participation in this
identifying relationship set.

cost pname age
ssn lot

Employees Policy Dependents

Slide No:L4-3
Weak Entity Sets
• An entity set that does not have a primary key is
referred to as a weak entity set.
• The existence of a weak entity set depends on the
existence of a identifying entity set
– it must relate to the identifying entity set via a
total, one-to-many relationship set from the
identifying to the weak entity set
– Identifying relationship depicted using a double
• The discriminator (or partial key) of a weak entity set
is the set of attributes that distinguishes among all
the entities of a weak entity set.
• The primary key of a weak entity set is formed by the
primary key of the strong entity set on which the
weak entity set is existence dependent, plus the weak
entity set’s discriminator.
Slide No:L4-4
Weak Entity Sets (Cont.)
• We depict a weak entity set by double rectangles.
• We underline the discriminator of a weak entity set
with a dashed line.
• payment_number – discriminator of the payment
entity set
• Primary key for payment – (loan_number,

Slide No:L4-5
Weak Entity Sets (Cont.)

• Note: the primary key of the strong entity set is

not explicitly stored with the weak entity set,
since it is implicit in the identifying relationship.
• If loan_number were explicitly stored, payment
could be made a strong entity, but then the
relationship between payment and loan would be
duplicated by an implicit relationship defined by
the attribute loan_number common to payment
and loan

Slide No:L4-6
More Weak Entity Set Examples

• In a university, a course is a strong entity and a

course_offering can be modeled as a weak entity
• The discriminator of course_offering would be
semester (including year) and section_number (if
there is more than one section)
• If we model course_offering as a strong entity we
would model course_number as an attribute.
Then the relationship with course would be
implicit in the course_number attribute

Slide No:L4-7
ISA (`is a’) Hierarchies
As in C++, or other PLs, ssn lot

attributes are inherited.

 If we declare A ISA B,
every A entity is also hours_worked
considered to be a B ISA

• Overlap constraints: Can Joe be an Hourly_Emps as well as a

Contract_Emps entity? (Allowed/disallowed)
• Covering constraints: Does every Employees entity also have to
be an Hourly_Emps or a Contract_Emps entity? (Yes/no)
• Reasons for using ISA:
– To add descriptive attributes specific to a subclass.
– To identify entitities that participate in a relationship.

Slide No:L5-1
• Used when we have to name
ssn lot
model a relationship
involving (entitity sets
and) a relationship set.
– Aggregation allows
us to treat a Monitors until
relationship set as
an entity set for
purposes of started_on since
participation in pid pbudget did
(other) budget
relationships. Projects Sponsors Departments

Aggregation vs. ternary relationship:

 Monitors is a distinct relationship, with a descriptive attribute.
 Also, can say that each sponsorship is monitored by at most one

Slide No:L5-2
Consider the ternary relationship works_on, which we
saw earlier
Suppose we want to record managers for tasks
performed by an employee at a branch

Slide No:L5-3
Aggregation (Cont.)
• Relationship sets works_on and manages represent
overlapping information
– Every manages relationship corresponds to a works_on
– However, some works_on relationships may not
correspond to any manages relationships
• So we can’t discard the works_on relationship
• Eliminate this redundancy via aggregation
– Treat relationship as an abstract entity
– Allows relationships between relationships
– Abstraction of relationship into new entity

Slide No:L5-4
Aggregation (Cont.)
• Eliminate this redundancy via aggregation
– Treat relationship as an abstract entity
– Allows relationships between relationships
– Abstraction of relationship into new entity
• Without introducing redundancy, the following diagram
– An employee works on a particular job at a particular
– An employee, branch, job combination may have an
associated manager

Slide No:L5-5
E-R Diagram With Aggregation

Slide No:L5-6
Conceptual Design Using the ER Model

• Design choices:
– Should a concept be modeled as an entity or an
– Should a concept be modeled as an entity or a
– Identifying relationships: Binary or ternary?
• Constraints in the ER Model:
– A lot of data semantics can (and should) be captured.
– But some constraints cannot be captured in ER

Slide No:L6-1
Entity vs. Attribute
• Should address be an attribute of Employees or an entity
(connected to Employees by a relationship)?
• Depends upon the use we want to make of address
information, and the semantics of the data:
• If we have several addresses per employee, address
must be an entity (since attributes cannot be set-
• If the structure (city, street, etc.) is important, e.g.,
we want to retrieve employees in a given city,
address must be modeled as an entity (since
attribute values are atomic).

Slide No:L6-2
Entity vs. Attribute (Contd.)
• Works_In4 does not
allow an employee to
work in a department name
for two or more ssn lot did budget
• Similar to the Employees Works_In4 Departments
problem of wanting
to record several
addresses for an
employee: We want to
record several values
of the descriptive ssn
lot did
attributes for each budget

instance of this Employees Works_In4 Departments

Accomplished by
introducing new from Duration to
entity set, Duration.
Slide No:L6-3
Entity vs. Relationship

• First ER diagram OK if a
manager gets a separate name
since dbudget
discretionary budget for ssn lot did budget
each dept.
• What if a manager gets a Employees Manages2 Departments
discretionary budget
that covers all managed
depts? ssn lot
– Redundancy: dbudget
since dname
stored for each dept Employees did budget
managed by manager.
– Misleading: Suggests
Manages2 Departments
dbudget associated ISA
with department-mgr
combination. This fixes
Managers dbudget
the problem!
Slide No:L6-4
Binary vs. Ternary Relationships

• If each policy is name

owned by just 1 ssn lot pname age
employee, and Employees Dependents
each dependent
is tied to the Bad design Policies
covering policy,
first diagram is policyid cost
inaccurate. name pname age
ssn lot
• What are the
additional Employees

constraints in
the 2nd Purchaser
Better design Policies

Slide No:L6-5
policyid cost
Binary vs. Ternary Relationships (Contd.)

• Previous example illustrated a case when two binary

relationships were better than one ternary relationship.
• An example in the other direction: a ternary relation
Contracts relates entity sets Parts, Departments and
Suppliers, and has descriptive attribute qty. No
combination of binary relationships is an adequate
– S “can-supply” P, D “needs” P, and D “deals-with” S
does not imply that D has agreed to buy P from S.
– How do we record qty?

Slide No:L6-6
Summary of Conceptual Design

• Conceptual design follows requirements analysis,

– Yields a high-level description of data to be stored

• ER model popular for conceptual design

– Constructs are expressive, close to the way people think
about their applications.
• Basic constructs: entities, relationships, and attributes (of
entities and relationships).
• Some additional constructs: weak entities, ISA hierarchies,
and aggregation.
• Note: There are many variations on ER model.

Slide No:L7-1
Summary of ER (Contd.)

• Several kinds of integrity constraints can be expressed in the

ER model: key constraints, participation constraints, and
overlap/covering constraints for ISA hierarchies. Some foreign
key constraints are also implicit in the definition of a
relationship set.
– Some constraints (notably, functional dependencies) cannot
be expressed in the ER model.
– Constraints play an important role in determining the best
database design for an enterprise.

Slide No:L7-2
Summary of ER (Contd.)

• ER design is subjective. There are often many ways to

model a given scenario! Analyzing alternatives can be
tricky, especially for a large enterprise. Common choices

Entity vs. attribute, entity vs. relationship, binary or n- ary
relationship, whether or not to use ISA hierarchies, and
whether or not to use aggregation.

Ensuring good database design: resulting relational schema
should be analyzed and refined further. FD information and
normalization techniques are especially useful.

Slide No:L7-3