You are on page 1of 373

Database Management Systems

Module 1
Introduction to Database
and its Conceptual Model

Presidency University, Bengaluru


Contents
• Introduction
• Simplified database system environment
• Typical DBMS Functionality
• Main Characteristics of the Database Approach
• Database Users
• Advantages of Using the Database Approach
• Data Models
• History of Data Models
• Three-Schema Architecture
• DBMS Languages
• DBMS Interfaces
• Database system environment

Slide 1-2
Basic Definitions
• Database: A collection of related data.
• Data: Known facts that can be recorded and have an implicit
meaning.
• Mini-world: Some part of the real world about which data is
stored in a database. For example, student grades and
transcripts at a university.
• Database Management System (DBMS): A software package/
system to facilitate the creation and maintenance of a
computerized database.
• Database System: The DBMS software together with the data
itself. Sometimes, the applications are also included.

Slide 1-3
What is a File system?

• A file system is a technique of arranging the files in a storage


medium like a hard disk, pen drive, DVD, etc. It helps you to
organizes the data and allows easy retrieval of files when they
are required. It mostly consists of different types of files like
mp3, mp4, txt, doc, etc. that are grouped into directories.
• A file system enables you to handle the way of reading and
writing data to the storage medium. It is directly installed into
the computer with the Operating systems such as Windows and
Linux.

4
KEY DIFFERENCES
• A file system is a software that manages and organizes the files in a
storage medium, whereas DBMS is a software application that is used
for accessing, creating, and managing databases.
• The file system doesn't have a crash recovery mechanism on the other
hand, DBMS provides a crash recovery mechanism.
• Data inconsistency is higher in the file system. On the contrary Data
inconsistency is low in a database management system.
• File system does not provide support for complicated transactions,
while in the DBMS system, it is easy to implement complicated
transactions using SQL.
• File system does not offer concurrency, whereas DBMS provides a
concurrency facility.

5
Simplified database system environment

6
Typical DBMS Functionality
• Define a database : in terms of data types, structures and
constraints
• Construct or Load the Database on a secondary storage
medium
• Manipulating the database : querying, generating reports,
insertions, deletions and modifications to its content
• Concurrent Processing and Sharing by a set of users and
programs – yet, keeping all data valid and consistent
Other features:
• Protection or Security measures to prevent unauthorized
access
• “Active” processing to take internal actions on data
• Presentation and Visualization of data

Slide 1-7
Example of a Database (with a Conceptual Data Model)
• Mini-world for the example: Part of a UNIVERSITY
environment.
• Some mini-world entities:
• STUDENTs
• COURSEs
• SECTIONs (of COURSEs)
• (academic) DEPARTMENTs
• INSTRUCTORs
Note: The above could be expressed in the ENTITY-
RELATIONSHIP data model.

Slide 1-8
Example of a Database (with a Conceptual Data Model)
• Some mini-world relationships:
• SECTIONs are of specific COURSEs
• STUDENTs take SECTIONs
• COURSEs have prerequisite COURSEs
• INSTRUCTORs teach SECTIONs
• COURSEs are offered by DEPARTMENTs
• STUDENTs major in DEPARTMENTs
Note: The above could be expressed in the ENTITY-
RELATIONSHIP data model.

Slide 1-9
Main Characteristics of the Database Approach

• Self-describing nature of a database system: A DBMS catalog


stores the description of the database. The description is called
meta-data). This allows the DBMS software to work with
different databases.
• Insulation between programs and data: Called program-data
independence. Allows changing data storage structures and
operations without having to change the DBMS access programs.
• Data Abstraction: A data model is used to hide storage details
and present the users with a conceptual view of the database.

Slide 1-10
Main Characteristics of the Database Approach

• Support of multiple views of the data: Each user may see


a different view of the database, which describes only
the data of interest to that user.
• Sharing of data and multiuser transaction processing :
allowing a set of concurrent users to retrieve and to
update the database. Concurrency control within the
DBMS guarantees that each transaction is correctly
executed or completely aborted. OLTP (Online
Transaction Processing) is a major part of database
applications.

Slide 1-11
Types of Databases and Database Applications
• Numeric and Textual Databases
• Multimedia Databases
• Geographic Information Systems (GIS)
• Data Warehouses
• Real-time and Active Databases

Slide 1-12
Database Users
Users may be divided into those who actually use and control the content
(called “Actors on the Scene”) and those who enable the database to be
developed and the DBMS software to be designed and implemented (called
“Workers Behind the Scene”).
Actors on the scene
• Database administrators: responsible for authorizing access to the
database, for coordinating and monitoring its use, acquiring software,
and hardware resources, controlling its use and monitoring efficiency of
operations.
• Database Designers: responsible to define the content, the structure,
the constraints, and functions or transactions against the database. They
must communicate with the end-users and understand their needs.
• End-users: they use the data for queries, reports and some of them
actually update the database content.

Slide 1-13
Categories of End-users
• Casual : access database occasionally when needed
• Naïve or Parametric : they make up a large section of the end-
user population. They use previously well-defined functions in
the form of “canned transactions” against the database.
Examples are bank-tellers or reservation clerks who do this
activity for an entire shift of operations.
• Sophisticated : these include business analysts, scientists,
engineers, others thoroughly familiar with the system
capabilities. Many use tools in the form of software packages that
work closely with the stored database.
• Stand-alone : mostly maintain personal databases using ready-
to-use packaged applications. An example is a tax program user
that creates his or her own internal database.

Slide 1-14
Workers behind the Scene
• DBMS system designers and implementers design and
implement the DBMS modules and interfaces as a software
package.
• Tool developers design and implement tools—the software
packages that facilitate database modeling and design,
database system design, and improved performance.
• Operators and maintenance personnel (system
administration personnel) are responsible for the actual
running and maintenance of the hardware and software
environment for the database system.

15
Advantages of Using the Database Approach
• Controlling redundancy in data storage and in development and
maintenance efforts.
• Sharing of data among multiple users.
• Restricting unauthorized access to data.
• Providing persistent storage for program Objects .
• Providing Storage Structures for efficient Query Processing
• Providing backup and recovery services.
• Providing multiple interfaces to different classes of users.
• Representing complex relationships among data.
• Enforcing integrity constraints on the database.
• Drawing Inferences and Actions using rules

Slide 1-16
Additional Implications of Using the Database Approach
• Potential for enforcing standards: this is very crucial for the
success of database applications in large organizations
Standards refer to data item names, display formats, screens,
report structures, meta-data (description of data) etc.
• Reduced application development time: incremental time to add
each new application is reduced.
• Flexibility to change data structures: database structure may
evolve as new requirements are defined.
• Availability of up-to-date information – very important for on-
line transaction systems such as airline, hotel, car reservations.
• Economies of scale: by consolidating data and applications
across departments wasteful overlap of resources and personnel
can be avoided.

Slide 1-17
Extending Database Capabilities
• New functionality is being added to DBMSs in
the following areas:
• Scientific Applications
• Image Storage and Management
• Audio and Video data management
• Data Mining
• Spatial data management
• Time Series and Historical Data Management

Slide 1-18
When not to use a DBMS
• Main inhibitors (costs) of using a DBMS:
• High initial investment and possible need for additional hardware.
• Overhead for providing generality, security, concurrency control, recovery, and
integrity functions.
• When a DBMS may be unnecessary:
• If the database and applications are simple, well defined, and not expected to
change.
• If there are stringent real-time requirements that may not be met because of
DBMS overhead.
• If access to data by multiple users is not required.
• When no DBMS may suffice:
• If the database system is not able to handle the complexity of data because of modeling
limitations
• If the database users need special operations not supported by the DBMS.

Slide 1-19
Data Models
• Data Model: A set of concepts to describe the structure of a
database, and certain constraints that the database should
obey.
• Data Model Operations: Operations for specifying database
retrievals and updates by referring to the concepts of the data
model. Operations on the data model may include basic
operations and user-defined operations.

Slide 2-20
Categories of data models
• Conceptual (high-level, semantic) data models: Provide
concepts that are close to the way many users perceive
data. (Also called entity-based or object-based data
models.)
• Physical (low-level, internal) data models: Provide
concepts that describe details of how data is stored in the
computer.
• Implementation (representational) data models:
Provide concepts that fall between the above two,
balancing user views with some computer storage
details.

Slide 2-21
History of Data Models
Relational Model:
• proposed in 1970 by E.F. Codd (IBM),
first commercial system in 1981-82.
Now in several commercial products
(DB2, ORACLE, SQL Server, SYBASE,
INFORMIX).
• The relational model represents the
database as a collection of relations.
A relation is nothing but a table of
values. Every row in the table
represents a collection of related
data values. These rows in the table
denote a real-world entity or
relationship.

22
Network model
• Is a database model that is designed as a
flexible approach to representing objects
and their relationships. A unique feature
of the network model is its schema,
which is viewed as a graph where
relationship types are arcs and object
types are nodes.
• the first one to be implemented by
Honeywell in 1964-65 (IDS System).
Adopted heavily due to the support by
CODASYL (CODASYL - DBTG report of
1971). Later implemented in a large
variety of systems - IDMS (Cullinet -
now CA), DMS 1100 (Unisys), IMAGE
(H.P.), VAX -DBMS (Digital Equipment
Corp.).

23
Network Model
• ADVANTAGES:
• Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.
• Can handle most situations for modeling using record types and
relationship types.
• Language is navigational; uses constructs like FIND, FIND member,
FIND owner, FIND NEXT within set, GET etc. Programmers can do
optimal navigation through the database.
• DISADVANTAGES:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through a set
of records. Little scope for automated "query optimization”

Slide 2-24
Hierarchical database model
• is a data model in which the data are • implemented in a joint effort by IBM
organized into a tree-like structure. and North American Rockwell
The data are stored as records which around 1965. Resulted in the IMS
are connected to one another family of systems. The most popular
through links. A record is a collection model.
of fields, with each field containing
only one value. The type of a record
defines which fields the record
contains.
• The hierarchical database model
mandates that each child record has
only one parent, whereas each parent
record can have one or more child
records. In order to retrieve data from
a hierarchical database the whole tree
needs to be traversed starting from
the root node.

25
• ADVANTAGES: Hierarchical Model
• It promotes data sharing.
• Parent/child relationship
• Promotes conceptual simplicity.
• Database security is provided and enforced by DBMS.
• Parent/child relationship promotes data integrity.
• It is efficient with 1:M relationships.
DISADVANTAGES:
• Complex implementation requires knowledge of physical data storage
characteristics.
• Navigational system yields complex application development, management,
and use; requires knowledge of hierarchical path.
• Changes in structure require changes in all application programs.
• There are implementation limitations (no multiparent or M:N
relationships).
• There is no data definition or data manipulation language in the DBMS.
• There is a lack of standards.

Slide 2-26
Object Oriented (OO) Data Model
• Increasingly complex real-world
problems demonstrated a need
for a data model that more
closely represented the real
world. In the object
oriented data model (OODM),
both data and their relationships
are contained in a single
structure known as an object.

27
Object relational model
• is a combination of a Object oriented database model and a
Relational database model. So, it supports objects, classes,
inheritance etc. just like Object Oriented models and has support
for data types, tabular structures etc. like Relational data model.
• One of the major goals of Object relational data model is to close
the gap between relational databases and the object oriented
practices frequently used in many programming languages such
as C++, C#, Java etc.
• Both Relational data models and Object oriented data models are
very useful. But it was felt that they both were lacking in some
characteristics and so work was started to build a model that was
a combination of them both. Hence, Object relational data model
was created as a result of research that was carried out in the
1990’s.

Slide 2-28
Schemas versus Instances
• Database Schema: A database schema is the skeleton structure that represents
the logical view of the entire database. It defines how the data is organized and
how the relations among them are associated. It formulates all the constraints
that are to be applied on the data.
• The description of a database. Includes descriptions of the database structure
and the constraints that should hold on the database.
• Schema Diagram: A diagrammatic display of (some aspects of) a database
schema.
• Schema Construct: A component of the schema or an object within the schema,
e.g., STUDENT, COURSE.
• Database Instance: The actual data stored in a database at a particular moment
in time. Also called database state (or occurrence).

Slide 2-29
Database Schema Vs. Database State
• Database State: Refers to the content of a database at a moment in
time.
• Initial Database State: Refers to the database when it is loaded
• Valid State: A state that satisfies the structure and constraints of the
database.
• Distinction
• The database schema changes very infrequently. The database
state changes every time the database is updated.
• Schema is also called intension, whereas state is called extension.

Slide 2-30
Example of a Database Schema

31
Example of a database state

32
Three-Schema Architecture
• Proposed to support DBMS characteristics of:

• Program-data independence.
Data Independence is the property of DBMS that helps
you to change the Database schema at one level of a
database system without requiring to change the
schema at the next higher level.

• Support of multiple views of the data.

Slide 2-33
Three-Schema Architecture

34
Ex:
Type of Schema Implementation

External Schema View 1: Course


info(cid:int,cname:string)
View 2: studeninfo(id:int.
name:string)

Conceptual Shema Students(id: int, name: string,


login: string, age: integer)
Courses(id: int, cname.string,
credits:integer) Enrolled(id:
int, grade:string)

Physical Schema •Relations stored as unordered


files.
•Index on the first column of
Students.

35
Three-Schema Architecture

• Defines DBMS schemas at three levels:


• Internal schema at the internal level to describe physical
storage structures and access paths. Typically uses a
physical data model.
• Conceptual schema at the conceptual level to describe
the structure and constraints for the whole database for a
community of users. Uses a conceptual or an
implementation data model.
• External schemas at the external level to describe the
various user views. Usually uses the same data model as
the conceptual level.

Slide 2-36
Three-Schema Architecture
Mappings among schema levels are needed to transform
requests and data. Programs refer to an external schema, and
are mapped by the DBMS to the internal schema for execution.

Data Independence:
• Logical Data Independence: The capacity to change the
conceptual schema without having to change the external
schemas and their application programs.
• Physical Data Independence: The capacity to change the
internal schema without having to change the conceptual
schema.

Slide 2-37
Data Independence

When a schema at a lower level is changed, only the


mappings between this schema and higher-level
schemas need to be changed in a DBMS that fully
supports data independence. The higher-level schemas
themselves are unchanged. Hence, the application
programs need not be changed since they refer to the
external schemas.

Slide 2-38
DBMS Languages
• Data Definition Language (DDL): Used by the DBA and
database designers to specify the conceptual schema of a
database. In many DBMSs, the DDL is also used to define
internal and external schemas (views). In some DBMSs,
separate storage definition language (SDL) and view
definition language (VDL) are used to define internal and
external schemas.
• Data Manipulation Language (DML): Used to specify
database retrievals and updates.
• DML commands (data sublanguage) can be embedded in a
general-purpose programming language (host language),
such as COBOL, C or an Assembly Language.
• Alternatively, stand-alone DML commands can be applied
directly (query language).

Slide 2-39
DBMS Languages

Slide 2-40
DBMS Interfaces
• Stand-alone query language interfaces.
• Programmer interfaces for embedding DML in programming
languages:
• Pre-compiler Approach
• Procedure (Subroutine) Call Approach
• User-friendly interfaces:
• Menu-based, popular for browsing on the web
• Forms-based, designed for naïve users
• Graphics-based (Point and Click, Drag and Drop etc.)
• Natural language: requests in written English
• Combinations of the above

Slide 2-41
Other DBMS Interfaces

• Speech Input and Output


• Web Browser as an interface
• Parametric interfaces (e.g., bank tellers) using
function keys.
• Interfaces for the DBA:
• Creating accounts, granting authorizations
• Setting system parameters
• Changing schemas or access path

Slide 2-42
Database System Utilities
• To perform certain functions such as:
• Loading data stored in files into a database. Includes
data conversion tools.
• Backing up the database periodically on tape.
• Reorganizing database file structures.
• Report generation utilities.
• Performance monitoring utilities.
• Other functions, such as sorting, user monitoring, data
compression, etc.

Slide 2-43
Database system environment

1) DBMS Component Modules


2) Centralized and Client-Server Architectures

44
Typical DBMS Component Modules

45
Centralized and Client-Server Architectures

• Centralized DBMS: combines everything into single


system including- DBMS software, hardware, application
programs and user interface processing software.

Slide 2-46
•Basic Client-Server Architectures:

Specialized Servers with Specialized functions


Clients
DBMS Server

Specialized Servers with Specialized functions:


• File Servers
• Printer Servers
• Web Servers
• E-mail Servers

Slide 2-47
Clients:
•Provide appropriate interfaces and a client-version of the system to
access and utilize the server resources.
•Clients maybe diskless machines or PCs or Workstations with disk
with only the client software installed.
•Connected to the servers via some form of a network
(LAN: local area network, wireless network, etc.)
DBMS Server
• Provides database query and transaction services to the
clients
• Sometimes called query and transaction servers

Slide 2-48
Two Tier Client-Server Architecture
•User Interface Programs and Application Programs run on the client
side
•Interface called ODBC (Open Database Connectivity – see Ch 9)
provides an Application program interface (API) allow client side
programs to call the DBMS. Most DBMS vendors provide ODBC
drivers.
• A client program may connect to several DBMSs.
• Other variations of clients are possible: e.g., in some DBMSs,
more functionality is transferred to clients including data
dictionary functions, optimization and recovery across multiple
servers, etc. In such situations the server may be called the Data
Server.

49
Three Tier Client-Server Architecture
• Common for Web applications
• Intermediate Layer called Application Server or Web Server:
• stores the web connectivity software and the rules and
business logic (constraints) part of the application used to
access the right amount of data from the database server
• acts like a conduit for sending partially processed data
between the database server and the client.
• Additional Features- Security:
• encrypt the data at the server before transmission
• decrypt data at the client

Slide 2-50
51
Classification of DBMSs
• Based on the data model used:
• Traditional: Relational, Network, Hierarchical.
• Emerging: Object-oriented, Object-relational.
• Other classifications:
• Single-user (typically used with micro- computers) vs.
multi-user (most DBMSs).
• Centralized (uses a single computer with one database) vs.
distributed (uses multiple computers, multiple databases)

• Distributed Database Systems have now come to be


known as client server based database systems because they
do not support a totally distributed environment, but rather
a set of database servers supporting a set of clients.
• Etc…….

Slide 2-52
Data Modelling using
Entities and Relationships

53
ER Diagram
• The ER or (Entity Relational Model) is a high-level conceptual data model diagram.
Entity-Relation model is based on the notion of real-world entities and the
relationship between them.
• Helps you to define terms related to entity relationship modeling
• Provide a preview of how all your tables should connect, what fields are going to be
on each table
• Helps to describe entities, attributes, relationships
• ER diagrams are translatable into relational tables which allows you to build
databases quickly
• ER diagrams can be used by database designers as a blueprint for implementing data
in specific software applications
• The database designer gains a better understanding of the information to be
contained in the database with the help of ERP diagram
• ERD is allowed you to communicate with the logical structure of the database to
users

54
Components of the ER Diagram
This model is based on three basic concepts:
• Entities Attributes Relationships

55
Example COMPANY Database
We need to create a database schema design based on the following (simplified)
Requirements of the COMPANY Database:
• The company is organized into DEPARTMENTs. Each department
has a name, number and an employee who manages the
department. We keep track of the start date of the department
manager. A department may have several locations.
• Each department controls a number of PROJECTs. Each project has
a unique name, unique number and is located at a single location.
• We store each EMPLOYEE’s social security number, address, salary,
sex, and birthdate.
• Each employee works for one department but may work on several projects.
• We keep track of the number of hours per week that an employee currently
works on each project.
• We also keep track of the direct supervisor of each employee.

56
• Each employee may have a number of DEPENDENTs.
• For each dependent, we keep track of their name, sex,
birthdate, and relationship to the employee.
• Entities and Attributes
• Entities are specific objects or things in the mini-world that are represented in
the database.
• For example the EMPLOYEE John Smith, the Research DEPARTMENT, the ProductX PROJECT
• Attributes are properties used to describe an entity.
• For example an EMPLOYEE entity may have the attributes Name, SSN, Address, Sex,
BirthDate
• A specific entity will have a value for each of its attributes.
• For example a specific employee entity may have Name='John Smith', SSN='123456789',
Address ='731, Fondren, Houston, TX', Sex='M', BirthDate='09-JAN-55‘
• Each attribute has a value set (or data type) associated with it – e.g. integer,
string, subrange, enumerated type, …

57
Types of Attributes

58
Types of Attributes Examples
• Simple
• Each entity has a single atomic value for the attribute. For example,
SSN or Sex.
• Composite
• The attribute may be composed of several components. For
example, Address (Apt#, House#, Street, City, State, ZipCode,
Country) or Name (FirstName, MiddleName, LastName).
Composition may form a hierarchy where some components are
themselves composite.
• Multi-valued
• An entity may have multiple values for that attribute. For example,
Color of a CAR or PreviousDegrees of a STUDENT. Denoted as
{Color} or {PreviousDegrees}.

Chapter 3-59
Types of Attributes (2)

• In general, composite and multi-valued attributes may be


nested arbitrarily to any number of levels although this is
rare.
• For example, PreviousDegrees of a STUDENT is a
composite multi-valued attribute denoted by
{PreviousDegrees (College, Year, Degree, Field)}.

Chapter 3-60
Entity Types and Key Attributes
• Entities with the same basic attributes are grouped or typed into
an entity type.
• For example, the EMPLOYEE entity type or the PROJECT entity type.
• An attribute of an entity type for which each entity must have a
unique value is called a key attribute of the entity type.
• For example, SSN of EMPLOYEE.
• A key attribute may be composite.
• For example, VehicleTagNumber is a key of the CAR entity type with
components (Number, State).
• An entity type may have more than one key.
• For example, the CAR entity type may have two keys:
• VehicleIdentificationNumber (popularly called VIN) and
• VehicleTagNumber (Number, State), also known as license_plate number.

Chapter 3-61
Entity Type CAR with two keys and a corresponding
Entity Set

62
Entity Set
• Each entity type will have a collection of entities
stored in the database
• Called the entity set
• Previous slide shows three CAR entity instances
in the entity set for CAR
• Same name (CAR) used to refer to both the entity
type and the entity set
• Entity set is the current state of the entities of
that type that are stored in the database

63
Initial Design of Entity Types for the COMPANY
Database Schema
• Based on the requirements, we can identify four initial entity
types in the COMPANY database:
• DEPARTMENT
• PROJECT
• EMPLOYEE
• DEPENDENT
• Their initial design is shown on the following slide
• The initial attributes shown are derived from the requirements
description

64
Initial Design of Entity Types:
EMPLOYEE, DEPARTMENT, PROJECT, DEPENDENT

65
Refining the initial design by introducing relationships
• ER model has three main concepts:
• Entities (and their entity types and entity sets)
• Attributes (simple, composite, multivalued)
• Relationships (and their relationship types and relationship sets)
• A relationship relates two or more distinct entities
with a specific meaning.
• For example, EMPLOYEE John Smith works on the ProductX PROJECT, or
EMPLOYEE Franklin Wong manages the Research DEPARTMENT.

• Relationships of the same type are grouped or typed


into a relationship type.
• For example, the WORKS_ON relationship type in which EMPLOYEEs and
PROJECTs participate, or the MANAGES relationship type in which EMPLOYEEs
and DEPARTMENTs participate.

66
Relationship instances of the WORKS_FOR N:1 relationship
between EMPLOYEE and DEPARTMENT

67
Relationship instances of the M:N WORKS_ON
relationship between EMPLOYEE and PROJECT

68
• Relationship Type:
• Is the schema description of a relationship
• Identifies the relationship name and the participating entity
types
• Also identifies certain relationship constraints
• Relationship Set:
• The current set of relationship instances represented in the
database
• The current state of a relationship type

• In ER diagrams, we represent the relationship type as follows:


• Diamond-shaped box is used to display a relationship type
• Connected to the participating entity types via straight lines

69
Refining the COMPANY database schema by
introducing relationships
• By examining the requirements, six relationship types are
identified
• All are binary relationships( degree 2)
• Listed below with their participating entity types:
• WORKS_FOR (between EMPLOYEE, DEPARTMENT)
• MANAGES (also between EMPLOYEE, DEPARTMENT)
• CONTROLS (between DEPARTMENT, PROJECT)
• WORKS_ON (between EMPLOYEE, PROJECT)
• SUPERVISION (between EMPLOYEE (as subordinate),
• EMPLOYEE (as supervisor))
• DEPENDENTS_OF (between EMPLOYEE, DEPENDENT)

70
ER DIAGRAM

71
Recursive Relationship Type
• An relationship type whose with the same participating
entity type in distinct roles
• Example: the SUPERVISION relationship
• EMPLOYEE participates twice in two distinct roles:
• supervisor (or boss) role
• supervisee (or subordinate) role
• Each relationship instance relates two distinct
EMPLOYEE entities:
• One employee in supervisor role
• One employee in supervisee role

72
Weak Entity Types
• An entity that does not have a key attribute
• A weak entity must participate in an identifying relationship
type with an owner or identifying entity type
• Entities are identified by the combination of:
• A partial key of the weak entity type
• The particular entity they are related to in the identifying entity type
• Example:
• A DEPENDENT entity is identified by the dependent’s first name, and the specific
EMPLOYEE with whom the dependent is related
• Name of DEPENDENT is the partial key
• DEPENDENT is a weak entity type
• EMPLOYEE is its identifying entity type via the identifying relationship type
DEPENDENT_OF

73
Constraints on Relationships
• Constraints on Relationship Types(Also known as ratio
constraints)
• Cardinality Ratio (specifies maximum participation)
• One-to-one (1:1)
• One-to-many (1:N) or Many-to-one (N:1)
• Many-to-many (M:N)
• Existence Dependency Constraint (specifies minimum
• participation) (also called participation constraint)
• zero (optional participation, not existence-dependent)
• one or more (mandatory participation, existence-dependent)

74
Many-to-one (N:1) RELATIONSHIP

Chapter 3-75
Many-to-many (M:N) RELATIONSHIP

Chapter 3-76
A RECURSIVE RELATIONSHIP SUPERVISION

© The Benjamin/Cummings Publishing Company, Inc. 1994, Elmasri/Navathe, Fundamentals of Database Systems, Second Edition

Chapter 3-77
Alternative (min, max) notation for relationship structural constraints:
• Specified on each participation of an entity type E in a relationship type R
• Specifies that each entity e in E participates in at least min and at most max
relationship instances in R
• Default(no constraint): min=0, max=n (signifying no limit)
• Must have min<=max, min>=0, max >=1
• Derived from the knowledge of mini-world constraints
• Examples
• A department has exactly one manager and an employee can manage at
most one department.
• Specify (0,1) for participation of EMPLOYEE in MANAGES
• Specify (1,1) for participation of DEPARTMENT in MANAGES
• An employee can work for exactly one department but a department can
have any number of employees.
• Specify (1,1) for participation of EMPLOYEE in WORKS_FOR
• Specify (0,n) for participation of DEPARTMENT in WORKS_FOR

78
The (min,max) notation for relationship
constraints

79
COMPANY ER Schema Diagram using (min,
max) notation

80
Summary of notation for ER diagrams

81
82
83
ER DIAGRAM FOR A BANK
DATABASE

© The Benjamin/Cummings Publishing Company, Inc. 1994, Elmasri/Navathe, Fundamentals of Database Systems, Second Edition

Chapter 3-84
Relationships of Higher Degree
 Relationship types of degree 2 are called binary
 Relationship types of degree 3 are called ternary and
of degree n are called n-ary
 In general, an n-ary relationship is not equivalent to n
binary relationships
 Higher-order relationships discussed further in
Chapter 4

Chapter 3-85
Example of a ternary relationship

86
Some of the Currently Available Automated
Database Design Tools

87
Thank YOU

88
Database Management Systems
Module 2
Structured Query
Language(SQL)
&
The Relational Algebra

Presidency University, Bengaluru


Contents
 The Relational Data Model
 Characteristics Of Relations
 Types of Keys
 Relational Database Constraints
 Update Operations on Relations

 SQL data definition and datatypes


 Data Definition, Constraints, and Schema Changes
 Retrieval Queries in SQL
 Complex Queries, Triggers, Views
 Query Languages
 Relational Algebra
 Select Operation - Example
 Project Operation
 Set Opertions
 Rename Operation
 Binary relational Operations(Joins/Division)
 Queries

Slide 1-2
Relational Model Concepts
Example of a Relation

• Key of a Relation:
• Each row has a value of a data item (or set of items) that uniquely
identifies that row in the table Called the key
• In the STUDENT table, SSN is the key
• Sometimes row-ids or sequential numbers are assigned as keys to
identify the rows in a table Called artificial key or surrogate key

3
• The Schema (or description) of a Relation:
• Denoted by R(A1, A2, .....An)
• R is the name of the relation
• The attributes of the relation are A1, A2, ..., An
• Each attribute has a domain or a set of valid values.
• The attribute name designates the role played by a
domain in a relation
• A tuple is an ordered set of values (enclosed in
angled brackets ‘< … >’)
• A relation is a set of such tuples (rows)

4
Informal Terms Formal Terms
Table Relation
Column Header Attribute
All possible Column Domain
Values
Row Tuple
Table Definition Schema of a Relation
Populated Table State of the Relation

5
Characteristics Of Relations
• Ordering of tuples in a relation r(R):
• The tuples are not considered to be ordered, even though they appear to
be in the tabular form.
• Ordering of attributes in a relation schema R (and of values
within each tuple):
• We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1,
v2, ..., vn> to be ordered .
• (However, a more general alternative definition of relation
does not require this ordering).

6
• Values in a tuple:
• All values are considered atomic (indivisible).
• Each value in a tuple must be from the domain of the attribute
for that column
• If tuple t = <v1, v2, …, vn> is a tuple (row) in the relation state r of
R(A1, A2, …, An)
• Then each vi must be a value from dom(Ai)
• A special null value is used to represent values that are
unknown or inapplicable to certain tuples.

7
What are Keys in DBMS?
• KEYS in DBMS is an attribute or set of attributes which
helps you to identify a row(tuple) in a relation(table).
• They allow you to find the relation between two tables.
Keys help you uniquely identify a row in a table by a
combination of one or more columns in that table.
• Key is also helpful for finding unique record or row from
the table. Database key is also helpful for finding unique
record or row from the table.

8
Types of Keys
• Super Key - A super key is a group of single or multiple
keys which identifies rows in a table.
• Primary Key - is a column or group of columns in a table
that uniquely identify every row in that table.
• Candidate Key - is a set of attributes that uniquely
identify tuples in a table. Candidate Key is a super key
with no repeated attributes.
• Alternate Key - is a column or group of columns in a
table that uniquely identify every row in that table.

9
Types of Keys
• Foreign Key - is a column that creates a relationship
between two tables. The purpose of Foreign keys is to
maintain data integrity and allow navigation between
two different instances of an entity.
• Compound Key - has two or more attributes that allow
you to uniquely recognize a specific record. It is possible
that each column may not be unique by itself within the
database.
• Composite Key - An artificial key which aims to
uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created
when you don't have any natural primary key.

10
Constraints:
• Constraints are conditions that must hold on all valid
relation states.
• There are three main types of constraints in the relational
model:
• Key constraints
• Entity integrity constraints
• Referential integrity constraints
• Another implicit constraint is the domain constraint
• Every value in a tuple must be from the domain of its attribute
(or it could be null, if allowed for that attribute)

11
Key Constraints
• Superkey of R:
• Is a set of attributes SK of R with the following condition:
• No two tuples in any valid relation state r(R) will have the same
value for SK
• That is, for any distinct tuples t1 and t2 in r(R), t1[SK] ¹ t2[SK]
• This condition must hold in any valid state r(R)
• Key of R:
• A "minimal" superkey
• That is, a key is a superkey K such that removal of any attribute
from K results in a set of attributes that is not a superkey (does
not possess the superkey uniqueness property)

12
Key Constraints (continued)
• Example: Consider the CAR relation schema:
• CAR(State, Reg#, SerialNo, Make, Model, Year)
• CAR has two keys:
• Key1 = {State, Reg#}
• Key2 = {SerialNo}
• Both are also superkeys of CAR
• {SerialNo, Make} is a superkey but not a key.
• In general:
• Any key is a superkey (but not vice versa)
• Any set of attributes that includes a key is a superkey
• A minimal superkey is also a key

13
Key Constraints (continued)
• If a relation has several candidate keys, one is chosen arbitrarily to be
the primary key.
• The primary key attributes are underlined.
• Example: Consider the CAR relation schema:
• CAR(State, Reg#, SerialNo, Make, Model, Year)
• We chose SerialNo as the primary key
• The primary key value is used to uniquely identify each tuple in a relation
• Provides the tuple identity
• Also used to reference the tuple from another tuple
• General rule: Choose as primary key the smallest of the candidate keys
(in terms of size)
• Not always applicable – choice is sometimes subjective

14
COMPANY Database Schema

15
Entity Integrity
• The primary key attributes PK of each relation schema R in S
cannot have null values in any tuple of r(R).
• This is because primary key values are used to identify the
individual tuples.
• t[PK]  null for any tuple t in r(R)
• If PK has several attributes, null is not allowed in any of
these attributes
• Note: Other attributes of R may be constrained to disallow null
values, even though they are not members of the primary key.

16
Referential Integrity
• A constraint involving two relations
• The previous constraints involve a single relation.
• Used to specify a relationship among tuples in two relations:
• The referencing relation and the referenced relation.
• Tuples in the referencing relation R1 have attributes FK (called
foreign key attributes) that reference the primary key attributes
PK of the referenced relation R2.
• A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
• A referential integrity constraint can be displayed in a relational
database schema as a directed arc from R1.FK to R2.

17
Referential Integrity Constraints for COMPANY database

18
Populated database state
• Each relation will have many tuples in its current relation
state
• The relational database state is a union of all the
individual relation states
• Whenever the database is changed, a new state arises
• Basic operations for changing the database:
• INSERT a new tuple in a relation
• DELETE an existing tuple from a relation
• MODIFY an attribute of an existing tuple

19
Populated database state for COMPANY

20
Update Operations on Relations
• INSERT a tuple.
• DELETE a tuple.
• MODIFY a tuple.
• Integrity constraints should not be violated by the update operations.
• Several update operations may have to be grouped together.
• Updates may propagate to cause other updates automatically. This may be
necessary to maintain integrity constraints.
In case of integrity violation, several actions can be taken:
• Cancel the operation that causes the violation (RESTRICT or REJECT option)
• Perform the operation but inform the user of the violation
• Trigger additional updates so the violation is corrected (CASCADE option, SET
NULL option)
• Execute a user-specified error-correction routine

21
Possible violations for each operation
• INSERT may violate any of the constraints:
• Domain constraint:
• if one of the attribute values provided for the new tuple is not of the
specified attribute domain
• Key constraint:
• if the value of a key attribute in the new tuple already exists in
another tuple in the relation
• Referential integrity:
• if a foreign key value in the new tuple references a primary key value
that does not exist in the referenced relation
• Entity integrity:
• if the primary key value is null in the new tuple

22
Possible violations for each operation (Cont..)
• DELETE may violate only referential integrity:
• If the primary key value of the tuple being deleted is referenced
from other tuples in the database
• Can be remedied by several actions: RESTRICT, CASCADE, SET NULL
• RESTRICT option: reject the deletion
• CASCADE option: propagate the new primary key value into the foreign keys of the
referencing tuples
• SET NULL option: set the foreign keys of the referencing tuples to NULL
• One of the above options must be specified during database design for each
foreign key constraint

23
Possible violations for each operation (Cont..)
• UPDATE may violate domain constraint and NOT NULL
constraint on an attribute being modified
• Any of the other constraints may also be violated,
depending on the attribute being updated:
• Updating the primary key (PK):
• Similar to a DELETE followed by an INSERT
• Need to specify similar options to DELETE
• Updating a foreign key (FK):
• May violate referential integrity
• Updating an ordinary attribute (neither PK nor FK):
• Can only violate domain constraints

24
SQL DATA DEFINITION AND DATATYPES
SQL Schema:
An SQL schema is identified by a schema name and includes an authorization identifier
name to indicate user or account who owns the schema, as well as descriptor for each
elements in the schema.
Schema creation with authorization:
CREATE SCHEMA DATABASE_NAME AUTHORIZATION IDENTIFIER;
Eg: CREATE SCHEMA COMPANY AUTHORIZATION ‘JSMITH’;
Catalog: Named collection of schemas.
Data Definition, Constraints, and Schema Changes
• Used to CREATE, DROP, and ALTER the descriptions of the tables (relations) of a database
Creating a Database
• Syntax: CREATE DATABASE database_name;
Creating a Table
• Syntax: CREATE TABLE table_name (Column_name datatype[(size)], Column_name
datatype[(size)], );
CREATE TABLE
• Specifies a new base relation by giving it a name, and specifying each of its
attributes and their data types (INTEGER, FLOAT, DECIMAL(i,j), CHAR(n),
VARCHAR(n))
• A constraint NOT NULL may be specified on an attribute
CREATE TABLE DEPARTMENT (
DNAME VARCHAR(10) NOT NULL,
DNUMBER INTEGER NOT NULL,
MGRSSN CHAR(9),
MGRSTARTDATE CHAR(9) );
• In SQL, can use the CREATE TABLE command for specifying the primary key attributes,
secondary keys, and referential integrity constraints (foreign keys).
• Key attributes can be specified via the PRIMARY KEY and UNIQUE phrases
CREATE TABLE DEPT (
DNAME VARCHAR(10) NOT NULL,
DNUMBER INTEGER NOT NULL,
MGRSSN CHAR(9),
MGRSTARTDATE CHAR(9),
PRIMARY KEY (DNUMBER),
UNIQUE (DNAME),
FOREIGN KEY (MGRSSN) REFERENCES EMP );

Slide 8- 26
Create tables for Company Database
COMPANY ER Schema Diagram

Chapter 3-28
DROP TABLE
• Used to remove a relation (base table) and its definition
• The relation can no longer be used in queries, updates, or any other commands
since its description no longer exists
• Example:

DROP TABLE DEPENDENT;

ALTER TABLE
• Used to add an attribute to one of the base relations
• The new attribute will have NULLs in all the tuples of the relation right
after the command is executed; hence, the NOT NULL constraint is not
allowed for such an attribute
• Example:
ALTER TABLE EMPLOYEE ADD JOB VARCHAR(12);

• The database users must still enter a value for the new attribute JOB for each
EMPLOYEE tuple.
• This can be done using the UPDATE command.

Slide 8- 29
Attribute Data Types and Domains in SQL
Following broad categories of data types exist in most databases:
• String Data
• Numeric Data
• Temporal Data
• Bit String
• Boolean
• DDL - String Data
• Fixed Length:
• Occupies the same length of space in memory no matter how much data is stored in
them.
• Syntax: char(n) where n is the length of the String
e.g. name char(50)
• If the variable stored for name is ‘Presidency’ the extra 40 fields are padded with
blanks
DDL - Numeric Data Types
• Store all the data related to purely numeric data.
• Some numeric data may also be stored as a character field e.g. zip codes
• Common Numeric Types:
• Decimal Floating point number
• Float Floating point number
• Integer(size) Integer of specified length
• Money A number which contains exactly two digits
after the decimal point
• Number A standard number field that can
hold a floating point data

31
DDL - Temporal Data Types
• These represent the dates and time:
• Three basic types are supported:
• Dates
• Times
• Date-Time Combinations
• MySQL comes with the following data types for storing a
date or a date/time value in the database:
• DATE - format YYYY-MM-DD Bit String :
• DATETIME - format: YYYY-MM-DD HH:MI:SS
• TIMESTAMP - format: YYYY-MM-DD HH:MI:SS • BIT(n)- Fixed Length(n)
• BIT varying(n) -Varying
• YEAR - format YYYY or YY length string
Boolean:
• Takes value true or false.
• Because of the presence of
NULL values Boolean also
32
take the value unknown.
Constraints in SQL:
• NOT NULL - Ensures that a column cannot have a NULL value
• UNIQUE - Ensures that all values in a column are different
• PRIMARY KEY - A combination of a NOT NULL and UNIQUE.
Uniquely identifies each row in a table
• FOREIGN KEY - Uniquely identifies a row/record in another table
• CHECK - Ensures that all values in a column satisfies a specific
condition
• DEFAULT - Sets a default value for a column when no value is
specified

33
DDL - Specifying Keys- Introduction
• Unique keyword is used to specify keys.
• This ensures that duplicate rows are not created in
the database.
• Both Primary keys and Candidate Keys can be specified in the database.
• Once a set of columns has been declared unique any data entered that
duplicates the data in these columns is rejected.
• Specifying a single column as unique:
Example:
CREATE TABLE Student
(snum Number,
sname varchar(20),
major varchar(10),
level char(2),
UNIQUE (name));

34
DDL - Specifying Keys- Multiple Columns
• Specifying multiple columns as unique:
• Example:
CREATE TABLE Student
(snum Number,
sname varchar(20),
major varchar(10),
level varchar(10),
UNIQUE(snum, sname));
• Here both name and snum combination are declared as candidate keys.

35
DDL - Specifying Keys- Primary Key

• Specifying multiple columns as unique:


• To specify the Primary Key the Primary Key clause is
used
• Example:
CREATE TABLE Student
( snum Number,
sname varchar(20),
major varchar(10),
level
varchar(10),
DOB DATE,
PRIMARY KEY (snum),
UNIQUE (sname),
);

36
DDL - Specifying Keys- Single and MultiColumn Keys

• Single column keys can be defined at the column level instead of at the table level at
the end of the field descriptions.
• MultiColumn keys still need to be defined separately at the table level
CREATE TABLE Student ( snum int PRIMARY KEY,
Sname varchar(20) UNIQUE,
major varchar(10),
level varchar(10),
DOB date,
Unique(DOB,MAJOR));

DDL - Specifying Keys- Foreign Keys


• References clause is used to create a relationship between a set of columns in one
table and a candidate key(primary key) in the table that is being referenced.

• Example: CREATE TABLE ENROLL


( cname varchar(20) references Course(cname) ,
snum int REFERENCES Student(snum));

Creates a relationship between Student and Course.


37
The Mgr_ssn Example
CREATE TABLE DEPARTMENT (

Mgr_ssn CHAR(9),

FOREIGN KEY (Mgr_ssn)
REFERENCES EMPLOYEE (Ssn)
ON DELETE ???
ON UPDATE ???
);

38
Referential Integrity Options
• Causes of referential integrity violation for a foreign key FK (consider the
Mgr_ssn of DEPARTMENT).
• On Delete: when deleting the foreign tuple
• What to do when deleting the manager tuple in EMPLOYEE ?
• On Update: when updating the foreign tuple
• What to do when updating/changing the SSN of the manager tuple in
EMPLOYEE is changed ?
• Actions when the above two causes occur.
• Set Null: the Mgr_ssn is set to null.
• Set Default: the Mgr_ssn is set to the default value.
• Cascade: the Mgr_ssn is updated accordingly
• If the manager is deleted, the department is also deleted.

39
Referential Integrity Options
An Example:
Create table EMP(

ESSN CHAR(9),
DNO INTEGER DEFAULT 1,
SUPERSSN CHAR(9),
PRIMARY KEY (ESSN),
FOREIGN KEY (DNO) REFERENCES DEPT
ON DELETE SET DEFAULT
ON UPDATE CASCADE,
FOREIGN KEY (SUPERSSN) REFERENCES EMP ON DELETE SET NULL ON
UPDATE CASCADE);

40
DDL - Constraints- Disallowing Null Values
Disallowing Null Values:
• Null values entered into a column means that the data in
not known.
• These can cause problems in Querying the database.
• Specifying Primary Key automatically prevents null being
entered in columns which specify the primary key
• Not Null clause is used in preventing null values from being entered in a column.

• Example:
CREATE TABLE Student
( snum number PRIMARY KEY,
sname varchar(20) NOT NULL,
major varchar(10) NOT NULL,
level varchar(10) NOT NULL
DOB date);
• Null clause can be used to explicitly allow null values in a column
also
41
DDL - Constraints- Value Constraints
Value Constraints:
• Allows value inserted in the column to be checked
condition in the column constraint.
• Check clause is used to create a constraint in SQL
• Example:
CREATE TABLE STUDENT
(snum NumberPRIMARY KEY,
sname varchar(20),
Age Number check (Age > = 50000));

• Table level constraints can also be defined using the Constraint keyword
Example:
CREATE TABLE STUDENT
(SNUM Number PRIMARY KEY,
sname varchar(20) not null,
age Number ,
CONSTRAINT age_constraint Check (age between 17 and 22));

Such constraints can be activated and deactivated as required.


42
DDL - Constraints- Default Value
Default Value:
• A default value can be inserted in any
column by using the Default keyword.
Example:

CREATE TABLE STUDENT(


SNUM NUMBER NOT NULL UNIQUE PRIMARY KEY,
SNAME varchar(20) NOT NULL,
MAJOR VARCHAR(10) DEFAULT ‘CSE’,
DOB DATE DEFAULT NULL );

43
DDL -Constraints- AUTO INCREMENT
• Auto-increment allows a unique number to be generated automatically when a new
record is inserted into a table.
• Often this is the primary key field that we would like to be created automatically every
time a new record is inserted.
• The following SQL statement defines the “SNUM" column to be an auto-increment
primary key field in the "Persons" table:
• CREATE TABLE STUDENT(
SNUM int NOT NULL AUTO_INCREMENT,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
PRIMARY KEY (SNUM));

44
continued..
• MySQL uses the AUTO_INCREMENT keyword to perform an auto-increment
feature.
• By default, the starting value for AUTO_INCREMENT is 1, and it will
increment by 1 for each new record.
• To let the AUTO_INCREMENT sequence start with another value, use the
following SQL statement:
• ALTER TABLE STUDENT AUTO_INCREMENT=100;
• To insert a new record into the “STUDENT" table, we will NOT have to
specify a value for the “SNUM" column (a unique value will be added
automatically):
• INSERT INTO STUDENT (Sname,Major, Level,DOB)
VALUES ('Lakshman', ‘IS‘, ’JR’, ‘2001-05-01’);

45
Retrieval Queries in SQL
• SQL has one basic statement for retrieving information from a database; the
SELECT statement
• This is not the same as the SELECT operation of the relational algebra
• Important distinction between SQL and the formal relational model:
• SQL allows a table (relation) to have two or more tuples that are identical in all their
attribute values
• Hence, an SQL relation (table) is a multi-set (sometimes called a bag) of tuples; it is
not a set of tuples
• SQL relations can be constrained to be sets by specifying PRIMARY KEY or UNIQUE
attributes, or by using the DISTINCT option in a query
• A bag or multi-set is like a set, but an element may appear more than once.
• Example: {A, B, C, A} is a bag. {A, B, C} is also a bag that also is a set.
• Bags also resemble lists, but the order is irrelevant in a bag.
• Example:
• {A, B, A} = {B, A, A} as bags
• However, [A, B, A] is not equal to [B, A, A] as lists

Slide 8- 47
Retrieval Queries in SQL (contd.)
• Basic form of the SQL SELECT statement is called a mapping or a SELECT-
FROM-WHERE block

SELECT <attribute list>


FROM <table list>
WHERE <condition>

• <attribute list> is a list of attribute names whose


values are to be retrieved by the query
• <table list> is a list of the relation names required to
process the query
• <condition> is a conditional (Boolean) expression
that identifies the tuples to be retrieved by the query

Slide 8- 48
Relational Database Schema

Slide 8- 49
Populated Database

Slide 8- 50
Simple SQL Queries
• Basic SQL queries correspond to using the following operations of the
relational algebra:
• SELECT
• PROJECT
• JOIN
• All subsequent examples use the COMPANY database
• Example of a simple query on one relation
• Query 0: Retrieve the birthdate and address of the employee whose name is 'John B.
Smith'.
Q0: SELECT BDATE, ADDRESS
FROM EMPLOYEE
WHERE FNAME='John' AND MINIT='B’
AND LNAME='Smith’;

• Similar to a SELECT-PROJECT pair of relational algebra operations:


• The SELECT-clause specifies the projection attributes and the WHERE-clause
specifies the selection condition
• However, the result of the query may contain duplicate tuples

Slide 8- 51
Simple SQL Queries (contd.)

• Query 1: Retrieve the name and address of all employees who work for the
'Research' department.

Q1: SELECT FNAME, LNAME, ADDRESS


FROM EMPLOYEE, DEPARTMENT
WHERE DNAME='Research' AND DNUMBER=DNO;

• Similar to a SELECT-PROJECT-JOIN sequence of relational algebra operations


• (DNAME='Research') is a selection condition (corresponds to a SELECT
operation in relational algebra)
• (DNUMBER=DNO) is a join condition (corresponds to a JOIN operation in
relational algebra)

Slide 8- 52
Simple SQL Queries (contd.)
• Query 2: For every project located in 'Stafford', list the project number, the
controlling department number, and the department manager's last name,
address, and birthdate.

Q2: SELECT PNUMBER, DNUM, LNAME, BDATE, ADDRESS


FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE DNUM=DNUMBER AND MGRSSN=SSN
AND PLOCATION='Stafford‘;

• In Q2, there are two join conditions


• The join condition DNUM=DNUMBER relates a project to
its controlling department
• The join condition MGRSSN=SSN relates the controlling
department to the employee who manages that
department.

Slide 8- 53
Aliases, * and DISTINCT, Empty WHERE-clause

• In SQL, we can use the same name for two (or more) attributes as long as the
attributes are in different relations
• A query that refers to two or more attributes with the same name must
qualify the attribute name with the relation name by prefixing the relation
name to the attribute name
• Example:

• EMPLOYEE.LNAME, DEPARTMENT.DNAME

Slide 8- 54
ALIASES
• Some queries need to refer to the same relation twice
• In this case, aliases are given to the relation name
• Query 8: For each employee, retrieve the employee's name, and the name of his or
her immediate supervisor.

Q8: SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME


FROM EMPLOYEE E S
WHERE E.SUPERSSN=S.SSN;

• In Q8, the alternate relation names E and S are called


aliases or tuple variables for the EMPLOYEE relation
• We can think of E and S as two different copies of
EMPLOYEE; E represents employees in role of supervisees
and S represents employees in role of supervisors

Slide 8- 55
ALIASES (contd.)

• Aliasing can also be used in any SQL query for


convenience
• Can also use the AS keyword to specify aliases

Q8: SELECT E.FNAME, E.LNAME,


S.FNAME, S.LNAME
FROM EMPLOYEE AS E,
EMPLOYEE AS S
WHERE E.SUPERSSN=S.SSN;

Slide 8- 56
UNSPECIFIED WHERE-clause
• A missing WHERE-clause indicates no condition; hence, all tuples of
the relations in the FROM-clause are selected
• This is equivalent to the condition WHERE TRUE
• Query 9: Retrieve the SSN values for all employees.

• Q9: SELECT SSN


FROM EMPLOYEE;

• If more than one relation is specified in the FROM-clause and


there is no join condition, then the CARTESIAN PRODUCT of tuples
is selected
• Ex:
Q10: SELECT SSN, DNAME
FROM EMPLOYEE, DEPARTMENT;

• It is extremely important not to overlook specifying any


selection and join conditions in the WHERE-clause; otherwise,
incorrect and very large relations may result

Slide 8- 57
USE OF *

• To retrieve all the attribute values of the selected tuples, a * is used, which
stands for all the attributes
Examples:

Q1C: SELECT *
FROM EMPLOYEE
WHERE DNO=5;

Q1D: SELECT *
FROM EMPLOYEE, DEPARTMENT
WHERE DNAME='Research' AND
DNO=DNUMBER;

Slide 8- 58
USE OF DISTINCT

• SQL does not treat a relation as a set; duplicate tuples can appear
• To eliminate duplicate tuples in a query result, the keyword DISTINCT is
used
• For example, the result of Q11 may have duplicate SALARY values whereas
Q11A does not have any duplicate values

Q11: SELECT ALL SALARY


FROM EMPLOYEE;
Q11A: SELECT DISTINCT SALARY
FROM EMPLOYEE;

Slide 8- 59
SET OPERATIONS

• SQL has directly incorporated some set operations


• There is a union operation (UNION), and in some versions of SQL there are
set difference (MINUS) and intersection (INTERSECT) operations
• The resulting relations of these set operations are sets of tuples; duplicate
tuples are eliminated from the result
• The set operations apply only to union compatible relations; the two
relations must have the same attributes and the attributes must appear in
the same order

Slide 8- 60
SET OPERATIONS (contd.)

• Query 4: Make a list of all project numbers for projects that involve an employee whose last
name is 'Smith' as a worker or as a manager of the department that controls the project.

Q4: (SELECT DISTINCT PNUMBER


FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE DNUM=DNUMBER AND
MGRSSN=SSN AND LNAME='Smith')
UNION
(SELECT PNUMBER
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE PNUMBER=PNO AND
ESSN=SSN AND NAME='Smith');

Slide 8- 61
SUBSTRING PATTERN MATCHING AND ARITHMATIC
OPERATORS
In SQL, the LIKE comparison operator is used for string pattern
matching.
• Partial strings are specified used two reserved characters:
- % replaces an arbitrary number of zero or more
characters.
- Underscore(_) replaces a single character.
Q12: Retreive all employees whose address is in Houston, Texas.
SELECT Fname, Lname
FROM Employee
WHERE Address LIKE ‘%Houston,Tex%’;
Q12A: Find all the employees who were born during the 1950s.
SELECT Fname, Lname
FROM Employee
WHERE Bdate LIKE ‘_ _ 5_ _ _ _ _ _ _’;

Q13: Show the resulting salaries if every employee working on the ‘ProductX’ project is
given a 10% rise.
SELECT E.Fname, E.Lname, 1.1*E.Salary AS Increased_Salary
FROM Employee AS E, Works_On AS W, Project AS P
WHERE E.SSN = W.ESSN and W.Pno = P.Pno and P.Pname = ‘ProductX’;

Q14: Retrieve all employees in dept. No 5 whose salary is between $30000 and
$40000.
SELECT * FROM Employee WHERE (Salary BETWEEN 30000 AND
40000) AND Dno = 5;
INSERT
• In its simplest form, it is used to add one or more tuples to a
relation
• Attribute values should be listed in the same order as the
attributes were specified in the CREATE TABLE command
• Example:
U1: INSERT INTO EMPLOYEE
VALUES ('Richard','K','Marini', '653298653', '30-DEC-52',
'98 Oak Forest,Katy,TX', 'M', 37000,'987654321', 4 );

• An alternate form of INSERT specifies explicitly the attribute names that


correspond to the values in the new tuple
• Attributes with NULL values can be left out
• Example: Insert a tuple for a new EMPLOYEE for whom we only know the
FNAME, LNAME, and SSN attributes.
U1A: INSERT INTO EMPLOYEE (FNAME, LNAME, SSN)
VALUES ('Richard', 'Marini', '653298653');

Slide 8- 64
INSERT (contd.)
• Important Note: Only the constraints specified in the DDL
commands are automatically enforced by the DBMS when
updates are applied to the database
• Another variation of INSERT allows insertion of multiple tuples
resulting from a query into a relation
• Example: Suppose we want to create a temporary table that has the name, number
of employees, and total salaries for each department.
• A table DEPTS_INFO is created by U3A, and is loaded with the summary
information retrieved from the database by the query in U3B.
U3A: CREATE TABLE DEPTS_INFO
(DEPT_NAME VARCHAR(10),
NO_OF_EMPS INTEGER,
TOTAL_SAL INTEGER);

U3B: INSERT INTO DEPTS_INFO (DEPT_NAME,


NO_OF_EMPS, TOTAL_SAL)
SELECT DNAME, COUNT (*), SUM (SALARY)
FROM DEPARTMENT, EMPLOYEE
WHERE DNUMBER=DNO
GROUP BY DNAME ;

Slide 8- 65
INSERT (contd.)
• Note: The DEPTS_INFO table may not be up-to-date if we
change the tuples in either the DEPARTMENT or the
EMPLOYEE relations after issuing U3B. We have to create
a view (see later) to keep such a table up to date.
DELETE
• Removes tuples from a relation
• Includes a WHERE-clause to select the tuples to be deleted
• Referential integrity should be enforced
• Tuples are deleted from only one table at a time (unless CASCADE is specified
on a referential integrity constraint)
• A missing WHERE-clause specifies that all tuples in the relation are to be
deleted; the table then becomes an empty table
• The number of tuples deleted depends on the number of tuples in the relation
that satisfy the WHERE-clause

Slide 8- 66
DELETE (contd.)

• Examples:
U4A: DELETE FROM EMPLOYEE
WHERE LNAME='Brown’;

U4B: DELETE FROM EMPLOYEE


WHERE SSN='123456789’;

U4C: DELETE FROM EMPLOYEE


WHERE DNO IN
(SELECT DNUMBER
FROM DEPARTMENT
WHERE DNAME='Research');

U4D: DELETE FROM EMPLOYEE;

Slide 8- 67
UPDATE
• Used to modify attribute values of one or more selected tuples
• A WHERE-clause selects the tuples to be modified
• An additional SET-clause specifies the attributes to be modified and their
new values
• Each command modifies tuples in the same relation
• Referential integrity should be enforced
• Example: Change the location and controlling department number of
project number 10 to 'Bellaire' and 5, respectively.

U5: UPDATE PROJECT


SET PLOCATION = 'Bellaire',
DNUM = 5
WHERE PNUMBER=10;

Slide 8- 68
UPDATE (contd.)

• Example: Give all employees in the 'Research' department a 10% raise in


salary.
U6: UPDATE EMPLOYEE
SET SALARY = SALARY *1.1
WHERE DNO IN (SELECT DNUMBER
FROM DEPARTMENT
WHERE DNAME='Research');

• In this request, the modified SALARY value depends on the original SALARY
value in each tuple
• The reference to the SALARY attribute on the right of = refers to the old
SALARY value before modification
• The reference to the SALARY attribute on the left of = refers to the new
SALARY value after modification

Slide 8- 69
MORE COMPLEX SQL
RETRIEVAL QUERIES
COMPARISON INVOLVING NULL AND 3 VALUED LOGIC

SQL has various rules for dealing with NULL values. The NULL is used to
represent a missing values but that is usually has one of the 3 different
interpretations
Populated Database

Slide 8- 73
NESTING OF QUERIES
• A complete SELECT query, called a nested query, can be specified within the
WHERE-clause of another query, called the outer query
• Many of the previous queries can be specified in an alternative form using
nesting
• Query 1: Retrieve the name and address of all employees who work for the
'Research' department.

Q1: SELECT FNAME, LNAME, ADDRESS


FROM EMPLOYEE
WHERE DNO IN (SELECT DNUMBER
FROM DEPARTMENT
WHERE DNAME='Research' );
• The nested query selects the number of the 'Research' department
• The outer query select an EMPLOYEE tuple if its DNO value is in the result of either
nested query
• The comparison operator IN compares a value v with a set (or multi-set) of values V, and
evaluates to TRUE if v is one of the elements in V
• In general, we can have several levels of nested queries
• A reference to an unqualified attribute refers to the relation declared in the innermost
nested query
• In this example, the nested query is not correlated with the outer query

Slide 8- 74
CORRELATED NESTED QUERIES
• If a condition in the WHERE-clause of a nested query references an attribute of a
relation declared in the outer query, the two queries are said to be correlated
• The result of a correlated nested query is different for each tuple (or combination
of tuples) of the relation(s) the outer query

• Query 12: Retrieve the name of each employee who has a dependent with the same
first name as the employee.

Q12: SELECT E.FNAME, E.LNAME


FROM EMPLOYEE AS E
WHERE E.SSN IN
(SELECT D.ESSN
FROM DEPENDENT AS D
WHERE E.ESSN=E.SSN AND
E.FNAME=DEPENDENT_NAME);

Slide 8- 75
CORRELATED NESTED QUERIES (contd.)

• In Q12, the nested query has a different result in the outer query
• A query written with nested SELECT... FROM... WHERE... blocks and using the = or IN
comparison operators can always be expressed as a single block query. For example,
Q12 may be written as in Q12A
Q12A: SELECT E.FNAME, E.LNAME
FROM EMPLOYEE E, DEPENDENT D
WHERE E.SSN=D.ESSN AND
E.FNAME=D.DEPENDENT_NAME;

• Nested Subqueries Versus Correlated Subqueries :


• With a normal nested subquery, the inner SELECT query runs first and executes once,
returning values to be used by the main query. A correlated subquery, however,
executes once for each candidate row considered by the outer query. In other words,
the inner query is driven by the outer query.

Slide 8- 76
CORRELATED NESTED QUERIES (contd.)
• The original SQL as specified for SYSTEM R also had a CONTAINS
comparison operator, which is used in conjunction with nested correlated
queries
• This operator was dropped from the language,
possibly because of the difficulty in implementing it
efficiently
• Most implementations of SQL do not have this
operator
• The CONTAINS operator compares two sets of values,
and returns TRUE if one set contains all values in the
other set
• Reminiscent of the division operation of algebra

Slide 8- 77
CORRELATED NESTED QUERIES (contd.)
• Query 3: Retrieve the name of each employee who works on all the projects
controlled by department number 5.

Q3: SELECT FNAME, LNAME


FROM EMPLOYEE
WHERE ( (SELECT PNO
FROM WORKS_ON
WHERE SSN=ESSN)
CONTAINS
(SELECT PNUMBER
FROM PROJECT
WHERE DNUM=5) );
• In Q3, the second nested query, which is not correlated with the outer query,
retrieves the project numbers of all projects controlled by department 5
• The first nested query, which is correlated, retrieves the project numbers on
which the employee works, which is different for each employee tuple
because of the correlation

Slide 8- 78
THE EXISTS FUNCTION
• EXISTS is used to check whether the result of a correlated nested query is empty
(contains no tuples) or not
• The result of EXISTS is a boolean value True or False. It can be used in a SELECT,
UPDATE, INSERT or DELETE statement.
• SELECT column_name(s) FROM table_name WHERE EXISTS
(SELECT column_name(s) FROM table_name WHERE condition);
• We can formulate Query 12 in an alternative form that uses EXISTS as
Q12B

• Query 12: Retrieve the name of each employee who has a


dependent with the same first name as the employee.

Q12B: SELECT FNAME, LNAME


FROM EMPLOYEE
WHERE EXISTS (SELECT *
FROM DEPENDENT
WHERE SSN=ESSN
AND FNAME=DEPENDENT_NAME);

Slide 8- 79
THE EXISTS FUNCTION (contd.)

• Query 6: Retrieve the names of employees who have no dependents.

Q6: SELECT FNAME, LNAME


FROM EMPLOYEE
WHERE NOT EXISTS (SELECT *
FROM DEPENDENT
WHERE SSN=ESSN);
• In Q6, the correlated nested query retrieves all DEPENDENT tuples related
to an EMPLOYEE tuple. If none exist, the EMPLOYEE tuple is selected
• EXISTS is necessary for the expressive power of SQL

Slide 8- 80
EXPLICIT SETS
• It is also possible to use an explicit
(enumerated) set of values in the WHERE-
clause rather than a nested query
• Query 13: Retrieve the social security numbers of
all employees who work on project number 1, 2,
or 3.
Q13: SELECT DISTINCT ESSN
FROM WORKS_ON
WHERE PNO IN (1, 2, 3);

Slide 8- 82
Joined Relations Feature in SQL
• Can specify a "joined relation" in the FROM-clause
• Looks like any other relation but is the result of a join
• Allows the user to specify different types of joins (regular "theta" JOIN,
NATURAL JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, CROSS JOIN,
etc)
• Examples:
Q8: SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME
FROM EMPLOYEE E S
WHERE E.SUPERSSN=S.SSN;

• can be written as:


Q8: SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME
FROM (EMPLOYEE E LEFT OUTER JOIN
EMPLOYEES ON E.SUPERSSN=S.SSN);

Slide 8- 83
Joined Relations Feature in SQL (contd.)
• Examples:
Q1: SELECT FNAME, LNAME, ADDRESS
FROM EMPLOYEE, DEPARTMENT
WHERE DNAME='Research' AND DNUMBER=DNO;
• could be written as:
Q1: SELECT FNAME, LNAME, ADDRESS
FROM (EMPLOYEE JOIN DEPARTMENT
ON DNUMBER=DNO)
WHERE DNAME='Research’;
• or as:
Q1: SELECT FNAME, LNAME, ADDRESS
FROM (EMPLOYEE NATURAL JOIN DEPARTMENT
AS DEPT(DNAME, DNO, MSSN, MSDATE)
WHERE DNAME='Research’;

Slide 8- 84
Joined Relations Feature in SQL(contd.)
• Another Example: Q2 could be written as follows; this illustrates
multiple joins in the joined tables

Q2: SELECT PNUMBER, DNUM, LNAME,


BDATE, ADDRESS
FROM (PROJECT JOIN
DEPARTMENT ON DNUM=DNUMBER)
JOIN EMPLOYEE ON MGRSSN=SSN) )
WHERE PLOCATION='Stafford’;

Illustration for LEFT OUTER JOIN

Slide 8- 85
AGGREGATE FUNCTIONS
• Include COUNT, SUM, MAX, MIN, and AVG
• Query 15: Find the maximum salary, the minimum salary, and the average
salary among all employees.
Q15: SELECT MAX(SALARY),
MIN(SALARY), AVG(SALARY), SUM(SALARY)
FROM EMPLOYEE;

• Some SQL implementations may not allow more than one function in the
SELECT-clause
• Query 16: Find the maximum salary, the minimum salary,
and the average salary among employees who work for
the 'Research' department.
Q16: SELECT MAX(SALARY),
MIN(SALARY), AVG(SALARY)
FROM EMPLOYEE, DEPARTMENT
WHERE DNO=DNUMBER AND
DNAME='Research‘;

Slide 8- 86
AGGREGATE FUNCTIONS (contd.)

• Queries 17 and 18: Retrieve the total number of employees in the company
(Q17), and the number of employees in the 'Research' department (Q18).

Q17:SELECT COUNT (*)


FROM EMPLOYEE;

Q18:SELECT COUNT (*)


FROM EMPLOYEE, DEPARTMENT
WHERE DNO=DNUMBER AND
DNAME='Research’;

Slide 8- 87
GROUPING

• In many cases, we want to apply the aggregate functions to subgroups of


tuples in a relation
• Each subgroup of tuples consists of the set of tuples that have the same value
for the grouping attribute(s)
• The function is applied to each subgroup independently
• SQL has a GROUP BY-clause for specifying the grouping attributes, which
must also appear in the SELECT-clause

Slide 8- 88
GROUPING (contd.)
• Query 20: For each department, retrieve the department
number, the number of employees in the department, and
their average salary.

Q20: SELECT DNO, COUNT (*), AVG (SALARY)


FROM EMPLOYEE
GROUP BY DNO;

• In Q20, the EMPLOYEE tuples are divided into groups-


• Each group having the same value for the grouping attribute DNO
• The COUNT and AVG functions are applied to each such
group of tuples separately
• The SELECT-clause includes only the grouping attribute
and the functions to be applied on each group of tuples
• A join condition can be used in conjunction with grouping

Slide 8- 89
GROUPING (contd.)
• Query 21: For each project, retrieve the project number, project name, and
the number of employees who work on that project.

Q21: SELECT PNUMBER, PNAME, COUNT (*)


FROM PROJECT, WORKS_ON
WHERE PNUMBER=PNO
GROUP BY PNUMBER, PNAME;

• In this case, the grouping and functions are applied after the joining of the two
relations

THE HAVING-CLAUSE

• Sometimes we want to retrieve the values of these functions for only those
groups that satisfy certain conditions
• The HAVING-clause is used for specifying a selection condition on groups
(rather than on individual tuples)

Slide 8- 90
THE HAVING-CLAUSE (contd.)

• Query 22: For each project on which more than two employees work, retrieve
the project number, project name, and the number of employees who work
on that project.

Q22: SELECT PNUMBER, PNAME,


COUNT(*)
FROM PROJECT, WORKS_ON
WHERE PNUMBER=PNO
GROUP BY PNUMBER, PNAME
HAVING COUNT (*) > 2;

Slide 8- 91
COUNT(*) returns the number
of rows in a specified table,
and it preserves duplicate
rows. It counts each row
separately. This includes rows
that contain null values.
ORDER BY
• The ORDER BY clause is used to sort the tuples in a query result based on
the values of some attribute(s)
• Query 28: Retrieve a list of employees and the projects each works in,
ordered by the employee's department, and within each department ordered
alphabetically by employee last name.
Q28: SELECT DNAME, LNAME, FNAME, PNAME
FROM DEPARTMENT, EMPLOYEE,
WORKS_ON, PROJECT
WHERE DNUMBER=DNO AND SSN=ESSN
AND PNO=PNUMBER
ORDER BY DNAME, LNAME;

• The default order is in ascending order of values


• We can specify the keyword DESC if we want a descending order; the
keyword ASC can be used to explicitly specify ascending order, even
though it is the default

Slide 8- 95
Constraints as Assertions

• General constraints: constraints that do not fit in the basic SQL categories
(presented in chapter 8)
• Mechanism: CREATE ASSERTION
• Components include:
• a constraint name,
• followed by CHECK,
• followed by a condition

Slide 9- 96
Assertions: An Example

• “The salary of an employee must not be greater


than the salary of the manager of the department
that the employee works for’’

CREATE ASSERTION SALARY_CONSTRAINT


CHECK (NOT EXISTS (SELECT *
FROM EMPLOYEE E, EMPLOYEE M,
DEPARTMENT D
WHERE E.SALARY > M.SALARY AND
E.DNO=D.NUMBER AND
D.MGRSSN=M.SSN));

Slide 9- 97
SQL Triggers
• Objective: to monitor a database and take initiate action when a condition
occurs
• A trigger is a stored procedure in database which automatically invokes
whenever a special event in the database occurs. For example, a trigger can be
invoked when a row is inserted into a specified table or when certain table
columns are being updated.
• Triggers are expressed in a syntax similar to assertions and include the
following:
• Event
• Such as an insert, deleted, or update operation
• BEFORE or AFTER the triggering operation is executed
• Condition: Determines whether the rule action should be executed.
Optional Condition: If Condition exists
If no condition exists
• Action
• To be taken when the condition is satisfied

Slide 9- 98
Syntax
create [or replace ] trigger [trigger_name] //Creates or replaces an existing trigger with the
trigger_name.
[before | after] //This specifies when the trigger will be executed.
{insert | update | delete} //This specifies the DML operation.
on [table_name] //This specifies the name of the table associated with the trigger.
[for each row] //This specifies a row-level trigger, i.e., the trigger will be executed for each row
being affected.
[trigger_body] //This provides the operation to be performed as trigger is fired

99
Example:
Given Student Report Database, in which student marks assessment is recorded. In such schema,
create a trigger so that the total and average of specified marks is automatically inserted
whenever a record is insert.

mysql> desc Student;


+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| tid | int(4) | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL| |
| subj1 | int(2) | YES | | NULL | |
| subj2 | int(2) | YES | | NULL | |
| subj3 | int(2) | YES | | NULL | |
| total | int(3) | YES | | NULL | |
| avg | int(3) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+

100
Example:
Create trigger stumarks
Before insert on student for each row set
student.total=student.sub1+student.sub2+student.sub3,
Student.avg=student.total/3;

mysql> insert into Student values(0, “Ram", 20, 20, 20, 0, 0);
Query OK, 1 row affected (0.09 sec)

101
mysql> select * from Student;
+-----+-------+-------+-------+-------+-------+------+
| tid | name | subj1 | subj2 | subj3 | total | avg |
+-----+-------+-------+-------+-------+-------+------+
| 100 | Ram | 20 | 20 | 20 | 60 | 20 |
+-----+-------+-------+-------+-------+-------+------+
1 row in set (0.00 sec)

102
Example
Given Library Book Management database schema
with Student database schema. In these databases, if
any student borrows a book from library then the
count of that specified book should be decremented.
mysql> select * from book_det;
mysql> select * from book_issue;
+-----+-------------+--------+
+------+------+--------+
| bid | btitle | copies |
| bid | sid | btitle |
+-----+-------------+--------+
+------+------+--------+
| 1 | Java | 10 |
1 row in set (0.00 sec)
| 2 | C++ | 5 |
| 3 | MySql | 10|
| 4 | DBMS | 5|
+-----+-------------+--------+
4 rows in set (0.00 sec)

103
OLD and NEW.
• There is two MySQL extension to triggers 'OLD' and 'NEW'.
• OLD and NEW are not case sensitive.
• Within the trigger body, the OLD and NEW keywords enable you to access
columns in the rows affected by a trigger
• In an INSERT trigger, only NEW.col_name can be used.
• In a UPDATE trigger, you can use OLD.col_name to refer to the columns of a
row before it is updated and NEW.col_name to refer to the columns of the
row after it is updated.
• In a DELETE trigger, only OLD.col_name can be used; there is no new row.

104
mysql> insert into book_issue values(1, 100, "Java");

Example • mysql> select * from book_det;


+-----+-------------+--------+

Create or replace trigger | bid | btitle | copies |

book_deduction +-----+-------------+--------+
| 1 | Java | 9|

After insert on book_issue | 2 | C++ | 5|

for each row update | 3 | MySql | 10 |

book_det set
| 4 | DBMS | 5|
+-----+-------------+--------+
copies=copies-1 where • mysql> select * from book_issue;
bid=new.bid; +------+------+--------+
| bid | sid | btitle |
+------+------+--------+
| 1 | 100 | Java |
+------+------+--------+

105
Populated Database

Slide 8- 106
SQL Triggers: An Example

• A trigger to compare an employee’s salary to his/her supervisor during


insert or update operations:

CREATE TRIGGER INFORM_SUPERVISOR


BEFORE INSERT OR UPDATE OF SALARY, SUPERVISOR_SSN ON EMPLOYEE
FOR EACH ROW
WHEN
(NEW.SALARY> (SELECT SALARY FROM EMPLOYEE
WHERE SSN=NEW.SUPERVISOR_SSN))
INFORM_SUPERVISOR
(NEW.SUPERVISOR_SSN,NEW.SSN);

Slide 9- 107
Views in SQL

• A view is a “virtual” table that is derived from other tables


• Allows for limited update operations
• Since the table may not physically be stored
• Allows full query operations
• A convenience for expressing certain operations

Slide 9- 108
Specification of Views
The view has primarily two purposes:
• Simplify the complex SQL queries.
• Provide restriction to users from accessing sensitive data.

• SQL command: CREATE VIEW


• a table (view) name
• a possible list of attribute names (for example, when
arithmetic operations are specified or when we want
the names to be different from the attributes in the
base relations)
• a query to specify the table contents

Slide 9- 109
Views in SQL
Deptnam Empname Salary
e
Admin XYZ 20000
Department table HR PQR 16000

VHR----- HR Accounts ABC 13000

VADMIN- --- ADMIN

Create view VHR as select


deptname,empname,salary where deptname=‘HR’;

Select * from VHR;

110
Example
Department Employee empbydept
D_id D_name Id Name Salar Gende D_id Id Name Salar Gende D_name
y r y r
1 IT
1 Aman 6000 Male 3 1 Aman 6000 Male HR
2 Accounts
2 Bhavya 4999 Female 2 2 Bhavya 4999 Female Accounts
3 HR
3 Chang 7000 Male 1 3 Chang 7000 Male IT
4 Admin
4 Deep 5000 Male 4 4 Deep 5000 Male Admin
5 Ekta 3500 Female 3 5 Ekta 3500 Female HR
6 Francis 4500 Male 1 6 Francis 4500 Male IT

Create view empbydept as


Select
emp.id,emp.name,emp.salary,emp.gender,dept.d_name from
emp join dept on emp.d_id=dept.d_id;
Select * from empbydept;

111
Create view Itemp as select
id,name,salary,gender,d_name from emp join dept in
emp.d_id= dept.d_id where dept.d_name=‘IT’;
Select * from Itemp;

Create view nonconfidentialdata as select


name,gender,d_name from emp join dept in emp.d_id=
dept.d_id ;

Create view summery as select d_name,count(id) as


total_emp from emp join dept on dept.d_id=emp.d_id
group by d_name;

112
Types of View

113
Simple View

114
Complex View

115
SQL Views: An Example

• Specify a different WORKS_ON table

CREATE VIEW WORKS_ON_NEW AS


SELECT FNAME, LNAME, PNAME, HOURS
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE SSN=ESSN AND PNO=PNUMBER;

Slide 9- 116
Using a Virtual Table

• We can specify SQL queries on a newly create


table (view):
SELECT FNAME, LNAME
FROM WORKS_ON_NEW
WHERE PNAME=‘Seena’;

• When no longer needed, a view can be dropped:


DROP WORKS_ON_NEW;

Slide 9- 117
VIEWS Example
Efficient View Implementation

• Query modification:
• Present the view query in terms of a query on the
underlying base tables
• Disadvantage:
• Inefficient for views defined via complex queries
• Especially if additional queries are to be applied to the view within
a short time period

Slide 9- 119
Materialized View

120
Efficient View Implementation
• View materialization:
• Involves physically creating and keeping a temporary
table
• Assumption:
• Other queries on the view will follow
• Concerns:
• Maintaining correspondence between the base table
and the view when the base table is updated
• Strategy:
• Incremental update

Slide 9- 121
Update Views

• Update on a single view without aggregate operations:


• Update may map to an update on the underlying base
table
• Views involving joins:
• An update may map to an update on the underlying
base relations
• Not always possible

Slide 9- 122
Un-updatable Views

• Views defined using groups and aggregate functions are not updateable
• Views defined on multiple tables using joins are generally not updateable
• WITH CHECK OPTION: must be added to the definition of a view if the view
is to be updated
• To allow check for updatability and to plan for an
execution strategy

Slide 9- 123
Database Stored Procedures
• Persistent procedures/functions (modules) are stored locally and executed
by the database server
• As opposed to execution by clients
• Advantages:
• If the procedure is needed by many applications, it
can be invoked by any of them (thus reduce
duplications)
• Execution by the server reduces communication costs
• Enhance the modeling power of views
• Disadvantages:
• Every DBMS has its own syntax and this can make the
system less portable

Slide 9- 124
Stored Procedure Constructs

• A stored procedure
CREATE PROCEDURE procedure-name (params)
local-declarations
procedure-body;

• A stored function
CREATE FUNCTION fun-name (params) RETRUNS return-type
local-declarations
function-body;

• Calling a procedure or function


CALL procedure-name/fun-name (arguments);

Slide 9- 125
Query Languages
• Language in which user requests information from the
database.
• Categories of languages
• Procedural
• Non-procedural, or declarative
• “Pure” languages:
• Relational algebra
• Tuple relational calculus
• Domain relational calculus
• Pure languages form underlying basis of query languages
that people use.

126
Relational Algebra
• The relational algebra is a procedural query language. Relational
algebra is the basic set of operations for the relational model
• It consists of a set of operations that take one or two relations as
input and produce a new relation as their result.
• Six basic operators
– select: 
– project: 
– union: 
– set difference: –
– Cartesian product: x
– rename: 

127
Relational Algebra Overview
• Relational Algebra consists of several groups of operations
• Unary Relational Operations
• SELECT (symbol:  (sigma))
• PROJECT (symbol: p (pi))
• RENAME (symbol:  (rho))
• Relational Algebra Operations From Set Theory
• UNION (), INTERSECTION ( ), DIFFERENCE (or MINUS, – )
• CARTESIAN PRODUCT ( x )
• Binary Relational Operations
• JOIN (several variations of JOIN exist)
• DIVISION
• Additional Relational Operations
• OUTER JOINS, OUTER UNION
• AGGREGATE FUNCTIONS (These compute summary of information: for example,
SUM, COUNT, AVG, MIN, MAX)

128
Select Operation
• The select operation selects tuples that satisfy a given predicate.
• Notation:  p(r)
• p is called the selection predicate
• The selection condition acts as a filter
• comparisons are done using =, ≠, <, ≤, >, and ≥ in the selection
predicate.
• we can combine several predicates into a larger predicate by using the connectives and
(∧), or (∨), and not (¬)
Example of selection:
1. To select tuples of the instructor who is in the “Physics” department
 dept_name=“Physics”(instructor)
2. find all instructors with salary greater than 90,000
 salary>90000(instructor)
3. to find the instructors in Physics with a salary greater than $90,000
 dept name =“Physics”∧salary>90000 (instructor )

129
Project Operation
• Project is used to display the required attributes from a relation.
• Notation:  A1 , A2 ,, Ak (r )
where A1, A2 are attribute names and r is a relation name.
• The result is defined as the relation of k columns obtained by erasing the
columns that are not listed
• Duplicate rows removed from result, since relations are sets
Example:
• To list all instructors’ ID, name, and salary attributes of instructor
ID, name, salary (instructor)
• Find the name of all instructors in the Physics department
name ( dept name =“Physics” (instructor))

130
RENAME
• The RENAME operator is denoted by  (rho)
• The general RENAME operation  can be expressed
by any of the following forms:
• S (B1, B2, …, Bn )(R) changes both:
• the relation name to S, and
• the column (attribute) names to B1, B1, …..Bn
• S(R) changes:
• the relation name only to S
• (B1, B2, …, Bn )(R) changes:
• the column (attribute) names only to B1, B1, …..Bn

131
Union Operation
• Notation: r  s
• Defined as:
r  s = {t | t  r or t  s}
• The result of this operation, denoted by R ∪ S, is a relation that includes all tuples that are
either in R or in S or in both R and S. Duplicate tuples are eliminated.
• For r  s to be valid (r and s should be union compatible).
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (example: 2nd column
of r deals with the same type of values as does the 2nd
column of s)
• Example: to find all courses taught in the Fall 2009 semester, or in the Spring 2010
semester, or in both
course_id ( semester=“Fall” Λ year=2009 (section)) 
course_id ( semester=“Spring” Λ year=2010 (section))

132
Set Difference Operation
• Notation r – s
• Defined as:
• r – s = {t | t  r and t  s}

• Set differences must be taken between compatible relations.


• r and s must have the same arity
• attribute domains of r and s must be compatible
• Example: to find all courses taught in the Fall 2009 semester, but not in the
Spring 2010 semester

course_id ( semester=“Fall” Λ year=2009 (section)) −


course_id ( semester=“Spring” Λ year=2010 (section))

133
Set-Intersection Operation
• Notation: r  s
• Defined as:
• r  s = { t | t  r and t  s }
• The result of this operation, denoted by r∩ s, is a relation that
includes all tuples that are in both r and s.
• Assume:
– r, s have the same arity
– attributes of r and s are compatible
• Example: to find all courses taught in the Fall 2009 semester, or in
the Spring 2010 semester, or in both
course_id ( semester=“Fall” Λ year=2009 (section)) 
course_id ( semester=“Spring” Λ year=2010 (section))

134
Set operations example

135
Cartesian-Product Operation

The Cartesian-product operation, denoted by a cross (×), allows us


to combine information from any two relations.
Cartesian product of relations r1 and r2 is written as r1 × r2.
Ex: to find the names of all instructors in the Physics department
together with the course id of all courses they taught.

136
Join / Cartesian Product
• Binary Operation between two relation A and B
• The operator generates all possible combination between all tuples
of A and B
• Denoted by ‘× ‘
• Synonym as ‘cross join’ . e.g
A B A×B
P Q M N
P Q M N p1 q1 m1 n1
m1 n1
p1 q1 p1 q1 m2 n2
m2 n2
p1 q1 m3 n3
p2 q2 m3 n3
p2 q2 m1 n1
p2 q2 m2 n2
p2 q2 m3 n3

137
Join / Cartesian Product
• A B A×B
p1
p1 q1 q1
p q2
p2 q2 P2
P Q M N
P Q M N p1 q1 m1 n1
m1 n1
p1 q1 p1 q1 m2 n2
m2 n2
p1 q1 m3 n3
p2 q2 m3 n3
p2 q2 m1 n1
p2 q2 m2 n2
p2 q2 m3 n3

138
Join / Cartesian Product

Properties of Join operation


Given two relation A and B with :
degree(A) = m degree(B)=n
cardinality(A)= c1 cardinality(B)=c2
Then
degree(A×B) = degree(A)+ degree(B)=>m+n
cardinality(A×B) = cardinality(c1) * cardinality(c2) => c1*c2

139
Join in relational Algebra
Join is a combination of a Cartesian product followed by a selection process.
A Join operation pairs two tuples from different relations, if and only if a given join condition is satisfied.
Various forms of join operation are:
 Inner Joins:
Theta join
EQUI join
Natural join
 Outer join:
Left Outer Join
Right Outer Join
Full Outer Join
Inner Join:
In an inner join, only those tuples that satisfy the matching criteria are included, while the rest are excluded.

140
141
142
143
144
145
OUTER
A
JOIN
B A⋈B

Id Name Na Mar Id Name Marks


me ks
1 Jay
0 Roh 20 20 Veer 18
an 30 John 14
2 Veer
0 Veer 18
3 John John 14
0 Sam 13
•In an outer join, along with tuples that satisfy the matching criteria, we also include
some or all tuples that do not match the criteria.
Outer join:
•Left Outer Join (A B)
•Right Outer Join (A B)
•Full Outer Join (A B)

146
LEFT JOIN ( )
• This join returns all the rows of the table on the left side
of the join and matching rows for the table on the right
side of join.
• The rows for which there is no matching row on right
side, the result-set will contain null.
• LEFT JOIN is also known as LEFT OUTER JOIN

id Name Marks
10 Jay NULL
20 Veer 18
30 John 14

147
RIGHT JOIN( )
• RIGHT JOIN is similar to LEFT JOIN.
• This join returns all the rows of the table on the right side
of the join and matching rows for the table on the left
side of join.
• The rows for which there is no matching row on left side,
the result-set will contain null.
• RIGHT JOIN is also known as RIGHT OUTER JOIN
id name marks
Null Rohan 20
20 Veer 18
30 John 14
Null Sam 13

148
FULL JOIN ( )
• FULL OUTER JOIN creates the result-set by combining
result of both LEFT JOIN and RIGHT JOIN.
• The result-set will contain all the rows from both the
tables.
• The rows for which there is no matching, the result-set
will contain NULL values.

ID Name Marks
10 Jay NULL
Table 20 Veer 18
Table A
B 30 John 14
NULL Rohan 20
Null Sam 13

149
OUTER UNION Operations
• The outer union operation was developed to take the union of
tuples from two relations if the relations are not type compatible.
• This operation will take the union of tuples in two relations R(X, Y)
and S(X, Z) that are partially compatible, meaning that only
some of their attributes, say X, are type compatible.
• The attributes that are type compatible are represented only once
in the result, and those attributes that are not type compatible from
either relation are also kept in the result relation T(X, Y, Z).

150
Division ÷
• Binary Operation between two relation C and B
• Implicitly C is A × B where A is any Relation
• C÷B => (A × B) ÷ B
• The operator ‘÷ ‘ splits B from C and produces A
• e.g C÷B=>
C= A × B B A
P Q M N M N P Q
p1 q1 m1 n1 m1 n1 p1 q1
p1 q1 m2 n2 m2 n2
p2 q2
p1 q1 m3 n3 m3 n3
p2 q2 m1 n1
p2 q2 m2 n2
p2 q2 m3 n3

151
Division ÷
Student Subject TestQP(Student ×Subject)
1 1 120 combinations
DBM 2 DBM
2
S .. S
p3 COA
.. COA
60 60
TestQP(Student ×Subject) Student Subject(TestQP ÷ Student)
1,DBMS
1,COA
1
2,DBMS 2
2,COA
…. 3
60,DBMS
60,COA
..
60

152
Division ÷

TestQP(Student ×Subject)Subject Student(TestQP ÷


Subject)
1,DBMS 1
1,COA
2,DBMS • Separateses Subject 2
2,COA 3
….
60,DBMS ..
60,COA
60

153
Formal examples: Division ÷
Cases :
Case 1:
Given A,B,C are relations and X,Y are attributes
C(X,Y) ÷ A(X) => B(Y)
C(X,Y) ÷ A(Y) => B(X)
Case 2:
X Y ÷ Y = X
X1 Y1 Y1 X1
X2 Y2
÷ Y2
X1 Y2
÷
X4 y4

154
Division ÷
Formal examples:
Case 3:
X Y
÷ X = Y
X1 Y1 X1 Y1
X2 Y2 Y2
X1 Y2
X4 y4
Case 4:
X Y Y X
X1 Y1 Y1 Null
X2 Y2
÷ =
Y2
X1 Y2
Y3
X4 y4
Y4

155
Division ÷
Formal examples:
Case 5:
X Y ÷ Y = x
X1 Y1 Y1 X1
X2 Y1 X2
X3 Y1 X3
X4 y1 X4
Case 6:
X Y Y X
X1 Y1 Y1
÷ = X1
X2 Y1 X2
Y2
X2 Y2
X1 y2

156
Recap of Relational Algebra Operations

157
Aggregate Function Operation
• Use of the Aggregate Functional operation ℱ
• ℱMAX Salary (EMPLOYEE) retrieves the maximum salary value
from the EMPLOYEE relation
• ℱMIN Salary (EMPLOYEE) retrieves the minimum Salary value
from the EMPLOYEE relation
• ℱSUM Salary (EMPLOYEE) retrieves the sum of the Salary from
the EMPLOYEE relation
• ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE) computes the count
(number) of employees and their average salary
• Note: count just counts the number of rows, without removing duplicates

158
Examples of applying aggregate functions and grouping

159
Examples of Queries in Relational Algebra

• Query 1. Retrieve the name and address of all


employees who work for the ‘Research’ department.

160
• Query 2. For every project located in ‘Stafford’,
list the project number, the controlling
department number, and the department
manager’s last name, address, and birth date.

161
Query 3. Find the names of employees who work on
all the projects controlled by department number 5.

162
Query 4. Make a list of project numbers for
projects that involve an employee whose last name
is ‘Smith’, either as a worker or as a manager of the
department that controls the project.

163
Query 5. List the names of all employees
with two or more dependents.

164
Query 6. Retrieve the names of employees who
have no dependents.

165
Query 7. List the names of managers who
have at least one dependent.

166
Thank YOU

167
Database Management Systems

Module 3
Database Design Theory &
Normalization

Presidency University, Bengaluru


Contents
1 Informal Design Guidelines for Relational Databases
1.1Semantics of the Relation Attributes
1.2 Redundant Information in Tuples and Update Anomalies
1.3 Null Values in Tuples
1.4 Spurious Tuples

2 Functional Dependencies (FDs)


2.1 Definition of FD
2.2 Inference Rules for FDs
3 Normal Forms Based on Primary Keys
3.1 Normalization of Relations
3.2 Practical Use of Normal Forms
3.3 Definitions of Keys and Attributes Participating in Keys
3.4 First Normal Form
3.5 Second Normal Form
3.6 Third Normal Form
3.7 BCNF (Boyce-Codd Normal Form)
3.8 4NF and 5NF

Slide 1-2
Introduction
Each Relation schema consists of a number of attributes.
the Relational Database schema consists of a number of relation
schemas.
What is relational database design?
• The grouping of attributes to form "good" relation schemas
• Produces set of relations.

Two levels of relation schemas


The logical level-users interpret the relation schemas and the
meaning of their attributes.
 The storage level (Implementation level)
Approaches to database design:
Bottom-up or top-down.

3
1.Informal Design Guidelines for Relation Schemas

Used as measures to determine the quality of relation


schema design
Making sure attribute semantics are clear
Reducing redundant information in tuples
Reducing NULL values in tuples
Disallowing possibility of generating spurious tuples

4
1.1 Semantics to Attributes in Relations

Semantics of a relation Attributes : Informally, each tuple in a relation


should represent one entity or relationship instance.
• Attributes of different entities (EMPLOYEEs, DEPARTMENTs,
PROJECTs) should not be mixed in the same relation
• Only foreign keys should be used to refer to other entities
• Entity and relationship attributes should be kept apart as much as
possible

5
Bottom Line: Design a schema that can be explained
easily relation by relation. The semantics of attributes
should be easy to interpret.

6
Guideline 1
Design relation schema so that it is easy to explain its meaning
Do not combine attributes from multiple entity types and relationship
types into a single relation
Example of violating Guideline 1: Figure 15.3

7
1.2 Redundant Information in Tuples and
Update Anomalies
Mixing attributes of multiple entities may cause
problems
Information is stored redundantly
• wastes storage
• Problems with update anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies

8
EXAMPLE OF AN INSERT ANOMALY

Consider the relation:


EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Insert Anomaly: Cannot insert a project unless an employee is
assigned to it.
Inversely - Cannot insert an employee unless an he/she is
assigned to a project.

9
EXAMPLE OF AN DELETE ANOMALY
Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Delete Anomaly:
• When a project is deleted, it will result in deleting all the
employees who work on that project.
• Alternately, if an employee is the sole employee on a
project, deleting that employee would result in deleting the
corresponding project.

10
EXAMPLE OF AN UPDATE ANOMALY

Consider the relation:


EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

Update Anomaly: Changing the name of project


number 1 from “ProductX” to “Billing” may cause this
update to be made for all Other employees working
on project 1.

11
Two relation schemas suffering from
update anomalies

12
13
Guideline 2

• Design a schema that does not suffer from the


insertion, deletion and update anomalies.
•If there are any anomalies present, then note them so
that applications can be made to take them into
account.

14
1.3 Null Values in Tuples
GUIDELINE 3:
• Relations should be designed such that their tuples will have
as few NULL values as possible
• Attributes that are NULL frequently could be placed in
separate relations (with the primary key)
Reasons for nulls:
• attribute not applicable or invalid
• attribute value unknown (may exist)
• value known to exist, but unavailable

15
1.4 Generation of Spurious Tuples

•Bad designs for a relational database may result in


erroneous results for certain JOIN operations
• The "lossless join" property is used to guarantee
meaningful results for join operations
GUIDELINE 4:
•The relations should be designed to satisfy the lossless join
condition.
• No spurious tuples should be generated by doing a natural
join of any relations.

16
17
18
There are two important properties of decompositions:
a) Non-additive or losslessness of the corresponding join
b) Preservation of the functional dependencies.

Note that:
• Property (a) is extremely important and cannot be sacrificed.
•Property (b) is less stringent and may be sacrificed.

19
2. Functional Dependencies
•Are used to specify formal measures of the "goodness" of relational
designs
•And keys are used to define normal forms for relations
•Are constraints that are derived from the meaning and
interrelationships of the data attributes
•A set of attributes X functionally determines a set of attributes Y if
the value of X determines a unique value for Y
Constraint between two sets of attributes from the database
2.1

20
• X -> Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
• For any two tuples t1 and t2 in any relation instance r(R): If
t1[X]=t2[X], then t1[Y]=t2[Y]
• X -> Y in R specifies a constraint on all relation instances r(R)
Examples of FD constraints
• Social security number determines employee name
• SSN -> ENAME
• Project number determines project name and location
• PNUMBER -> {PNAME, PLOCATION}
• Employee SSN and project number determines the hours per week that the
employee works on the project
• {SSN, PNUMBER} -> HOURS

21
• An FD is a property of the attributes in the schema R
• The constraint must hold on every relation instance r(R)
• If K is a key of R, then K functionally determines all attributes in
R
(since we never have two distinct tuples with t1[K]=t2[K])

X Y
1 1
2 1
3 2
4 3
2 5
X->Y
If t1.x=t2.x
Then t1.y=t2.y

22
R.NO NAME MARKS DEPT COURSE
1 A 78 CS C1
2 B 60 EE C1
3 A 78 CS C2
4 B 60 EE C3
5 C 80 IT C3
6 D 80 EC C2

R.NO ->NAME
NAME ->R.NO
R.NO->MARKS
DEPT->COURSE
NAME,MARKS->DEPT
NAME,MARKS->DEPT,COURSE
Name,Marks->Marks

23
Practice Questions
R.NO ->NAME, MARKS
DEPT ,COURSE->NAME
R.NO,MARKS->DEPT
NAME->COURSE
NAME,MARKS,DEPT->R.NO

24
2.2 Inference Rules for FDs
• Given a set of FDs F, we can infer additional FDs that
hold whenever the FDs in F hold
Armstrong's inference rules:
– IR1. (Reflexive) If Y subset-of X, then X -> Y
– IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)
– IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
• IR1, IR2, IR3 form a sound and complete set of
inference rules
– These are rules hold and all other rules that hold can be
deduced from these

25
Inference Rules for FDs
Some additional inference rules that are useful:

(Decomposition) If X -> YZ, then X -> Y and X -> Z

(Union) If X -> Y and X -> Z, then X -> YZ

(Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z

• The last three inference rules, as well as any other inference


rules, can be deduced from IR1, IR2, and IR3 (completeness
property)

26
Inference Rules for FDs

• Closure of a set F of FDs is the set F+ of all FDs that can


be inferred from F

• Closure of a set of attributes X with respect to F is the set


X + of all attributes that are functionally determined by X

• X + can be calculated by repeatedly applying IR1, IR2, IR3


using the FDs in F

27
Equivalence of Sets of FDs
sets of FDs F and G are equivalent if:
• Every FD in F can be inferred from G, and
• Every FD in G can be inferred from F
• Hence, F and G are equivalent if F+ =G+
Definition (Covers):
• F covers G if every FD in G can be inferred from F
• (i.e., if G+ subset-of F+)
F and G are equivalent if F covers G and G covers F
There is an algorithm for checking equivalence of sets
of FDs

28
Minimal Sets of FDs

A set of FDs is minimal if it satisfies the following


conditions:
1. Every dependency in F has a single attribute for its RHS.
2. We cannot remove any dependency from F and have a

set of dependencies that is equivalent to F.


3. We cannot replace any dependency X -> A in F with a

dependency Y -> A, where Y proper-subset-of X ( Y


subset-of X) and still have a set of dependencies that is
equivalent to F.

29
Minimal Sets of FDs

• Every set of FDs has an equivalent minimal set


• There can be several equivalent minimal sets
• There is no simple algorithm for computing a minimal set
of FDs that is equivalent to a set F of FDs
• To synthesize a set of relations, we assume that we start
with a set of dependencies that is a minimal set
• E.g., see algorithms 11.2 and 11.4

30
Q1.R(A,B,C,D,E)
A->B
B->C
C->D
D->E
1.Find the closure of A,AD,B
A + ={A,B,C,D,E}
{AD} + ={A,D,B,C,E}
B + ={B,C,D,E}

2. Find super keys.

Q2.R(A,B,C,D,E)
A->B
D->E
1.Find the super keys.

31
F = {SSN -> ENAME
PNO -> PNAME, PLOC
SSN,PNO -> HRS }

{SSN} + ={SSN,ENAME}
{PNO} + ={PNO,PNAME,PLOC}
{SSN,PNO} + ={SSN,PNO,ENAME,PNAME,PLOC,HRS}

32
3.Normalization of Relations
• Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
• Normal form: Condition using keys and FDs of a
relation to certify whether a relation schema is in a
particular normal form
• 2NF, 3NF, BCNF
• based on keys and FDs of a relation schema

• 4NF based on keys, multi-valued dependencies : MVDs

• 5NF based on keys, join dependencies : JDs

33
Practical Use of Normal Forms

• Normalization is carried out in practice so that the resulting


designs are of high quality and meet the desirable properties
• The practical utility of these normal forms becomes questionable
when the constraints on which they are based are hard to
understand or to detect
• The database designers need not normalize to the highest possible
normal form. (usually up to 3NF, BCNF or 4NF)
• Denormalization: the process of storing the join of higher normal
form relations as a base relation—which is in a lower normal form

34
Definitions of Keys and Attributes Participating in Keys
• A superkey of a relation schema R = {A1, A2, ...., An} is a set of
attributes S subset-of R with the property that no two tuples t1 and
t2 in any legal relation state r of R will have t1[S] = t2[S]

• A key K is a superkey with the additional property that removal of


any attribute from K will cause K not to be a superkey any more.

• If a relation schema has more than one key, each is called a


candidate key.
– One of the candidate keys is arbitrarily designated to be the primary key, and
the others are called secondary keys.

35
Definitions of Keys and Attributes Participating in Keys

• A Prime attribute must be a member of some candidate key

• A Nonprime attribute is not a prime attribute—that is, it is not a


member of any candidate key.

36
First Normal Form
• Disallows
• composite attributes
• multivalued attributes
• nested relations; attributes whose values for an individual
tuple are non-atomic
• Only attribute values permitted are single atomic (or indivisible)
values
• Techniques to achieve first normal form
– Remove attribute and place in separate relation
– Expand the key
– Use several atomic attributes

37
First Normal Form (cont’d.)

• To change to 1NF:

– Remove nested relation attributes into a new relation

– Propagate the primary key into it

– Unnest relation into a set of 1NF relations

38
39
40
Second Normal Form
Uses the concepts of FDs, primary key
Definitions
• Prime attribute: An attribute that is member of the primary key K
• Full functional dependency: a FD Y -> Z where removal of any
attribute from Y means the FD does not hold any more
Examples:
{SSN, PNUMBER} -> HOURS is a full FD since neither SSN
-> HOURS nor PNUMBER -> HOURS hold
{SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency )
since SSN -> ENAME also holds

41
• A relation schema R is in second normal form (2NF) if every
non-prime attribute A in R is fully functionally dependent on the
primary key
• R can be decomposed into 2NF relations via the process of
2NF normalization

42
Third Normal Form
• A functional dependency X->Y in a relation schema R is a
transitive dependency if there exists a set of attributes Z in R
that is neither a candidate key nor a subset of any key of R, and
both X->Z and Z->Y hold.
• Based on concept of transitive dependency

Examples:
SSN -> DMGRSSN is a transitive FD
• Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold
SSN -> ENAME is non-transitive
• Since there is no set of attributes X where SSN -> X and X -> ENAME

43
Problematic FD X->Y
Left-hand side X is part of primary key (violates 2NF)
Left-hand side X is a nonkey attribute (violates 3NF)
NOTE:
• In X -> Y and Y -> Z, with X as the primary key, we consider
this a problem only if Y is not a candidate key.
• When Y is a candidate key, there is no problem with the
transitive dependency .
• E.g., Consider EMP (SSN, Emp#, Salary ).
• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

44
A relation schema R is in third normal form (3NF) if whenever a FD
X -> A holds in R, then either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R
NOTE: Boyce-Codd normal form disallows condition (b) above

45
46
General Definitions of Second
and Third Normal Forms

47
Boyce-Codd Normal Form

• Every relation in BCNF is also in 3NF


– Relation in 3NF is not necessarily in BCNF

• Difference:
– Condition which allows A to be prime is absent from BCNF
• Most relation schemas that are in 3NF are also in BCNF

48
49
Assume the following FD:
Student, Course ->Instructor
Instructor->Course

50
Multivalued Dependencies and Fourth Normal Form
Definition
A multivalued dependency MVD X ->> Y specified on relation
schema R, where X and Y are both subsets of R, specifies the
following constraint on any relation state r of R: If two tuples t1
and t2 exist in r such that t1[X] =t2[X] then two tuples t3 and t4
should also exist in r with the following properties, where we use Z
to denote (R-( X U Y)):

t3[X]=t4[X]=t1[X]=t2[X].
t3[Y]=t1[Y]andt4[Y]=t2[Y].
t3[Z]=t2[Z]andt4[Z]=t1[Z].

51
An MVD X -->>Y in R is called a trivial MVD if (a) Y is a subset of
X, or (b) X U Y =R

Definition:
A relation schema R is in 4 NF with respect to a set of dependencies F
(that includes functional dependencies and multivalued dependencies)
if, for every nontrivial multivalued dependency X ->>Y in F+, X is a
superkey for R
Note: F+ is the (complete) set of all dependencies (functional
or multivalued) that will hold in every relation state r of R that
satisfies F, It is also called the closure of F

52
Multivalued Dependencies and Fourth Normal Form
(a)The EMP relation with two MVDs: ENAME —>> PNAME and
ENAME —>> DNAME.
(b)Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS.

53
Join Dependencies and Fifth Normal Form
Definition:
A join dependency (JD) denoted by JD( R1, R2,…. Rn), specified on
relation schema R, specifies a constraint on the states r of R

• The constraint states that every legal state r of R should have a


non additive join decomposition into R 1 R 2 R n that is, for every
such r we have
• *(ΠR1(r), ΠR2(r),..., ΠRn(r))=r
Note: an MVD is a special case of a JD where n 2
A join dependency JD( R1,R2,Rn specified on relation schema R is a
trivial JD if one of the relation schemas R i in JD( R 1 R 2 R n is
equal to R

54
Definition
•A relation schema R is in fifth normal form(5NF) (orProject-
JoinNormalForm(PJNF)) with respect to a set F of functional,
multivalued, and join dependencies if,
–for every nontrivial join dependency JD (R1,R2,...,Rn) in F+ (that is,implied
by F),
•every Ri is a superkey of R.

55
56
Summary

• Informal guidelines for good design


• Functional dependency
– Basic tool for analyzing relational schemas
• Normalization:
– 1NF, 2NF, 3NF, BCNF,4NF,5NF

57
Exercise 1
Consider a relation R(A, B, C, D), with FDs AB -> C, BC -> D, CD -> A.
• (a) Find the closure of AB.
• (b) Find candidate keys.
• (c) find the normal form of relation R. If the relation is not in BCNF
then convert into BCNF?
Exercise 2
• Consider relation R(A,B,C,D,E) with the following functional
dependencies: AB -> C, D -> E, DE -> B.

• Is R in BCNF? If not, decompose R into a collection of BCNF


relations.

58
Exercise 3
Compute the closure of the following set F of functional
dependencies for relation schema R = {A, B, C, D, E}.
A -> BC
CD -> E
B -> D
E -> A
List the candidate keys for R.
Exercise 4
Consider a relation R(A,B,C,D,E) with the following
dependencies:
{AB-> C, CD -> E, DE -> B} List all candidate keys.

59
Exercise 5
R(A,B,C,D) and FDs {AB -> C, C -> D, D -> A}.

(1) List all nontrivial FDs that can be inferred from the given FDs.

(2) Find all candidate keys.

(3) Find all BCNF violations.

(4) Decompose R into relations in BCNF.

(5) What FDs are not preserved by BCNF.

60
Thank YOU

61
Database Management Systems (CSE
2012)

MODULE 4

Transaction Management

Presidency University, Bengaluru


Module Topics

Transaction Management
Introduction to Transaction Processing
Transaction and System concepts
Desirable properties of transactions
Schedules of transactions

Presidency University, Bengaluru


Introduction to Transaction Processing

 Transaction processing systems- are systems with large


databases and hundreds of concurrent users executing database
transactions.

 Examples of such systems include

• airline reservations, banking, credit card processing, online


retail purchasing, stock markets, supermarket checkouts, and
many other applications.

Presidency University, Bengaluru


Introduction to Transaction Processing
• A transaction is a unit of program execution that accesses and possibly
updates various data items.
– Read operations (database retrieval, such as SQL SELECT)
– Write operations (modify database, such as SQL INSERT, UPDATE,
DELETE)
• E.g., transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Note: Each execution of a program is a distinct transaction with different
parameters
– Bank transfer program parameters: savings account number, checking
account number, transfer amount

Presidency University, Bengaluru


Introduction to Transaction Processing (contd.)

• A transaction (set of operations) may be:


– stand-alone, specified in a high level language like SQL
submitted interactively, or
– consist of database operations embedded within a program
(most transactions)
• Transaction boundaries: Begin and End transaction.
– Note: An application program may contain several
transactions separated by Begin and End transaction
boundaries

Presidency University, Bengaluru


Introduction to Transaction Processing (contd.)

• Transaction Processing Systems: Large multi-user database systems


supporting thousands of concurrent transactions (user processes)
per minute
• Two Modes of Concurrency
– Interleaved processing: concurrent execution of processes is
interleaved in a single CPU
– Parallel processing: processes are concurrently executed in
multiple CPUs
– Basic transaction processing theory assumes interleaved
concurrency

Presidency University, Bengaluru


Presidency University, Bengaluru
Introduction to Transaction Processing (contd.)

For transaction processing purposes, a simple database model is used:


• A database - collection of named data items
• Granularity (size) of a data item - a field (data item value), a record,
or a whole disk block
– Transaction Processing concepts are independent of
granularity
• Basic operations on an item X:
– read_item(X): Reads a database item named X into a
program variable. To simplify our notation, we assume that
the program variable is also named X.
– write_item(X): Writes the value of program variable X into
the database item named X.

Presidency University, Bengaluru


Introduction to Transaction Processing (contd.)
READ AND WRITE OPERATIONS:
Basic unit of data transfer from the disk to the computer main memory is one disk
block (or page). A data item X (what is read or written) will usually be the field of some
record in the database, although it may be a larger unit such as a whole record or even
a whole block.
o read_item(X) command includes the following steps:
 Find the address of the disk block that contains item X.
 Copy that disk block into a buffer in main memory (if that disk block is not
already in some main memory buffer).
 Copy item X from the buffer to the program variable named X.
o write_item(X) command includes the following steps:
 Find the address of the disk block that contains item X.
 Copy that disk block into a buffer in main memory (if it is not already in some
main memory buffer).
 Copy item X from the program variable named X into its correct location in
the buffer.
 Store the updated block from the buffer back to disk (either immediately or
at some later point in time).
Presidency University, Bengaluru
Transaction Notation

• Notation focuses on the read and write operations


• Can also write in shorthand notation:
– T1: b1; r1(X); w1(X); r1(Y); w1(Y); e1;
– T2: b2; r2(Y); w2(Y); e2;
• bi and ei specify transaction boundaries (begin and end)
• i specifies a unique transaction identifier (Tid)

Presidency University, Bengaluru


Why Concurrency Control Is Needed

Without Concurrency Control, problems may occur with concurrent


transactions:
• Lost Update Problem (Write-Write Conflict).
Occurs when two transactions update the same data item, but
both read the same original value before update (Figure a, next
slide)
• The Temporary Update (Write-Read Conflict or Dirty Read)
Problem.
This occurs when one transaction T1 updates a database item X,
which is accessed (read) by another transaction T2; then T1 fails
for some reason (Figure b); X was (read) by T2 before its value is
changed back (rolled back or UNDONE) after T1 fails

Presidency University, Bengaluru


Why Concurrency Control Is Needed

Presidency University, Bengaluru


Why we need concurrency control (contd.)

• The Incorrect Summary Problem


One transaction is calculating an aggregate summary function on a
number of records (for example, sum (total) of all bank account
balances) while other transactions are updating some of these records
(for example, transferring a large amount between two accounts, see
Figure c); the aggregate function may read some values before they are
updated and others after they are updated.

Presidency University, Bengaluru


Why we need concurrency control (contd.)

• The Unrepeatable Read Problem (Read-Write Conflict)


A transaction T1 may read an item (say, available seats on a flight);
later, T1 may read the same item again and get a different value
because another transaction T2 has updated the item (reserved
seats on the flight) between the two reads by T1

Presidency University, Bengaluru


Why recovery is needed
Causes of transaction failure:
1. A computer failure (system crash): A hardware or software error
occurs during transaction execution. If the hardware crashes, the
contents of the computer’s internal main memory may be lost.
2. A transaction or system error : Some operation in the transaction
may cause it to fail, such as integer overflow or division by zero.
Transaction failure may also occur because of erroneous parameter
values or because of a logical programming error. In addition, the
user may interrupt the transaction during its execution.
3. Local errors or exception conditions detected by the transaction:
– For example, data for the transaction may not be found.
– A condition, such as insufficient account balance in a banking
database, may cause a transaction, such as a fund withdrawal, to
be canceled
– a programmed abort causes the transaction to fail.

Presidency University, Bengaluru


Why recovery is needed (contd.)

4. Concurrency control enforcement: The concurrency control


method may decide to abort the transaction, to be restarted later,
because it violates serializability or because several transactions
are in a state of deadlock.
5. Disk failure: Some disk blocks may lose their data because of a
read or write malfunction or because of a disk read/write head
crash.
6. Physical problems and catastrophes: This refers to an endless list
of problems that includes power or air-conditioning failure, fire,
theft, sabotage, overwriting disks or tapes by mistake, and
mounting of a wrong tape by the operator.

Presidency University, Bengaluru


Transaction and System Concepts

A transaction is an atomic unit of work that is either completed in its


entirety or not done at all. A transaction passes through several states
(Figure 20.4, similar to process states in operating systems).
Transaction states:
• Active state (executing read, write operations)
• Partially committed state (ended but waiting for system checks to
determine success or failure)
• Committed state (transaction succeeded)
• Failed state (transaction failed, must be rolled back)
• Terminated State (transaction leaves system)

Presidency University, Bengaluru


Transaction and System Concepts (contd.)

DBMS Recovery Manager needs system to keep track of the


following operations (in the system log file):
• begin_transaction: Start of transaction execution.
• read or write: Read or write operations on the database items
that are executed as part of a transaction.
• end_transaction: Specifies end of read and write transaction
operations have ended. System may still have to check whether
the changes (writes) introduced by transaction can be
permanently applied to the database (commit transaction); or
whether the transaction has to be rolled back (abort transaction)
because it violates concurrency control or for some other reason.
• commit_transaction: Signals successful end of transaction; any
changes (writes) executed by transaction can be safely
committed to the database and will not be undone.

Presidency University, Bengaluru


Transaction and System Concepts (contd.)

Recovery manager keeps track of the following operations (cont.):


• abort_transaction (or rollback): Signals transaction has ended
unsuccessfully; any changes or effects that the transaction may
have applied to the database must be undone.

System operations used during recovery:


• undo(X): Similar to rollback except that it applies to a single write
operation rather than to a whole transaction.
• redo(X): This specifies that a write operation of a committed
transaction must be redone to ensure that it has been applied
permanently to the database on disk.

Presidency University, Bengaluru


Transaction and System Concepts (contd.)

The System Log File


• The log is a sequential , append-only file to keep track of all operations
of all transactions in the order in which they occurred. This information
is needed during recovery from failures
• Log is kept on disk - not affected except for disk or catastrophic failure
• As with other disk files, a log main memory buffer is kept for holding
the records being appended until the whole buffer is appended to the
end of the log file on disk
• Log is periodically backed up to archival storage (tape) to guard against
catastrophic failures
• protocols for recovery that avoid cascading rollbacks do not require
that read operations be written to the system log; most recovery
protocols fall in this category
• strict protocols require simpler write entries that do not include
new_value

Presidency University, Bengaluru


Transaction and System Concepts (contd.)

Types of records (entries) in log file:


• [start_transaction,T]: Records that transaction T has started
execution.
• [write_item,T,X,old_value,new_value]: T has changed the value of
item X from old_value to new_value.
• [read_item,T,X]: T has read the value of item X (not needed in
many cases).
• [end_transaction,T]: T has ended execution
• [commit,T]: T has completed successfully, and committed.
• [abort,T]: T has been aborted.

Presidency University, Bengaluru


Transaction and System Concepts (contd.)

Commit Point of a Transaction:


o Definition: A transaction T reaches its commit point when all its
operations that access the database have been executed
successfully and the effect of all the transaction operations on the
database has been recorded in the log file (on disk). The
transaction is then said to be committed.

Presidency University, Bengaluru


Desirable Properties of Transactions

Called ACID properties – Atomicity, Consistency, Isolation,


Durability:

• Atomicity: A transaction is an atomic unit of processing;


it is either performed in its entirety or not performed at
all.

• Consistency preservation: A correct execution of the


transaction must take the database from one consistent
state to another.

Presidency University, Bengaluru


Desirable Properties of Transactions (contd.)

ACID properties (cont.):


• Isolation: Even though transactions are executing concurrently, they
should appear to be executed in isolation – that is, their final effect
should be as if each transaction was executed in isolation from start
to finish.

• Durability: Once a transaction is committed, its changes (writes)


applied to the database must never be lost because of subsequent
failure.

Presidency University, Bengaluru


Desirable Properties of Transactions

• Atomicity: Enforced by the recovery protocol.


• Consistency preservation: Specifies that each transaction does a
correct action on the database on its own. Application
programmers and DBMS constraint enforcement are responsible
for this.
• Isolation: Responsibility of the concurrency control protocol.
• Durability : Enforced by the recovery protocol.

Presidency University, Bengaluru


Self study for flipped class
activity

Presidency University, Bengaluru


Schedules of Transactions
• Transaction schedule (or history) :A schedule S of n transactions T1,
T2, ..., Tn is an ordering of the operations of the transactions.
Operations from different transactions can be interleaved in the
schedule S.
• However, for each transaction Ti that participates in the schedule S,
the operations of Ti in S must appear in the same order in which
they occur in Ti.
• Figure (next slide) shows 4 possible schedules (A, B, C, D) of two
transactions T1 and T2:
– Order of operations from top to bottom
– Each schedule includes same operations
– Different order of operations in each schedule

Presidency University, Bengaluru


Presidency University, Bengaluru
Schedules of Transactions (contd.)

• Schedules can also be displayed in more compact notation


• Order of operations from left to right
• Include only read (r) and write (w) operations, with transaction id
(1, 2, …) and item name (X, Y, …)
• Can also include other operations such as b (begin), e (end), c
(commit), a (abort)
• Schedules in previous slide would be displayed as follows:
– Schedule A: r1(X); w1(X); r1(Y); w1(Y); r2(X); w2(x);
– Schedule B: r2(X); w2(X); r1(X); w1(X); r1(Y); w1(Y);
– Schedule C: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
– Schedule D: r1(X); w1(X); r2(X); w2(X); r1(Y); w1(Y);

Presidency University, Bengaluru


Schedules of Transactions (contd.)

• Generally very large number of possible schedules


• Some schedules are easy to recover from after a failure, while
others are not
• Some schedules produce correct results, while others produce
incorrect results
• Rest of chapter characterizes schedules by classifying them based
on ease of recovery (recoverability) and correctness
(serializability)

Presidency University, Bengaluru


Characterizing Schedules based on Recoverability

Schedules classified into two main classes:


• Recoverable schedule: One where no committed transaction
needs to be rolled back (aborted).
A schedule S is recoverable if no transaction T in S commits until
all transactions T’ that have written an item that T reads have
committed.
• Non-recoverable schedule: A schedule where a committed
transaction may have to be rolled back during recovery.
This violates Durability from ACID properties (a committed
transaction cannot be rolled back) and so non-recoverable
schedules should not be allowed.

Presidency University, Bengaluru


Characterizing Schedules based on
Recoverability (contd.)
• Example: Schedule A below is non-recoverable because T2 reads
the value of X that was written by T1, but then T2 commits before
T1 commits or aborts
• To make it recoverable, the commit of T2 (c2) must be delayed
until T1 either commits, or aborts (Schedule B)
• If T1 commits, T2 can commit
• If T1 aborts, T2 must also abort because it read a value that was
written by T1; this value must be undone (reset to its old value)
when T1 is aborted
– known as cascading rollback

• Schedule A: r1(X); w1(X); r2(X); w2(X); c2; r1(Y); w1(Y); c1 (or a1)
• Schedule B: r1(X); w1(X); r2(X); w2(X); r1(Y); w1(Y); c1 (or a1); ...

Presidency University, Bengaluru


Characterizing Schedules based on
Recoverability (contd.)

Recoverable schedules can be further refined:


• Cascadeless schedule: A schedule in which a transaction T2
cannot read an item X until the transaction T1 that last wrote X has
committed.
• The set of cascadeless schedules is a subset of the set of
recoverable schedules.

Schedules requiring cascaded rollback: A schedule in which an


uncommitted transaction T2 that read an item that was written by
a failed transaction T1 must be rolled back.

Presidency University, Bengaluru


Characterizing Schedules based on
Recoverability (contd.)
• Example: Schedule B below is not cascadeless because T2 reads
the value of X that was written by T1 before T1 commits
• If T1 aborts (fails), T2 must also be aborted (rolled back) resulting
in cascading rollback
• To make it cascadeless, the r2(X) of T2 must be delayed until T1
commits (or aborts and rolls back the value of X to its previous
value) – see Schedule C

• Schedule B: r1(X); w1(X); r2(X); w2(X); r1(Y); w1(Y); c1 (or a1);


• Schedule C: r1(X); w1(X); r1(Y); w1(Y); c1; r2(X); w2(X); ...

Presidency University, Bengaluru


Characterizing Schedules based on
Recoverability (contd.)

Cascadeless schedules can be further refined:


• Strict schedule: A schedule in which a transaction T2 can
neither read nor write an item X until the transaction T1 that last
wrote X has committed.
• The set of strict schedules is a subset of the set of cascadeless
schedules.
• If blind writes are not allowed, all cascadeless schedules are also
strict

Blind write: A write operation w2(X) that is not preceded by a


read r2(X).

Presidency University, Bengaluru


Characterizing Schedules based on
Recoverability (contd.)
• Example: Schedule C below is cascadeless and also strict
(because it has no blind writes)
• Schedule D is cascadeless, but not strict (because of the blind
write w3(X), which writes the value of X before T1 commits)
• To make it strict, w3(X) must be delayed until after T1 commits –
see Schedule E

• Schedule C: r1(X); w1(X); r1(Y); w1(Y); c1; r2(X); w2(X); …


• Schedule D: r1(X); w1(X); w3(X); r1(Y); w1(Y); c1; r2(X); w2(X); …
• Schedule E: r1(X); w1(X); r1(Y); w1(Y); c1; w3(X); r2(X); w2(X); …

Presidency University, Bengaluru


Characterizing Schedules based on
Recoverability (contd.)
Summary:
• Many schedules can exist for a set of transactions
• The set of all possible schedules can be partitioned into two
subsets: recoverable and non-recoverable
• A subset of the recoverable schedules are cascadeless
• If blind writes are allowed, a subset of the cascadeless schedules
are strict
• If blind writes are not allowed, the set of cascadeless schedules is
the same as the set of strict schedules

Presidency University, Bengaluru


Characterizing Schedules based on Serializability

• Among the large set of possible schedules, we want to


characterize which schedules are guaranteed to give a
correct result
• The consistency preservation property of the ACID
properties states that: each transaction if executed on its
own (from start to finish) will transform a consistent state of
the database into another consistent state
• Hence, each transaction is correct on its own

Presidency University, Bengaluru


Characterizing Schedules based on
Serializability (contd.)

• Serial schedule: A schedule S is serial if, for every transaction


T participating in the schedule, all the operations of T are
executed consecutively (without interleaving of operations
from other transactions) in the schedule. Otherwise, the
schedule is called nonserial.
• Based on the consistency preservation property, any serial
schedule will produce a correct result (assuming no inter-
dependencies among different transactions)

Presidency University, Bengaluru


Characterizing Schedules based on
Serializability (contd.)

• Serial schedules are not feasible for performance reasons:


– No interleaving of operations
– Long transactions force other transactions to wait
– System cannot switch to other transaction when a
transaction is waiting for disk I/O or any other event
– Need to allow concurrency with interleaving without
sacrificing correctness

Presidency University, Bengaluru


Characterizing Schedules based on
Serializability (contd.)

• Serializable schedule: A schedule S is serializable if it is


equivalent to some serial schedule of the same n
transactions.
• There are (n)! serial schedules for n transactions – a
serializable schedule can be equivalent to any of the
serial schedules
• Question: How do we define equivalence of schedules?

Presidency University, Bengaluru


Equivalence of Schedules

• Result equivalent: Two schedules are called result equivalent


if they produce the same final state of the database.
• Difficult to determine without analyzing the internal
operations of the transactions, which is not feasible in
general.
• May also get result equivalence by chance for a particular
input parameter even though schedules are not equivalent in
general (see Figure 20.6)

Presidency University, Bengaluru


Equivalence of Schedules (contd.)

• Conflict equivalent: Two schedules are conflict equivalent if


the relative order of any two conflicting operations is the
same in both schedules.
• Commonly used definition of schedule equivalence
• Two operations are conflicting if:
– They access the same data item X
– They are from two different transactions
– At least one is a write operation
• Read-Write conflict example: r1(X) and w2(X)
• Write-write conflict example: w1(Y) and w2(Y)

Presidency University, Bengaluru


Characterizing Schedules based on
Serializability (contd.)

• Conflict equivalence of schedules is used to determine


which schedules are correct in general (serializable)

A schedule S is said to be serializable if it is conflict


equivalent to some serial schedule S’.

Presidency University, Bengaluru


Characterizing Schedules based on
Serializability (contd.)

• A serializable schedule is considered to be correct


because it is equivalent to a serial schedule, and any
serial schedule is considered to be correct
– It will leave the database in a consistent state.
– The interleaving is appropriate and will result in a state
as if the transactions were serially executed, yet will
achieve efficiency due to concurrent execution and
interleaving of operations from different transactions.

Presidency University, Bengaluru


Characterizing Schedules based on
Serializability (contd.)

Testing for conflict serializability


Algorithm 21.1:
• Looks at only r(X) and w(X) operations in a schedule
• Constructs a precedence graph (serialization graph) – one node
for each transaction, plus directed edges
• An edge is created from Ti to Tj if one of the operations in Ti
appears before a conflicting operation in Tj
• The schedule is serializable if and only if the precedence graph
has no cycles.

Presidency University, Bengaluru


Presidency University, Bengaluru
Presidency University, Bengaluru
Presidency University, Bengaluru
Presidency University, Bengaluru
Presidency University, Bengaluru
Characterizing Schedules based on Serializability
(contd.)

• View equivalence: A less restrictive definition of


equivalence of schedules than conflict serializability
when blind writes are allowed

• View serializability: definition of serializability based on


view equivalence. A schedule is view serializable if it is
view equivalent to a serial schedule.

Presidency University, Bengaluru


Characterizing Schedules based on Serializability
(contd.)

Two schedules are said to be view equivalent if the following three


conditions hold:
• The same set of transactions participates in S and S’, and S and S’
include the same operations of those transactions.
• For any operation Ri(X) of Ti in S, if the value of X read was
written by an operation Wj(X) of Tj (or if it is the original value of
X before the schedule started), the same condition must hold for
the value of X read by operation Ri(X) of Ti in S’.
• If the operation Wk(Y) of Tk is the last operation to write item Y
in S, then Wk(Y) of Tk must also be the last operation to write
item Y in S’.

Presidency University, Bengaluru


Characterizing Schedules based on Serializability
(contd.)

The premise behind view equivalence:


o Each read operation of a transaction reads the result
of the same write operation in both schedules.
o “The view”: the read operations are said to see the
same view in both schedules.
o The final write operation on each item is the same
on both schedules resulting in the same final
database state in case of blind writes

Presidency University, Bengaluru


Characterizing Schedules based on Serializability
(contd.)

Relationship between view and conflict equivalence:

o The two are same under constrained write


assumption (no blind writes allowed)
o Conflict serializability is stricter than view
serializability when blind writes occur (a schedule
that is view serializable is not necessarily conflict
serialiable.
o Any conflict serializable schedule is also view
serializable, but not vice versa.

Presidency University, Bengaluru


Characterizing Schedules based on Serializability
(contd.)

Relationship between view and conflict equivalence


(cont):
Consider the following schedule of three transactions
T1: r1(X); w1(X); T2: w2(X); and T3: w3(X):
Schedule Sa: r1(X); w2(X); w1(X); w3(X); c1; c2; c3;

In Sa, the operations w2(X) and w3(X) are blind writes, since T2
and T3 do not read the value of X.

Sa is view serializable, since it is view equivalent to the serial


schedule T1, T2, T3. However, Sa is not conflict serializable,
since it is not conflict equivalent to any serial schedule.

Presidency University, Bengaluru


Presidency University, Bengaluru

You might also like