Professional Documents
Culture Documents
DBMS Ese
DBMS Ese
1. Introduction to database
● A database is a structured collection of data designed for efficient
storage, retrieval, and management.
● Its primary purpose is to organize data in a systematic manner, facilitating
easy access and manipulation.
● Different types of databases exist, including relational (SQL), NoSQL,
object-oriented, etc.
● For example: Social media platforms use databases to manage user
profiles, posts, connections, and interactions.
● Advantages of Databases:
○ Databases ensure data consistency, accuracy, and reliability.
○ They offer robust security measures, scalability, and concurrent
access for enhanced efficiency.
2. Characteristics of Database
● Self-Describing Nature:
○ The DBMS catalog stores metadata, providing descriptions of the
database structure.
○ This self-describing nature simplifies database management and
allows for more efficient querying and manipulation of data.
● Insulation between Programs and Data:
○ Program-data independence allows modifications to data storage
structures and operations without necessitating changes in DBMS
access programs.
○ This insulation enables easier adaptation to evolving business
needs without affecting the applications utilizing the database.
● Data Abstraction:
○ Utilizes a data model to offer users a conceptual view of the
database, concealing intricate storage details.
○ Data abstraction enhances user understanding by presenting a
simplified and logical view of the database structure.
● Support of Multiple Views:
○ Enables diverse users to have customized views of the database,
displaying only the data relevant to their specific needs or
permissions.
○ Multiple views enhance usability by tailoring data presentation
based on different user roles or requirements.
● Sharing of Data and Multi User Transaction Processing:
○ Allows multiple users to concurrently access, retrieve, and update
the database.
○ Concurrency control mechanisms within the DBMS ensure proper
execution or complete cancellation of each transaction to maintain
data consistency and integrity.
3. Properties of database
● Representation of Real-World Aspects:
○ Databases aim to represent aspects of the real world. Through
tables, records, and relationships, databases emulate real-world
structures to facilitate data organization and retrieval.
● Logical Coherence and Meaningful Data:
○ Data within databases is logically coherent, organized in a
structured manner following predefined rules and relationships.
● Purposeful Design, Construction, and Population:
○ Databases are designed, built, and populated with a specific
objective in mind, addressing particular needs or requirements.
○ The creation process involves defining schemas, tables, and
constraints based on the intended use and application
requirements.
● Tailored for Target Users and Applications:
○ Databases are constructed considering a target group of users and
specific applications they serve.
○ They are optimized to efficiently store, retrieve, and manage data
relevant to the intended users and applications, ensuring
performance meets their requirements.
4. File vs Database system
5. Users of database
● Actors on the Scene
● These individuals directly interact with, control, or use the database and
its content:
○ Database Administrators (DBAs):
■ Responsible for authorizing access, coordinating and
monitoring database use, acquiring resources, controlling
access, and ensuring operational efficiency.
○ Database Designers:
■ Define the database content, structure, constraints, and
functions or transactions. They liaise with end-users to
understand their requirements and translate them into the
database design.
○ End-users:
■ Utilize the data for queries, reports, and some update
operations.
● Categories of End-users
○ Casual Users:
■ Access the database occasionally as needed.
○ Naïve or Parametric Users:
■ Utilize pre-defined functions ("canned transactions")
regularly. Examples include bank-tellers or reservation
clerks.
○ Application Programmers:
■ Develop application programs using various tools, including
those for user interface development.
○ Sophisticated Users:
■ Business analysts, scientists, engineers, etc., familiar with
system capabilities, often using software packages that
work closely with the stored database.
○ Stand-alone Users:
■ Maintain personal databases using packaged applications,
e.g., an individual using a tax program to manage their
personal database.
● Database Administrator (DBA)
○ Oversee central control of data and programs accessing the data
within DBMS.
○ Functions include schema definition, storage structure
modification, authorization management, and routine maintenance.
● Workers Behind the Scene
● These individuals contribute to the development, implementation, and
maintenance of the database system:
○ DBMS System Designers and Implementers:
■ Design and implement DBMS modules and interfaces as
software packages.
○ Tool Developers:
■ Design and implement software tools facilitating database
modeling, design, system optimization, and performance
enhancement.
○ Operators and Maintenance Personnel:
■ Responsible for running and maintaining the hardware and
software environment for the database system.
6. Three schema architecture
● External Schema:
○ Provides a tailored view of data for each user group or application.
○ Offers specific data formats & access rights unique to each user's
needs
● Conceptual Schema:
○ Presents a unified, logical structure of the entire database.
○ Ensures consistency across different user perspectives without
detailing specific user needs.
● Internal Schema:
○ Describes how data is physically stored within the database.
○ Focuses on optimizing storage and retrieval mechanisms.
● Operations and Mappings
○ Request Processing:
■ Transforms user requests from their view (external schema)
to the conceptual schema.
■ Converts these requests to operations accessing the
internal schema for efficient processing.
○ Mappings:
■ Ensure accurate processing of requests between schema
levels without compromising data integrity.
○ Data Formatting:
■ Adjusts internally stored data to match the specific user's
view during data retrieval.
7. Database Independence
● Data Independence refers to the database system's ability to modify one
level of the system without impacting other levels, ensuring flexibility
and adaptability. There are two primary types:
● Logical Data Independence:
○ Enables modifications to the conceptual schema without affecting
external schemas or application programs.
○ Changes such as additions or deletions of data within the
database should not disrupt or alter the external view or access to
data.
○ Applications using the external schema should continue
functioning normally even after significant changes are made to
the logical structure of the database.
● Physical Data Independence:
○ Allows alterations to the internal schema without necessitating
changes in the conceptual or external schemas.
○ Changes in the physical schema, like optimizing storage,
restructuring files, or improving performance through indexing, can
be implemented without affecting the external view of data.
○ Modifications to the physical organization should not demand
changes in the conceptual schema as long as the data remains
consistent.
8. Database architecture
● Query Processor
● Handles user queries and facilitates data retrieval.
● Components:
○ DDL Interpreter: Interprets and executes Data Definition Language
(DDL) statements, retrieving data definitions from the data
dictionary.
○ DML Compiler: Translates Data Manipulation Language (DML)
statements into executable low-level instructions for the query
evaluation engine.
○ Query Evaluation Engine: Executes instructions generated by the
DML compiler to fetch data from the DBMS.
● Storage Manager
● Responsible for managing data storage and ensuring data integrity.
● Components:
○ Authorization and Integrity Manager: Enforces integrity constraints
and manages user access rights.
○ Transaction Manager: Maintains database consistency, especially
during system failures or concurrent access.
○ File Manager: Controls disk storage space allocation and data
structures on disk.
○ Buffer Manager: Handles efficient data transfer between disk
storage and main memory, optimizing data retrieval.
● Storage Components
○ Data Files: Store actual data within the database.
○ Data Dictionary: Contains metadata providing essential
information about the database structure.
○ Indices: Facilitate swift access to specific data items, enhancing
data retrieval speed.
9. Schema and instance
● A schema represents the logical plan defining the structure, organization,
and relationships of data within a database.
● It provides the blueprint for how data is stored, accessed, and
manipulated.
● Key Points about Schemas:
○ Structure Definition:
■ Specifies the table structure, attributes (columns), data
types, constraints, and relationships between tables.
■ Defines the layout and arrangement of data elements,
ensuring data organization and coherence.
○ Data Integrity Constraints:
■ Enforces data integrity by implementing constraints such as
primary keys, foreign keys, unique constraints, and check
constraints.
■ Ensures accuracy, consistency, and reliability of data stored
within the database.
○ Security and Access Control:
■ Defines user roles and access privileges within the database
environment.
■ Specifies who has permission to view, modify, or manipulate
data in tables or the entire database.
■ Includes security policies and access control mechanisms to
safeguard sensitive information and regulate user actions.
○ Data Dictionary:
■ Often contains a data dictionary that serves as a repository
for metadata about the database.
■ Contains descriptions of tables, columns, constraints,
indexes, and other database components.
■ Provides essential information describing the database's
structure, aiding in understanding and managing the
database effectively.
Chapter 2
10.Relational schema
● A relational schema is a blueprint that defines the structure, constraints,
and relationships of tables in a relational database.
● It outlines the logical view of the database and describes how data is
organized and stored.
● Components of Relational Schema:
○ Table Definition: Specifies table names and their associated
attributes.
○ Constraints: Enforces rules and restrictions on data integrity, such
as primary keys, foreign keys, and unique constraints.
○ Relationships: Defines associations between tables, establishing
connections using keys.
11.ORDBMS vs RDBMS
12.Relational algebra
● The algebraic operations create new relations by manipulating existing
ones using operations within the algebra.
● Operations can be chained or combined to form relational algebra
expressions, defining complex queries or data manipulations.
● The output of a relational algebra expression is a relation, representing
the result of a database query or retrieval request.
● Relational algebra forms the fundamental operations for the relational
model, allowing users to specify basic retrieval requests or queries.
● All operations in relational algebra produce relations as output, making
the algebra "closed" as all objects within it are relations.
● Types of Relational Algebra Operations:
● Unary Relational Operations:
○ SELECT (σ):
■ Filters rows from a relation based on a specified condition.
○ PROJECT (π):
■ Selects specific columns from a relation, removing
duplicates.
○ RENAME (ρ):
■ Changes the name of a relation or its attributes.
● Operations from Set Theory:
○ UNION (∪), INTERSECTION (∩), DIFFERENCE (or MINUS,
–):
■ Set-based operations to combine or compare relations.
○ CARTESIAN PRODUCT (×):
■ Combines tuples from two relations to create a new
relation.
13.Select
● Purpose: Filters rows from a relation based on specified conditions.
● Symbol: Represented by σ (sigma).
● Functionality: Retrieves rows that satisfy specified criteria.
● Syntax: σ<condition>(Relation)
● Example: σ(Salary > 50000)(Employees)
● Result: Generates a new relation with rows meeting the condition.
14.Project
● Purpose: Selects specific columns from a relation.
● Symbol: Represented by π (pi).
● Functionality: Retrieves chosen attributes from a relation, eliminating
duplicates.
● Syntax: π<attribute(s)>(Relation)
● Example: π(Name, Salary)(Employees)
● Result: Generates a new relation with specified attributes.
15.Rename
● Purpose: Changes the name of a relation or its attributes.
● Symbol: Represented by ρ (rho).
● Functionality: Renames the relation or its attributes for clarity or
convenience.
● Syntax: ρ<NewName/Attribute>(OldName)
● Example: ρ(NewSalary)(Salary)
● Result: Renames the attribute "Salary" to "NewSalary" in the relation.
16. UNION (∪)
● Purpose: Combines tuples from two relations, removing duplicates.
● Symbol: Represented by ∪ (union).
● Functionality: Merges rows from two relations into a single relation.
● Syntax: R ∪ S (R and S are relations with the same
attributes)
● Example: If R represents the set of employees in
Department A and S represents employees in Department B, R
∪ S would provide a combined list of employees from both
departments, removing duplicates.
● Result: Generates a new relation with all distinct tuples from both
relations.
20. EQUIJOIN
● Purpose: Performs a type of JOIN operation based on equality between
attributes.
● Functionality: Merges rows from two relations where specified attributes
are equal.
● Syntax: R ⨝ (R.Attribute = S.Attribute) S (R and S are relations)
● Example: If R represents the set of employees and S represents their respective
departments, R ⨝ (R.DepartmentID = S.DepartmentID) S would combine employee
data with their corresponding departments where the DepartmentID matches.
● Result: Generates a new relation by matching tuples from both input
relations where the specified attributes are equal.
21. NATURAL JOIN
● Purpose: Performs a JOIN operation based on common attributes
between two relations.
● Functionality: Merges rows from two relations with matching attribute
names.
● Syntax: R ⨝ NATURAL JOIN S (R and S are relations)
● Example: If R represents employee data and S represents departments, R ⨝
NATURAL JOIN S would combine data where both relations share common
attribute names, such as "DepartmentID".
● Result: Generates a new relation by combining tuples from both input
relations based on their shared attribute names.
22.Aggregate functions
● Purpose: Computes summaries or aggregations of data within a relation.
● Usage: Operates on groups of tuples to generate a single result.
● Functions: Common aggregate functions include:
○ SUM: Computes the sum of values in a specified attribute.
○ COUNT: Calculates the number of tuples or non-null values.
○ AVG: Determines the average of values in a specified attribute.
○ MIN: Finds the minimum value in a specified attribute.
○ MAX: Identifies the maximum value in a specified attribute.
● Syntax: AggregateFunction<Attribute>(Relation)
● Example: COUNT(EmployeeID)(Employees) - Counts the number of
EmployeeID entries in the Employees relation.
● Result: Returns a single value that represents the result of the aggregate
function applied to the specified attribute in the relation.
23. OUTER JOIN Operation
● Purpose: Performs a JOIN operation, including unmatched tuples from
one or both relations.
● Types: Includes LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL
OUTER JOIN.
● Functionality: Combines tuples from two relations based on a specified
condition and includes unmatched tuples.
● Syntax (for LEFT OUTER JOIN): R ⟕<condition> S (R and S are relations)
● Example: If R represents employees and S represents departments, R ⟕
(R.DepartmentID = S.DepartmentID) S would combine employee data and
include departments even if no employee belongs to that department.
● Result: Generates a new relation by matching tuples based on the
condition and including unmatched tuples from one or both relations.
Chapter 3
1. Overview of SQL
● Data-definition language(DDL):
○ The SQL DDL provides commands for defining relation schemas,
deleting relations, and modifying relation schemas.
● Data-manipulation language(DML):
○ The SQL DML provides the ability to query information from the
database and to insert tuples into, delete tuples from, and modify
tuples in the database.
● Integrity:
○ The SQL DDL includes commands for specifying integrity
constraints that the data stored in the database must satisfy.
Updates that violate integrity constraints are disallowed.
● View definition:
○ The SQL DDL includes commands for defining views.
● Transaction control:
○ SQL includes commands for specifying the beginning and ending
of transactions.
● Embedded and dynamic SQL:
○ Define how SQL statements can be embedded within general-
purpose programming languages, such as C, C++, and Java.
● Authorization:
○ The SQL DDL includes commands for specifying access rights to
relations and views.
2. Domains of SQL
● char(n): Fixed length, user-defined size string.
● varchar(n): Variable length, user-defined maximum size string.
● int: Machine-dependent integer.
● smallint: Machine-dependent small integer.
● numeric(p,d): Fixed point number with precision p and d digits to the right
of the decimal point.
● real, double precision: Floating-point numbers with machine-dependent
precision.
● float(n): Floating-point number with at least n digits of precision.
3. DDL commands
● CREATE Command:
○ Used to create new database objects like tables, views, indexes,
etc.
○ Syntax for creating a table: CREATE TABLE table_name (column1
datatype, column2 datatype, ...);
○ Requires specifying table name, column names, & their data types.
○ Example: CREATE TABLE students (id INT, name VARCHAR(50),
age INT);
● DROP Command:
○ Used to remove database objects like tables, views, indexes, etc.
○ Syntax for dropping a table: DROP TABLE table_name;
○ Irreversibly deletes the entire table and its data.
○ Example: DROP TABLE employees;
● ALTER Command:
○ Modifies the structure of existing database objects like tables.
○ Syntax for adding a column: ALTER TABLE table_name ADD
column_name datatype;
○ Allows adding, modifying, or dropping columns, changing data
types, etc.
○ Example: ALTER TABLE customers ADD email VARCHAR(100);
● TRUNCATE Command:
○ Removes all records from a table but keeps the table structure
intact.
○ Syntax: TRUNCATE TABLE table_name;
○ Faster than DELETE as it doesn't generate individual deletion logs
for each row.
○ Example: TRUNCATE TABLE logs;
● RENAME Command:
○ Renames an existing database object.
○ Syntax for renaming a table: RENAME TABLE old_table_name TO
new_table_name;
○ Useful for altering object names without changing their structures.
○ Example: RENAME TABLE users TO customers;
4. DML commands
● SELECT Command:
○ Fundamental command used to retrieve data from a database.
○ Syntax: SELECT column1, column2 FROM table_name WHERE
condition;
○ Allows fetching specific columns or all columns from a table.
○ Distinct keyword is used for duplicate removal and * is used for
selecting all records
○ The WHERE clause is optional but used for filtering data based on
specified conditions.
○ Example: SELECT * FROM customers WHERE country='USA';
● INSERT Command:
○ Used to add new records/rows into a database table.
○ Syntax: INSERT INTO table_name (column1, column2, ...) VALUES
(value1, value2, ...);
○ Requires specifying the table and columns where data needs to be
inserted along with corresponding values.
○ Example: INSERT INTO employees (emp_name, emp_salary)
VALUES ('John Doe', 50000);
● UPDATE Command:
○ Modifies existing records in a table.
○ Syntax: UPDATE table_name SET column1 = value1, column2 =
value2, ... WHERE condition;
○ Indicates which columns to update and their new values.
○ The WHERE clause is crucial to specify which records to update;
without it, all records in the table might be affected.
○ Example: UPDATE products SET quantity = 100 WHERE
product_id = 123;
● DELETE Command:
○ Removes records from a table.
○ Syntax: DELETE FROM table_name WHERE condition;
○ Requires the WHERE clause to prevent accidentally deleting all
records and to specify which records to remove based on certain
conditions.
○ Example: DELETE FROM customers WHERE customer_id = 456;
● Where:
○ The where clause specifies conditions that the result must satisfy
○ SQL includes a between comparison operator to find records by
specifying range
○ select loan_number from loan where branch_name = ‘ Perryridge’
and amount > 1200
● From:
○ The from clause lists the relations involved in the query
○ You can involve multiple relation in query by using comma as
separator
○ select * from borrower, loan
5. Constraints
● Key Constraints:
○ Primary Key Constraint: Ensures uniqueness and non-null values
in a column or a set of columns.
○ It uniquely identifies each record in a table.
○ Example: CREATE TABLE students (student_id INT PRIMARY KEY,
name VARCHAR(50));
● Cascadeless Rollback:
○ Cascadeless rollback ensures that the rollback of a transaction
does not cause other transactions (not directly dependent on the
failed transaction) to be rolled back.
○ Prevents the cascading effect of rollbacks, maintaining a more
stable and consistent system.
○ Advantages of Cascadeless Rollback:
■ Isolation Maintenance: Preserves the isolation level
between independent transactions, allowing them to
proceed independently without unnecessary rollbacks.
■ Reduced Impact of Failures: Limits the impact of a single
transaction failure on the overall system by avoiding
unnecessary rollbacks.
6. Concurrent Executions
● Concurrent execution refers to the simultaneous execution of multiple
transactions in a multi-user database system.
● Concurrent execution allows transactions to operate concurrently,
potentially improving system throughput and response time.
● Benefits of Concurrent Executions:
○ Improved Throughput: Allows multiple transactions to run
simultaneously, potentially increasing the system's overall
throughput.
○ Enhanced Responsiveness: Provides faster response times by
allowing concurrent transactions to execute simultaneously,
serving multiple users concurrently.
● Challenges in Concurrent Executions:
○ Isolation Levels: Different isolation levels might lead to issues like
lost updates, dirty reads, or non-repeatable reads that need to be
managed.
○ Deadlocks and Resource Contention: Situations where
transactions are waiting for resources held by other transactions,
leading to deadlocks or resource contention.
● Concurrency Control Mechanisms:
○ Locking: Granular locks on data items to control access and
maintain consistency.
○ Multiversion Concurrency Control (MVCC): Maintains multiple
versions of data for read consistency.
○ Timestamp Ordering: Assigning timestamps to transactions to
order their execution and manage conflicts.
7. Lock-based Concurrency Control Protocols
● Lock-based concurrency control is a technique used to manage
simultaneous access to shared resources by multiple transactions.
● It employs locks to control the access and ensure that transactions
maintain data consistency and integrity.
● Types of Locks:
○ Shared Lock (S-Lock): Allows multiple transactions to read
resources concurrently but prevents any transaction from writing
to it.
○ Exclusive Lock (X-Lock): Grants exclusive access to a transaction
for both read and write operations, blocking other transactions
from accessing the resource.
● Lock-based Protocols:
○ Two-Phase Locking (2PL):
■ Transactions acquire locks in two phases: the growing
phase (acquiring locks) and the shrinking phase (releasing
locks).
■ Ensures serializability by preventing conflict operations, i.e.,
a transaction cannot request a new lock after releasing any
lock.
○ Strict Two-Phase Locking (Strict 2PL):
■ Similar to 2PL but holds all locks until the transaction
reaches its commit point.
■ Guarantees strictness by holding all locks until the
transaction completes, reducing the possibility of cascading
rollbacks.
● Granularity of Locks:
○ Fine-grained Locking: Locks are applied at a finer level, like
individual data items or rows, allowing more concurrency but
potentially increasing overhead.
○ Coarse-grained Locking: Locks are applied at a higher level, like
entire tables, reducing overhead but potentially limiting
concurrency.
● Advantages: Ensures data consistency and integrity, allows for effective
coordination among transactions, and prevents conflicting operations.
● Limitations: May lead to deadlocks, increased contention, and potential
performance overhead due to lock management.
8. Timestamp based Concurrency Control Protocols
● Timestamp-based concurrency control uses timestamps assigned to
transactions or data items to manage concurrency and maintain
consistency.
● It ensures serializability by using transaction timestamps to determine the
order of access to shared resources.
● Types of Timestamps:
○ Transaction Timestamps: Assigned to transactions when they start
or request access to resources.
○ Data Item Timestamps: Assigned to shared data items to track
their last read or write operations by transactions.
● Timestamp Ordering:
○ Concurrency Control Policy: Transactions are ordered based on
their timestamps to control their access to data items.
○ Read and Write Rules: Ensures that a transaction with an older
timestamp can read data written by a transaction with a newer
timestamp, but the reverse is not allowed to maintain consistency.
● Concurrency Control Protocols:
○ Thomas' Write Rule (TW):
■ A transaction T can write a data item Q if the timestamp of
T is larger than the timestamp of the last transaction that
read or wrote Q.
■ Ensures consistency by allowing only newer transactions to
overwrite older data.
○ Wound-Wait Protocol:
■ If a transaction T1 requests a resource held by a transaction
T2, T2 is aborted if its timestamp is older than T1's
timestamp; otherwise, T1 waits.
■ Prevents older transactions from delaying newer
transactions, promoting efficient execution.
● Advantages: Efficiently manages concurrency by using timestamps to
order transactions, reduces contention, and ensures consistency.
● Limitations: May lead to increased aborts (transaction rollbacks),
potential timestamp overflow, and complexity in implementation.
9. Validation Based Concurrency Control Protocols
● Validation-based concurrency control protocols, primarily associated with
optimistic concurrency control, manage simultaneous access to data in a
database without locking mechanisms.
● These protocols allow transactions to execute concurrently and validate
their changes before committing them to ensure consistency.
● Optimistic Concurrency Control (OCC):
● OCC operates under the assumption that conflicts among transactions
are infrequent.
● Transactions proceed without acquiring locks during read or write
operations.
● They retain a local copy of data items they access during their execution.
● Upon completion, before committing, transactions undergo a validation
phase.
● The database system validates if the changes made by the current
transaction conflict with other transactions that committed during its
execution.
● Conflicts typically include situations where multiple transactions attempt
to modify the same data concurrently.
● If no conflicts are found during validation, the transaction is committed,
and its changes become visible to other transactions.
● In case of conflicts, the transaction is rolled back, and the process may be
retried.
● Timestamps are often used to track transaction order and validate
changes. A transaction's timestamp helps determine its relative order
concerning other transactions.
● Advantages:
○ Effective when conflicts are infrequent, leading to a higher overall
throughput.
● Challenges:
○ Overhead due to validation checks might impact performance.
10.Deadlock and its Handling
● A deadlock occurs when two or more transactions hold locks on
resources and wait indefinitely for locks held by others. As a result, no
transaction can proceed further.
● Starvation occurs if the concurrency control manager is badly designed.
● When a deadlock occurs there is a possibility of cascading roll-backs.
● Deadlock prevention protocols ensure that the system will never enter
into a deadlock state. Some prevention strategies :
○ Require that each transaction locks all its data items before it
begins execution (predeclaration).
○ Impose partial ordering of all data items and require that a
transaction can lock data items only in the order specified by the
partial order.
● Deadlock prevention strategies:
○ wait-die scheme — non-preemptive
■ older transaction may wait for younger one to release data
item. (older means smaller timestamp) Younger
transactions never Younger transactions never wait for
older ones; they are rolled back instead.
■ a transaction may die several times before acquiring needed
data item
○ wound-wait scheme — preemptive
■ older transaction wounds (forces rollback) of younger
transaction instead of waiting for it. Younger transactions
may wait for older ones.
■ may be fewer rollbacks than wait-die scheme.
○ Timeout-Based Schemes:
■ a transaction waits for a lock only for a specified amount of
time.
■ If the lock has not been granted within that time, the
transaction is rolled back and restarted,
■ Thus, deadlocks are not possible
11.Deadlock detection
● Deadlock detection in a database management system involves
identifying if a deadlock has occurred among transactions.
● One of the popular methods used for deadlock detection is the Wait-for
Graph.
● The wait-for graph represents transactions as nodes and resource
allocation edges.
● Nodes represent transactions, and directed edges signify the resources
that are being waited for by other transactions.
● Each transaction that is waiting for a resource from another transaction
contributes to the graph's construction.
● Node and Edge Representations:
○ Nodes: Represent transactions awaiting resources.
○ Edges: Show the resource dependency, i.e., a transaction waiting
for a resource held by another transaction.
● If a cycle exists in the wait-for graph, it indicates the presence of a
deadlock.
● A cycle signifies a circular chain of transactions, each waiting for a
resource held by the next transaction in the chain.
12.Snapshot Isolation,read,write
● Snapshot Isolation is a concurrency control method used in database
systems to provide consistent reads without blocking writes.
● It ensures that a transaction sees a consistent snapshot of the database
at the beginning of its execution and prevents certain concurrency
anomalies.
● Properties:
○ Consistency: Guarantees consistent reads throughout the
transaction's lifespan.
○ Read Consistency: Reads from a consistent snapshot irrespective
of concurrent writes.
○ Reduced Blocking: Allows concurrent reads and writes without
significant blocking, improving system performance.
○ Potential Write Conflicts: Write operations might encounter
conflicts if multiple transactions modify the same data.
● Read Operation:
○ When a transaction starts, it takes a snapshot of the database,
capturing the committed state of all data items at that moment.
○ Throughout the transaction, reads are performed from this
consistent snapshot rather than directly from the live database.
○ Changes made by other transactions after the snapshot was taken
are not visible during the transaction's execution.
● Write Operation:
○ When a transaction performs write operations, the changes are
not immediately reflected in the database.
○ If a write conflicts with another transaction's uncommitted data,
the system might reject the write or cause the transaction to wait.
○ The changes made by the transaction are visible to other
transactions only upon commit, ensuring atomicity and isolation.
13.Recovery System: Failure classification
● Transaction Failures:
○ Logical Error:
■ Logical errors occur due to incorrect data input or incorrect
application logic within a transaction.
■ These errors lead to improper or invalid data modifications
within the database.
○ System Error:
■ System errors occur due to software or hardware issues
within the database management system (DBMS).
■ They may cause unexpected termination or incorrect
functioning of a transaction.
● System Crash:
○ A system crash is an abrupt and unexpected failure of the entire
database system or a significant component of it.
○ It results in the loss of volatile data in memory, potential
corruption of data, and interruption of ongoing transactions.
● Disk Failure:
○ Disk failures occur when there is a malfunction or damage to the
storage disks where the database is stored.
○ They can lead to the loss or corruption of stored data, affecting the
database's integrity and availability.
14.Log based recovery
● Log-based recovery is a fundamental mechanism in database systems
used to restore the database to a consistent state after a system failure.
● Every transaction's operation (e.g., read, write) is recorded in a log before
the changes are applied to the database.
● Each log entry contains details like transaction ID, operation type, before
and after values, and a unique sequence number or timestamp.
● Types of Log Records:
○ Undo (Backward Recovery): Records necessary to reverse or undo
the changes made by incomplete or aborted transactions.
○ Redo (Forward Recovery): Records needed to reapply changes
made by committed transactions that were not yet written to the
database before a failure occurred.
● Checkpointing:
○ Periodic checkpoints are created, indicating a consistent state of
the database.
○ Reduces the recovery time by providing a starting point closer to
the failure time rather than starting from the beginning.
● Recovery after Failure:
○ Starting from the last checkpoint or the end of the log, the system
analyzes the log to identify transactions that were in progress
during the failure.
○ It identifies committed transactions and those whose changes
need to be undone or redone.
○ Redo changes recorded in the log that were committed but not yet
written to the database before the failure.
○ Undo or rollback changes made by incomplete or aborted
transactions, restoring the database to a consistent state.
● Advantages of Log-Based Recovery:
○ Provides a systematic method to recover the database after
various types of failures.
○ Ensures durability and maintains data consistency by replaying
committed transactions and undoing incomplete ones.
15.Storage devices and stable storage implementation
● Volatile storage:
○ does not survive system crashes
○ examples: main memory, cache memory
● Nonvolatile storage:
○ survives system crashes
○ examples: disk, tape, flash memory, non-volatile (battery backed
up) RAM
○ but may still fail, losing data
● Stable storage:
○ a mythical form of storage that survives all failures
○ approximated by maintaining multiple copies on distinct
nonvolatile media
○ Maintain multiple copies of each block on separate disks for
implementation
16. Data access
● Physical blocks are those blocks residing on the disk.
● Buffer blocks are the blocks residing temporarily in main memory.
● Block movements between disk and main memory are initiated through
the following two operations:
○ input(B) transfers the physical block B to main memory.
○ output(B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
● We assume, for simplicity, that each data item fits in, and is stored inside,
a single block.
17.Shadow Paging
● Shadow paging is a technique used in database systems for
implementing the concept of atomicity during transaction management.
● It ensures that the changes made by a transaction become visible to other
transactions atomically, meaning either all changes are committed or
none.
● Implementation Approach:
○ Duplicate Database Copy: Maintains two physical copies of the
database: the current version (visible to users) and a shadow
version (used during transaction execution).
○ Shadow Page Table: Keeps track of the mapping between the
current database pages and their corresponding shadow pages.
● Transaction Execution:
○ Read Operations: Transactions read data from the current version
of the database.
○ Write Operations: Changes made by a transaction are applied to
the shadow version of the database, leaving the current version
unchanged.
● Transaction Commit:
○ Atomic Commit: Upon successful completion of a transaction, the
changes in the shadow version are atomically committed to the
current version.
○ Switching Pointers: The shadow page table is updated to redirect
pointers from the current version to the shadow version, making
the changes visible.
● Rollback Handling:
○ Aborted Transactions: If a transaction aborts or fails, no changes
from the shadow version are applied to the current version.
○ Clean-Up: The changes in the shadow version related to the
aborted transaction are discarded, maintaining consistency.
● Advantages of Shadow Paging:
○ Unlike logging-based approaches, shadow paging doesn't require
maintaining undo logs for rollback purposes.
● Challenges of Shadow Paging:
○ Overhead of Duplication: Requires space for maintaining two
copies of the database, which might be substantial for large
databases.
18.Checkpoints
● Checkpoints capture a point-in-time snapshot of the database, ensuring
all changes are written to stable storage.
● Provides a starting point for recovery in case of system crashes or
failures.
● Before the checkpoint, the system ensures that all transaction log entries
are flushed to the stable storage.
● After a system crash or failure, the recovery process starts from the last
checkpoint, minimizing the amount of log to be replayed.
● By providing a consistent starting point, recovery time is reduced
compared to starting from the beginning of the logs.
● Advantages of Checkpoints:
○ Ensures a consistent and recoverable state of the database.
○ Shortens recovery time by providing a known consistent state.
○ Minimizes the risk of data loss during system failures.
● Impact on Performance:
○ Frequent checkpoints might affect transaction processing by
introducing slight delays during the checkpoint process.
○ Writing database pages to disk during checkpoints incurs I/O
overhead, impacting system performance.
19.ARIES recovery algorithm
● The ARIES (Algorithm for Recovery and Isolation Exploiting Semantics)
recovery algorithm is a widely-used and efficient database recovery
method that provides high-performance crash recovery and ensures
database consistency.
● It consists of three main phases: Analysis, Redo, and Undo, and is based
on the principles of write-ahead logging (WAL).
● 1. Analysis Phase:
○ Scan of Transaction Log:
■ Begins by analyzing the transaction log starting from the
last checkpoint.
■ Identifies the transactions that were active but not
committed during the crash or failure.
○ Reconstruction of Transaction Table (Transaction Table and Dirty
Page Table):
■ Rebuilds the in-memory transaction table, storing
information about active transactions (including their states)
during the failure.
■ Constructs the dirty page table, which records modified
pages by active transactions not yet committed.
● 2. Redo Phase:
○ Redo Pass:
■ Reapplies changes recorded in the log for committed
transactions and for transactions whose changes were not
reflected in the database before the failure.
■ Re-executes the updates described in the log, bringing the
database to a consistent state.
○ Write-Ahead Logging (WAL):
■ Utilizes the log to ensure that committed transactions'
changes, which might not have been written to the
database, are reapplied during recovery.
● 3. Undo Phase:
○ Undo Pass:
■ Rolls back the changes made by transactions that were
active but not committed during the failure.
■ Uses the log to perform backward log traversal and undo
uncommitted changes, returning the database to a
consistent state.
○ Logging Undo Operations:
■ Generates undo log records to ensure atomicity and
durability during the rollback process.
● Advantages of ARIES:
○ Minimizes the amount of work needed to recover the database
after a crash by utilizing the log efficiently.
○ Ensures atomicity, consistency, isolation, and durability (ACID
properties) during recovery.
Chapter 6
1. Hashing techniques
● Hashing techniques are used in computing to transform data (often keys
or values) into fixed-size values called hash codes or hash values.
● 1. Division Method:
○ It uses the modulo operation by dividing the key by a prime
number (not close to a power of 2) and taking the remainder as the
hash value.
○ Hash = key % divisor, where the divisor is a prime number.
○ Prone to clustering if the divisor isn't chosen properly, leading to
collisions (multiple keys hashing to the same location).
● 2. Mid Square Method:
○ It involves squaring the key and extracting a portion (often middle
digits) as the hash value.
○ Hash = Middle digits of (key^2)
○ Can produce non-uniform hash values depending on the key
distribution and can be computationally intensive for larger keys.
● 3. Folding Method:
○ Splits the key into smaller parts (e.g., digits or bytes), applies some
operation (e.g., addition), and then reduces it to the hash value.
○ Key is divided into segments, and these segments are added
together to obtain the hash.
○ Requires careful design to handle keys of different lengths, and
folding technique may need adjustment for specific key
distributions.
● 4. Multiplication Method:
○ Involves multiplying the key by a constant (usually a fraction),
extracting the fractional part, and multiplying it by the table size to
obtain the hash value.
○ Hash = Integer portion of (key * constant) * table_size
○ The choice of the constant is critical; improper selection may lead
to non-uniform distribution and increased chances of collisions.
2. Collision
● In hash tables, collisions occur when two or more keys hash to the same
index or slot in the hash table.
● Collisions need to be managed to ensure the proper storage and retrieval
of data.
● Collision Handling:
● Open Addressing (Closed Hashing):
○ Open addressing is a collision resolution technique where, upon a
collision, the system searches for an alternative (i.e., an open or
empty slot) within the hash table itself to store the collided key-
value pair.
○ When a collision occurs, the algorithm probes the table by looking
for the next available slot using various methods like linear
probing, quadratic probing, or double hashing until an empty slot
is found.
○ Linear probing probes the hash table linearly & If a collision occurs,
the algorithm checks the next slot in a linear fashion until an
empty slot is found.
○ Quadratic probing uses a quadratic function to probe the hash
table. If a collision occurs at index 'i', the algorithm checks slots at
positions 'i + 1', 'i + 4', 'i + 9', and so on (quadratic increments) until
an empty slot is found.
○ Double hashing uses two hash functions to calculate the sequence
of probe positions.If a collision occurs at index 'i', the second hash
function generates a different increment value to find the next
available slot.
○ Clustering can occur over time as consecutive collisions may lead
to longer probe sequences, affecting performance.
● Separate Chaining (Open Hashing):
○ Separate chaining handles collisions by allowing each slot in the
hash table to point to a linked list or another data structure (such
as an array or linked list).
○ Collided keys are stored in the same slot using a data structure
(e.g., linked list) at that location, avoiding overwriting.
○ Avoids clustering, offers flexibility in handling multiple collided
keys, and doesn't require additional probing.
○ Additional memory overhead due to pointers or extra data
structures, and longer search times if the linked lists become
lengthy.
3. Types of Indexes: Single Level, Ordered Indexes, Multilevel Indexes
● Single Level Indexes:
○ Single Level Indexes maintain a single-level structure, typically
mapping keys to corresponding data records in a one-to-one
relationship.
○ Directly maps index entries to the data records, allowing fast
access to specific records based on keys.
○ Offers quick access to data as there is a direct mapping between
index entries and data records.
○ Suffers from scalability issues when handling a large volume of
data due to the single-level structure.
● Ordered Indexes:
○ Ordered Indexes maintain a sorted sequence of index entries
based on the indexed attribute's values.
○ Entries are sorted based on key values, allowing efficient range
queries and traversal.
○ Facilitates rapid search operations using binary search, enabling
efficient retrieval of ranges of values.
○ Insertions and deletions might require restructuring of the index,
impacting performance.
● 3. Multilevel Indexes:
○ Multilevel Indexes utilize a hierarchical structure consisting of
multiple levels to manage and access the index.
○ Hierarchical levels of indexes help in navigating through the data,
reducing search times by using primary and secondary levels of
indexes.
○ Efficiently handles large datasets by organizing index entries into
multiple levels, allowing faster access to data.
○ Requires additional memory and maintenance as the number of
levels increases, impacting storage and performance.
4. Overview of B-Trees and B+ Trees
● B-Trees:
○ B-trees are balanced tree structures with a variable but fixed
maximum number of children per node, typically denoted as the
"order" of the tree.
○ Each node in a B-tree can have multiple children (often referred to
as keys), ranging from ceil(order/2) to order children.
○ Properties:
■ All leaf nodes are at the same level, maintaining balance
within the tree.
■ B-trees are optimized for disk access due to their ability to
minimize the number of I/O operations.
■ Offers logarithmic time complexity O(log n) for search,
insertions, and deletions.
○ Commonly used in databases and file systems for indexing and
organizing large amounts of data.
● B+ Trees:
● B+ trees are similar to B-trees but with some distinct differences,
particularly in how they store and organize data.
● In B+ trees, all keys are present in the leaf nodes, and the leaf nodes are
linked together in a linked list to allow sequential access.
● Properties:
○ Similar to B-trees, B+ trees maintain a balanced structure and
allow sequential access due to the linked list structure of leaf
nodes.
○ Particularly efficient for range queries due to the sequential nature
of leaf nodes.
○ Offers logarithmic time complexity O(log n) for search, insertions,
and deletions, similar to B-trees.
● Widely used in databases to build indexes and efficiently support range
queries.
5. Separate chaining vs open addressing