You are on page 1of 96

Chapter 1

1. Introduction to database
● A database is a structured collection of data designed for efficient
storage, retrieval, and management.
● Its primary purpose is to organize data in a systematic manner, facilitating
easy access and manipulation.
● Different types of databases exist, including relational (SQL), NoSQL,
object-oriented, etc.
● For example: Social media platforms use databases to manage user
profiles, posts, connections, and interactions.
● Advantages of Databases:
○ Databases ensure data consistency, accuracy, and reliability.
○ They offer robust security measures, scalability, and concurrent
access for enhanced efficiency.
2. Characteristics of Database
● Self-Describing Nature:
○ The DBMS catalog stores metadata, providing descriptions of the
database structure.
○ This self-describing nature simplifies database management and
allows for more efficient querying and manipulation of data.
● Insulation between Programs and Data:
○ Program-data independence allows modifications to data storage
structures and operations without necessitating changes in DBMS
access programs.
○ This insulation enables easier adaptation to evolving business
needs without affecting the applications utilizing the database.
● Data Abstraction:
○ Utilizes a data model to offer users a conceptual view of the
database, concealing intricate storage details.
○ Data abstraction enhances user understanding by presenting a
simplified and logical view of the database structure.
● Support of Multiple Views:
○ Enables diverse users to have customized views of the database,
displaying only the data relevant to their specific needs or
permissions.
○ Multiple views enhance usability by tailoring data presentation
based on different user roles or requirements.
● Sharing of Data and Multi User Transaction Processing:
○ Allows multiple users to concurrently access, retrieve, and update
the database.
○ Concurrency control mechanisms within the DBMS ensure proper
execution or complete cancellation of each transaction to maintain
data consistency and integrity.
3. Properties of database
● Representation of Real-World Aspects:
○ Databases aim to represent aspects of the real world. Through
tables, records, and relationships, databases emulate real-world
structures to facilitate data organization and retrieval.
● Logical Coherence and Meaningful Data:
○ Data within databases is logically coherent, organized in a
structured manner following predefined rules and relationships.
● Purposeful Design, Construction, and Population:
○ Databases are designed, built, and populated with a specific
objective in mind, addressing particular needs or requirements.
○ The creation process involves defining schemas, tables, and
constraints based on the intended use and application
requirements.
● Tailored for Target Users and Applications:
○ Databases are constructed considering a target group of users and
specific applications they serve.
○ They are optimized to efficiently store, retrieve, and manage data
relevant to the intended users and applications, ensuring
performance meets their requirements.
4. File vs Database system
5. Users of database
● Actors on the Scene
● These individuals directly interact with, control, or use the database and
its content:
○ Database Administrators (DBAs):
■ Responsible for authorizing access, coordinating and
monitoring database use, acquiring resources, controlling
access, and ensuring operational efficiency.
○ Database Designers:
■ Define the database content, structure, constraints, and
functions or transactions. They liaise with end-users to
understand their requirements and translate them into the
database design.
○ End-users:
■ Utilize the data for queries, reports, and some update
operations.
● Categories of End-users
○ Casual Users:
■ Access the database occasionally as needed.
○ Naïve or Parametric Users:
■ Utilize pre-defined functions ("canned transactions")
regularly. Examples include bank-tellers or reservation
clerks.
○ Application Programmers:
■ Develop application programs using various tools, including
those for user interface development.
○ Sophisticated Users:
■ Business analysts, scientists, engineers, etc., familiar with
system capabilities, often using software packages that
work closely with the stored database.
○ Stand-alone Users:
■ Maintain personal databases using packaged applications,
e.g., an individual using a tax program to manage their
personal database.
● Database Administrator (DBA)
○ Oversee central control of data and programs accessing the data
within DBMS.
○ Functions include schema definition, storage structure
modification, authorization management, and routine maintenance.
● Workers Behind the Scene
● These individuals contribute to the development, implementation, and
maintenance of the database system:
○ DBMS System Designers and Implementers:
■ Design and implement DBMS modules and interfaces as
software packages.
○ Tool Developers:
■ Design and implement software tools facilitating database
modeling, design, system optimization, and performance
enhancement.
○ Operators and Maintenance Personnel:
■ Responsible for running and maintaining the hardware and
software environment for the database system.
6. Three schema architecture
● External Schema:
○ Provides a tailored view of data for each user group or application.
○ Offers specific data formats & access rights unique to each user's
needs
● Conceptual Schema:
○ Presents a unified, logical structure of the entire database.
○ Ensures consistency across different user perspectives without
detailing specific user needs.
● Internal Schema:
○ Describes how data is physically stored within the database.
○ Focuses on optimizing storage and retrieval mechanisms.
● Operations and Mappings
○ Request Processing:
■ Transforms user requests from their view (external schema)
to the conceptual schema.
■ Converts these requests to operations accessing the
internal schema for efficient processing.
○ Mappings:
■ Ensure accurate processing of requests between schema
levels without compromising data integrity.
○ Data Formatting:
■ Adjusts internally stored data to match the specific user's
view during data retrieval.
7. Database Independence
● Data Independence refers to the database system's ability to modify one
level of the system without impacting other levels, ensuring flexibility
and adaptability. There are two primary types:
● Logical Data Independence:
○ Enables modifications to the conceptual schema without affecting
external schemas or application programs.
○ Changes such as additions or deletions of data within the
database should not disrupt or alter the external view or access to
data.
○ Applications using the external schema should continue
functioning normally even after significant changes are made to
the logical structure of the database.
● Physical Data Independence:
○ Allows alterations to the internal schema without necessitating
changes in the conceptual or external schemas.
○ Changes in the physical schema, like optimizing storage,
restructuring files, or improving performance through indexing, can
be implemented without affecting the external view of data.
○ Modifications to the physical organization should not demand
changes in the conceptual schema as long as the data remains
consistent.
8. Database architecture
● Query Processor
● Handles user queries and facilitates data retrieval.
● Components:
○ DDL Interpreter: Interprets and executes Data Definition Language
(DDL) statements, retrieving data definitions from the data
dictionary.
○ DML Compiler: Translates Data Manipulation Language (DML)
statements into executable low-level instructions for the query
evaluation engine.
○ Query Evaluation Engine: Executes instructions generated by the
DML compiler to fetch data from the DBMS.
● Storage Manager
● Responsible for managing data storage and ensuring data integrity.
● Components:
○ Authorization and Integrity Manager: Enforces integrity constraints
and manages user access rights.
○ Transaction Manager: Maintains database consistency, especially
during system failures or concurrent access.
○ File Manager: Controls disk storage space allocation and data
structures on disk.
○ Buffer Manager: Handles efficient data transfer between disk
storage and main memory, optimizing data retrieval.
● Storage Components
○ Data Files: Store actual data within the database.
○ Data Dictionary: Contains metadata providing essential
information about the database structure.
○ Indices: Facilitate swift access to specific data items, enhancing
data retrieval speed.
9. Schema and instance
● A schema represents the logical plan defining the structure, organization,
and relationships of data within a database.
● It provides the blueprint for how data is stored, accessed, and
manipulated.
● Key Points about Schemas:
○ Structure Definition:
■ Specifies the table structure, attributes (columns), data
types, constraints, and relationships between tables.
■ Defines the layout and arrangement of data elements,
ensuring data organization and coherence.
○ Data Integrity Constraints:
■ Enforces data integrity by implementing constraints such as
primary keys, foreign keys, unique constraints, and check
constraints.
■ Ensures accuracy, consistency, and reliability of data stored
within the database.
○ Security and Access Control:
■ Defines user roles and access privileges within the database
environment.
■ Specifies who has permission to view, modify, or manipulate
data in tables or the entire database.
■ Includes security policies and access control mechanisms to
safeguard sensitive information and regulate user actions.
○ Data Dictionary:
■ Often contains a data dictionary that serves as a repository
for metadata about the database.
■ Contains descriptions of tables, columns, constraints,
indexes, and other database components.
■ Provides essential information describing the database's
structure, aiding in understanding and managing the
database effectively.
Chapter 2

1. Er diagram & entity types


● ERD is a visual representation displaying relationships between entities
in a database.
● Entity types are representations of real-world objects or concepts stored
in the database. These entities can be either tangible (e.g., a person, a
product) or intangible (e.g., an event, an invoice).
● Entity types in an ERD encompass specific attributes defining their
characteristics and properties, aiding in the understanding of the data
they store.
● ERDs employ various symbols (like rectangles for entities, lines for
relationships) to depict connections, cardinality, and the nature of
associations between different entities.
● Strong Entity Sets: These entities exist independently and possess their
attributes. They are represented by rectangles in ERDs and typically have
a primary key to uniquely identify each instance. Example: 'Customer'
entity in a 'Customer-Order' scenario.
● Weak Entity Sets: Weak entities depend on a related strong entity for
identification and do not have a primary key on their own. They're often
depicted using double rectangles in ERDs. Example: 'OrderItem' entity
relying on 'OrderID' within the 'Order' entity, lacking its unique
identification.
2. Data models
● Relational Model:
○ Uses tables to represent data and relationships.
○ Each table has rows (records) and columns (attributes).
○ Tables follow fixed-format records with defined attributes.
● Entity-Relationship Model (E-R Model):
○ Utilizes entities and relationships to represent data.
○ Entities signify distinguishable real-world objects.
○ Relationships illustrate connections between entities.
● Object-Based Data Model:
○ Extends E-R model with encapsulation, methods, and object
identity.
○ Incorporates object-oriented principles within data modeling.
● Object-Relational Data Model:
○ Combines features of object-oriented and relational models.
○ Integrates object-oriented concepts with relational database
capabilities.
● Semistructured Data Model:
○ Allows varying attributes within the same data type.
○ Differs from traditional models by permitting different attribute
sets.
3. Attributes
● Simple (Atomic) Attribute:
○ Attributes that cannot be divided further, such as 'Age' of a person.
● Composite Attribute:
○ Attributes that can be divided into smaller subparts, representing
more basic attributes with independent meanings.
○ For instance, 'Address' is composed of subparts like street number,
city, state, and postal code.
● Single-Valued Attribute:
○ Represents an attribute with a single value for a specific entity.
○ Example: 'DateOfBirth' for an individual entity.
● Multivalued Attributes:
○ Attributes holding multiple values for a particular entity.
○ An example could be 'PhoneNumbers' for a person having
multiple contact numbers.
● Complex Attributes:
○ Attributes that combine composite and multivalued attributes,
allowing nested structures.
○ Consider the 'Education' attribute consisting of 'Degree' and
'University' nested within.
● Derived Attributes:
○ Attribute values derived from other entities or attributes.
○ An instance could be 'Age,' calculated from 'DateOfBirth' in a
database.
○ Another example is 'TotalCost' in an e-commerce system, derived
from 'UnitPrice' and 'Quantity.'
4. Cardinality
● Cardinality describes the number of entities associated with another
entity through a relationship set in a database.
● Cardinality is primarily used to comprehend the relationships between
two entities within a database.
● Types of Mapping Cardinality in Binary Relationships:
○ One-to-One (1:1):
■ Each entity in one set is linked to precisely one entity in
another set.
■ Example: A marriage relationship between two individuals,
where each person is married to only one other person.
○ One-to-Many (1:N):
■ Each entity in one set can be associated with multiple
entities in another set, but each entity in the second set is
connected to only one entity in the first set.
■ Example: A university department (one) having multiple
students (many), yet each student is affiliated with only one
department.
○ Many-to-One (N:1):
■ Multiple entities in one set can relate to only one entity in
another set.
■ Example: Many employees (many) belong to a single
department (one), but each department can have multiple
employees.
○ Many-to-Many (N:M):
■ Multiple entities in one set can be associated with multiple
entities in another set.
■ Example: Students (many) enrolling in multiple courses
(many), and courses having multiple students enrolled
concurrently.
5. Participation
● Participation describes the involvement of entities from an entity set in
relationships within a relationship set in a database schema.
● Total Participation (Indicated by Double Line):
○ Total participation involves every entity within an entity set
participating in at least one relationship within the relationship set.
○ Contributes to data integrity and completeness.
○ Example: Consider the relationship between 'student' and
'advisor.' If the participation of students in the advisor relationship
is total, it means every student must have an associated advisor.
No student can exist without an advisor relationship.
● Partial Participation:
○ In contrast, partial participation refers to a scenario where some
entities within an entity set may not participate in any relationship
within the relationship set.
○ Offers flexibility, permitting some entities in the entity set to
remain unconnected in certain relationships
○ Example: In the relationship between 'instructor' and 'advisor,' if
the participation of instructors in the advisor relationship is partial,
it implies that some instructors may not be associated with any
advisor relationship. They might not have an advisor assigned to
them.
6. Extended Entity-Relationship Model: Generalization,Specialization &
Aggregation
● The Extended Entity-Relationship model is an advanced and flexible
database design concept that enhances the Entity-Relationship model.
● Generalization:
○ It's a bottom-up approach merging lower-level entities that share
common attributes to create a higher-level entity.
○ Example: In a university database, "Person" can be a general entity
from which "Student" and "Faculty" inherit common attributes like
"Name" and "Address."
● Specialization:
○ A top-down approach that divides a higher-level entity into
multiple lower-level entities based on specific attributes or
relationships.
○ Example: In a vehicle database, "Vehicle" can be specialized into
"Car" and "Motorcycle," each with unique attributes like "Number
of Doors" for cars and "Engine Displacement" for motorcycles.
● Aggregation:
○ Represents whole-part relationships, combining smaller related
entities to form a higher-level entity.
○ Example: In an e-commerce system, an "Order" entity can
aggregate multiple "Order Items," each representing a product,
quantity, and price.
○ These concepts within the Extended Entity-Relationship model
enhance the traditional Entity-Relationship model by allowing for
more complex and structured data representation.
7. Er to Schema
8. Keys
● Key is a specific attribute or combination of attributes within an entity or
relation (table) that uniquely identifies each tuple (row) in the entity.
● They ensure data uniqueness,relational connections,efficient data
retrieval
● Primary Key:
○ A primary key is a unique attribute or a combination of attributes
within a table that uniquely identifies each row in that table.
○ Purpose: Ensures data integrity and acts as the main link between
different tables in a relational database.
○ Example: In an "Employees" table, the "EmployeeID" attribute
serves as the primary key.
● Candidate Key:
○ Definition: A candidate key is an attribute or a set of attributes that
could potentially function as a primary key due to their unique
identification property.
○ Purpose: Provides multiple options for choosing the primary key.
○ Example: In the same "Employees" table, both "EmployeeID" and
"Social Security Number" could be candidate keys.
● Superkey:
○ Definition: A superkey is a set of attributes that uniquely identifies
tuples but may contain more attributes than required.
○ Purpose: Represents a broader identification set that includes
attributes not strictly necessary for uniqueness.
○ Example: A superkey for the "Employees" table might include
"EmployeeID," "First Name," and "Last Name."
● Foreign Key:
○ Definition: A foreign key is an attribute in one table that references
the primary key in another table, establishing a relationship
between the two tables.
○ Purpose: Creates relationships between tables to enforce
referential integrity.
○ Example: In an "Orders" table, a "CustomerID" attribute might
serve as a foreign key, linking each order to a specific customer in
the "Customers" table.
9. Relational model
● Relational Model:
● The relational model is a database management approach that organizes
data into tables (relations) consisting of rows (tuples) and columns
(attributes).
● Developed by E.F. Codd in the 1970s, it forms the foundation for most
modern database systems.
● Key Characteristics:
○ Data is structured into tables representing entities and
relationships.
○ Each table contains rows representing individual records and
columns representing specific attributes or properties.
○ Utilizes the principles of set theory and predicate logic to
manipulate data.

10.Relational schema
● A relational schema is a blueprint that defines the structure, constraints,
and relationships of tables in a relational database.
● It outlines the logical view of the database and describes how data is
organized and stored.
● Components of Relational Schema:
○ Table Definition: Specifies table names and their associated
attributes.
○ Constraints: Enforces rules and restrictions on data integrity, such
as primary keys, foreign keys, and unique constraints.
○ Relationships: Defines associations between tables, establishing
connections using keys.
11.ORDBMS vs RDBMS
12.Relational algebra
● The algebraic operations create new relations by manipulating existing
ones using operations within the algebra.
● Operations can be chained or combined to form relational algebra
expressions, defining complex queries or data manipulations.
● The output of a relational algebra expression is a relation, representing
the result of a database query or retrieval request.
● Relational algebra forms the fundamental operations for the relational
model, allowing users to specify basic retrieval requests or queries.
● All operations in relational algebra produce relations as output, making
the algebra "closed" as all objects within it are relations.
● Types of Relational Algebra Operations:
● Unary Relational Operations:
○ SELECT (σ):
■ Filters rows from a relation based on a specified condition.
○ PROJECT (π):
■ Selects specific columns from a relation, removing
duplicates.
○ RENAME (ρ):
■ Changes the name of a relation or its attributes.
● Operations from Set Theory:
○ UNION (∪), INTERSECTION (∩), DIFFERENCE (or MINUS,
–):
■ Set-based operations to combine or compare relations.
○ CARTESIAN PRODUCT (×):
■ Combines tuples from two relations to create a new
relation.
13.Select
● Purpose: Filters rows from a relation based on specified conditions.
● Symbol: Represented by σ (sigma).
● Functionality: Retrieves rows that satisfy specified criteria.
● Syntax: σ<condition>(Relation)
● Example: σ(Salary > 50000)(Employees)
● Result: Generates a new relation with rows meeting the condition.

14.Project
● Purpose: Selects specific columns from a relation.
● Symbol: Represented by π (pi).
● Functionality: Retrieves chosen attributes from a relation, eliminating
duplicates.
● Syntax: π<attribute(s)>(Relation)
● Example: π(Name, Salary)(Employees)
● Result: Generates a new relation with specified attributes.

15.Rename
● Purpose: Changes the name of a relation or its attributes.
● Symbol: Represented by ρ (rho).
● Functionality: Renames the relation or its attributes for clarity or
convenience.
● Syntax: ρ<NewName/Attribute>(OldName)
● Example: ρ(NewSalary)(Salary)
● Result: Renames the attribute "Salary" to "NewSalary" in the relation.
16. UNION (∪)
● Purpose: Combines tuples from two relations, removing duplicates.
● Symbol: Represented by ∪ (union).
● Functionality: Merges rows from two relations into a single relation.
● Syntax: R ∪ S (R and S are relations with the same
attributes)
● Example: If R represents the set of employees in
Department A and S represents employees in Department B, R
∪ S would provide a combined list of employees from both
departments, removing duplicates.
● Result: Generates a new relation with all distinct tuples from both
relations.

17. INTERSECTION (∩)


● Purpose: Retrieves common tuples from two relations.
● Symbol: Represented by ∩ (intersection).
● Functionality: Selects rows that appear in both relations.
● Syntax: R ∩ S (R and S are relations with the same
attributes)
● Example: If R represents the set of employees in
Department A and S represents employees in Department B, R
∩ S would provide a list of employees who belong to both
departments.
● Result: Generates a new relation with tuples that exist in both input
relations.

18. SET DIFFERENCE (or MINUS, −)


● Purpose: Retrieves tuples present in one relation but not in another.
● Symbol: Represented by MINUS (−) or the subtraction
symbol.
● Functionality: Subtracts tuples from one relation that also exist in
another.
● Syntax: R - S (R and S are relations with the same attributes)
● Example: If R represents the set of all employees and S represents
employees in Department A, R - S would provide a list of employees not
belonging to Department A.
● Result: Generates a new relation with tuples from the first relation that
are not present in the second relation.
19. JOIN
● Purpose: Combines tuples from two relations based on a specified
condition.
● Functionality: Merges rows from two relations that satisfy a given
condition.
● Types: Various types of joins exist—INNER JOIN, LEFT JOIN, RIGHT JOIN,
FULL JOIN, etc.
● Syntax (for INNER JOIN): R ⨝<condition> S (R and S are relations)
● Example: If R represents the set of employees and S represents their respective
departments, R ⨝ (R.DepartmentID = S.DepartmentID) S would combine employee
data with their corresponding departments.
● Result: Generates a new relation by matching tuples from both input
relations based on the specified condition.

20. EQUIJOIN
● Purpose: Performs a type of JOIN operation based on equality between
attributes.
● Functionality: Merges rows from two relations where specified attributes
are equal.
● Syntax: R ⨝ (R.Attribute = S.Attribute) S (R and S are relations)
● Example: If R represents the set of employees and S represents their respective
departments, R ⨝ (R.DepartmentID = S.DepartmentID) S would combine employee
data with their corresponding departments where the DepartmentID matches.
● Result: Generates a new relation by matching tuples from both input
relations where the specified attributes are equal.
21. NATURAL JOIN
● Purpose: Performs a JOIN operation based on common attributes
between two relations.
● Functionality: Merges rows from two relations with matching attribute
names.
● Syntax: R ⨝ NATURAL JOIN S (R and S are relations)
● Example: If R represents employee data and S represents departments, R ⨝
NATURAL JOIN S would combine data where both relations share common
attribute names, such as "DepartmentID".
● Result: Generates a new relation by combining tuples from both input
relations based on their shared attribute names.

22.Aggregate functions
● Purpose: Computes summaries or aggregations of data within a relation.
● Usage: Operates on groups of tuples to generate a single result.
● Functions: Common aggregate functions include:
○ SUM: Computes the sum of values in a specified attribute.
○ COUNT: Calculates the number of tuples or non-null values.
○ AVG: Determines the average of values in a specified attribute.
○ MIN: Finds the minimum value in a specified attribute.
○ MAX: Identifies the maximum value in a specified attribute.
● Syntax: AggregateFunction<Attribute>(Relation)
● Example: COUNT(EmployeeID)(Employees) - Counts the number of
EmployeeID entries in the Employees relation.
● Result: Returns a single value that represents the result of the aggregate
function applied to the specified attribute in the relation.
23. OUTER JOIN Operation
● Purpose: Performs a JOIN operation, including unmatched tuples from
one or both relations.
● Types: Includes LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL
OUTER JOIN.
● Functionality: Combines tuples from two relations based on a specified
condition and includes unmatched tuples.
● Syntax (for LEFT OUTER JOIN): R ⟕<condition> S (R and S are relations)
● Example: If R represents employees and S represents departments, R ⟕
(R.DepartmentID = S.DepartmentID) S would combine employee data and
include departments even if no employee belongs to that department.
● Result: Generates a new relation by matching tuples based on the
condition and including unmatched tuples from one or both relations.
Chapter 3

1. Overview of SQL
● Data-definition language(DDL):
○ The SQL DDL provides commands for defining relation schemas,
deleting relations, and modifying relation schemas.
● Data-manipulation language(DML):
○ The SQL DML provides the ability to query information from the
database and to insert tuples into, delete tuples from, and modify
tuples in the database.
● Integrity:
○ The SQL DDL includes commands for specifying integrity
constraints that the data stored in the database must satisfy.
Updates that violate integrity constraints are disallowed.
● View definition:
○ The SQL DDL includes commands for defining views.
● Transaction control:
○ SQL includes commands for specifying the beginning and ending
of transactions.
● Embedded and dynamic SQL:
○ Define how SQL statements can be embedded within general-
purpose programming languages, such as C, C++, and Java.
● Authorization:
○ The SQL DDL includes commands for specifying access rights to
relations and views.
2. Domains of SQL
● char(n): Fixed length, user-defined size string.
● varchar(n): Variable length, user-defined maximum size string.
● int: Machine-dependent integer.
● smallint: Machine-dependent small integer.
● numeric(p,d): Fixed point number with precision p and d digits to the right
of the decimal point.
● real, double precision: Floating-point numbers with machine-dependent
precision.
● float(n): Floating-point number with at least n digits of precision.
3. DDL commands
● CREATE Command:
○ Used to create new database objects like tables, views, indexes,
etc.
○ Syntax for creating a table: CREATE TABLE table_name (column1
datatype, column2 datatype, ...);
○ Requires specifying table name, column names, & their data types.
○ Example: CREATE TABLE students (id INT, name VARCHAR(50),
age INT);
● DROP Command:
○ Used to remove database objects like tables, views, indexes, etc.
○ Syntax for dropping a table: DROP TABLE table_name;
○ Irreversibly deletes the entire table and its data.
○ Example: DROP TABLE employees;
● ALTER Command:
○ Modifies the structure of existing database objects like tables.
○ Syntax for adding a column: ALTER TABLE table_name ADD
column_name datatype;
○ Allows adding, modifying, or dropping columns, changing data
types, etc.
○ Example: ALTER TABLE customers ADD email VARCHAR(100);
● TRUNCATE Command:
○ Removes all records from a table but keeps the table structure
intact.
○ Syntax: TRUNCATE TABLE table_name;
○ Faster than DELETE as it doesn't generate individual deletion logs
for each row.
○ Example: TRUNCATE TABLE logs;
● RENAME Command:
○ Renames an existing database object.
○ Syntax for renaming a table: RENAME TABLE old_table_name TO
new_table_name;
○ Useful for altering object names without changing their structures.
○ Example: RENAME TABLE users TO customers;
4. DML commands
● SELECT Command:
○ Fundamental command used to retrieve data from a database.
○ Syntax: SELECT column1, column2 FROM table_name WHERE
condition;
○ Allows fetching specific columns or all columns from a table.
○ Distinct keyword is used for duplicate removal and * is used for
selecting all records
○ The WHERE clause is optional but used for filtering data based on
specified conditions.
○ Example: SELECT * FROM customers WHERE country='USA';
● INSERT Command:
○ Used to add new records/rows into a database table.
○ Syntax: INSERT INTO table_name (column1, column2, ...) VALUES
(value1, value2, ...);
○ Requires specifying the table and columns where data needs to be
inserted along with corresponding values.
○ Example: INSERT INTO employees (emp_name, emp_salary)
VALUES ('John Doe', 50000);
● UPDATE Command:
○ Modifies existing records in a table.
○ Syntax: UPDATE table_name SET column1 = value1, column2 =
value2, ... WHERE condition;
○ Indicates which columns to update and their new values.
○ The WHERE clause is crucial to specify which records to update;
without it, all records in the table might be affected.
○ Example: UPDATE products SET quantity = 100 WHERE
product_id = 123;
● DELETE Command:
○ Removes records from a table.
○ Syntax: DELETE FROM table_name WHERE condition;
○ Requires the WHERE clause to prevent accidentally deleting all
records and to specify which records to remove based on certain
conditions.
○ Example: DELETE FROM customers WHERE customer_id = 456;
● Where:
○ The where clause specifies conditions that the result must satisfy
○ SQL includes a between comparison operator to find records by
specifying range
○ select loan_number from loan where branch_name = ‘ Perryridge’
and amount > 1200
● From:
○ The from clause lists the relations involved in the query
○ You can involve multiple relation in query by using comma as
separator
○ select * from borrower, loan
5. Constraints
● Key Constraints:
○ Primary Key Constraint: Ensures uniqueness and non-null values
in a column or a set of columns.
○ It uniquely identifies each record in a table.
○ Example: CREATE TABLE students (student_id INT PRIMARY KEY,
name VARCHAR(50));

○ Unique Constraint: Restricts values in a column(s) to be unique but


allows NULL values.
○ Example: CREATE TABLE employees (emp_id INT UNIQUE, email
VARCHAR(100));
● Domain Constraints:
○ Defines allowable data values for a column.
○ Specifies the data type for a column.
○ Example: CREATE TABLE products (product_id INT, product_name
VARCHAR(50));
● Referential Integrity:
○ Maintains relationships between tables using foreign keys.
○ FOREIGN KEY: Ensures data consistency by enforcing a link
between two tables.
○ Example: CREATE TABLE orders (order_id INT, product_id INT,
FOREIGN KEY (product_id) REFERENCES products(product_id));
● Check Constraints:
○ Imposes conditions on data entered into a table.
○ Verifies that data meets a specific condition.
○ Example: CREATE TABLE employees (emp_id INT, age INT
CHECK (age >= 18));
● Not null:
○ The not null constraint on an attribute specifies that the null value
is not allowed for that attribute
○ CREATE TABLE ConstraintDemo1(ID INT NOT NULL, Name
VARCHAR(50) NULL)
6. DCL commands
● GRANT Command:
○ Allows users to grant specific privileges or permissions to
database objects (such as tables, views, procedures) to other users
or roles.
○ Syntax: GRANT privilege_name ON object_name TO {user_name |
PUBLIC | role_name};
○ privilege_name can be SELECT, INSERT, UPDATE, DELETE, ALL,
etc.
○ object_name refers to the database object on which the privileges
are granted.
○ user_name, PUBLIC, or role_name specifies the users, public, or
roles to whom the privileges are granted.
○ Example: GRANT SELECT ON employees TO manager;
● REVOKE Command:
○ Allows users to revoke previously granted privileges or
permissions from other users or roles.
○ Syntax: REVOKE privilege_name ON object_name FROM
{user_name | PUBLIC | role_name};
○ Similar parameters to the GRANT command for specifying
privilege, object, and user/role.
○ Example: REVOKE INSERT ON customers FROM salesperson;
7. TCl commands
● COMMIT Command:
○ Definition: COMMIT is a vital TCL command used in SQL to
permanently save changes made within a transaction.
○ Syntax: COMMIT;
○ Example:
■ BEGIN TRANSACTION;
■ -- Execute SQL statements (INSERT, UPDATE, DELETE)
■ COMMIT;
○ Explanation: The COMMIT command finalizes and makes
permanent all the changes made during the transaction. For
instance, after inserting records into a database table within a
transaction, executing COMMIT; will permanently save those
records, making them visible to other transactions.
● ROLLBACK Command:
○ Definition: ROLLBACK is a crucial TCL command used to undo
changes within a transaction that has not been committed.
○ Syntax: ROLLBACK;
○ Example:
■ BEGIN TRANSACTION;
■ -- Execute SQL statements (INSERT, UPDATE, DELETE)
■ ROLLBACK;
○ Explanation: Upon executing ROLLBACK, any modifications made
within the transaction are discarded, reverting the database to its
state before the transaction began. For instance, if an error occurs
during an operation within a transaction, executing ROLLBACK;
will undo those changes, ensuring database consistency.
● SAVEPOINT Command:
○ Definition: SAVEPOINT sets a named point within a transaction to
which you can later roll back.
○ Syntax: SAVEPOINT savepoint_name;
○ Example:
■ BEGIN TRANSACTION;
■ -- Execute SQL statements
■ SAVEPOINT sp1;
■ -- Further SQL operations
■ ROLLBACK TO SAVEPOINT sp1;
○ Explanation: SAVEPOINT divides a transaction into smaller parts
and allows rolling back to a specific point within the transaction,
enhancing flexibility. If a part of the transaction encounters an
issue, using ROLLBACK TO SAVEPOINT allows reverting to a
designated SAVEPOINT, ensuring data consistency.
8. String operations
● Concatenation:
○ Concatenation is the process of combining two or more strings into
a single string.
○ Syntax: In SQL, concatenation can be achieved using the
CONCAT() function or the + operator.
○ Example:
○ SELECT CONCAT('Hello', ' ', 'World') AS ConcatenatedString;
○ -- Output: 'Hello World'
○ Explanation: Concatenation allows combining strings together,
useful for creating full names, addresses, or custom messages
within SQL queries by joining multiple string values.
● UPPER() & LOWER():
○ UPPER() and LOWER() are SQL functions used to convert strings
to uppercase and lowercase, respectively.
○ Syntax:
○ UPPER(string) converts a string to uppercase.
○ LOWER(string) converts a string to lowercase.
○ Example:
○ SELECT UPPER('hello') AS UppercaseString, LOWER('WORLD')
AS LowercaseString;
○ -- Output: 'HELLO' and 'world'
○ Explanation: These functions are helpful for standardizing text
input, making searches case-insensitive, or displaying consistent
text formatting.
● SUBSTRING:
○ Definition: SUBSTRING retrieves a portion of a string based on
specified starting position and length.
○ Syntax:
○ SUBSTRING(string, start_position, length) extracts a substring
from the given string.
○ Example:
○ SELECT SUBSTRING('Hello World', 7, 5) AS SubstringResult;
○ -- Output: 'World'
○ Explanation: SUBSTRING extracts part of a string, starting from a
specified position and of a specified length. This is useful for
extracting specific segments of text from larger strings.
● Order by:
○ Used to list records in alphabetic order
○ Syntax: order by column_name keyword
○ Keyword are ‘asc’ and ‘desc’ resp ascending and descending
○ Example: order by first_name asc
9. Set operation
● UNION:
○ UNION is a set operation in SQL that combines the results of two
or more SELECT statements into a single result set.
○ Syntax:
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;
○ Example:
SELECT employee_id, employee_name FROM employees
UNION
SELECT contractor_id, contractor_name FROM contractors;
○ Explanation: UNION merges the results of multiple SELECT
queries, eliminating duplicates, and returns a combined result set
containing unique records from both queries.
● INTERSECT:
○ INTERSECT is a set operation that returns common records
present in the results of two SELECT statements.
○ Syntax:
○ SELECT column1, column2 FROM table1
○ INTERSECT
○ SELECT column1, column2 FROM table2;
○ Example:
○ SELECT product_id, product_name FROM online_store
○ INTERSECT
○ SELECT product_id, product_name FROM physical_store;
○ Explanation: INTERSECT retrieves the records that are common or
shared between the result sets of both SELECT queries, returning
only the matching rows.
● EXCEPT:
○ EXCEPT is a set operation that returns records present in the first
SELECT statement but not in the second SELECT statement.
○ Syntax:
○ SELECT column1, column2 FROM table1
○ EXCEPT
○ SELECT column1, column2 FROM table2;
○ Example:
○ SELECT customer_id, customer_name FROM online_subscribers
○ EXCEPT
○ SELECT customer_id, customer_name FROM
physical_store_customers;
○ Explanation: EXCEPT retrieves records from the first query that are
not present in the result set of the second query, effectively
performing a set difference operation.
10.Aggregate functions
● Aggregate Functions:
○ Aggregate functions perform calculations on a set of values and
return a single result. Common aggregate functions include AVG,
MIN, MAX, SUM, and COUNT.
○ Examples:
■ AVG(column_name) calculates the average value of a
numeric column.
■ MIN(column_name) retrieves the minimum value from a
column.
■ MAX(column_name) fetches the maximum value from a
column.
■ SUM(column_name) computes the total sum of values in a
column.
■ COUNT(column_name) counts the number of values in a
column.
○ Usage:
○ SELECT AVG(salary) AS average_salary FROM employees;
○ SELECT MIN(age) AS youngest_employee_age FROM employees;
○ SELECT MAX(age) AS oldest_employee_age FROM employees;
○ SELECT SUM(sales) AS total_sales FROM transactions;
○ SELECT COUNT(*) AS total_records FROM records;
● Group By:
○ Definition: GROUP BY is used with aggregate functions to group
rows that have the same values into summary rows. It divides the
result set into groups based on a specified column.
○ Syntax:
○ SELECT column_name, aggregate_function(column_name)
○ FROM table_name
○ GROUP BY column_name;
○ Example:
○ SELECT department, AVG(salary) AS avg_salary_per_department
○ FROM employees
○ GROUP BY department;
○ Explanation: GROUP BY combines rows with the same values in
the specified column(s) and performs aggregate functions within
each group, producing summary results for each distinct group.
● Having Clause:
○ Definition: HAVING is used in combination with GROUP BY to
filter the groups based on specified conditions applied to the result
of aggregate functions.
○ Syntax:
○ SELECT column_name, aggregate_function(column_name)
○ FROM table_name
○ GROUP BY column_name
○ HAVING condition;
○ Example:
○ SELECT department, AVG(salary) AS avg_salary_per_department
○ FROM employees
○ GROUP BY department
○ HAVING AVG(salary) > 50000;
○ Explanation: HAVING clause filters groups based on the result of
aggregate functions. In this example, it selects departments with
an average salary greater than $50,000.
11.Views
● View:
○ A view in SQL is a virtual table based on the result set of a
SELECT query. It represents a stored SQL query that acts as a
table, allowing users to retrieve specific data without duplicating
the query logic.
○ Creation Syntax:
○ CREATE VIEW view_name AS
○ SELECT column1, column2...
○ FROM table_name
○ WHERE condition;
○ Example:
○ CREATE VIEW sales_summary AS
○ SELECT product_name, SUM(sales_amount) AS total_sales
○ FROM sales
○ GROUP BY product_name;
○ Explanation: The view "sales_summary" summarizes sales data by
product, allowing users to query it as if it were a table, simplifying
complex queries by encapsulating logic.
● Updation of View:
○ Definition: Views in SQL are generally read-only, meaning they
cannot be directly updated using INSERT, UPDATE, or DELETE
statements unless they meet specific criteria.
○ Modification of Underlying Tables: To update a view indirectly, the
underlying tables on which the view is based must be modified
using appropriate SQL operations. Changes in the base tables are
reflected in the view.
● Materialized View:
○ A materialized view is a physical copy or snapshot of the data from
the base tables at a specific point in time. Unlike a regular view, a
materialized view stores the results of the query and updates
periodically or on demand.
○ Creation Syntax:
○ CREATE MATERIALIZED VIEW mv_name AS
○ SELECT column1, column2...
○ FROM table_name
○ WHERE condition;
○ Example:
○ CREATE MATERIALIZED VIEW daily_sales_summary AS
○ SELECT date, SUM(sales_amount) AS total_sales
○ FROM sales
○ GROUP BY date;
○ Explanation: A materialized view precomputes and stores the
aggregated data, enhancing query performance by reducing the
need for complex calculations in real-time
12.Joins
● Joins in SQL are used to combine rows from different tables based on a
related column between them. There are different types of joins:
● INNER JOIN:
○ INNER JOIN returns rows from both tables that have matching
values in the specified columns.
○ Syntax:
○ SELECT table1.column_name, table2.column_name
○ FROM table1
○ INNER JOIN table2 ON table1.column_name =
table2.column_name;
○ Example:
○ SELECT employees.name, departments.department_name
○ FROM employees
○ INNER JOIN departments ON employees.department_id =
departments.department_id;
○ Explanation: INNER JOIN retrieves rows from both "employees"
and "departments" tables where the "department_id" column
matches, combining related information from both tables.
● LEFT JOIN (or LEFT OUTER JOIN):
○ LEFT JOIN returns all rows from the left table and matching rows
from the right table. If there are no matches, NULL values are
returned for the right table.
○ Syntax:
○ SELECT table1.column_name, table2.column_name
○ FROM table1
○ LEFT JOIN table2 ON table1.column_name = table2.column_name;
○ Example:
○ SELECT employees.name, departments.department_name
○ FROM employees
○ LEFT JOIN departments ON employees.department_id =
departments.department_id;
○ Explanation: LEFT JOIN retrieves all rows from "employees" table
and matching rows from "departments" table, providing NULL for
department information if no match exists.
● RIGHT JOIN (or RIGHT OUTER JOIN):
○ RIGHT JOIN returns all rows from the right table and matching
rows from the left table. If there are no matches, NULL values are
returned for the left table.
○ Syntax:
○ SELECT table1.column_name, table2.column_name
○ FROM table1
○ RIGHT JOIN table2 ON table1.column_name =
table2.column_name;
○ Example:
○ SELECT employees.name, departments.department_name
○ FROM employees
○ RIGHT JOIN departments ON employees.department_id =
departments.department_id;
○ Explanation: RIGHT JOIN retrieves all rows from "departments"
table and matching rows from "employees" table, providing NULL
for employee information if no match exists.
● FULL JOIN (or FULL OUTER JOIN):
○ FULL JOIN returns all rows when there is a match in either the left
or right table. If no match exists, NULL values are returned for the
columns.
○ Syntax:
○ SELECT table1.column_name, table2.column_name
○ FROM table1
○ FULL JOIN table2 ON table1.column_name =
table2.column_name;
○ Example:
○ SELECT employees.name, departments.department_name
○ FROM employees
○ FULL JOIN departments ON employees.department_id =
departments.department_id;
○ Explanation: FULL JOIN retrieves all rows from both tables,
combining related information where matches exist, and providing
NULL where there is no match.
● NATURAL JOIN:
○ A Natural Join is a type of join that performs the join operation
based on columns with the same name in both tables. It
automatically matches columns with identical names and retrieves
the results.
○ Syntax:
○ SELECT table1.column1, table1.column2, table2.column1,
table2.column2...
○ FROM table1
○ NATURAL JOIN table2;
○ Example:
○ SELECT employees.name, employees.department_id,
departments.department_name
○ FROM employees
○ NATURAL JOIN departments;
○ Explanation: NATURAL JOIN automatically matches columns with
identical names (e.g., "department_id") between the "employees"
and "departments" tables, combining the data based on these
common columns.
13.Triggers
● Triggers in SQL are database objects that automatically perform actions
(such as inserting, updating, deleting, etc.) in response to specific events
occurring on tables. These events could include INSERT, UPDATE,
DELETE operations on the table.
● Types of Triggers:
○ BEFORE Trigger: Executes before the triggering event (e.g.,
BEFORE INSERT).
○ AFTER Trigger: Executes after the triggering event (e.g., AFTER
UPDATE).
● Syntax:
● CREATE TRIGGER trigger_name
● {BEFORE/AFTER} {INSERT/UPDATE/DELETE} ON table_name
● FOR EACH ROW
● BEGIN
● -- SQL statements
● END;
● Example:
● CREATE TRIGGER update_salary_trigger
● AFTER UPDATE ON employees
● FOR EACH ROW
● BEGIN
● INSERT INTO salary_changes (employee_id, old_salary, new_salary,
change_date)
● VALUES (OLD.employee_id, OLD.salary, NEW.salary, NOW());
● END;
● Explanation: This trigger, named "update_salary_trigger," fires after an
update operation on the "employees" table. It logs salary changes in the
"salary_changes" table, recording the employee ID, old salary, new
salary, and the change date.
● OLD and NEW Pseudo Records: In triggers, the OLD pseudo record refers
to the old values of the row being updated or deleted, while the NEW
pseudo record represents the new values during updates or inserts.
● Syntax of Dropping Triggers:
● DROP TRIGGER IF EXISTS trigger_name;
● Example:
● DROP TRIGGER IF EXISTS update_salary_trigger;
● Explanation: This command drops the trigger named
"update_salary_trigger" if it exists in the database.
14.Security and authorization
● Security and authorization in SQL databases are essential to control
access, protect sensitive data, and ensure data integrity. Here are the
fundamental aspects:
● User Authentication:
○ User Accounts: SQL databases manage user accounts with
usernames and passwords to authenticate users.
○ Authentication Methods: Common methods include password-
based authentication, integrated Windows authentication, and
certificate-based authentication.
● Authorization:
○ GRANT Statement: Allows users or roles specific permissions to
perform operations on database objects.
○ REVOKE Statement: Withdraws previously granted permissions
from users or roles.
○ Example:
○ GRANT SELECT, INSERT ON table_name TO user_name;
○ REVOKE UPDATE ON table_name FROM user_name;
● Types of Permissions:
○ SELECT: Grants permission to retrieve data from tables.
○ INSERT: Allows inserting new records into tables.
○ UPDATE: Enables modifying existing records in tables.
○ DELETE: Permits the removal of records from tables.
○ ALL PRIVILEGES: Grants all possible permissions on a specific
database object.
● Database Roles:
○ Roles are named groups of related privileges assigned to users.
They simplify permissions management by assigning privileges to
roles rather than individual users.
○ Example:
○ CREATE ROLE admin_role;
○ GRANT ALL PRIVILEGES ON database_name.* TO admin_role;
● Views for Security:
○ Restricted Access: Views can restrict access to sensitive data by
displaying only specific columns or rows to certain users or roles.
○ Example:
○ CREATE VIEW sensitive_view AS
○ SELECT sensitive_column
○ FROM sensitive_table
○ WHERE condition;
Chapter 4

1. Pitfalls in Relational-Database designs


● Inadequate Indexing Strategies:
○ Pitfall: Failure to establish proper indexes or creating excessive
indexes can impact query performance. Over-indexing can lead to
slower data modification operations, while insufficient indexing
may hinder query performance.
○ Solution: Identify key columns for indexing based on query
patterns and usage. Avoid unnecessary indexing and periodically
review and optimize indexes for optimal database performance.
● Normalization Oversight:
○ Pitfall: Ignoring normalization principles may result in data
redundancy, anomalies, and difficulties in maintaining consistency.
○ Solution: Adhere to normalization rules (1NF, 2NF, 3NF) to
structure data efficiently, minimize redundancy, and ensure data
integrity.
● Data Integrity Constraint Neglect:
○ Pitfall: Neglecting to enforce data integrity constraints (foreign
keys, unique constraints) can lead to data inconsistencies or invalid
entries.
○ Solution: Implement constraints to maintain data integrity,
ensuring that data aligns with defined rules and relationships.
● Inefficient Query Design:
○ Pitfall: Poorly crafted queries, inefficient join types, or lack of query
optimization may lead to performance degradation.
○ Solution: Optimize queries by analyzing execution plans,
employing suitable indexing, and avoiding unnecessary operations
to enhance performance.
2. Concept of normalization
● Normalization is the process of organizing data in a database to reduce
redundancy and dependency.
● It aims to create well-structured relations (tables) by breaking them
down into smaller, more manageable entities.
● Normalization Forms:
○ 1NF (First Normal Form)
○ 2NF (Second Normal Form)
○ 3NF (Third Normal Form)
○ BCNF(Boyce Codd Normal Form)
● Purpose and Benefits:
○ Data Integrity: Normalization reduces data redundancy, minimizing
anomalies and inconsistencies in the database.
○ Optimized Storage: It optimizes storage space by organizing data
efficiently and eliminates duplication.
● Process of Normalization:
○ Identify Entities: Recognize entities and their attributes within the
database.
○ Define Relationships: Establish relationships between entities
using primary and foreign keys.
○ Apply Normalization Rules: Organize data to adhere to
normalization forms (1NF, 2NF, 3NF) by eliminating redundancy
and dependencies.
● Example:
○ Scenario: Consider a table containing customer information
(CustomerID, Name, Phone, Email) and orders (OrderID,
CustomerID, Date, Product).
○ Normalization Steps: Break the table into separate entities
(Customers and Orders), establish relationships, and apply
normalization rules to ensure each table satisfies the required
normal forms.
3. Function Dependencies
● Function dependencies are a fundamental concept in database design
that describe the relationship between attributes within a relation.
● Functional Dependency (FD):
○ A functional dependency occurs when the value of one attribute
(or a set of attributes) in a relation uniquely determines the value
of another attribute.
○ It is denoted as X -> Y, where X determines Y. If two rows have the
same value for X, they must also have the same value for Y.
● Types of Functional Dependencies:
○ Full Functional Dependency:
■ When an attribute is functionally dependent on a whole
composite key (multiple attributes forming the primary key),
it's a full functional dependency.
○ Transitive Dependency:
■ Occurs when an attribute is functionally dependent on
another non-key attribute instead of a primary key.
● Understanding function dependencies aids in normalizing a database by
identifying and eliminating redundancy.
● Determining Functional Dependencies:
○ Observing Data Patterns: Analyze the data to identify patterns
where one attribute uniquely determines another.
○ Understanding Business Rules: Consider business rules and
relationships between attributes to establish functional
dependencies accurately.
● Example:
○ Scenario: Consider a table with attributes (StudentID, CourseID,
InstructorID, InstructorName).
○ Functional Dependencies:
■ StudentID -> CourseID (Each student is enrolled in specific
courses)
■ CourseID -> InstructorID, InstructorName (Each course is
taught by a specific instructor)
4. Normal forms
● 1NF (First Normal Form):
○ 1NF ensures that each attribute in a table contains only atomic
(indivisible) values, and there are no repeating groups or arrays
within rows.
○ Steps:
i. Place all items that appear in the repeating group in a new
table
ii. Designate a primary key for each new table produced.
iii. Duplicate in the new table the primary key of the table from
which the repeating group was extracted or vice versa.
○ Example: A table adhering to 1NF has single values in each cell
and avoids storing multiple values in a single field.
● 2NF (Second Normal Form):
○ 2NF builds upon 1NF and ensures that no partial dependencies
exist, i.e Every non-key attribute is fully functionally dependent on
the entire primary key.
○ Steps:
i. If a data item is fully functionally dependent on only a part
of the primary key, move that data item and that part of the
primary key to a new table.
ii. If other data items are functionally dependent on the same
part of the key, place them in the new table also
iii. Make the partial primary key copied from the original table
the primary key for the new table. Place all items that
appear in the repeating group in a new table
○ Example: In 2NF, each non-key attribute depends on the entire
composite primary key, removing partial dependencies.
● 3NF (Third Normal Form):
○ 3NF further eliminates transitive dependencies by ensuring that no
non-key attribute depends on another non-key attribute within the
same table.
○ Steps:
i. Move all items involved in transitive dependencies to a new
entity.
ii. Identify a primary key for the new entity.
iii. Place the primary key for the new entity as a foreign key on
the original entity.
○ Example: It resolves dependencies between non-key attributes,
preventing indirect relationships.
● BCNF (Boyce-Codd Normal Form):
○ BCNF is a stricter version of 3NF and eliminates all non-trivial
functional dependencies where the determinant is not a candidate
key.
○ Steps:
i. Place the two candidate primary keys in separate entities
ii. Place each of the remaining data items in one of the
resulting entities according to its dependency on the
primary key.
○ Example: In BCNF, every non-trivial functional dependency implies
that the determinant is a superkey.
5. Decomposition
● If decomposition does not cause any loss of information it is called a lossless
decomposition.
● If a decomposition does not cause any dependencies to be lost it is called a
dependency-preserving decomposition.
● Any table scheme can be decomposed in a lossless way into a collection of
smaller schemas that are in BCNF form. However, the dependency preservation
is not guaranteed.
● Any table can be decomposed in a lossless way into 3rd normal form that also
preserves the dependencies.
○ 3NF may be better than BCNF in some cases
Chapter 5
1. Transaction Concept
● A transaction in a database represents a unit of work that is executed as
a single, indivisible operation.
● It comprises multiple database operations that need to be executed
together.
● Properties of transaction:
○ Atomicity
○ Consistency
○ Isolation
○ Durability
● Transaction States:
○ Active
○ Partially Committed
○ Failed
○ Committed
○ Aborted
● Transaction Control Commands:
○ COMMIT: Finalizes the transaction, making all changes permanent.
○ ROLLBACK: Reverts transaction to its initial state before
execution.
○ SAVEPOINT: Establishes points within a transaction to which a
rollback can be performed.
● Concurrency Control:
○ Concurrency Issues: Transactions executed concurrently might
lead to problems like lost updates, dirty reads, or inconsistency.
○ Concurrency Control Techniques: Locking mechanisms, isolation
levels, and timestamp-based methods manage concurrency to
ensure data consistency.
● Example:
○ E.g., transaction to transfer $50 from account A to account B:
read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
○ ACID Compliance: Ensures either the entire transaction of fund
transfer is completed, or no changes occur if any operation fails.
2. ACID properties
● Atomicity:
○ Atomicity ensures that a transaction is treated as a single,
indivisible unit of work.
○ It either executes all its operations successfully or none at all.
○ If any part of a transaction fails, the entire transaction is rolled
back to its initial state to maintain consistency.
● Consistency:
○ Consistency ensures that a database moves from one valid state to
another valid state after a transaction.
○ It preserves data integrity and constraints.
○ Transactions bring the database from one consistent state to
another, ensuring data validity.
● Isolation:
○ Isolation ensures that transactions are executed independently of
each other, and the changes made by one transaction are not
visible to other transactions until they are committed.
○ Prevents interference or conflicts between concurrent transactions,
maintaining data integrity and reliability.
● Durability:
○ Durability guarantees that once a transaction is committed, the
changes made by the transaction persist even in the event of
system failures or crashes.
○ Ensures that committed transactions are permanently saved and
are not lost due to system failures, providing data persistence.
3. Transaction States
● Active:
○ The initial state where a transaction is actively executing its
operations.
○ In this state, the transaction is performing various database
operations like reads, writes, or modifications.
● Partially Committed:
○ The transaction reaches this state after executing all its operations
successfully but has not been officially committed yet.
○ At this stage, all changes made by the transaction are temporarily
held until the commit command is issued.
● Failed:
○ The transaction enters this state when an error or exception occurs
during its execution.
○ It might result from a hardware failure, software error, or violation
of constraints, leading to the termination of the transaction.
● Committed:
○ The transaction reaches this state after successful completion of
all its operations and commits the changes to the database.
○ In this state, all changes made by the transaction are permanently
saved and become a durable part of the database.
● Aborted:
○ The transaction enters this state after encountering a failure and
needs to be rolled back to its initial state.
○ Any changes made by the transaction are undone or reverted,
ensuring the database's consistency and integrity.
4. Schedules & Serializability(Conflict and view serializability)
● A schedule in a database represents the chronological order of
operations (read and write actions) performed by concurrent transactions.
● It depicts the execution sequence of transactions in a multi-user
environment.
● Serializability:
○ Serializability ensures that the final outcome of executing multiple
transactions concurrently is equivalent to the outcome achieved if
transactions were executed serially (one after another).
○ It maintains data consistency and integrity despite concurrent
execution.
● Conflict Serializability:
○ Conflict Operation:
■ A conflict occurs when two operations from different
transactions access the same data item and at least one of
the operations is a write.
■ Types of Conflicts: Read-Write, Write-Read, Write-Write.
○ Conflict Serializable Schedule:
■ A schedule is conflict serializable if its outcome is
equivalent to some serial execution of the transactions
without any conflicting pairs interfering with each other's
order.
■ Using techniques like the precedence graph method to
determine if conflicts exist by checking cycles in the graph
and if no cycle the schedule is conflict serializable or else
non conflict serializable
● View Serializability:
○ View serializability ensures that a schedule's outcome is
equivalent to some serial execution of transactions based on the
final database state, known as the database view.
○ Guarantees equivalence in final outcomes regardless of the actual
order of operations.
○ Testing for View Serializability:
■ Two schedules are viewed as equivalent if they yield the
same final state despite their different execution orders.
■ Using the Conflict Serializable Schedule as a foundation,
test for view equivalence to ensure view serializability.
5. Recoverable schedule, cascading rollback, cascadeless rollback
● Recoverable Schedule:
○ A recoverable schedule ensures that if a transaction T1 reads a
data item previously written by another transaction T2, T2's
commit must occur before T1's commit.
○ It allows transaction rollbacks without causing a lost update
problem.
○ Recoverable Schedule Properties:
■ Commit Order: Ensures that transactions commit in an order
that prevents reading of uncommitted data.
■ Avoidance of Lost Updates: Prevents scenarios where an
intermediate update is lost due to transaction rollbacks.
● Cascading Rollback:
○ Cascading rollback occurs when a single transaction failure
triggers a chain reaction causing the rollback of other transactions
that were dependent on the failed transaction's changes.
○ It can lead to a domino effect where multiple transactions are
rolled back unnecessarily.
○ Impact of Cascading Rollback:
■ Data Inconsistency: Rollback of transactions not directly
related to the failure leads to unnecessary data reverting,
impacting consistency.
■ Concurrency Issues: Cascading rollbacks can impede
concurrency and degrade overall system performance.

● Cascadeless Rollback:
○ Cascadeless rollback ensures that the rollback of a transaction
does not cause other transactions (not directly dependent on the
failed transaction) to be rolled back.
○ Prevents the cascading effect of rollbacks, maintaining a more
stable and consistent system.
○ Advantages of Cascadeless Rollback:
■ Isolation Maintenance: Preserves the isolation level
between independent transactions, allowing them to
proceed independently without unnecessary rollbacks.
■ Reduced Impact of Failures: Limits the impact of a single
transaction failure on the overall system by avoiding
unnecessary rollbacks.
6. Concurrent Executions
● Concurrent execution refers to the simultaneous execution of multiple
transactions in a multi-user database system.
● Concurrent execution allows transactions to operate concurrently,
potentially improving system throughput and response time.
● Benefits of Concurrent Executions:
○ Improved Throughput: Allows multiple transactions to run
simultaneously, potentially increasing the system's overall
throughput.
○ Enhanced Responsiveness: Provides faster response times by
allowing concurrent transactions to execute simultaneously,
serving multiple users concurrently.
● Challenges in Concurrent Executions:
○ Isolation Levels: Different isolation levels might lead to issues like
lost updates, dirty reads, or non-repeatable reads that need to be
managed.
○ Deadlocks and Resource Contention: Situations where
transactions are waiting for resources held by other transactions,
leading to deadlocks or resource contention.
● Concurrency Control Mechanisms:
○ Locking: Granular locks on data items to control access and
maintain consistency.
○ Multiversion Concurrency Control (MVCC): Maintains multiple
versions of data for read consistency.
○ Timestamp Ordering: Assigning timestamps to transactions to
order their execution and manage conflicts.
7. Lock-based Concurrency Control Protocols
● Lock-based concurrency control is a technique used to manage
simultaneous access to shared resources by multiple transactions.
● It employs locks to control the access and ensure that transactions
maintain data consistency and integrity.
● Types of Locks:
○ Shared Lock (S-Lock): Allows multiple transactions to read
resources concurrently but prevents any transaction from writing
to it.
○ Exclusive Lock (X-Lock): Grants exclusive access to a transaction
for both read and write operations, blocking other transactions
from accessing the resource.
● Lock-based Protocols:
○ Two-Phase Locking (2PL):
■ Transactions acquire locks in two phases: the growing
phase (acquiring locks) and the shrinking phase (releasing
locks).
■ Ensures serializability by preventing conflict operations, i.e.,
a transaction cannot request a new lock after releasing any
lock.
○ Strict Two-Phase Locking (Strict 2PL):
■ Similar to 2PL but holds all locks until the transaction
reaches its commit point.
■ Guarantees strictness by holding all locks until the
transaction completes, reducing the possibility of cascading
rollbacks.
● Granularity of Locks:
○ Fine-grained Locking: Locks are applied at a finer level, like
individual data items or rows, allowing more concurrency but
potentially increasing overhead.
○ Coarse-grained Locking: Locks are applied at a higher level, like
entire tables, reducing overhead but potentially limiting
concurrency.
● Advantages: Ensures data consistency and integrity, allows for effective
coordination among transactions, and prevents conflicting operations.
● Limitations: May lead to deadlocks, increased contention, and potential
performance overhead due to lock management.
8. Timestamp based Concurrency Control Protocols
● Timestamp-based concurrency control uses timestamps assigned to
transactions or data items to manage concurrency and maintain
consistency.
● It ensures serializability by using transaction timestamps to determine the
order of access to shared resources.
● Types of Timestamps:
○ Transaction Timestamps: Assigned to transactions when they start
or request access to resources.
○ Data Item Timestamps: Assigned to shared data items to track
their last read or write operations by transactions.
● Timestamp Ordering:
○ Concurrency Control Policy: Transactions are ordered based on
their timestamps to control their access to data items.
○ Read and Write Rules: Ensures that a transaction with an older
timestamp can read data written by a transaction with a newer
timestamp, but the reverse is not allowed to maintain consistency.
● Concurrency Control Protocols:
○ Thomas' Write Rule (TW):
■ A transaction T can write a data item Q if the timestamp of
T is larger than the timestamp of the last transaction that
read or wrote Q.
■ Ensures consistency by allowing only newer transactions to
overwrite older data.
○ Wound-Wait Protocol:
■ If a transaction T1 requests a resource held by a transaction
T2, T2 is aborted if its timestamp is older than T1's
timestamp; otherwise, T1 waits.
■ Prevents older transactions from delaying newer
transactions, promoting efficient execution.
● Advantages: Efficiently manages concurrency by using timestamps to
order transactions, reduces contention, and ensures consistency.
● Limitations: May lead to increased aborts (transaction rollbacks),
potential timestamp overflow, and complexity in implementation.
9. Validation Based Concurrency Control Protocols
● Validation-based concurrency control protocols, primarily associated with
optimistic concurrency control, manage simultaneous access to data in a
database without locking mechanisms.
● These protocols allow transactions to execute concurrently and validate
their changes before committing them to ensure consistency.
● Optimistic Concurrency Control (OCC):
● OCC operates under the assumption that conflicts among transactions
are infrequent.
● Transactions proceed without acquiring locks during read or write
operations.
● They retain a local copy of data items they access during their execution.
● Upon completion, before committing, transactions undergo a validation
phase.
● The database system validates if the changes made by the current
transaction conflict with other transactions that committed during its
execution.
● Conflicts typically include situations where multiple transactions attempt
to modify the same data concurrently.
● If no conflicts are found during validation, the transaction is committed,
and its changes become visible to other transactions.
● In case of conflicts, the transaction is rolled back, and the process may be
retried.
● Timestamps are often used to track transaction order and validate
changes. A transaction's timestamp helps determine its relative order
concerning other transactions.
● Advantages:
○ Effective when conflicts are infrequent, leading to a higher overall
throughput.
● Challenges:
○ Overhead due to validation checks might impact performance.
10.Deadlock and its Handling
● A deadlock occurs when two or more transactions hold locks on
resources and wait indefinitely for locks held by others. As a result, no
transaction can proceed further.
● Starvation occurs if the concurrency control manager is badly designed.
● When a deadlock occurs there is a possibility of cascading roll-backs.

● Deadlock prevention protocols ensure that the system will never enter
into a deadlock state. Some prevention strategies :
○ Require that each transaction locks all its data items before it
begins execution (predeclaration).
○ Impose partial ordering of all data items and require that a
transaction can lock data items only in the order specified by the
partial order.
● Deadlock prevention strategies:
○ wait-die scheme — non-preemptive
■ older transaction may wait for younger one to release data
item. (older means smaller timestamp) Younger
transactions never Younger transactions never wait for
older ones; they are rolled back instead.
■ a transaction may die several times before acquiring needed
data item
○ wound-wait scheme — preemptive
■ older transaction wounds (forces rollback) of younger
transaction instead of waiting for it. Younger transactions
may wait for older ones.
■ may be fewer rollbacks than wait-die scheme.
○ Timeout-Based Schemes:
■ a transaction waits for a lock only for a specified amount of
time.
■ If the lock has not been granted within that time, the
transaction is rolled back and restarted,
■ Thus, deadlocks are not possible
11.Deadlock detection
● Deadlock detection in a database management system involves
identifying if a deadlock has occurred among transactions.
● One of the popular methods used for deadlock detection is the Wait-for
Graph.
● The wait-for graph represents transactions as nodes and resource
allocation edges.
● Nodes represent transactions, and directed edges signify the resources
that are being waited for by other transactions.
● Each transaction that is waiting for a resource from another transaction
contributes to the graph's construction.
● Node and Edge Representations:
○ Nodes: Represent transactions awaiting resources.
○ Edges: Show the resource dependency, i.e., a transaction waiting
for a resource held by another transaction.
● If a cycle exists in the wait-for graph, it indicates the presence of a
deadlock.
● A cycle signifies a circular chain of transactions, each waiting for a
resource held by the next transaction in the chain.
12.Snapshot Isolation,read,write
● Snapshot Isolation is a concurrency control method used in database
systems to provide consistent reads without blocking writes.
● It ensures that a transaction sees a consistent snapshot of the database
at the beginning of its execution and prevents certain concurrency
anomalies.
● Properties:
○ Consistency: Guarantees consistent reads throughout the
transaction's lifespan.
○ Read Consistency: Reads from a consistent snapshot irrespective
of concurrent writes.
○ Reduced Blocking: Allows concurrent reads and writes without
significant blocking, improving system performance.
○ Potential Write Conflicts: Write operations might encounter
conflicts if multiple transactions modify the same data.
● Read Operation:
○ When a transaction starts, it takes a snapshot of the database,
capturing the committed state of all data items at that moment.
○ Throughout the transaction, reads are performed from this
consistent snapshot rather than directly from the live database.
○ Changes made by other transactions after the snapshot was taken
are not visible during the transaction's execution.
● Write Operation:
○ When a transaction performs write operations, the changes are
not immediately reflected in the database.
○ If a write conflicts with another transaction's uncommitted data,
the system might reject the write or cause the transaction to wait.
○ The changes made by the transaction are visible to other
transactions only upon commit, ensuring atomicity and isolation.
13.Recovery System: Failure classification
● Transaction Failures:
○ Logical Error:
■ Logical errors occur due to incorrect data input or incorrect
application logic within a transaction.
■ These errors lead to improper or invalid data modifications
within the database.
○ System Error:
■ System errors occur due to software or hardware issues
within the database management system (DBMS).
■ They may cause unexpected termination or incorrect
functioning of a transaction.
● System Crash:
○ A system crash is an abrupt and unexpected failure of the entire
database system or a significant component of it.
○ It results in the loss of volatile data in memory, potential
corruption of data, and interruption of ongoing transactions.
● Disk Failure:
○ Disk failures occur when there is a malfunction or damage to the
storage disks where the database is stored.
○ They can lead to the loss or corruption of stored data, affecting the
database's integrity and availability.
14.Log based recovery
● Log-based recovery is a fundamental mechanism in database systems
used to restore the database to a consistent state after a system failure.
● Every transaction's operation (e.g., read, write) is recorded in a log before
the changes are applied to the database.
● Each log entry contains details like transaction ID, operation type, before
and after values, and a unique sequence number or timestamp.
● Types of Log Records:
○ Undo (Backward Recovery): Records necessary to reverse or undo
the changes made by incomplete or aborted transactions.
○ Redo (Forward Recovery): Records needed to reapply changes
made by committed transactions that were not yet written to the
database before a failure occurred.
● Checkpointing:
○ Periodic checkpoints are created, indicating a consistent state of
the database.
○ Reduces the recovery time by providing a starting point closer to
the failure time rather than starting from the beginning.
● Recovery after Failure:
○ Starting from the last checkpoint or the end of the log, the system
analyzes the log to identify transactions that were in progress
during the failure.
○ It identifies committed transactions and those whose changes
need to be undone or redone.
○ Redo changes recorded in the log that were committed but not yet
written to the database before the failure.
○ Undo or rollback changes made by incomplete or aborted
transactions, restoring the database to a consistent state.
● Advantages of Log-Based Recovery:
○ Provides a systematic method to recover the database after
various types of failures.
○ Ensures durability and maintains data consistency by replaying
committed transactions and undoing incomplete ones.
15.Storage devices and stable storage implementation
● Volatile storage:
○ does not survive system crashes
○ examples: main memory, cache memory
● Nonvolatile storage:
○ survives system crashes
○ examples: disk, tape, flash memory, non-volatile (battery backed
up) RAM
○ but may still fail, losing data
● Stable storage:
○ a mythical form of storage that survives all failures
○ approximated by maintaining multiple copies on distinct
nonvolatile media
○ Maintain multiple copies of each block on separate disks for
implementation
16. Data access
● Physical blocks are those blocks residing on the disk.
● Buffer blocks are the blocks residing temporarily in main memory.
● Block movements between disk and main memory are initiated through
the following two operations:
○ input(B) transfers the physical block B to main memory.
○ output(B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
● We assume, for simplicity, that each data item fits in, and is stored inside,
a single block.
17.Shadow Paging
● Shadow paging is a technique used in database systems for
implementing the concept of atomicity during transaction management.
● It ensures that the changes made by a transaction become visible to other
transactions atomically, meaning either all changes are committed or
none.
● Implementation Approach:
○ Duplicate Database Copy: Maintains two physical copies of the
database: the current version (visible to users) and a shadow
version (used during transaction execution).
○ Shadow Page Table: Keeps track of the mapping between the
current database pages and their corresponding shadow pages.
● Transaction Execution:
○ Read Operations: Transactions read data from the current version
of the database.
○ Write Operations: Changes made by a transaction are applied to
the shadow version of the database, leaving the current version
unchanged.
● Transaction Commit:
○ Atomic Commit: Upon successful completion of a transaction, the
changes in the shadow version are atomically committed to the
current version.
○ Switching Pointers: The shadow page table is updated to redirect
pointers from the current version to the shadow version, making
the changes visible.
● Rollback Handling:
○ Aborted Transactions: If a transaction aborts or fails, no changes
from the shadow version are applied to the current version.
○ Clean-Up: The changes in the shadow version related to the
aborted transaction are discarded, maintaining consistency.
● Advantages of Shadow Paging:
○ Unlike logging-based approaches, shadow paging doesn't require
maintaining undo logs for rollback purposes.
● Challenges of Shadow Paging:
○ Overhead of Duplication: Requires space for maintaining two
copies of the database, which might be substantial for large
databases.
18.Checkpoints
● Checkpoints capture a point-in-time snapshot of the database, ensuring
all changes are written to stable storage.
● Provides a starting point for recovery in case of system crashes or
failures.
● Before the checkpoint, the system ensures that all transaction log entries
are flushed to the stable storage.
● After a system crash or failure, the recovery process starts from the last
checkpoint, minimizing the amount of log to be replayed.
● By providing a consistent starting point, recovery time is reduced
compared to starting from the beginning of the logs.
● Advantages of Checkpoints:
○ Ensures a consistent and recoverable state of the database.
○ Shortens recovery time by providing a known consistent state.
○ Minimizes the risk of data loss during system failures.
● Impact on Performance:
○ Frequent checkpoints might affect transaction processing by
introducing slight delays during the checkpoint process.
○ Writing database pages to disk during checkpoints incurs I/O
overhead, impacting system performance.
19.ARIES recovery algorithm
● The ARIES (Algorithm for Recovery and Isolation Exploiting Semantics)
recovery algorithm is a widely-used and efficient database recovery
method that provides high-performance crash recovery and ensures
database consistency.
● It consists of three main phases: Analysis, Redo, and Undo, and is based
on the principles of write-ahead logging (WAL).
● 1. Analysis Phase:
○ Scan of Transaction Log:
■ Begins by analyzing the transaction log starting from the
last checkpoint.
■ Identifies the transactions that were active but not
committed during the crash or failure.
○ Reconstruction of Transaction Table (Transaction Table and Dirty
Page Table):
■ Rebuilds the in-memory transaction table, storing
information about active transactions (including their states)
during the failure.
■ Constructs the dirty page table, which records modified
pages by active transactions not yet committed.
● 2. Redo Phase:
○ Redo Pass:
■ Reapplies changes recorded in the log for committed
transactions and for transactions whose changes were not
reflected in the database before the failure.
■ Re-executes the updates described in the log, bringing the
database to a consistent state.
○ Write-Ahead Logging (WAL):
■ Utilizes the log to ensure that committed transactions'
changes, which might not have been written to the
database, are reapplied during recovery.
● 3. Undo Phase:
○ Undo Pass:
■ Rolls back the changes made by transactions that were
active but not committed during the failure.
■ Uses the log to perform backward log traversal and undo
uncommitted changes, returning the database to a
consistent state.
○ Logging Undo Operations:
■ Generates undo log records to ensure atomicity and
durability during the rollback process.
● Advantages of ARIES:
○ Minimizes the amount of work needed to recover the database
after a crash by utilizing the log efficiently.
○ Ensures atomicity, consistency, isolation, and durability (ACID
properties) during recovery.
Chapter 6

1. Hashing techniques
● Hashing techniques are used in computing to transform data (often keys
or values) into fixed-size values called hash codes or hash values.
● 1. Division Method:
○ It uses the modulo operation by dividing the key by a prime
number (not close to a power of 2) and taking the remainder as the
hash value.
○ Hash = key % divisor, where the divisor is a prime number.
○ Prone to clustering if the divisor isn't chosen properly, leading to
collisions (multiple keys hashing to the same location).
● 2. Mid Square Method:
○ It involves squaring the key and extracting a portion (often middle
digits) as the hash value.
○ Hash = Middle digits of (key^2)
○ Can produce non-uniform hash values depending on the key
distribution and can be computationally intensive for larger keys.
● 3. Folding Method:
○ Splits the key into smaller parts (e.g., digits or bytes), applies some
operation (e.g., addition), and then reduces it to the hash value.
○ Key is divided into segments, and these segments are added
together to obtain the hash.
○ Requires careful design to handle keys of different lengths, and
folding technique may need adjustment for specific key
distributions.
● 4. Multiplication Method:
○ Involves multiplying the key by a constant (usually a fraction),
extracting the fractional part, and multiplying it by the table size to
obtain the hash value.
○ Hash = Integer portion of (key * constant) * table_size
○ The choice of the constant is critical; improper selection may lead
to non-uniform distribution and increased chances of collisions.
2. Collision
● In hash tables, collisions occur when two or more keys hash to the same
index or slot in the hash table.
● Collisions need to be managed to ensure the proper storage and retrieval
of data.
● Collision Handling:
● Open Addressing (Closed Hashing):
○ Open addressing is a collision resolution technique where, upon a
collision, the system searches for an alternative (i.e., an open or
empty slot) within the hash table itself to store the collided key-
value pair.
○ When a collision occurs, the algorithm probes the table by looking
for the next available slot using various methods like linear
probing, quadratic probing, or double hashing until an empty slot
is found.
○ Linear probing probes the hash table linearly & If a collision occurs,
the algorithm checks the next slot in a linear fashion until an
empty slot is found.
○ Quadratic probing uses a quadratic function to probe the hash
table. If a collision occurs at index 'i', the algorithm checks slots at
positions 'i + 1', 'i + 4', 'i + 9', and so on (quadratic increments) until
an empty slot is found.
○ Double hashing uses two hash functions to calculate the sequence
of probe positions.If a collision occurs at index 'i', the second hash
function generates a different increment value to find the next
available slot.
○ Clustering can occur over time as consecutive collisions may lead
to longer probe sequences, affecting performance.
● Separate Chaining (Open Hashing):
○ Separate chaining handles collisions by allowing each slot in the
hash table to point to a linked list or another data structure (such
as an array or linked list).
○ Collided keys are stored in the same slot using a data structure
(e.g., linked list) at that location, avoiding overwriting.
○ Avoids clustering, offers flexibility in handling multiple collided
keys, and doesn't require additional probing.
○ Additional memory overhead due to pointers or extra data
structures, and longer search times if the linked lists become
lengthy.
3. Types of Indexes: Single Level, Ordered Indexes, Multilevel Indexes
● Single Level Indexes:
○ Single Level Indexes maintain a single-level structure, typically
mapping keys to corresponding data records in a one-to-one
relationship.
○ Directly maps index entries to the data records, allowing fast
access to specific records based on keys.
○ Offers quick access to data as there is a direct mapping between
index entries and data records.
○ Suffers from scalability issues when handling a large volume of
data due to the single-level structure.
● Ordered Indexes:
○ Ordered Indexes maintain a sorted sequence of index entries
based on the indexed attribute's values.
○ Entries are sorted based on key values, allowing efficient range
queries and traversal.
○ Facilitates rapid search operations using binary search, enabling
efficient retrieval of ranges of values.
○ Insertions and deletions might require restructuring of the index,
impacting performance.
● 3. Multilevel Indexes:
○ Multilevel Indexes utilize a hierarchical structure consisting of
multiple levels to manage and access the index.
○ Hierarchical levels of indexes help in navigating through the data,
reducing search times by using primary and secondary levels of
indexes.
○ Efficiently handles large datasets by organizing index entries into
multiple levels, allowing faster access to data.
○ Requires additional memory and maintenance as the number of
levels increases, impacting storage and performance.
4. Overview of B-Trees and B+ Trees
● B-Trees:
○ B-trees are balanced tree structures with a variable but fixed
maximum number of children per node, typically denoted as the
"order" of the tree.
○ Each node in a B-tree can have multiple children (often referred to
as keys), ranging from ceil(order/2) to order children.
○ Properties:
■ All leaf nodes are at the same level, maintaining balance
within the tree.
■ B-trees are optimized for disk access due to their ability to
minimize the number of I/O operations.
■ Offers logarithmic time complexity O(log n) for search,
insertions, and deletions.
○ Commonly used in databases and file systems for indexing and
organizing large amounts of data.
● B+ Trees:
● B+ trees are similar to B-trees but with some distinct differences,
particularly in how they store and organize data.
● In B+ trees, all keys are present in the leaf nodes, and the leaf nodes are
linked together in a linked list to allow sequential access.
● Properties:
○ Similar to B-trees, B+ trees maintain a balanced structure and
allow sequential access due to the linked list structure of leaf
nodes.
○ Particularly efficient for range queries due to the sequential nature
of leaf nodes.
○ Offers logarithmic time complexity O(log n) for search, insertions,
and deletions, similar to B-trees.
● Widely used in databases to build indexes and efficiently support range
queries.
5. Separate chaining vs open addressing

You might also like