Professional Documents
Culture Documents
JNTU HYDERABAD
SYLLABUS
UNIT – I
UNIT – II
UNIT – III
UNIT – IV
Page 1 of 75
Database Management Systems Riyaz Mohammed
UNIT – V
*****
Page 2 of 75
Database Management Systems Riyaz Mohammed
UNIT – I
INTRODUCTION
Data:
The facts that can be recorded and which have implicit meaning known as
'data'.
Data Base:
Advantages:
Page 3 of 75
Database Management Systems Riyaz Mohammed
Disadvantages:
Database Applications:
A HISTORICAL PERSPECTIVE
From the earliest days of computers, storing and manipulating data have been
a major application focus. The first general-purpose DBMS, designed by
Charles Bachman at General Electric in the early 1960s, was called the
Integrated Data Store. It formed the basis for the network data model, which
was standardized by the Conference on Data Systems Languages (CODASYL)
and strongly influenced database systems through the 1960s. Bachman was
the first recipient of ACM's Turing Award (the computer science equivalent of
a Nobel Prize) for work in the database area; he received the award in 1973.
In the late 1960s, IBM developed the Information Management System (IMS)
DBMS, used even today in many major installations. IMS formed the basis for
an alternative data representation framework called the hierarchical data
model.
In 1970, Edgar Codd, at IBM's San Jose Research Laboratory, proposed a new
data representation framework called the relational data model. This proved
Page 4 of 75
Database Management Systems Riyaz Mohammed
In the 1980s, the relational model consolidated its position as the dominant
DBMS paradigm, and database systems continued to gain widespread use.
The SQL query language for relational databases, developed as part of IBM's
System R project, is now the standard query language. SQL was standardized
in the late 1980s, and the current standard.
Page 5 of 75
Database Management Systems Riyaz Mohammed
Relational model:
Page 6 of 75
Database Management Systems Riyaz Mohammed
1) Physical/Internal schema.
2) Conceptual schema.
3) External schema.
Page 7 of 75
Database Management Systems Riyaz Mohammed
2) Conceptual schema:
3) External schema:
DATA INDEPENDENCE
That is, application programs are insulated from changes in the way the data
is structured and stored. Data independence is achieved through use of the
Page 8 of 75
Database Management Systems Riyaz Mohammed
three levels of data abstraction; in particular, the conceptual schema and the
external schema provide distinct benefits in this area.
For example, suppose that the faculty relation in our university database is
replaced by the following two relations:
STRUCTURE OF A DBMS
The code that implements relational operators sits on top of the file and access
methods layer. This layer supports the concept of a file, which, in a DBMS, is
a collection of pages or a collection of records. Heap files, or files of unordered
pages, as well as indexes are supported. In addition to keeping track of the
pages in a file, this layer organizes the information within a page.
Page 9 of 75
Database Management Systems Riyaz Mohammed
The files and access methods layer code sits on top of the buffer manager,
which brings pages in from disk to main memory ct." needed in response to
read requests.
The disk space manager, buffer manager, and file and access method layers
must interact with components.
Page 10 of 75
Database Management Systems Riyaz Mohammed
The database design process can be divided into six steps. The ER model is
most relevant to the first three steps.
1) Requirements analysis.
2) Conceptual Database Design.
3) Logical Database Design.
4) Schema Refinement.
5) Physical Database Design.
6) Application and security Design.
Page 11 of 75
Database Management Systems Riyaz Mohammed
E-R diagrams:
Page 12 of 75
Database Management Systems Riyaz Mohammed
Page 13 of 75
Database Management Systems Riyaz Mohammed
ER-diagram:
Page 14 of 75
Database Management Systems Riyaz Mohammed
Page 15 of 75
Database Management Systems Riyaz Mohammed
1) Key constraints.
2) Participating constraints.
3) Weak entity.
4) Class hierarchies.
5) Aggregation.
1) Key constraints:
2) Participation Constraints:
The key constraint on Manages tells us that a department has at most one
manager. Let us say that every department is required to have a manager.
This requirement is an example of a participation constraint; the participation
of the entity set Departments in the relationship set Manages is said to be
total. A participation that is not total is said to be partial. Participation of the
entity set Employees in Manages is partial, since not every employee gets to
manage a department.
Page 16 of 75
Database Management Systems Riyaz Mohammed
3) Class Hierarchies:
4) Aggregation:
Page 17 of 75
Database Management Systems Riyaz Mohammed
*****
Page 18 of 75
Database Management Systems Riyaz Mohammed
UNIT – II
Relational algebra is one of the two formal query languages associated with
the relational model. Queries in algebra are composed using a collection of
operators. A fundamental property is that every operator in the algebra
accepts (one or two) relation instances as arguments and returns a relation
instance as the result. This property makes it easy to compose operators to
form a complex query —a relational algebra expression is recursively defined
to be a relation, a unary algebra operator applied to a single expression, or a
binary algebra operator applied to two expressions. We describe the basic
operators of the algebra (selection, projection, union, cross product, and
difference).
Page 19 of 75
Database Management Systems Riyaz Mohammed
1) Domain Constraints.
2) Key Constraints.
3) Entity Integrity Constraints.
4) Referential Integrity Constraints.
1) Domain Constraints:
2) Key Constraints:
Page 20 of 75
Database Management Systems Riyaz Mohammed
Page 21 of 75
Database Management Systems Riyaz Mohammed
This is also a candidate key, whose values are used to identify tuples in
the relation.
It is common to designate one of the candidate keys as a primary key
of the relation.
The attributes that form the primary key of a relation schema are
underlined.
It is used to denote a candidate key that is chosen by the database
designer as the principal means of identifying entities with an entity set.
Example: ‘Sid’ of Students relation.
RELATIONAL ALGEBRA
Page 22 of 75
Database Management Systems Riyaz Mohammed
1) Select (σ).
2) Project (π).
3) Rename (ρ).
1) Divisions.
2) Joints.
Set operations:
1) Union (U).
2) Intersection (∩).
3) Difference (-).
4) Cartesian product or cross product (×).
Syntax:
Page 23 of 75
Database Management Systems Riyaz Mohammed
Example:
Π gender(s1)
Π sname, gender(s1)
Page 24 of 75
Database Management Systems Riyaz Mohammed
Page 25 of 75
Database Management Systems Riyaz Mohammed
It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.
Output: This query will yield the article, page and subject from the relational
javatpoint, where subject is a database.
Page 26 of 75
Database Management Systems Riyaz Mohammed
*****
Page 27 of 75
Database Management Systems Riyaz Mohammed
UNIT – III
SQL
SQL stands for Structure Query Language. It is used for storing and
managing data in relational database management system (RDMS).
It is a standard language for Relational Database System. It enables a
user to create, read, update and delete relational databases and tables.
All the RDBMS like: MySQL, Informix, Oracle, MS Access and SQL
Server use SQL as their standard database language.
SQL allows users to query the database in a numbers of ways, using
English-like statements.
Rules:
SQL process:
When an SQL command is executing for any RDBMS then the system
figure out the best way to carry out the request and SQL engine
determines that how to interpret the task.
In the process, various components are included. These components
are optimization Engine, Query engine, Query dispatcher, classic etc.
All the non-SQL queries are handled by the classic query engine but
SQL query engine won't handle logical files.
Page 28 of 75
Database Management Systems Riyaz Mohammed
Characteristics of SQL:
FROM from-list
WHERE qualifications
Page 29 of 75
Database Management Systems Riyaz Mohammed
The from-list in the FROM clause is a list of table names. A table name
can be followed by a range variable; a range variable is particularly
useful when the same table name appears more than once in the from-
list.
The select-list is a list of (expressions involving) column names of tables
named in the from-list.
SQL provides three set-manipulation constructs that extend the basic query
form. Since the answer to a query is a multiset of rows, it is natural to consider
Page 30 of 75
Database Management Systems Riyaz Mohammed
Page 31 of 75
Database Management Systems Riyaz Mohammed
NESTED QUERIES
A nested query is a query that has another query embedded within it;
the embedded query is called a subquery.
SQL provides other set operations: IN (to check if an element is in a
given set), NOT IN (to check if an element is not in a given set).
The nested subquery computes the (multi)set of sids for sailors who have
reserved boat 103, and the top-level query retrieves the names of sailors
whose sid is in this set. The IN operator allows us to test whether a value is
in a given set of elements; an SQL query is used to generate the set to be
tested.
Page 32 of 75
Database Management Systems Riyaz Mohammed
AGGREGATION OPERATORS
NULL VALUES
SQL provides a special column value called null to use where some column
does not have a value to hold or the value is unknown. We use null when the
column value is either unknown or inapplicable.
Page 33 of 75
Database Management Systems Riyaz Mohammed
OR
Triggers are stored programs, which are automatically executed or fired when
some event occurs.
Triggers could be defined on the table, view, schema, or database with which
the event is associated.
Advantages of Triggers:
Page 34 of 75
Database Management Systems Riyaz Mohammed
Creating a trigger:
Here,
Page 35 of 75
Database Management Systems Riyaz Mohammed
6) [REFERENCING OLD AS o NEW AS n]: This allows you to refer new and
old values for various DML statements, like INSERT, UPDATE, and
DELETE.
7) [FOR EACH ROW]: This specifies a row level trigger, i.e., the trigger
would be executed for each row being affected. Otherwise the trigger
will execute just once when the SQL statement is executed, which is
called a table level trigger.
8) WHEN (condition): This provides a condition for rows for which the
trigger would fire.
SCHEMA REFINEMENT
Anomalies refers to the problems occurred after poorly planned and un-
normalised databases where all the data is stored in one table which is
Page 36 of 75
Database Management Systems Riyaz Mohammed
sometimes called a flat file database. Let us consider such type of schema:
Here all the data is stored in a single table which causes redundancy of data
or say anomalies as SID and Sname are repeated once for same CID. Let us
discuss anomalies one by one. Due to redundancy of data we may get the
following problems, those are:
Insertion Anomaly and Deletion Anomaly: These anomalies exist only due
to redundancy, otherwise they do not exist.
Page 37 of 75
Database Management Systems Riyaz Mohammed
Page 38 of 75
Database Management Systems Riyaz Mohammed
Page 39 of 75
Database Management Systems Riyaz Mohammed
DECOMPOSITIONS
FUNCTIONAL DEPENDENCIES
Armstrong Axioms: Armstrong axioms defines the set of rules for reasoning
about functional dependencies and also to infer all the functional
dependencies on a relational database.
Primary axioms:
Page 40 of 75
Database Management Systems Riyaz Mohammed
Page 41 of 75
Database Management Systems Riyaz Mohammed
Super Key: Super Key is set of attributes of a relation which can be used to
identify a tuple uniquely.
Page 42 of 75
Database Management Systems Riyaz Mohammed
Page 43 of 75
Database Management Systems Riyaz Mohammed
Page 44 of 75
Database Management Systems Riyaz Mohammed
Page 45 of 75
Database Management Systems Riyaz Mohammed
Page 46 of 75
Database Management Systems Riyaz Mohammed
Note: In some cases multi value dependencies may exist not more than one
time in a given relation.
7) Domain key normal form: A domain key normal form keeps a constraint
that every constraint on the relation is a logical sequence of definition of keys
and domains.
8) Sixth normal form: A relation is said to be in sixth normal form such that
the relation R should not contain any non-trivial join dependencies. Also sixth
normal form considers temporal dimensions (time) to the relational model.
Page 47 of 75
Database Management Systems Riyaz Mohammed
Page 48 of 75
Database Management Systems Riyaz Mohammed
While joining two smaller tables no data should be lost and should satisfy all
the rules of decomposition. No additional data should be generated on natural
join of decomposed tables.
Page 49 of 75
Database Management Systems Riyaz Mohammed
*****
Page 50 of 75
Database Management Systems Riyaz Mohammed
UNIT – IV
TRANSACTION CONCEPT
Operations of Transaction:
Page 51 of 75
Database Management Systems Riyaz Mohammed
The first operation reads X's value from database and stores it in a
buffer.
The second operation will decrease the value of X by 500. So buffer will
contain 3500.
The third operation will write the buffer's value to the database. So X's
final value will be 3500.
For example: If in the above transaction, the debit transaction fails after
executing operation 2 then X's value will remain 4000 in the database which
is not acceptable by the bank.
Transaction property: The transaction has the four properties. These are
used to maintain consistency in a database, before and after the transaction.
Property of Transaction:
1) Atomicity.
2) Consistency.
3) Isolation.
4) Durability.
Page 52 of 75
Database Management Systems Riyaz Mohammed
1) Active state:
The active state is the first state of every transaction. In this state, the
transaction is being executed.
For example: Insertion or deletion or updating a record is done here.
But all the records are still not saved to the database.
Page 53 of 75
Database Management Systems Riyaz Mohammed
2) Partially committed:
4) Failed state:
If any of the checks made by the database recovery system fails, then
the transaction is said to be in the failed state.
In the example of total mark calculation, if the database is not able to
fire a query to fetch the marks, then the transaction will fail to execute.
5) Aborted:
If any of the checks fail and the transaction has reached a failed state
then the database recovery system will make sure that the database is
in its previous consistent state. If not then it will abort or roll back the
transaction to bring the database into a consistent state.
If the transaction fails in the middle of the transaction then before
executing the transaction, all the executed transactions are rolled back
to its consistent state.
After aborting the transaction, the database recovery module will select
one of the two operations:
i. Re-start the transaction.
ii. Kill the transaction.
Page 54 of 75
Database Management Systems Riyaz Mohammed
copies, assumes that only one transaction is active at a time. The scheme also
assumes that the database is simply a file on disk. A pointer called db-pointer
is maintained on disk; it points to the current copy of the database.
The transaction is said to have been committed at the point where the updated
db- pointer is written to disk.
We now consider how the technique handles transaction and system failures.
First, consider transaction failure. If the transaction fails at any time before
db-pointer is updated, the old contents of the database are not affected. We
can abort the trans- action by just deleting the new copy of the database.
Page 55 of 75
Database Management Systems Riyaz Mohammed
Once the transaction has been committed, all the updates that it performed
are in the database pointed to by db- pointer. Thus, either all updates of the
transaction are reflected, or none of the effects are reflected, regardless of
transaction failure.
Now consider the issue of system failure. Suppose that the system fails at any
time before the updated db-pointer is written to disk. Then, when the system
restarts, it will read db-pointer and will thus see the original contents of the
database, and none of the effects of the transaction will be visible on the
database. Next, suppose that the system fails after db-pointer has been
updated on disk. Before the pointer is updated, all updated pages of the new
copy of the database were written to disk. Again, we assume that, once a file
is written to disk, its contents will not be damaged even if there is a system
failure. Therefore, when the system restarts, it will read db-pointer and will
thus see the contents of the database after all the updates performed by the
transaction.
In a multi-user system, multiple users can access and use the same
database at one time, which is known as the concurrent execution of
the database. It means that the same database is executed
simultaneously on a multi-user system by different users.
While working on the database transactions, there occurs the
requirement of using the database by multiple users for performing
different operations, and in that case, concurrent execution of the
database is performed.
The thing is that the simultaneous execution that is performed should
be done in an interleaved manner, and no operation should affect the
other executing operations, thus maintaining the consistency of the
database. Thus, on making the concurrent execution of the transaction
operations, there occur several challenging problems that need to be
solved.
Page 56 of 75
Database Management Systems Riyaz Mohammed
In a database transaction, the two main operations are READ and WRITE
operations. So, there is a need to manage these two operations in the
concurrent execution of the transactions as if these operations are not
performed in an interleaved manner, and the data may become inconsistent.
So, the following problems occur with the Concurrent Execution of the
operations:
Page 57 of 75
Database Management Systems Riyaz Mohammed
i. Simplistic lock protocol: It is the simplest way of locking the data while
transaction. Simplistic lock-based protocols allow all the transactions to get
the lock on the data before insert or delete or update on it. It will unlock the
data item after completing the transaction.
Pre-claiming Lock Protocols evaluate the transaction to list all the data
items on which they need locks.
Before initiating an execution of the transaction, it requests DBMS for
all the lock on all those data items.
If all the locks are granted then this protocol allows the transaction to
begin. When the transaction is completed then it releases all the lock.
If all the locks are not granted then this protocol allows the transaction
to rolls back and waits until all the locks are granted.
Page 58 of 75
Database Management Systems Riyaz Mohammed
In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
In the second part, the transaction acquires all the locks. The third
phase is started as soon as the transaction releases its first lock.
In the third phase, the transaction cannot demand any new locks. It
only releases the acquired locks.
The first phase of Strict-2PL is similar to 2PL. In the first phase, after
acquiring all the locks, the transaction continues to execute normally.
The only difference between 2PL and strict 2PL is that Strict-2PL does
not release a lock after using it.
Strict-2PL waits until the whole transaction to commit, and then it
releases all the locks at a time.
Strict-2PL protocol does not have shrinking phase of lock release.
Page 59 of 75
Database Management Systems Riyaz Mohammed
The priority of the older transaction is higher that's why it executes first.
To determine the timestamp of the transaction, this protocol uses
system time or logical counter.
The lock-based protocol is used to manage the order between conflicting
pairs among transactions at the execution time. But Timestamp based
protocols start working as soon as a transaction is created.
Let's assume there are two transactions T1 and T2. Suppose the
transaction T1 has entered the system at 007 times and transaction T2
has entered the system at 009 times. T1 has the higher priority, so it
executes first as it is entered the system first.
The timestamp ordering protocol also maintains the timestamp of last
'read' and 'write' operation on a data.
Page 60 of 75
Database Management Systems Riyaz Mohammed
Validation (Ti): It contains the time when Ti finishes its read phase and starts
its validation phase.
Finish (Ti): It contains the time when Ti finishes its write phase.
Page 61 of 75
Database Management Systems Riyaz Mohammed
SERIALIZABILITY
Page 62 of 75
Database Management Systems Riyaz Mohammed
Here,
Page 63 of 75
Database Management Systems Riyaz Mohammed
If a precedence graph contains a single edge Ti → Tj, then all the instructions
of Ti are executed before the first instruction of Tj is executed.
For example:
Explanation:
Page 64 of 75
Database Management Systems Riyaz Mohammed
The precedence graph for schedule S1 contains a cycle that's why Schedule
S1 is non-serializable.
Explanation:
Page 65 of 75
Database Management Systems Riyaz Mohammed
The precedence graph for schedule S2 contains no cycle that's why Schedule
S2 is serializable.
RECOVERABILITY
The above table 1 shows a schedule which has two transactions. T1 reads and
writes the value of A and that value is read and written by T2. T2 commits
Page 66 of 75
Database Management Systems Riyaz Mohammed
but later on, T1 fails. Due to the failure, we have to rollback T1. T2 should
also be rollback because it reads the value written by T1, but T2 can't be
rollback because it already committed. So this type of schedule is known as
irrecoverable schedule.
Page 67 of 75
Database Management Systems Riyaz Mohammed
LOG–BASED RECOVERY
Page 68 of 75
Database Management Systems Riyaz Mohammed
Checkpoint:
The checkpoint is a type of mechanism where all the previous logs are
removed from the system and permanently stored in the storage disk.
The checkpoint is like a bookmark. While the execution of the
transaction, such checkpoints are marked, and the transaction is
executed then using the steps of the transaction, the log files will be
created.
When it reaches to the checkpoint, then the transaction will be updated
into the database, and till that point, the entire log file will be removed
from the file. Then the log file is updated with the new step of
transaction till next checkpoint and so on.
The checkpoint is used to declare a point before which the DBMS was
in the consistent state, and all transactions were committed.
The recovery system reads log files from the end to start. It reads log
files from T4 to T1.
Recovery system maintains two lists, a redo-list, and an undo-list.
Page 69 of 75
Database Management Systems Riyaz Mohammed
The transaction is put into redo state if the recovery system sees a log
with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-
list and their previous list, all the transactions are removed and then
redone before saving their logs.
For example: In the log file, transaction T2 and T3 will have <Tn, Start>
and <Tn, Commit>. The T1 transaction will have only <Tn, commit> in
the log file. That's why the transaction is committed after the checkpoint
is crossed. Hence it puts T1, T2 and T3 transaction into redo list.
The transaction is put into undo state if the recovery system sees a log
with <Tn, Start> but no commit or abort log found. In the undo-list, all
the transactions are undone, and their logs are removed.
For example: Transaction T4 will have <Tn, Start>. So T4 will be put
into undo list since this transaction is not yet complete and failed amid.
*****
Page 70 of 75
Database Management Systems Riyaz Mohammed
UNIT – V
FILE ORGANIZATION
The method of mapping file records to disk blocks defines file organization,
i.e. how the file records are organized. The following are the types of file
organization.
Page 71 of 75
Database Management Systems Riyaz Mohammed
FILE ORGANIZATIONS
1) Update Operations.
2) Retrieval Operations.
Open: A file can be opened in one of two modes, read mode or write mode. In
read mode, operating system does not allow anyone to alter data it is solely
for reading purpose. Files opened in read mode can be shared among several
entities. The other mode is write mode, in which, data modification is allowed.
Files opened in write mode can be read also but cannot be shared.
Locate: Every file has a file pointer, which tells the current position where the
data is to be read or written. This pointer can be adjusted accordingly. Using
find (seek) operation it can be moved forward or backward.
Read: By default, when files are opened in read mode the file pointer points
to the beginning of file. There are options where the user can tell the operating
system to where the file pointer to be located at the time of file opening. The
very next data to the file pointer is read.
Write: User can select to open files in write mode, which enables them to edit
the content of file. It can be deletion, insertion or modification. The file pointer
can be located at the time of opening or can be dynamically changed if the
operating system allowed doing so.
Close: This also is most important operation from operating system point of
view. When a request to close a file is generated, the operating system removes
all the locks (if in shared mode) and saves the content of data (if altered) to
the secondary storage media and release all the buffers and file handlers
associated with the file.
Page 72 of 75
Database Management Systems Riyaz Mohammed
We know that information in the DBMS files is stored in form of records. Every
record is equipped with some key field, which helps it to be recognized
uniquely.
B+ TREES
Structure of B+ Tree:
Every leaf node is at equal distance from the root node. A B+ tree is of order
n where n is fixed for every B+ tree.
Page 73 of 75
Database Management Systems Riyaz Mohammed
Internal nodes:
Leaf nodes:
Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
At most, leaf nodes contain n record pointers and n key values.
Every leaf node contains one block pointer P to point to next leaf node
and forms a linked list.
B+ tree insertion: B+ tree are filled from bottom. And each node is inserted
at leaf node.
Partition at i = ⌊(m+1)/2⌋
B+ tree deletion:
If it is in internal node, delete and replace with the entry from the left position.
Page 74 of 75
Database Management Systems Riyaz Mohammed
If underflow occurs
*****
Prepared by:
RIYAZ MOHAMMED
Page 75 of 75