You are on page 1of 75

CIT-503 (Database Administration and Management)

Advance Data Models In Database System


Data Model:
Data Model is a collection of concepts that can be used to describe the structure
of database .The meaning of Structure is data types, relationships and constraints
that apply to data. Most data models also include a set of basic operation for
retrievals and update on database.

Categories of Data Models

Data Model

Record based or Physical Data


Object based or
Representational Models
conceptual based
Data Models
Data Models

1. ER Model 1.Relational
2. Object oriented 2.Network
Model 3.Hierarchical
Many data models have been proposed .It is basically categorized in three types .

1) Conceptual/Object Based Data Models


High-level or conceptual data models describe the database at a very high level
and are useful to understand the needs or requirements of the database. It is
this model that is used in the requirement-gathering process i.e. before the
CIT-503 (Database Administration and Management)

Database Designers start making a particular database. One such popular model
is the Entity relationship model (ER Model) and object oriented model.

• ER Model
The E/R model specializes in entities, relationships, and even attributes that are
used by database designers. Conceptual data models use concepts such as
entities, attributes, and relationships. An entity represents a real-world object or
concept, such as an employee or a project from the miniworld that is described in
the database. An attribute represents some property of interest that further
describes an entity, such as the employee’s name or salary. A relationship among
two or more entities represents an association among the entities, for example, a
works-on relationship between an employee and a project.

• Object oriented Data Model


An object oriented data model ,Information or data is displayed as an object and
these objects store the value in the instance variable. In this model object
oriented programming images are used. This model works with object oriented
programming language like Python ,Java ,VB.net and perl etc.
CIT-503 (Database Administration and Management)

Characteristics of a Conceptual Data model

• Offers Organization-wide coverage of the business concepts.


• This type of Data Models are designed and developed for a business
audience.
• Conceptual data models known as Domain models create a common
vocabulary for all stakeholders by establishing basic concepts and scope.

2) Representational/Record Based Data Model


Representational or implementation data models are the models used most
frequently in traditional commercial DBMSs. This type of data model is used to
represent only the logical part of the database and does not represent the
physical structure of the database. The representational data model allows us to
focus primarily, on the design part of the database. A popular representational
model is a Relational model, Network and Hierarchical.
• Relational Model:
In the Relational Model, the table is used to represent data and the relationships
between them. It is a theoretical concept whose practical implementation is
done in Physical Data Model.
CIT-503 (Database Administration and Management)

• Network Data model:


In network data model, data is organized into graphs and it can be more than
one parent node. It permits the modeling of many to many relationship in data.

• Hierarchical Data model:


The hierarchical Data model organizes data in tree structure. In this model each
entity has only one parent and many children node.
The advantage of using a Representational data model is to provide a foundation
to form the base for the Physical model.
CIT-503 (Database Administration and Management)

Physical Data Model:


Physical data models describe how data is stored as files in the computer by
representing information such as record formats, record orderings, and access
paths. Low-level or physical data models provide concepts that describe the
details of how data is stored on the computer storage media, typically magnetic
disks.
Advantages of Data Models
1. Data Models helps in representing data accurately.
2. It helps us in finding the missing data and also in minimizing Data
Redundancy.
3. Data Model provides data security in a better way.
4. The data model should be detailed enough to be used for building the
physical database.
5. The information in the data model can be used for defining the relationship
between tables, primary and foreign keys, and stored procedures.
Disadvantages of Data Models
1. In the case of a vast database, sometimes it becomes difficult to understand
the data model.
2. You must have the proper knowledge of SQL to use physical models.
3. Even smaller change made in structure requires modification in the entire
application.
4. There is no set data manipulation language in DBMS.
5. To develop Data model one should know physical data stored characteristic
Database Administration and Management

Database Programming
Database programming involves designing and maintaining a database for an
application. Best practices include establishing relationships between different
data sets and testing for errors and duplicate records. Retrieving instances of data
from the database is another key responsibility in database programming.Efficient
database design, implementation and programming and SQL use in an web-based
application is the most critical element of your website performance. Poorly
written queries can cause havoc in the database. Because in many organizations
power users access the production databases via reporting tools and direct
queries, efficiently written SQL not only results in better application performance
but also reduces traffic on the network.

Programming Languages
Our programmers have the ability to work with essentially any programming
language, however we are more focused on extending our expertise in those
language that are most efficient in web based applications.

SQL (Structured Query Language)


SQL is a programming language for querying and modifying data and managing
databases. SQL was standardized first by the ANSI (American National Standard
Institute) and later by the ISO (International Organization for Standardation).
Most database management systems implement a majority of one of these
standards and add their proprietary extensions. SQL allows the retrieval,
insertion, updating, and deletion of data. A database management system also
includes management and administrative functions.
MySQL
MySQL is a RDBMS (Relational Database Management System) which has more
then 11 million installations. The program runs as a server providing multi-user
access to a number of databases. MySQL is owned and sponsored by a single for-
profit firm based in Sweden. The project's source code is available under terms of
the GNU General Public License, as well as under a variety of proprietary
Database Administration and Management

agreements. MySQL is a popular Open Source Software relational database


management system which uses a subset of ANSI (American National Standard
Institute) SQL (Structured Query Language).

HTML (HyperText Markup Language)


It's the authoring language used to create documents on the World Wide Web.
HTML defines the structure and layout of a Web document by using a variety of
tags and attributes. HyperText is the method by which you move around on the
web by clicking on special text called HyperLinks which bring you to the next page.
Markup is what HTML tags do to the text inside them.

PHP (Hypertext Preprocessor)


PHP is a server-side scripting language for creating dynamic Web pages that helps
create web pages. When a visitor opens the page, the server processes the PHP
commands and then sends the results to the visitor's browser. PHP is Open
Source and cross-platform that runs on Windows NT and many Unix versions, and
it can be built as an Apache module and as a binary that can run as a CGI
(Common Gateway Interface). When built as an Apache module, PHP is especially
lightweight and speedy. Designed to operate on the web, many applications are
open-source and thus less expensive to develop.

JavaScript
JavaScript is an object-oriented scripting language that interact with website
visitors and intently responds to what they do with no need to reload the page
and can make the pages feel more dynamic and give feedback to the user.
JavaScript is growing in popularity due to its simple learning curve relative to the
amount of power it provides. A substitute for CGI (Common Gateway Interface)
scripting, Java scripting is designed for the web.

PERL (Practical Extraction and Report Language)


Developed by Larry Wall, this language is especially designed for processing text.
Because of its strong text processing abilities, Perl has become one of the most
Database Administration and Management

popular server-side scripting languages for writing CGI programs. Perl programs,
or scripts, are text files which are parsed (run through and executed) by a
program called an interpreter on the server. Perl is an interpretive language,
which makes it easy to build and test simple programs.

CGI (Common Gateway Interface)


The Common Gateway Interface (CGI) is a standard for interfacing external
applications with information servers, such as HTTP or Web servers. A plain HTML
document that the Web daemon retrieves is static, which means it exists in a
constant state: a text file that doesn't change. A CGI program is any program
designed to accept and return data that conforms to the CGI specification. The
program could be written in any programming language, including C, Perl, Java, or
Visual Basic. A CGI program, on the other hand, is executed in real-time, so that it
can output dynamic information.

XML (Extensible Markup Language)


Extensible Markup Language is a general-purpose specification for creating
custom markup languages and supports multilingual documents. XML, in
combination with other standards, makes it possible to define the content of a
document separately from its formatting, making it easy to use that content in
other applications or for other presentation environments. It allows programmers
to create their own customized tags, enabling the definition, transmission,
validation, and interpretation of data between applications and between
organizations.
CIT-503(Database Administration and Management )

File organization Concepts


File:
A file is a sequence of records stored in binary format. A disk drive is formatted
into several blocks that can store records. File records are mapped onto those
disk blocks.

File Organization:
File organization is used to describe the way in which the records are stored in
terms of blocks, and the blocks are placed on the storage medium. Files of fixed
length records are easier to implement than the files of variable length records.

Database File Record

Attributes
Objective of file organization
o It contains an optimal selection of records, i.e., records can be selected as
fast as possible.
o To perform insert, delete or update transaction on the records should be
quick and easy.
o The duplicate records cannot be induced as a result of insert, update or
delete.
o For the minimal cost of storage, records should be stored efficiently.
CIT-503(Database Administration and Management )

Organization of Records in Files


Several of the possible ways of organizing records in files are:

1)Heap File Organization System:


It is also known as unordered File organization. Any record can be placed
anywhere in the file where there is space for the record. There is no ordering of
records. When a file is created using Heap File Organization, the Operating
System allocates memory area to that file without any further accounting details.
File records can be placed anywhere in that memory area. It is the responsibility
of the software to manage the records. Heap File does not support any ordering,
sequencing, or indexing on its own.

Benefits of Heap File Organization


• it's an efficient way to organize files for bulk insertion. This method is most
effective when a large volume of data needs to be loaded into the database
simultaneously.
• Fetching records and retrieving them is faster in a small database than in
sequential records.
CIT-503(Database Administration and Management )

Drawbacks of Heap File Organization


o As it takes time to find or modify a record in a large database, this method
is relatively inefficient.
o For large or complex databases, this type of organization may not be
suitable.

2)Sequential file organization:


Every file record contains a data field (attribute) to uniquely identify that record.
In sequential file organization, records are placed in the file in some sequential
order based on the unique key field or search key. Practically, it is not possible to
store all the records sequentially in physical form.

Advantages of Sequential File Organization


1. It is simple to adapt method. The implementation is simple compared to
other file organization methods.
2. It is fast and efficient when we are dealing with huge amount of data.
3. This method of file organization is mostly used for generating various reports
and performing statistical operations on data.
4. Data can be stored on a cheap storage devices.

Disadvantages of Sequential File Organization


1. Sorting the file takes extra time and it requires additional storage for sorting
operation.
2. Searching a record is time consuming process in sequential file organization
as the records are searched in a sequential order.

3)Hash/Direct file organization


A hash function is computed on some attribute of each record. The result of the
hash function specifies in which block of the file the record should be placed.
CIT-503(Database Administration and Management )

When a record has to be received using the hash key columns, then the address is
generated, and the whole record is retrieved using that address. In the same way,
when a new record has to be inserted, then the address is generated using the
hash key and record is directly inserted. The same process is applied in the case of
delete and update. In this method, there is no effort for searching and sorting the
entire file. In this method, each record will be stored randomly in the memory.

Advantages of Hash File Organization


1. This method doesn’t require sorting explicitly as the records are
automatically sorted in the memory based on hash keys.
2. Reading and fetching a record is faster compared to other methods as the
hash key is used to quickly read and retrieve the data from database.
3. Records are not dependant on each other and are not stored in consecutive
memory locations so that prevents the database from read, write, update,
delete anomalies.

Disadvantages of Hash File Organization


1. Can cause accidental deletion of data, if columns are not selected properly
for hash function. For example, while deleting an
Employee "Steve" using Employee_Name as hash column can cause
accidental deletion of other employee records if the other employee name is
also "Steve". This can be avoided by selecting the attributes properly, for
example in this case combining age, department or SSN with the
employee_name for hash key can be more accurate in finding the distinct
record.
2. Memory is not efficiently used in hash file organization as records are not
stored in consecutive memory locations.
3. If there are more than one hash columns, searching a record using a single
attribute will not give accurate results.

4)Clustered File System:


Clustered file organization is not considered good for large databases. In this
mechanism, related records from one or more relations are kept in the same disk
block, that is, the ordering of records is not based on primary key or search key.
CIT-503(Database Administration and Management )

Clusters are created when two or more records are saved in the same file. There
will be two or more tables in the same block of data in these files, and key
attributes that are used to link these tables together will only be kept once. This
strategy lowers the cost of searching several files for various records.

When combining tables with the same condition on a regular basis, the cluster file
organization is employed. Only a few records from both tables will be returned by
these joins.

Pros of Cluster File Organization


• When there are many requests for connecting tables with the same joining
condition, the cluster file organization is employed.

Cons of Cluster File Organization


• For a very large database, this approach has a low performance.
• If the joining condition changes, this method will no longer work. When we
update the joining condition, traversing the file takes a long time.
Database Administration and Management

Distributed Database
A distributed database is basically a database that is not limited to one system, it
is spread over different sites, i.e, on multiple computers or over a network of
computers. A distributed database system is located on various sites that don’t
share physical components. This may be required when a particular database
needs to be accessed by various users globally. It needs to be managed such that
for the users it looks like one single database.

Features of Distributed Databases


In general, distributed databases include the following features:

1. Location independency: Data is independently stored at multiple sites and


managed by independent Distributed database management
systems (DDBMS).

2. Network linking: All distributed databases in a collection are linked by a


network and communicate with each other.

3. Distributed query processing: Distributed query processing is the procedure


of answering queries (which means mainly read operations on large data
sets) in a distributed environment.
Database Administration and Management

o Query processing involves the transformation of a high-level


query (e.g., formulated in SQL) into a query execution
plan (consisting of lower-level query operators in some variation
of relational algebra) as well as the execution of this plan.

4. Hardware independent: The different sites where data is stored


are hardware-independent. There is no physical contact between these
Distributed Database In Dbms which is accomplished often through
virtualization.

5. Distributed transaction management: Distributed Database In Dbms


provides a consistent distribution through commit protocols, distributed
recovery methods, and distributed concurrency control techniques in case
of many transaction failures.

Types of Distributed Database In Dbms

There are two types of distributed databases:

• Homogenous distributed database.

• Heterogeneous distributed database.

Homogenous Distributed Database

• A Homogenous distributed database is a network of identical


databases stored on multiple sites. All databases stores data identically,
the operating system, DDBMS and the data structures used – all are same
at all sites, making them easy to manage.
Database Administration and Management

Heterogeneous Distributed Database

• It is the opposite of a Homogenous distributed database. It


uses different schemas, operating systems, DDBMS, and different data
models causing it difficult to manage.

• In the case of a Heterogeneous distributed database, a particular site can


be completely unaware of other sites. This causes limited cooperation in
processing user requests, this is why translations are required to establish
communication between sites.
Database Administration and Management

Heterogeneous DDMS have local users while Homogenous DDMS does not have
local users.

Distributed Data Storage

There are two ways in which data can be stored at different sites. These are,

1. Replication.

2. Fragmentation.

Replication
• As the name suggests, the system stores copies of data at different sites. If
an entire database is available on multiple sites, it is a fully redundant
database.

• The advantage of data replication is that it increases availability of data on


different sites. As the data is available at different sites, queries can be
processed parallelly.

• However, data replication has some disadvantages as well. Data needs to


be constantly updated and synchronized with other sites, if any site fails to
achieve it then it will lead to inconsistencies in the database. Availability of
data is highly benefitted from Replication.

Fragmentation
In Fragmentation, the relations are fragmented, which means they are split
into smaller parts. Each of the fragments is stored on a different site, where it
is required. In this, the data is not replicated, and no copies are created.
Consistency of data is highly benefitted from Fragmentation.

The prerequisite for fragmentation is to make sure that the fragments can
later be reconstructed into the original relation without losing any data.

Consistency is not a problem here as each site has a different piece of


information.
Database Administration and Management

There are two types of fragmentation,

Horizontal Fragmentation – Splitting by rows.

Vertical fragmentation – Splitting by columns.

Advantages of Distributed Database In Dbms


1. Better Reliability: Distributed databases offers better reliability than
centralized databases. When database failure occurs in a centralized
database, the system comes to a complete stop. But in the case of
distributed databases, the system functions even when a failure occurs, only
performance-related issues occur which are negotiable.

2. Modular Development: It implies that the system can be expanded


by adding new computers and local data to the new site and connecting
them to the distributed system without interruption.

3. Lower Communication Cost: Locally storing data reduces communication


costs for data manipulation in distributed databases. In centralized
databases, local storage is not possible.

4. Better Response Time: As the data is distributed efficiently in distributed


databases, this provides a better response time when user queries are met
locally. While in the case of centralized databases, all of the queries have to
pass through the central machine which increases response time.

Disadvantages of Distributed Database In Dbms

1. Costly Software: Maintaining a distributed database is costly because we


need to ensure data transparency, coordination across multiple sites which
requires costly software.

2. Large Overhead: Many operations on multiple sites require complex and


numerous calculations, causing a lot of processing overhead.
Database Administration and Management

3. Improper Data Distribution: If data is not properly distributed across


different sites, then responsiveness to user requests is affected. This in turn
increases the response time.

Emerging Research Trends in Database Systems


The database management system is an integral part of any enterprise software
application. It has been so for decades. Over time, newer technologies have
pushed the boundaries on what these systems can do. For example, retrieval of
hundreds of thousands of records in a fraction of a second was not possible
earlier.

Concepts such as indexing, hardware improvements in CPU and RAM have made
it possible to have database systems perform at lightning speed. Contrary to what
you might expect, we are witnessing new DBMS strategies and the development
of more modern processes.

1. Cloud Database

Databases in the cloud are not a new concept. Many organizations have adopted
it at some point in their application life cycle. However, the trend we see now is
the adoption of native cloud support for databases. These are databases that are
built with the cloud’s advantages in mind. Your grocery store, bank, restaurant,
online shopping sites, hospital, favorite clothing store and mobile service
provider.

2. AI in Database Management System

One of the trends today is to leverage AI to automate small independent


processes to improve database performance. Database AI allows organizations to
save both cost and time, and this is the primary reason for these tools to be
trending today. Fraud detection and predictive analysis .

3. Graph Database

NoSQL databases were designed to solve the challenges of unstructured data. It


provided a framework to build data and retrieve them in the best possible
Database Administration and Management

manner. A graph database provides features to store and relate unstructured


data at scale. It also provides interfaces to query information bearing the
relationship in mind. Netflix uses Graph Database for its Digital Asset
Management because it is a perfect way to track which movies (assets) each
viewer has already watched, and which movies they are allowed to watch (access
management).
Database Administration and Management

Transaction Processing
o The transaction is a set of logically related operation. It contains a group of
tasks.
o A transaction is an action or series of actions. It is performed by a single
user to perform operations for accessing the contents of the database.

Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's
account. This small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

Operations of Transaction:

Read(X): Read operation is used to read the value of X from the database and
stores it in a buffer in main memory.

Write(X): Write operation is used to write the value back to the database from
the buffer.

Let's take an example to debit transaction from an account which consists of


following operations:
Database Administration and Management

1. R(X);

2. X = X - 500;

3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.

o The second operation will decrease the value of X by 500. So buffer will
contain 3500.

o The third operation will write the buffer's value to the database. So X's final
value will be 3500.

But it may be possible that because of the failure of hardware, software or power,
etc. that transaction may fail before finished all the operations in the set.

For example: If in the above transaction, the debit transaction fails after
executing operation 2 then X's value will remain 4000 in the database which is not
acceptable by the bank.

To solve this problem, two important operations are available :

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

Transaction property
The transaction has the four properties. These are used to maintain consistency in
a database, before and after the transaction.
Database Administration and Management

Property of Transaction

1. Atomicity
• It states that all operations of the transaction take place at once if not,
the transaction is aborted.
• There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is not
executed at all.

Atomicity involves the following two operations:


Database Administration and Management

Abort: If a transaction aborts then all the changes made are not visible.

Commit: If a transaction commits then all the changes made are visible.

Example: Let's assume that following transaction T consisting of T1 and T2. A


consists of Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to
account B.

T1 T2

Read(A) Read(B)
A:=A-100 Y:=Y+100
Write(A) Write(B)

After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.

If the transaction T fails after the completion of transaction T1 but before


completion of transaction T2, then the amount will be deducted from A but not
added to B. This shows the inconsistent database state. In order to ensure
correctness of database state, the transaction must be executed in entirety.

2. Consistency
o The integrity constraints are maintained so that the database is consistent
before and after the transaction.
o The execution of a transaction will leave a database in either its prior stable
state or a new stable state.
o The consistent property of database states that every transaction sees a
consistent database instance.
o The transaction is used to transform the database from one consistent
state to another consistent state.For example: The total amount must be
maintained before or after the transaction.

1. Total before T occurs = 600+300=900


2. Total after T occurs= 500+400=900
Database Administration and Management

Therefore, the database is consistent. In the case when T1 is completed but T2


fails, then inconsistency will occur.

3.Isolation
o It shows that the data which is used at the time of execution of a
transaction cannot be used by the second transaction until the first one is
completed.
o In isolation, if the transaction T1 is being executed and using the data item
X, then that data item can't be accessed by any other transaction T2 until
the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation
property.

4.Durability
o The durability property is used to indicate the performance of the
database's consistent state. It states that the transaction made the
permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by
the system failure. When a transaction is completed, then the database
reaches a state known as the consistent state. That consistent state cannot
be lost, even in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability
property.

Transaction States
A transaction may go through a subset of five states, active, partially committed,
committed, failed and aborted.
Database Administration and Management

• Active − The initial state where the transaction enters is the active state.
The transaction remains in this state while it is executing read, write or
other operations.
• Partially Committed − The transaction enters this state after the last
statement of the transaction has been executed.
• Committed − The transaction enters this state after successful completion
of the transaction and system checks have issued commit signal.
• Failed − The transaction goes from partially committed state or active state
to failed state when it is discovered that normal execution can no longer
proceed or system checks fail.
• Aborted − This is the state after the transaction has been rolled back after
failure and the database has been restored to its state that was before the
transaction began.
Database Administration and Management

Schedule
A schedule is a series of operations from one or more transactions. A schedule
can be of two types:
Serial Schedule: When one transaction completely executes before starting
another transaction, the schedule is called a serial schedule. A serial schedule is
always consistent. A serial schedule has low throughput and less resource
utilization.
Concurrent Schedule: When operations of a transaction are interleaved with
operations of other transactions of a schedule, the schedule is called a
Concurrent schedule. But concurrency can lead to inconsistency in the
database.

Concurrency Control

When several transactions execute concurrently without any rules and protocols,
various problems arise that may harm the data integrity of several databases.
These problems are known as concurrency control problems. Therefore several
rules are designed, to maintain consistency in the transactions while they are
executing concurrently which are known as concurrency control protocols.

A transaction is a single reasonable unit of work that can retrieve or may change
the data of a database. Executing each transaction individually increases
the waiting time for the other transactions and the overall execution also gets
delayed. Hence, to increase the throughput and to reduce the waiting time,
transactions are executed concurrently.

Example: Suppose, between two railway stations, A and B, 5 trains have to travel,
if all the trains are set in a row and only one train is allowed to move from
station A to B and others have to wait for the first train to reach its destination
Database Administration and Management

then it will take a lot of time for all the trains to travel from station A to B. To
reduce time all the trains should be allowed to move concurrently from
station A to B ensuring no risk of collision between them.

When several transactions execute simultaneously, then there is a risk of violation


of the data integrity of several databases. Concurrency Control in DBMS is a
procedure of managing simultaneous transactions ensuring their atomicity,
isolation, consistency and serializability.

Concurrent Execution in DBMS

• In a multi-user system, multiple users can access and use the same
database at one time, which is known as the concurrent execution of the
database. It means that the same database is executed simultaneously on a
multi-user system by different users.

• While working on the database transactions, there occurs the requirement


of using the database by multiple users for performing different operations,
and in that case, concurrent execution of the database is performed.

Concurrency Control Problems

Several problems that arise when numerous transactions execute simultaneously


in a random manner are referred to as Concurrency Control Problems.

1. Dirty Read Problem


Database Administration and Management

The dirty read problem in DBMS occurs when a transaction reads the data
that has been updated by another transaction that is still uncommitted. It
arises due to multiple uncommitted transactions executing simultaneously.

• Example: Consider two transactions A and B performing read/write


operations on a data DT in the database DB. The current value of DT is
1000: The following table shows the read/write operations in A and B
transactions.
Database Administration and Management

Transaction A reads the value of data DT as 1000 and modifies it to 1500 which
gets stored in the temporary buffer. The transaction B reads the data DT as 1500
and commits it and the value of DT permanently gets changed to 1500 in the
database DB. Then some server errors occur in transaction A and it wants to get
rollback to its initial value, i.e., 1000 and then the dirty read problem occurs.

Unrepeatable Read Problem

The unrepeatable read problem occurs when two or more different values of the
same data are read during the read operations in the same transaction.

Example: Consider two transactions A and B performing read/write operations on


a data DT in the database DB. The current value of DT is 1000: The following table
shows the read/write operations in A and B transactions.
Database Administration and Management

Transaction A and B initially read the value of DT as 1000. Transaction A modifies


the value of DT from 1000 to 1500 and then again transaction B reads the value
and finds it to be 1500. Transaction B finds two different values of DT in its two
different read operations.

Lost Update Problem

The Lost Update problem arises when an update in the data is done over another
update but by two different transactions.

Example: Consider two transactions A and B performing read/write operations on


a data DT in the database DB. The current value of DT is 1000: The following table
shows the read/write operations in A and B transactions.
Database Administration and Management

Transaction A initially reads the value of DT as 1000. Transaction A modifies the


value of DT from 1000 to 1500 and then again transaction B modifies the value to
1800. Transaction A again reads DT and finds 1800 in DT and therefore the update
done by transaction A has been lost.

Advantages of Concurrency

In general, concurrency means, that more than one transaction can work on a
system. The advantages of a concurrent system are:

• Waiting Time: It means if a process is in a ready state but still the process
does not get the system to get execute is called waiting time. So,
concurrency leads to less waiting time.

• Response Time: The time wasted in getting the response from the cpu for
the first time, is called response time. So, concurrency leads to less
Response Time.

• Resource Utilization: The amount of Resource utilization in a particular


system is called Resource Utilization. Multiple transactions can run parallel
in a system. So, concurrency leads to more Resource Utilization.

• Efficiency: The amount of output produced in comparison to given input is


called efficiency. So, Concurrency leads to more Efficiency.

Disadvantages of Concurrency

• Overhead: Implementing concurrency control requires additional


overhead, such as acquiring and releasing locks on database objects. This
Database Administration and Management

overhead can lead to slower performance and increased resource


consumption, particularly in systems with high levels of concurrency.

• Deadlocks: Deadlocks can occur when two or more transactions are waiting
for each other to release resources, causing a circular dependency that can
prevent any of the transactions from completing. Deadlocks can be difficult
to detect and resolve, and can result in reduced throughput and increased
latency.

• Reduced concurrency: Concurrency control can limit the number of users


or applications that can access the database simultaneously. This can lead
to reduced concurrency and slower performance in systems with high
levels of concurrency.

• Complexity: Implementing concurrency control can be complex,


particularly in distributed systems or in systems with complex transactional
logic. This complexity can lead to increased development and maintenance
costs.

• Inconsistency: In some cases, concurrency control can lead to


inconsistencies in the database. For example, a transaction that is rolled
back may leave the database in an inconsistent state, or a long-running
transaction may cause other transactions to wait for extended periods,
leading to data staleness and reduced accuracy.
Database Administration and Management
Database Administration and Management

Query processing and Optimization


Query processing involves extracting data from a database through multiple
steps. This includes translating high-level queries into low-level expressions at the
file system's physical level, optimizing queries, and executing them for actual
results.

Steps
As mentioned in the above image, query processing can be divided into compile-
time and run-time phases. Compile-time phase includes:

1. Parsing and Translation (Query Compilation)

2. Query Optimization

3. Evaluation (code generation)

In the Runtime phase, the database engine is primarily responsible


for interpreting and executing the hence generated query with physical operators
and delivering the query output.
Database Administration and Management

1)Parsing and Translation


The first step in query processing is Parsing and Translation. Essentially, the query
gets broken down into different tokens and white spaces are removed along with
the comments (Lexical Analysis). In the next step, the query gets checked for the
correctness, both syntax and semantic wise. The query processor first checks the
query if the rules of SQL have been correctly followed or not (Syntactic Analysis).

Finally, the query processor checks if the meaning of the query is right or not.
Things like if the table(s) mentioned in the query are present in the DB or not? if
the column(s) referred from all the table(s) are actually present in them or not?
(Semantic Analysis)

Query:

SELECT

emp_name

FROM

employee

WHERE

salary>10000;

The above query would be divided into the following


tokens: SELECT, emp_name, FROM, employee, WHERE, salary, >, 10000.

The tokens (and hence the query) get validated for

• The name of the queried table is looked into the data dictionary table.

• The name of the columns mentioned (emp_name and salary) in the tokens
are validated for existence.

• The type of column(s) being compared have to be of the same type


(salary and the value 10000 should have the same data type).
Database Administration and Management

The next step is to translate the generated set of tokens into a relational algebra
query. These are easy to handle for the optimizer in further processes.

∏ emp_name (σ salary > 10000)

2)Query Evaluation
The next step is to apply certain rules and algorithms to generate a few other
powerful and efficient data structures. These data structures help in constructing
the query evaluation plans. For example, if the relational graph was constructed,
there could be multiple paths from source to destination. A query execution plan
will be generated for each of the paths.

3)Query Optimization
In the next step, DMBS picks up the most efficient evaluation plan based on the
cost each plan has. The aim here is to minimize the query evaluation time. The
optimizer also evaluates the usage of index present in the table and the columns
being used. It also finds out the best order of subqueries to be executed so as to
ensure only the best of the plans gets executed.

Simply put, for any query, there are multiple evaluation plans to execute it.
Choosing the one which costs the least is called Query Optimization in DBMS.
Some of the factors weighed in by the optimizer to calculate the cost of a query
evaluation plan is:

• CPU time

• Number of tuples to be scanned

• Disk access time

• number of operations
Database System

Structured Query Language

SQL, or Structured Query Language, is a powerful database management and


manipulation tool. It allows users to effortlessly interact with databases by
providing a standardized language for operations such as data retrieval and
updating. SQL's ease of use and adaptability make it an essential tool for anybody
working with databases.

SQL, or Structured Query Language, is the language used to interface with


relational databases. It is the bonding agent that allows software applications to
efficiently handle and retrieve data. SQL, in essence, allows you to interface with
databases by writing queries that execute various activities such as retrieving,
updating, and removing data.

Components of SQL

The components of the SQL can be likened together to create a powerful


application.

• Data Definition Language (DDL):

It is the architect of the database, DDL commands include CREATE, ALTER,


and DROP, shaping the structure of your data.

• Data Manipulation Language (DML):

Visualize DML as the construction crew; it handles the operations


like SELECT, INSERT, UPDATE, and DELETE, structuring the content within your
database.

• Data Control Language (DCL):

Security is the priority, and DCL commands like GRANT and REVOKE control access
permissions, ensuring that only authorized users enter the gates.

• Transaction Control Language (TCL):


Database System

This is the directing body, TCL commands COMMIT and ROLLBACK manage the
flow of transactions, ensuring data integrity.

Understanding these components enables you to use SQL with expertise,


maximizing the potential to successfully organize, query, and preserve your data.

SQL Commands

SQL commands are the foundation of database management, allowing users to


easily interact with and alter data. Structured Query Language (SQL) makes it
easier to retrieve, update, and manage data in databases.

SELECT commands, which serve as SQL's vision, are used to retrieve data.
The INSERT, UPDATE, and DELETE instructions serve as the hands, of updating
data entries. JOIN commands join tables together, forming associations that
improve data retrieval.

Consider SQL to be a chef in a kitchen: SELECT is for selecting


ingredients, INSERT is for adding new things to the menu, UPDATE is for revising
recipes, and DELETE is for deleting undesired dishes. The INDEX commands
organize the kitchen for efficiency, while the GROUP BY and ORDER BY commands
organize the final presentation.

The SELECT Statement

SELECT Column or Columns

FROM table or tables

WHERE condition

GROUP BY field or fields

ORDER BY Sort order

Types of SQL Databases

There are many popular RDBMS available to work with. Some of the most popular
RDBMS are listed below −
Database System

• MySQL

• MS SQL Server

• ORACLE

• MS ACCESS

• PostgreSQL

• SQLite

SQL Join and Subquery

A join is a query that combines records from two or more tables. A join will be
performed whenever multiple tables appear in the FROM clause of the query. The
select list of the query can select any columns from any of these tables. If join
condition is omitted or invalid then a Cartesian product is formed. If any two of
these tables have a column name in common, then must qualify these columns
throughout the query with table or table alias names to avoid ambiguity. Most
join queries contain at least one join condition, either in the FROM clause or in
the WHERE clause.
Advantages Of Joins:
• The advantage of a join includes that it executes faster.
• The retrieval time of the query using joins almost always will be faster than
that of a subquery.
• By using joins, you can minimize the calculation burden on the database
i.e., instead of multiple queries using one join query. This means you can
make better use of the database’s abilities to search through, filter, sort,
etc.
Disadvantages Of Joins:
Database System

• Disadvantage of using joins includes that they are not as easy to read as
subqueries.

• More joins in a query means the database server has to do more work,
which means that it is more time consuming process to retrieve data

Subquery
A Subquery or Inner query or Nested query is a query within SQL query and
embedded within the WHERE clause. A Subquery is a SELECT statement that is
embedded in a clause of another SQL statement. They can be very useful to select
rows from a table with a condition that depends on the data in the same or
another table. A Subquery is used to return data that will be used in the main
query as a condition to further restrict the data to be retrieved. The subquery can
be placed in the following SQL clauses they are WHERE clause, HAVING clause,
FROM clause.

Advantages Of Subquery:

• Subqueries divide the complex query into isolated parts so that a complex
query can be broken down into a series of logical steps.

• It is easy to understand and code maintenance is also at ease.

• Subqueries allow you to use the results of another query in the outer
query.

• In some cases, subqueries can replace complex joins and unions.

Disadvantages of Subquery:

• The optimizer is more mature for MYSQL for joins than for subqueries, so in
many cases a statement that uses a subquery can be executed more
efficiently if you rewrite it as join.

• We cannot modify a table and select from the same table within a subquery
in the same SQL statement.

How to Group and Aggregate Data Using SQL?


Database System

There can be many columns in a database table, so sometimes it can become


difficult and time taking to find the same type of data in these columns. The
GROUP BY statement groups the identical rows present in the columns of a
table. GROUP BY statement in conjunction with SQL aggregate functions (COUNT
(), MAX(), MIN(), SUM(), AVG() etc.) help us to analyze the data efficiently.

• COUNT counts how many rows are in a particular column.

• SUM adds together all the values in a particular column.

• MIN and MAX return the lowest and highest values in a particular column,
respectively.

• AVG calculates the average of a group of selected values.

Database Backup Recovery


Backup and Recovery both terms are data protection terminology. An efficient
backup and recovery system is critical for any firm to secure its precious data.
However, backup and recovery are distinct techniques, backup stores a copy of
the complete database onto storage media. In contrast, recovery is the
process of retrieving lost data from backup storage mediums.

Backup
Backup refers to the storage of a replication of the original data that may be
utilized in the event of data loss. Backup is considered one of the best data
security methods, and organizations should secure their important data by
utilizing the backup process. Backup can be accomplished by storing a backup
copy of the original data on storage devices or in a database.

The frequency of backup creation might vary depending on the importance of the
data. For example, if the data is particularly valuable, it must be backed up
daily. Monthly and quarterly backups are the same as daily backups but are only
performed on the last day of the month or quarter.
Database System

Nowadays, backups are created in the cloud due to technological advancements


because it offers highly feasible storage and simple management. There are
numerous backup types available, including complete backup, incremental
backup, local backup, mirror backup, and others.

Types of Data Backup

There are mainly three types of data backup, full, incremental, and differential
backup.

1. Full Backup

It is a simple and full backup procedure that copies all of your data to another
media set, including a tape, disk, or CD. As a result, a complete copy of all your
data is provided on a single media set. It takes a longer time to complete and
takes much storage space.

2. Incremental Backup

Incremental backups take up less space and time than differential and full
backups, but they are the most time-consuming technique for restoring a full
system. They're ideal for backing up data that hasn't changed in a long time.
However, there is no method to predict how much space you would use for future
backups.

3. Differential Backup

Differential backups are a combination of executing complete backups and


incremental backups on a regular basis. Differential backups are similar to
incremental backups in that they store data about changes to a database or
server. These backups make it simple to recover a whole copy of a database or
server from a single file. These backups are ideal for swiftly restoring databases
and servers without rebuilding everything.

These backups are very useful because they allow you to recover a database or
server swiftly. You don't have to generate a completely new version of it. Instead,
Database System

you apply the most recent code changes, restore a differential backup, and have a
working copy of the database or server.

Features of Backup

There are various features of Backup. Some of the backup features are as follows:

1. It is generally a data replica that is utilized to restore the actual data in the
case of data loss/damage.

2. It makes the process of data recovery simple and easy.

3. It is commonly utilized in manufacturing environments.

4. It provides data security to the users.

5. It is a cost-effective process to retrieve the data.

What is Recovery?

A database recovery system is an essential component of a Database


Management System that assures data consistency even after a system failure.
The process of restoring lost data is referred to as recovery. Even if the data was
backed up, it could still be recovered by employing various recovery procedures.
When a database fails, there is a risk of data loss. Therefore, the recovery
procedure aids in improving the database's reliability. Moreover, if any of the
transactions fails in the process of some activities, data recovery becomes a
critical task and the only option to save the lost data.

In this case, the failure could be any type, including system failure, concurrency
control enforcement, transaction errors, exception conditions, disk failure, and
disasters. Any event that results in downtime would require recovery. There are
various recovery processes, including Steal/no-steal and force/no-force policies,
shadowing, caching, before and after images of the data item, UNDO, REDO
recovery, etc.

Features of recovery
Database System

There are various features of Backup. Some of the backup features are as follows:

1. It is a process for restoring lost, damaged, or corrupted data to its original state.
2. The process of recovering is expensive.
3. When there is a failure, it refers to recovering the lost data.
4. It increases the database's reliability.
5. It is rarely utilized in production environments.

Difference Between Backup and recovery


There are various key differences between Backup and Recovery. Some of the key
differences between the Backup and Recovery are as follows:

1. A backup is a replication of data that is utilized to recover the original data


in the event of a data loss. In contrast, recovery is the process of restoring
inaccessible, damaged, lost, corrupted, or formatted data to its original
state.
2. A backup is a data replication. On the other hand, recovery is the process of
storing the database.
3. Taking backups does not determine their duration or systematic use. On
the other hand, recovery techniques are extremely beneficial. There are
various recovery options, including image-based backup, continuous
replication or snapshot, etc.
4. The use of backup production is quite prevalent. In contrast, recovery
production is extremely rare.
5. Backup necessitates additional storage space. On the other hand, Recovery
does not require additional external storage space because restoring is
done internally.

Indexing in Databases
Indexing improves database performance by minimizing the number of disc
visits required to fulfill a query. It is a data structure technique used to locate
and quickly access data in databases. Several database fields are used to
generate indexes. The main key or candidate key of the table is duplicated in
Database System

the first column, which is the Search key. To speed up data retrieval, the values
are also kept in sorted order. It should be highlighted that sorting the data is not
required. The second column is the Data Reference or Pointer which contains a
set of pointers holding the address of the disk block where that particular key
value can be found.

Attributes of Indexing
• Access Types: This refers to the type of access such as value-based search,
range access, etc.

• Access Time: It refers to the time needed to find a particular data element
or set of elements.

• Insertion Time: It refers to the time taken to find the appropriate space and
insert new data.

• Deletion Time: Time taken to find an item and delete it as well as update
the index structure.

• Space Overhead: It refers to the additional space required by the index.


Database System

NoSQL System
NoSQL is a type of database management system (DBMS) that is designed to
handle and store large volumes of unstructured and semi-structured data. Unlike
traditional relational databases that use tables with pre-defined schemas to store
data, NoSQL databases use flexible data models that can adapt to changes in data
structures and are capable of scaling horizontally to handle growing amounts of
data.

The term NoSQL originally referred to “non-SQL” or “non-relational” databases,


but the term has since evolved to mean “not only SQL,” as NoSQL databases have
expanded to include a wide range of different database architectures and data
models.

NoSQL databases are generally classified into four main categories:

1. Document databases: These databases store data as semi-structured


documents, such as JSON or XML, and can be queried using document-
oriented query languages.

2. Key-value stores: These databases store data as key-value pairs, and are
optimized for simple and fast read/write operations.

3. Column-family stores: These databases store data as column families,


which are sets of columns that are treated as a single entity. They are
optimized for fast and efficient querying of large amounts of data.

4. Graph databases: These databases store data as nodes and edges, and are
designed to handle complex relationships between data.

NoSQL databases are often used in applications where there is a high volume of
data that needs to be processed and analyzed in real-time, such as social media
analytics, e-commerce, and gaming. They can also be used for other applications,
such as content management systems, document management, and customer
relationship management.
Database System

However, NoSQL databases may not be suitable for all applications, as they may
not provide the same level of data consistency and transactional guarantees as
traditional relational databases. It is important to carefully evaluate the specific
needs of an application when choosing a database management system.

Key Features of NoSQL:

1. Dynamic schema: NoSQL databases do not have a fixed schema and can
accommodate changing data structures without the need for migrations or
schema alterations.

2. Horizontal scalability: NoSQL databases are designed to scale out by adding


more nodes to a database cluster, making them well-suited for handling
large amounts of data and high levels of traffic.

3. Document-based: Some NoSQL databases, such as MongoDB, use a


document-based data model, where data is stored in a scalessemi-
structured format, such as JSON or BSON.

4. Key-value-based: Other NoSQL databases, such as Redis, use a key-value


data model, where data is stored as a collection of key-value pairs.

5. Column-based: Some NoSQL databases, such as Cassandra, use a column-


based data model, where data is organized into columns instead of rows.

6. Distributed and high availability: NoSQL databases are often designed to


be highly available and to automatically handle node failures and data
replication across multiple nodes in a database cluster.

7. Flexibility: NoSQL databases allow developers to store and retrieve data in


a flexible and dynamic manner, with support for multiple data types and
changing data structures.

8. Performance: NoSQL databases are optimized for high performance and


can handle a high volume of reads and writes, making them suitable for big
data and real-time applications.
Database System
Shared Preference
What are shared preferences in Android?

• What are SharedPreferences? Android


provides many ways of storing app data such
as an SQLiteDatabase, saving a data text file,
etc. One of the ways is called
SharedPreference. SharedPreferences allows
you to save and require data in the form of
keys and values and provides a simple method
to read and write them.
• One of the ways to store data in Android
• It saves and retrieves data in form of key &
value pair.
Methods of shared Preference
• Apply()
• Clear()
• Remove(string key)
• Putstring(String key, String value)
• PutLong(String key, Long value)
• putFloat(String key, float value)
Database Administration and Management

Serializability

A schedule is serialized if it is equivalent to a serial schedule. A concurrent


schedule must ensure it is the same as if executed serially means one after
another. It refers to the sequence of actions such as read, write, abort, commit
are performed in a serial manner.

Example

Let’s take two transactions T1 and T2,

If both transactions are performed without interfering each other then it is called
as serial schedule, it can be represented as follows –

Non serial schedule − When a transaction is overlapped between the


transaction T1 and T2.
Database Administration and Management

Types of serializability

There are two types of serializability −

View serializability

A schedule is view-serializability if it is viewed equivalent to a serial schedule.

The rules it follows are as follows −

• T1 is reading the initial value of A, then T2 also reads the initial value of A.
• T1 is the reading value written by T2, then T2 also reads the value written
by T1.
• T1 is writing the final value, and then T2 also has the write operation as the
final value.

Conflict serializability

It orders any conflicting operations in the same way as some serial execution. A
pair of operations is said to conflict if they operate on the same data item and one
of them is a write operation.

That means

• Readi(x) readj(x) - non conflict read-read operation


• Readi(x) writej(x) - conflict read-write operation.
• Writei(x) readj(x) - conflict write-read operation.
• Writei(x) writej(x) - conflict write-write operation.
Database Administration and Management

Recoverability

The characteristics of non-serializable schedules are as follows −

• The transactions may or may not be consistent.


• The transactions may or may not be recoverable.

Irrecoverable schedules

If a transaction does a dirty read operation from an uncommitted transaction and


commits before the transaction from where it has read the value, then such a
schedule is called an irrecoverable schedule.

The above schedule is a irrecoverable because of the reasons mentioned below −

• The transaction T2 which is performing a dirty read operation on A.


• The transaction T2 is also committed before the completion of transaction
T2.
• The transaction T1 fails later and there are rollbacks.
• The transaction T2 reads an incorrect value.
• Finally, the transaction T2 cannot recover because it is already committed.
Database Administration and Management

Recoverable Schedules

If any transaction that performs a dirty read operation from an uncommitted


transaction and also its committed operation becomes delayed till the
uncommitted transaction is either committed or rollback such type of schedules is
called as Recoverable Schedules.

Example

Let us consider two transaction schedules as given below −

The above schedule is a recoverable schedule because of the reasons


mentioned below −

• The transaction T2 performs dirty read operation on A.


• The commit operation of transaction T2 is delayed until transaction T1
commits or rollback.
• Transaction commits later.
• In the above schedule transaction T2 is now allowed to commit whereas T1
is not yet committed.
Database Administration and Management

• In this case transaction T1 is failed, and transaction T2 still has a chance to


recover by rollback.

Concurrency Control Protocols

The concurrency control protocols ensure the atomicity, consistency, isolation,


durability and serializability of the concurrent execution of the database
transactions. Therefore, these protocols are categorized as:

o Lock Based Concurrency Control Protocol

o Two phase locking

o Time Stamp Concurrency Control Protocol

Lock-Based Protocol

In this type of protocol, any transaction cannot read or write data until it acquires
an appropriate lock on it. There are two types of lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only
read by the transaction.
o It can be shared between the transactions because when the transaction
holds a lock, then it can't update the data on the data item.

2. Exclusive lock:
Database Administration and Management

o In the exclusive lock, the data item can be both reads as well as written by
the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify
the same data simultaneously.

Lock Compatibility Matrix :


Database Administration and Management

Problems in shared exclusive protocols:

• May not sufficient to produce serializable schedule


• May not free from irrecoverability
• May not free from deadlock
• May not free from starvation

Two-phase locking (2PL)

o The two-phase locking protocol divides the execution phase of the


transaction into three parts.

o In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.

o In the second part, the transaction acquires all the locks. The third phase is
started as soon as the transaction releases its first lock.

o In the third phase, the transaction cannot demand any new locks. It only
releases the acquired locks.

There are two phases of 2PL:

Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.

Shrinking phase: In the shrinking phase, existing lock held by the transaction may
be released, but no new locks can be acquired.
Database Administration and Management

Transaction T1:

o Growing phase: from step 1-3

o Shrinking phase: from step 5-7

o Lock point: at 3

Transaction T2:

o Growing phase: from step 2-6

o Shrinking phase: from step 8-9

o Lock point: at 6

Timestamp Ordering Protocol

o The Timestamp Ordering Protocol is used to order the transactions based


on their Timestamps. The order of transaction is nothing but the ascending
order of the transaction creation.
Database Administration and Management

o The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time
or logical counter.

o The lock-based protocol is used to manage the order between conflicting


pairs among transactions at the execution time. But Timestamp based
protocols start working as soon as a transaction is created.

o Let's assume there are two transactions T1 and T2. Suppose the transaction
T1 has entered the system at 007 times and transaction T2 has entered the
system at 009 times. T1 has the higher priority, so it executes first as it is
entered the system first.

o The timestamp ordering protocol also maintains the timestamp of last


'read' and 'write' operation on a data.

Rule No. 01 is used when any transaction wants to perform a Read(A) operation

• If WTS(A) > TS (Ti), then Ti Rollback

• Else (otherwise) execute R(A) operation and SET RTS (A) = MAX {RTS(A),
TS(Ti)}

Rules No.2 rules are used when a transaction needs to perform WRITE (A)

• If RTS(A) > TS (Ti), then Ti Rollback

• If WTS(A) > TS (Ti), then Ti Rollback

• Else (otherwise) execute W(A) operation and SET WTS (A) = TS(Ti)

Where “A” is some data


Database Administration and Management

Example of Timestamp Ordering Protocol

Let’s Explain with an Example; look at the following table

Solution:

Draw the following table

In the above table, A, B, and C are data values. And Read and Write timestamp
values are given “0”. As in the example table, time0 to time7 are given. Let’s
discuss it one by one.

At time 1, transaction 1 wants to perform a read operation on data “A.” then,


according to Rule No 01,

• WTS(A) > TS(T1) = 0>100 // condition false

• Go to else part and SET RTS (A) = MAX {RTS(A), TS(T1)} So,
Database Administration and Management

• RTS (A) = MAX{0,100}= 100.

• So, finally RTS(A) is updated with 100

The updated table will appear as follows,

At time 2, transaction 2 wants to perform a read operation on data “B.” then,


according to Rule No 01,

• WTS(B) > TS(T2) = 0>200 // condition false

• Go to else part and SET RTS (B) = MAX {RTS(B), TS(T2)} So,

• RTS (B) = MAX{0,200} = 200.

• So, finally RTS(B) is updated with 200

The updated table will appear as follows,

At time 3, transaction 1 wants to perform a write operation on data “C.” then,


according to Rule No 02,
Database Administration and Management

• RTS(C) > TS(T1) = 0>100 // condition false

• Go to second condition, WTS(C) > TS(T1) = 0>100 // again condition false

• Go to the else part and SET WTS (C) = TS(T1) So,

• WTS (C) = TS(T1) = 100.

• So, finally WTS(C) is updated with 100

The updated table will appear as follows,

At time 4, transaction 3 wants to perform a read operation on data “B.” then,


according to Rule No 01,

• WTS(B) > TS(T3) = 0>300 // condition false

• Go to else part and SET RTS (B) = MAX {RTS(B), TS(T3)} So,

• RTS (B) = MAX{200,300} = 300.

• So, finally, RTS(B) replaced 200 and updated it with 300.

The updated table will appear as follows,


Database Administration and Management

At time 5, the transaction T1 wants to perform a read operation on data


“C” Then according to Rule No. 01,

• WTS(C) > TS(T1) = 100>100 // condition false

• Go to else part and SET RTS (C) = MAX {RTS(C), TS(T1)} So,

• RTS (A) = MAX{0,100}= 100.

• So, finally RTS(C) is updated with 100

The updated table will appear as follows,

At time 6, transaction 2 wants to perform a write operation on data “B.” then,


according to Rule No 02,

• RTS(B) > TS(T2) = 300>200 // condition True

According to Rule 2, if the condition is true, then Rollback T2.

When T2 rolls, it never resumes. It will restart with a new timestamp value. Keep
in mind that T2 restarts after completion of all running transactions, so in this
example, T2 will restart after completion of T3.

It happens due to conflict where the older transaction (T2) wants to perform a
write operation on data “B,” but the younger transaction (T3) has already Read
the same data “B”
Database Administration and Management

The table will remain the same

At time 7, transaction 3 wants to perform a write operation on data “A” Then


according to Rule No 02,

• RTS(A) > TS(T3) = 100>300 // condition false

• Go to second condition, WTS(A) > TS(T3) = 100>300 // again condition false

• Go to the else part and SET WTS (A) = TS(T3) So,

• WTS (A) = 300.

• So, finally WTS(A) is updated with 300

An updated table will appear as follows,


Database Administration and Management
Database Administration and Management

Recovery Techniques

A DBMS( Database Management System ) is used to store, monitor, and


manipulate data in a fast and efficient manner. A database has the
properties of atomicity, consistency, isolation, and durability. The
durability of a system is marked by the ability to preserve the data and
changes made to the data. A database may fail due to any of the
following reasons,

• System failures are caused due to hardware or software problems


in the system.

• Transaction failures occur when a particular process that deals


with the modification of data can't be completed.

• Disk crashes may be due to the inability of the system to read the
disk.

• Physical damages includes problems like power failure or natural


disaster.

Even though the database system fails, the data in the database must
be recoverable to the last state before the failure of the system.

The database recovery techniques in DBMS are used to recover the


data at such times of system failure. The recovery techniques in DBMS
maintain the properties of atomicity and durability of the database. A
system is not called durable if it fails during a transaction and loses all
its data and a system is not called atomic, if the data is in a partial state
of update during the transaction. The data recovery techniques in
DBMS make sure, that the state of data is preserved to protect the
atomic property and the data is always recoverable to protect the
Database Administration and Management

durability property. The following techniques are used to recover data


in a DBMS,

• Log-based recovery in DBMS.

• Recovery through Deferred Update

• Recovery through Immediate Update

The atomicity property of DBMS protects the state of data. If a


manipulation is performed in a data, then the manipulation must be
performed completely or the state of data must be maintained as if the
manipulation never occurred. DBMS failure due to transactions may
affect this property and the property is protected by the recovery
techniques in DBMS.

• Deferred Update: This technique does not physically update the


database on disk until a transaction has reached its commit point.
Before reaching commit, all transaction updates are recorded in
the local transaction workspace. If a transaction fails before
reaching its commit point, it will not have changed the database
in any way so UNDO is not needed. It may be necessary to REDO
the effect of the operations that are recorded in the local
transaction workspace, because their effect may not yet have
been written in the database. Hence, a deferred update is also
known as the No-undo/redo algorithm.

Example: Database

A 100
B 200
Database Administration and Management

T1 Log File
R(A)
A=A+100 New
W(A) <T1, A, 200>
R(B)
B=B+200 <T1, B, 400>
W(B)
Commit <T1, Commit>

Second Case:

T1 Log File
R(A)
A=A+100 New
W(A) <T1,A,200>
R(B)
B=B+200 <T1,B,400>
W(B)

Another Example :

<T1, Start>

<T1, A, 200>

<T1, B, 400>

<T1, Commit >

<T2, Start>

<T2, C, 500>
Database Administration and Management

• Immediate Update: In the immediate update, the database may


be updated by some operations of a transaction before the
transaction reaches its commit point. However, these operations
are recorded in a log on disk before they are applied to the
database, making recovery still possible. If a transaction fails to
reach its commit point, the effect of its operation must be undone
i.e. the transaction must be rolled back hence we require both
undo and redo. This technique is known as undo/redo algorithm.

Database

A 100
B 200

T1 Log File
R(A)
A=A+100 Old, New
W(A) <T1, A, 100,200>
R(B)
B=B+200 <T1, B, 200,400>
W(B)
Commit <T1, Commit>

Second Case:

T1 Log File
R(A)
A=A+100 Old, New
W(A) <T1, A, 100,200>
R(B)
B=B+200 <T1, B, 200,400>
W(B)
Database Administration and Management

Another Example:

<T1, Start>

<T1, A, 1000, 2000>

<T1, B, 5000, 6000>

<T1, Commit>

<T2, Start>

<T2, C, 700,800>

Difference Between Deferred and Immediate Update

Deferred Update Immediate Update

During a transaction, the changes An immediate change is made in the


are not applied immediately to the database as soon as the transaction
data occurs.

The log file holds the changes that The log file holds the changes along
are going to be applied. with the new and old values

Buffering and Caching are used in Shadow paging is used in this


this technique technique

More time is required to recover A large number of I/O operations


the data when a system failure are performed to manage the logs
occurs during the transaction

If a rollback is made, the log files are If a rollback is made, the old state of
destroyed and no change is made to the data is restored with the records
the database in the log file
Database Administration and Management

You might also like