Storage System UNIT-III

Unit III
Highly available and Disaster-tolerant designs
IEEE defines high availability as, “…the availability of resources in a computer system, in the
wake of component failures in the system.” While the Disaster Recovery Journal defines
disaster recovery as, “Resources and activities to re-establish information technology services
(including components such as infrastructure, telecommunications, systems, applications and
data) at an alternate site following a disruption of IT services”
1) Ordered writes:-
2) Soft updates and Transactions :-
Soft updates: - Soft updates is an approach to maintaining file system meta-data integrity in
the event of a crash or power outage. Soft updates work by tracking and enforcing
dependencies among updates to file system meta-data. Soft updates are an alternative to the
more commonly used approach of journaling file systems.
In file systems, metadata (e.g., directories, inodes, and free block maps) gives structure to
raw storage capacity. Metadata consists of pointers and descriptions for linking multiple disk
sectors into files and identifying those files. To be useful for persistent storage, a file system
must maintain the integrity of its metadata in the face of unpredictable system crashes, such
as power interruptions and operating system failures. Because such crashes usually result in
the loss of all information in volatile main memory, the information in nonvolatile storage
(i.e., disk) must always be consistent enough to deterministically reconstruct a coherent file
system state. Specifically, the on-disk image of the file system must have no dangling
pointers to uninitialized space, no ambiguous resource ownership caused by multiple
pointers, and no live resources to which there are no pointers. Maintaining these invariants
generally requires sequencing (or atomic grouping) of updates to small on-disk metadata
objects.
The soft updates mechanism tracks dependencies among updates to cached (i.e., in-memory)
copies of metadata and enforces these dependencies, via update sequencing, as the dirty
metadata blocks are written back to nonvolatile storage. Because most metadata blocks
contain many pointers, cyclic dependencies occur frequently when dependencies are
recorded only at the block level. Therefore, soft updates tracks dependencies on a per-pointer
basis and allow blocks to be written in any order. Any still-dependent updates in a metadata
block are rolled-back before the block is written and rolled-forward afterward. Thus,
dependency cycles are eliminated as an issue.
With soft updates, applications always see the most current copies of metadata blocks, and
the disk always sees copies that are consistent with its other contents. With soft updates, the
cost of maintaining integrity is low, and diskbased file system performance can be within a
few percent of a memorybased file system’s performance.
Definition - What does Transaction mean?

A transaction, in the context of a database, is a logical unit that is independently executed for
data retrieval or updates. In relational databases, database transactions must be atomic,
consistent, isolated and durable--summarized as the ACID acronym.
The ACID acronym defines the properties of a database transaction, as follows:
Atomicity: A transaction must be fully complete, saved (committed) or completely undone
(rolled back). A sale in a retail store database illustrates a scenario which explains atomicity,
e.g., the sale consists of an inventory reduction and a record of incoming cash. Both either
happen together or do not happen - it's all or nothing.
Consistency: The transaction must be fully compliant with the state of the database as it was
prior to the transaction. In other words, the transaction cannot break the database’s
constraints. For example, if a database table’s Phone Number column can only contain
numerals, then consistency dictates that any transaction attempting to enter an alphabetical
letter may not commit.
Isolation: Transaction data must not be available to other transactions until the original
transaction is committed or rolled back.
Durability: Transaction data changes must be available, even in the event of database
failure.
Two-phase commit protocol(2PC):-
Overview
The two phase commit protocol is a distributed algorithm which lets all sites in a distributed
system agree to commit a transaction. The protocol results in either all nodes committing the
transaction or aborting, even in the case of site failures and message losses. However, due to
the work by Skeen and Stonebraker, the protocol will not handle more than one random site
failure at a time. The two phases of the algorithm are broken into the COMMIT-REQUEST
phase, where the COORDINATOR attempts to prepare all the COHORTS, and the
COMMIT phase, where the COORDINATOR completes the transactions at all COHORTS.
Basic Algorithm
During phase 1, initially the coordinator sends a query to commit message to all cohorts. Then
it waits for all cohorts to report back with the agreement message. The cohorts, if the
transaction was successful, write an entry to the undo log and an entry to the redo log. Then
the cohorts reply with an agree message, or an abort if the transaction failed at a cohort node.
During phase 2, if the coordinator receives an agree message from all cohorts, then it writes a
commit record into its log and sends a commit message to all the cohorts. If all agreement
messages do not come back the coordinator sends an abort message. Next the coordinator
waits for the acknowledgement from the cohorts. When acks are received from all cohorts the
coordinator writes a complete record to its log. Note the coordinator will wait forever for all
the acknowledgements to come back. If the cohort receives a commit message, it releases all
the locks and resources held during the transaction and sends an acknowledgement to the
coordinator. If the message is abort, then the cohort undoes the transaction with the undo log
and releases the resources and locks held during the transaction. Then it sends an
acknowledgement.
Disadvantages
The greatest disadvantage of the two phase commit protocol is the fact that it is a
blocking protocol. A node will block while it is waiting for a message. This means that
other processes competing for resource locks held by the blocked processes will have to
wait for the locks to be released. A single node will continue to wait even if all other sites
have failed. If the coordinator fails permanently, some cohorts will never resolve their
transactions. This has the effect that resources are tied up forever.
Another disadvantage is the protocol is conservative. It is biased to the abort case rather
than the complete case.
Three-phase commit protocol(3PC):-

In computer networking and databases, the three-phase commit protocol (3PC)[1] is a distributed
algorithm which lets all nodes in a distributed system agree to commit a transaction. Unlike the
two-phase commit protocol (2PC) however, 3PC is non-blocking. Specifically, 3PC places an
upper bound on the amount of time required before a transaction either commits or aborts. This
property ensures that if a given transaction is attempting to commit via 3PC and holds some
resource locks, it will release the locks after the timeout.
In describing the protocol, we use terminology similar to that used in the two-phase commit
protocol. Thus we have a single coordinator site leading the transaction and a set of one or more
cohorts being directed by the coordinator.
Coordinator
i) The coordinator receives a transaction request. If there is a failure at this point, the coordinator
aborts the transaction (i.e. upon recovery, it will consider the transaction aborted). Otherwise, the
coordinator sends a canCommit? message to the cohorts and moves to the waiting state.
ii) If there is a failure, timeout, or if the coordinator receives a No message in the waiting state,
the coordinator aborts the transaction and sends an abort message to all cohorts. Otherwise the
coordinator will receive Yes messages from all cohorts within the time window, so it sends
preCommit messages to all cohorts and moves to the prepared state.
iii) If the coordinator succeeds in the prepared state, it will move to the commit state. However if
the coordinator times out while waiting for an acknowledgement from a cohort, it will abort the
transaction. In the case where an acknowledgement is received from the majority of cohorts, the
coordinator moves to the commit state as well.
Cohort
The cohort receives a canCommit? message from the coordinator. If the cohort agrees it sends a
Yes message to the coordinator and moves to the prepared state. Otherwise it sends a No message
and aborts. If there is a failure, it moves to the abort state.
In the prepared state, if the cohort receives an abort message from the coordinator, fails, or times
out waiting for a commit, it aborts. If the cohort receives a preCommit message, it sends an ACK
message back and awaits a final commit or abort.
If, after a cohort member receives a preCommit message, the coordinator fails or times out, the
cohort member goes forward with the commit.
Disadvantages
i) The main disadvantage to this algorithm is that it cannot recover in the event the network is
segmented in any manner. The original 3PC algorithm assumes a fail-stop model, where
processes fail by crashing and crashes can be accurately detected, and does not work with
network partitions or asynchronous communication.
ii) Keidar and Dolev's E3PC algorithm eliminates this disadvantage.
iii) The protocol requires at least three round trips to complete, needing a minimum of three
round trip times (RTTs). This is potentially a long latency to complete each transaction.
Paxos commit protocols:-

Paxos – How it Works
The basic steps in Paxos are very similar to 2PC:
i) Elect a node to be a Leader / Proposer.

ii) The Leader selects a value and sends it to all nodes (called Acceptors in Paxos) in an accept-
request message. Acceptors can reply with reject or accept.
iii) Once a majority of the nodes have accepted, consensus (a general agreement ) is reached and
the coordinator broadcasts a commit message to all nodes.
Phase 1a: Prepare:
A Proposer (the leader) creates a proposal identified with a number N. This number must be
greater than any previous proposal number used by this Proposer. Then, it sends a Prepare
message containing this proposal to a Quorum of Acceptors. The Proposer decides who is in
the Quorum.
Phase 1b: Promise:-

If the proposal's number N is higher than any previous proposal number received from any
Proposer by the Acceptor, then the Acceptor must return a promise to ignore all future
proposals having a number less than N. If the Acceptor accepted a proposal at some point in
the past, it must include the previous proposal number and previous value in its response to the
Proposer.
Otherwise, the Acceptor can ignore the received proposal. It does not have to answer in this
case for Paxos to work. However, for the sake of optimization, sending a denial (Nack)
response would tell the Proposer that it can stop its attempt to create consensus with proposal
N.
Phase 2a: Accept Request
If a Proposer receives enough promises from a Quorum of Acceptors, it needs to set a value to
its proposal. If any Acceptors had previously accepted any proposal, then they'll have sent their
values to the Proposer, who now must set the value of its proposal to the value associated with
the highest proposal number reported by the Acceptors. If none of the Acceptors had accepted
a proposal up to this point, then the Proposer may choose the value it originally chose 'N'.[15]
The Proposer sends an Accept Request message to a Quorum of Acceptors with the chosen
value for its proposal.
Phase 2b: Accepted[edit]
If an Acceptor receives an Accept Request message for a proposal N, it must accept it if and
only if it has not already promised to only consider proposals having an identifier greater than
N. In this case, it should register the corresponding value v and send an Accepted message to
the Proposer and every Learner. Else, it can ignore the Accept Request.
Note that an Acceptor can accept multiple proposals. These proposals may even have different
values in the presence of certain failures. However, the Paxos protocol will guarantee that the
Acceptors will ultimately agree on a single value.
Rounds fail when multiple Proposers send conflicting Prepare messages, or when the Proposer
does not receive a Quorum of responses (Promise or Accepted). In these cases, another round
must be started with a higher proposal number.
Notice that when Acceptors accept a request, they also acknowledge the leadership of the
Proposer. Hence, Paxos can be used to select a leader in a cluster of nodes.
Here is a graphic representation of the Basic Paxos protocol. Note that the values returned in
the Promise message are null the first time a proposal is made, since no Acceptor has accepted
a value before in this round.
Impossibility Results from Distributed Systems
In a fully asynchronous message-passing distributed system in which one process may have a
halting failure, it has been proved that consensus is impossible. However, this impossibility
result derives from a worst-case scenario of a process schedule which is highly unlikely.
what did “impossibility” mean?
1) In formal proofs, an algorithm is totally correct if
- It computes the right thing.
- And it always terminates.
2) When we say something is possible, we mean “there is a totally correct algorithm” solving
the problem.
( Use following paper for better understanding)
Choose 2 of 3: Availability, Consistency and Partition Tolerance.

(Please refer following NPTEL Lect-30(Page no.10 onwards) pdf for better understanding of
“Choose 2 of 3: Availability, Consistency and Partition Tolerance.”)

Storage System UNIT-III

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Storage System UNIT-III

Uploaded by

Copyright:

Available Formats

Unit III

Highly available and Disaster-tolerant designs

Definition - What does Transaction mean?

Three-phase commit protocol(3PC):-

Paxos commit protocols:-

The basic steps in Paxos are very similar to 2PC:

i) Elect a node to be a Leader / Proposer.

Phase 1b: Promise:-

Choose 2 of 3: Availability, Consistency and Partition Tolerance.

“Choose 2 of 3: Availability, Consistency and Partition Tolerance.”)

You might also like