Log-Based Recovery Schemes

Log-Based Recovery Schemes
If you are going to be in the logging

business, one of the things that you
have to do is to learn about heavy
equipment.
Robert VanNatta,
Logging History of
Columbia County
CS3223 – Crash Recovery 1
Integrity or consistency constraints
• Predicates/constraints data must satisfy, e.g.
• x is key of relation R
• x  y holds in R
• Domain(x) = {Red, Blue, Green}
• no employee should make more than twice the average
salary
• Definitions
• Consistent state: satisfies all constraints
• Consistent DB: DB in consistent state

Observation:
DB cannot always be consistent!
Example: Transfer 100 from a2 to a10
a2  a2 - 100
a10  a10 + 100
. . .
. . .
a2 500 400 400
. . .
. . .
a10 1000 1000 1100

Transaction: collection of actions that
preserve consistency
Consistent DB T Consistent DB’
If T starts with a consistent state + T executes in

isolation (and absence of errors)
 T leaves a consistent state

Reasons for failures
• Transaction failures
• Logical errors, deadlocks
• System crash
• Power failures, operating system bugs etc
• Memory data lost
• Disk failure
• Disk Read-Write Head crashes
STABLE STORAGE: Data is never lost. Can approximate by using RAID
and maintaining geographically distant copies of the data
but if earthquake everything will still be lost
Key problem: Unfinished transaction
Example Transfer fund from A to B

T1: A  A - 100
B  B + 100

T1: Read (A);
A  A-100
Write (A);
Read (B);
B  B+100
Write (B);
A: 800
A: 800
B: 800
B: 800
memory disk
T1: Read (A);
A  A-100
Write (A);
Read (B);
B  B+100
Write (B);
A: 800 700
A: 800
B: 800
B: 800
memory disk
T1: Read (A);
A  A-100
Write (A);
Read (B);
B  B+100
Updated A value is written to disk.
Write (B); This may be triggered ANYTIME
by explicit command or DBMS or OS
A: 800 700
A: 800 700
B: 800
B: 800
memory disk
T1: Read (A);
A  A-100
Write (A);
Read (B);
B  B+100
Write (B);
A: 800 700
A: 800 700
B: 800 900
B: 800
memory disk
T1: Read (A);
A  A-100
Write (A);
Read (B);
B  B+100
Failure before commit
Write (B);
(memory content lost
before disk updated)!
A: 800 700
A: 800 700
B: 800 900
B: 800
memory disk
What is the disk content before
the crash?
Disk not updated yet
A = 700; B = 800?
A = 800; B = 700?
Disk fully updated
A: 700 A = 800; B = 800?
Disk partially updated
B: 800
disk
Need atomicity: execute all
actions of a transaction or
none at all
Recovery Manager
• Recovery Manager guarantees atomicity and durability
properties of Xacts
• Undo: remove effects of aborted Xact to preserve atomicity
• Redo: re-installing effects of committed Xact for durability
• Processes three operations:
• Commit(T) - install T’s updated “pages” into disk
• Abort(T) - restore all data that T updated to their prior values
• Restart - recover database to a consistent state from system failure
• abort all active Xacts at the time of system failure
• installs updates of all committed Xacts
need to remember old values
• Desirable properties:
• Add little overhead to the normal processing of Xacts
• Recover quickly from a failure

Interaction Between Recovery and Buffer
Managers: Dirty pages in buffer pool
• Can a dirty page updated by Xact T be written to disk
before T commits?
•yesYes policy
-> steal need to 
steal->policy need old
remember to remember
value old value
no -> no steal policy
• No  no steal policy
• Must all dirty pages that are updated by Xact T be
written to disk when T commits?
• Yes
yes policy
-> force force policy
no -> no force policy -> need to remember the new value
• No  no force policy  need to remember the new value

Recovery schemes: Design options
• Four possible design options

Force No-force
Steal Undo & no redo Undo & redo
No steal No undo & no redo No undo & redo
No-steal policy  No undo

Force policy  No redo
Which is the best solution??

Log-based Recovery
• Log (aka trail/journal): history of actions executed by
DBMS remembering old and new values
• Contains a log record for each write, commit, & abort
• Each log record has a unique identifier called Log
Sequence Number (LSN)
• Log is stored as a sequential file of records in stable
storage log should never be lost
• LSN of log record = address of log record

UNDO AND NO REDO
One Solution: Undo logging
(Immediate modification/Steal-Force)
T1: Read (A); A  A-100
Write (A); Undo log: <TID, Object, oldValue>
Read (B); B  B+100 (not showing LSN)
Write (B);
<T1, start>
A:800 A:800
B:800 B:800
memory disk Log (Stable)

(Immediate modification)
T1: Read (A); A  A-100
Read (B); B  B+100
Write (B);
<T1, start>
A:800 700 A:800

B:800 900 B:800
memory disk log

T1: Read (A); A  A-100
Read (B); B  B+100
Write (B);
<T1, start>
<T1, A, 800>
A:800 700 A:800
B:800 900 B:800
memory disk log

T1: Read (A); A  A-100
Write (A);
Read (B); B  B+100
Write (B);
<T1, start>
<T1, A, 800>
A:800 700 A:800 700
B:800 900 B:800
memory disk log

T1: Read (A); A  A-100
Write (A);
Read (B); B  B+100
Write (B);
<T1, start>
<T1, A, 800>
A:800 700 A:800 700 <T1, B, 800>
B:800 900 B:800
memory disk log

T1: Read (A); A  A-100
Write (A);
Read (B); B  B+100
Write (B);
<T1, start>
<T1, A, 800>
A:800 700 A:800 700 <T1, B, 800>
B:800 900 B:800 900
memory disk log

T1: Read (A); A  A-100 yes forcing -> upon commit, all
updates written to disk
Write (A);
Read (B); B  B+100
Write (B);
<T1, start>
<T1, A, 800>
A:800 700 A:800 700 <T1, B, 800>
B:800 900 B:800 900 <T1, commit>
memory disk log

inefficient, alot of random writes to disk.
Complications
• Log is first written in memory
memory A: 800
B: 800 DB
A: 800 700
B: 800 900
Log: Log
<T1,start>
<T1, A, 800>
<T1, B, 800>

Complications
memory A: 800 700

B: 800 DB BAD STATE
A: 800 700
B: 800 900 #1
Log: Log Failure occurs after
<T1,start> partial updates on
<T1, A, 800> disk but before log
<T1, B, 800> is written to disk

Complications
memory A: 800 700

B: 800 DB BAD STATE
A: 800 700
B: 800 900 #1
Log: Log Failure occurs after
<T1,start> partial updates on
<T1, A, 800> disk but before log
<T1, B, 800> is written to disk
This means log record for A must be on log disk

before A can be updated on data disk (DB)
Complications
• Updates are not written to disk on every action
memory A: 800
B: 800 DB
A: 800 700
B: 800 900
Log: Log
<T1,start> <T1, B, 800>
<T1, A, 800> <T1, commit>
<T1, B, 800>
<T1, commit>

Complications
memory A: 800 700
B: 800 DB BAD STATE
A: 800 700
B: 800 900 #2
Log: Log All logs are on
disk (including
<T1,start> <T1, B, 800> commit log) but
<T1, A, 800> <T1, commit> only partial updates
<T1, B, 800> on disk.

Complications
memory A: 800 700
B: 800 DB BAD STATE
A: 800 700
B: 800 900 #2
Log: Log All logs are on
disk (including
<T1,start> <T1, B, 800> commit log) but
<T1, A, 800> <T1, commit> only partial updates
<T1, B, 800> on disk.
Before COMMIT log is written to Log, all updates
must be on disk (DB)
CS3223 – Crash Recovery force policy 29
Undo logging rules
(1) For every action generate undo log record
(containing old value)
(2) Before x is modified on disk, log record
pertaining to x must be on disk (write ahead
logging: WAL) write the log ahead of the actual modification to rem old val
(3) Before commit is flushed to log, all writes of
transaction must be reflected on disk

Undo Logging
T1: Read (A); A  A-100
Write (A);
Read (B); B  B+100
Write (B);
A: 800 700
B: 800 900
Log: A: 800
<T1,start> B: 800
<T1, A, 800>
<T1, B, 800>
log
Undo Logging
T1: Read (A); A  A-100
Write (A);
Read (B); B  B+100
Write (B);
A: 800 700
B: 800 900 <T1, Start>
Log: A: 800 <T1, A, 800>
<T1,start> B: 800 <T1, B, 800>
<T1, A, 800>
<T1, B, 800>
log
Undo Logging
T1: Read (A); A  A-100
Write (A);
Read (B); B  B+100
Write (B);
A: 800 700
B: 800 900 <T1, Start>
Log: A: 800 700 <T1, A, 800>
<T1,start> B: 800 <T1, B, 800>
<T1, A, 800>
<T1, B, 800>
log
Undo Logging
T1: Read (A); A  A-100
Write (A);
Read (B); B  B+100
Write (B);
A: 800 700
B: 800 900 <T1, Start>
Log: A: 800 700 <T1, A, 800>
<T1,start> B: 800 900 <T1, B, 800>
<T1, A, 800>
<T1, B, 800>
log
Undo Logging
T1: Read (A); A  A-100
Write (A);
Read (B); B  B+100
Write (B);
A: 800 700
B: 800 900 <T1, Start>
Log: A: 800 700 <T1, A, 800>
<T1,start> B: 800 900 <T1, B, 800>
<T1, A, 800> <T1, Commit>
<T1, B, 800>
<T1, commit> log
Recovery rules: Undo logging
(1)Let S = set of transactions with <Ti, start> in log, but
no <Ti, commit> (or <Ti, abort>) record in log
• What about those with Commit/Abort? commited transactions
dont care since updates
alr on disk
(2) For each <Ti, X, v> in log,
aborted txs
in reverse order (latest  earliest) do: have already
been undone
- if Ti  S then - X  v
- Update disk
(3) For each Ti  S do
- write <Ti, abort> to log
What if failure during recovery?
No problem! Undo is idempotent

Redo logging (deferred
modification/no-steal-no-force)
• In UNDO logging, we remember only the “old” value
• How about remembering only the “new” (updated)
values instead?
• Log record of the form <TID, object, newValue>
• What does this mean?
• NO old values, so NO updates must be written to disk
until a transaction commits!
• All updates have to be buffered in memory!
• Can also store dirty pages in temporary disk storage but very
inefficient
Redo logging (deferred modification)
T1: Read(A); A  A-100; write (A);
Read(B); B  B+100; write (B);
A: 800 A: 800
B: 800 B: 800
memory DB LOG
Redo log: <TID, Object, newValue>
A: 800 700 A: 800

B: 800900 B: 800
memory DB LOG

<T1, start>
<T1, A, 700>
A: 800 700 A: 800 <T1, B, 900>
B: 800900 B: 800 <T1, commit>
memory DB LOG

<T1, start>
<T1, A, 700>
A: 800 700 output A: 800 700 <T1, B, 900>
B: 800900 B: 800 <T1, commit>
memory DB LOG

Redo logging rules
(1) For every action, generate redo log record
(containing new value)
(2) Before X is modified on disk (DB), ALL
log records for transaction that modified X
(including commit) must be on disk
no urgency to ensure that data dirty page has to be written to disk.

Recovery rules: Redo logging
(1) Let S = set of transactions with
<Ti, commit> in log
(2) For each <Ti, X, v> in log, in forward
order (earliest  latest) do:
- if Ti  S then X  v
go ALL the way back
Update X on disk
Redo is also idempotent

Key drawbacks:
• Undo logging (steal/force)
• increase the number of disk I/Os
• Redo logging (no-steal/no-force)
• need to keep all modified blocks in memory until
commit inefficient use of space

Another Solution: undo/redo logging!
Update  <TID, object, newValue, oldValue>
page X
Rules:
1) Page X can be flushed before or after Ti commit
2) Log record flushed before corresponding updated page (WAL)
3) All log records flushed at commit
This is adopted in IBM DB2 – known as the

Aries Recovery Manager
Recovery process:
• Backwards pass
• construct set S of committed transactions
• undo actions of transactions not in S
• Forward pass
• redo actions of S transactions

Checkpointing

Recovery can be very, very SLOW !
Redo log:
... ... ...
First T1 wrote A,B Last

Crash
Record Committed a year ago Record
(1 year ago) --> STILL, Need to redo after crash!!

Solution: Checkpoint (simple version)
Periodically:
(1) Do not accept new transactions Tx freeze
(2) Wait until all (active) transactions finish
non-ideal
(3) Flush all log records to disk (log)
(4) Flush all buffers to disk (DB)
(5) Write “checkpoint” record on disk (log)
(6) Resume transaction processing
database now considered to be "new"

Example: what to do at recovery?
Redo log (disk): <T1,commit>
<T2,commit>
<T1,A,16>
<T3,C,21>
<T2,B,17>
Checkpoint Crash
... ... ... ... ... ...
No need to examine log records before the most recent Checkpoint

Non-quiescent Checkpoint
• Processing continues in the midst of
checkpointing
• No blocking of newly arrived transactions
too frequent - lots of overhead

rarely - log file very long
tradeoff

Non-quiescent Checkpoint: Undo Log
Only uncommitted transactions
started from this point onwards
need to be undone
L Start-ckpt
end ...
O ... active TR: ...
ckpt
T1,T2,... Tk
G
if theres no end, means that
cp did not happen successfully
Can be discarded Wait till all of
when checkpoint T1,. . . , Tk
completes commit or abort,
successfully but do not prohibit
other transactions
guaranteed T1-Tk completed
successfully, same for everything from starting
else before
Non-quiescent Checkpoint: Redo Log
We do not need to look further even if they started b4
start-cp
back than the earliest of the Start
Need to redo transactions that
of the active transactions T1… Tk
committed from this point
onwards (including T1… Tk)
L Start-ckpt
end ...
O ... active TR: ...
ckpt
T1,T2,... Tk
G
Committed transactions’ Write to disk all database elements that
updates all reflected on were written to buffers but not yet to disk
disk. So, no need to redo by transactions that had already committed
them. when the START CKPT record was written
to the log
CS3223 – Crash Recovery if commited tx have dirty pages that are not flushed 54
out, flush them out
Non-quiesce checkpoint
(Undo/Redo logging)
Committed transactions’
updates all reflected on Need to redo actions of transactions
disk. So, no need to redo committed from this point
them. onwards. No need to redo
actions of T1… Tk (if committed)
before Start-Ckpt.
L Start-ckpt
end
O ... active TR: ... ... need to undo
actions after start
T1,T2,... ckpt cp that have not been
G commited.
...
for Undo All dirty buffer pages prior to

(because Start-Ckpt are flushed without
dirty pages disrupting runtime operations
are flushed) write to disk, even if not completed. since
CS3223 – Crash Recovery we have undo value in case it crashes. 55
Examples: what to do at recovery time?
no T1 commit
L
T1,- Ckpt Ckpt T 1-
O ... ... ... ...
a T1 end b
G
 Undo T1 (undo a,b)

Example
L
O T1 ckpt-s T1 ckpt- T1 T1
... ... T1 ... ... ... ... ...
G a b end c cmt
 Redo T1: (redo b,c)

if crash happens after commit, only need to redo b and c,
because those before cp start are guaranteed to be flushed to disk

Media failure
(Loss of non-volatile storage)
A: 16
Solution: Make copies of data!

stateless storage. read from 1 copy but write to m copies
hopefully most copies will survive when theres a failure

Triple Modular Redundancy
• Keep 3 copies on separate disks
• Output(X) --> three outputs
• Input(X) --> three inputs + vote
X1 X2 X3

DB Dump + Log
backup active
database database
log
• If active database is lost,

– restore active database from backup
– bring up-to-date using redo entries in log

Summary
• Consistency of data
• Two sources of problems:
• Failures
• Logging
• Redundancy
• Data sharing
• Concurrency Control
• Log-based recovery mechanisms
• Undo, Redo, Undo/Redo
• What about No Undo/No Redo???
no recovery needed. if failure, restart
not used in practice because of high overhead.
crashes are uncommon anyway, not worth the overhead

Log-Based Recovery Schemes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Log-Based Recovery Schemes

Uploaded by

Copyright:

Available Formats

Log-Based Recovery Schemes

If you are going to be in the logging

CS3223 – Crash Recovery 2

CS3223 – Crash Recovery 3

If T starts with a consistent state + T executes in

CS3223 – Crash Recovery 4

Example Transfer fund from A to B

CS3223 – Crash Recovery 6

CS3223 – Crash Recovery 13

CS3223 – Crash Recovery 14

• Four possible design options

Steal Undo & no redo Undo & redo

No steal No undo & no redo No undo & redo

No-steal policy  No undo

Which is the best solution??

• LSN of log record = address of log record

CS3223 – Crash Recovery 16

memory disk Log (Stable)

A:800 700 A:800

memory disk log

memory disk log

memory disk log

memory disk log

memory disk log

memory disk log

CS3223 – Crash Recovery 24

memory A: 800 700

CS3223 – Crash Recovery 25

memory A: 800 700

This means log record for A must be on log disk

CS3223 – Crash Recovery 27

CS3223 – Crash Recovery 28

CS3223 – Crash Recovery 30

CS3223 – Crash Recovery 37

A: 800 700 A: 800

CS3223 – Crash Recovery 40

CS3223 – Crash Recovery 41

CS3223 – Crash Recovery 42

CS3223 – Crash Recovery 43

Redo is also idempotent

CS3223 – Crash Recovery 44

CS3223 – Crash Recovery 45

This is adopted in IBM DB2 – known as the

CS3223 – Crash Recovery 47

CS3223 – Crash Recovery 48

... ... ...

First T1 wrote A,B Last

CS3223 – Crash Recovery 49

CS3223 – Crash Recovery 50

No need to examine log records before the most recent Checkpoint

CS3223 – Crash Recovery 51

too frequent - lots of overhead

CS3223 – Crash Recovery 52

for Undo All dirty buffer pages prior to

 Undo T1 (undo a,b)

CS3223 – Crash Recovery 56

 Redo T1: (redo b,c)

CS3223 – Crash Recovery 57

Solution: Make copies of data!

CS3223 – Crash Recovery 58

CS3223 – Crash Recovery 59

• If active database is lost,

CS3223 – Crash Recovery 60