Professional Documents
Culture Documents
ch17 PDF
ch17 PDF
17.2
Failure Classification
Transaction failure :
z
Disk failure: a head crash or similar disk failure destroys all or part of
disk storage
z
17.3
Recovery Algorithms
2.
17.4
Storage Structure
Volatile storage:
z
Nonvolatile storage:
z
Stable storage:
z
17.5
Stable-Storage Implementation
Maintain multiple copies of each block on separate disks
z
Failure during data transfer can still result in inconsistent copies: Block
Successful completion
solution):
z
2.
3.
17.6
2.
2.
Better solution:
z
Record in-progress disk writes on non-volatile storage (Nonvolatile RAM or special area of disk).
17.7
Data Access
Physical blocks are those blocks residing on the disk.
Buffer blocks are the blocks residing temporarily in main memory.
Block movements between disk and main memory are initiated
output(B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
We assume, for simplicity, that each data item fits in, and is stored
17.8
read(X) assigns the value of data item X to the local variable xi.
Transactions
z
17.9
Buffer Block B
input(A)
A
output(B)
read(X)
write(Y)
x1
x2
y1
work area
of T1
work area
of T2
memory
17.10
disk
Silberschatz, Korth and Sudarshan
failure may occur after one of these modifications have been made but
before all of them are made.
17.11
shadow-paging
We assume (initially) that transactions run serially, that is, one after
the other.
17.12
Log-Based Recovery
A log is kept on stable storage.
z
When Ti finishes it last statement, the log record <Ti commit> is written.
We assume for now that log records are written directly to stable
17.13
modifications to the log, but defers all the writes to after partial
commit.
Assume that transactions execute serially
Transaction starts by writing <Ti start> record to log.
A write(X) operation results in a log record <Ti, X, V> being written,
17.14
only if both <Ti start> and<Ti commit> are there in the log.
Redoing a transaction Ti ( redoTi) sets the value of all data items updated
T1 : read (C)
A: - A - 50
C:- C- 100
Write (A)
write (C)
read (B)
B:- B + 50
write (B)
17.15
17.16
since undoing may be needed, update logs must have both old
value and new value
Output of updated blocks can take place at any time before or after
transaction commit
Order in which blocks are output can be different from the order in
17.17
Write
Output
<T0 start>
<T0, A, 1000, 950>
To, B, 2000, 2050
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700,x600>
1
C = 600
BB, BC
<T1 commit>
BA
17.18
redo(Ti) sets the value of all data items updated by Ti to the new
values, going forward from the first log record for Ti
That is, even if the operation is executed multiple times the effect is
the same as if it is executed once
Needed
17.19
17.20
Checkpoints
2.
3.
2.
3.
17.21
Checkpoints (Cont.)
2.
3.
Need only consider the part of log following above start record.
Earlier part of log can be ignored during recovery, and can be
erased whenever desired.
4.
5.
17.22
Example of Checkpoints
Tf
Tc
T1
T2
T3
T4
system failure
checkpoint
17.23
changed
z
17.24
Checkpoints are performed as before, except that the checkpoint log record
is now of the form
< checkpoint L>
where L is the list of transactions active at the time of the checkpoint
z
When the system recovers from a crash, it first does the following:
1.
2.
Scan the log backwards from the end, stopping when the first
<checkpoint L> record is found.
For each record found during the backward scan:
3.
17.25
During the scan, perform undo for each log record that
belongs to a transaction in undo-list.
2.
3.
Scan log forwards from the <checkpoint L> record till the end of
the log.
During the scan, perform redo for each log record that
belongs to a transaction on redo-list
17.26
Example of Recovery
Go over the steps of the recovery algorithm on the following log:
<T0 start>
<T0, A, 0, 10>
<T0 commit>
/* Scan at step 1 comes up to here */
<T1 start>
<T1, B, 0, 10>
<T2 start>
<T2, C, 0, 10>
<T2, C, 10, 20>
<checkpoint {T1, T2}>
<T3 start>
<T3, A, 10, 20>
<T3, D, 0, 10>
<T3 commit>
17.27
Log records are output to stable storage when a block of log records
in the buffer is full, or a log force operation is executed.
17.28
Log records are output to stable storage in the order in which they
are created.
Transaction Ti enters the commit state only when the log record
<Ti commit> has been output to stable storage.
17.29
Database Buffering
Database maintains an in-memory buffer of data blocks
z
If the block chosen for removal has been updated, it must be output to
disk
If a block with uncommitted updates is output to disk, log records with undo
information for the updates are output to the log on stable storage first
z
be ensured as follows.
z
17.30
in virtual memory
17.31
of some drawbacks:
z
1.
2.
17.32
volatile storage
z
z
17.33
Consult the log and redo all transactions that committed after
the dump
17.34
Key benefits
supports
easier
logical undo
to understand/show correctness
17.36
17.37
Physical redo logging does not conflict with early lock release
17.38
2.
3.
17.39
17.40
If a log record <Ti, X, V1, V2> is found, perform the undo and log a
special redo-only log record <Ti, X, V1>.
2.
17.41
4.
5.
6.
Cases 3 and 4 above can occur only if the database crashes while a
transaction is being rolled back.
17.42
17.43
(Redo phase): Scan log forward from last < checkpoint L> record till
end of log
1.
2.
17.44
Stop scan when <Ti start> records have been found for all Ti in
undo-list
17.45
2.
3.
17.46
Write a <checkpoint L> log record and force log to stable storage
Note list M of modified buffer blocks
4.
5.
2.
6.
<checkpoint L>
..
<checkpoint L>
..
last_checkpoint
Log
Database System Concepts, 5th Edition, Oct 5, 2006
17.47
17.48
ARIES
2.
Physiological redo
3.
4.
17.50
ARIES Optimizations
Physiological redo
z Affected page is physically identified, action within page can be
logical
17.51
be sequentially increasing
Typically
Page LSN
17.52
To update a page:
X-latch
Update
the page
Record
Unlock
page
ensuring idempotence
17.53
UndoInfo
Special redo-only log record called compensation log record (CLR) used to
Required
4'
17.54
3'
2'
1'
Silberschatz, Korth and Sudarshan
of the page
RecLSN
Page LSNs
on disk
P1 16
P6
16
P23
19
P6 12
..
P15 9
..
P23 11
P15
9
Buffer Pool
17.55
Contains:
DirtyPageTable
For
17.56
Which pages were dirty (disk version not up to date) at time of crash
Redo pass:
z
Undo pass:
z
17.57
Last checkpoint
End of Log
Time
Log
Redo pass
Analysis pass
Undo pass
17.58
Reads LSN of last log record for each transaction in undo-list from
checkpoint log record
17.59
17.60
2.
Otherwise fetch the page from disk. If the PageLSN of the page
fetched from disk is less than the LSN of the log record, redo the
log record
NOTE: if either test is negative the effects of the log record have
already appeared on the page. First test avoids even fetching the
page from disk!
17.61
Set UndoNextLSN of the CLR to the PrevLSN value of the update log
record
Arrows
4'
3'
17.62
6'
5' 2'
1'
At
each step pick largest of these LSNs to undo, skip back to it and
undo it
After
17.63
Savepoints:
z
17.64
Meanwhile
17.65
17.67
failed
z
Heart-beat messages
Transfer of control:
z
To take over control backup site first perform recovery using its copy
of the database and all the long records it has received from the
primary.
When the backup site takes over processing it becomes the new
primary
17.68
proceses the redo log records (in effect, performing recovery from
previous database state), performs a checkpoint, and can then delete
earlier parts of the log.
Hot-Spare configuration permits very fast takeover:
z Backup continually processes redo log record as they arrive,
on this in Chapter 19
17.69
primary
z
17.70
End of Chapter
Shadow Paging
Shadow paging is an alternative to log-based recovery; this scheme is
To start with, both the page tables are identical. Only current page table is
17.72
17.73
17.74
to make the current page table the new shadow page table, simply
update the pointer to point to current page table on disk
committed.
No recovery is needed after a crash new transactions can start right
(garbage collected).
17.75
recovery is trivial
Disadvantages :
z
No need to copy entire tree, only need to copy paths in the tree
that lead to updated leaf nodes
z
17.76
17.77
17.78
17.79
17.80
17.81