Professional Documents
Culture Documents
1 2
Capacity:
Main memory size is some orders of magnitude smaller
than what large databases need Access Time Electronic RAM
Economics:
Main
and bulk storage
memory
Typical capacity
3 4
1
Disk Components Top view of a single disk platter
Upper Surface
Platter
Lower Surface
Cylinder
Track
Sector
Actuator
7 8
9 10
11 12
2
About to read blue sector After reading blue sector
13 14
15 16
After BLUE read Seek for RED Rotational latency After BLUE read Seek for RED Rotational latency After RED read
17 18
3
Response time for disks Seek Time
Access time: (service time for a disk access) Time required to move head over desired track
Command + Seek + Rotation + Transfer A seek has up to four components
Accelerate
Response time: Coast at max velocity
Queue time + Access time Only if going far enough to reach max velocity
Decelerate
Settle onto correct track
19 20
14
Measured in Revolutions Per Minute (RPM)
12
Computing average rotational latency
Seek Time [ms]
0
0 1000 2000 3000 4000 5000 6000 7000 8000
Seek Distance [Cylinders]
21 22
23 24
4
Logical Block Number Mappings Outline
LOOK CLOOK
SCAN CSCAN
27 28
29 30
5
Remember Slotted-Pages (from System-R) A Record in a Slotted-Page
31 32
Non-volatile RAM (NVRAM) can be used to speed up Devote a disk as a log disk
disk writes All writes are sequential (to end of the log)
Non-volatile RAM retains changes after power is lost Eliminates seeks
How it works: Allows large sequential writes, uses disk bandwidth well
Database issues disk write Data written out to main disks in the background
Disk controller writes contents to NVRAM and returns
immediately
On crash-recovery end of log is examined
Disk controller writes contents of NVRAM to disk in
background
On crash-recovery contents of NVRAM buffers flushed to
disk
33 34
35 36
6
1. Reducing number of disk accesses 2. Disk data must become cooler
Use a few large transfers instead of many small ones Disks of 1990:
Disk page size is growing (2KB – 8KB+ in last decade) 50 Kaps (KB accesses per second) to 1GB of data
Favor sequential transfers to random ones
1 Kaps per 20MB
$10 / MB for disk storage
Disks of today:
Reduces number of seeks, uses disk bandwidth better
Make use of disk mirroring (for redundancy) 120 Kaps to 80GB of data
Optimize for number of I/Os rather than for space (space is 1 Kaps per 500MB
cheap) < $1 / MB for RAM storage
So disk data today must be 25x cooler than in 1990
1990s disk data can live in RAM today
Large main memories can help cool disk data
Mirroring can also help spread out read accesses
37 38
Automated archives
Archival (e.g. optical disk
storage jukeboxes, tape
robots, etc.)
Typical capacity
39 40
41 42
7
Break Even Reference Interval The 5 minute and the 1 minute rule
43 44
Optimal Index Page Size Benefit Cost Ratio and Record Sizes
0.6
Gives the cost of fetching the page from disk 0.5
0.1
IndexPageBenefitCost = IndexPageUtility/IndexPageAccessCost
0
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
45 Page Size (KB) 46
Benefit Cost Ratio and Disk Characteristics Index Page Size Summary
1.2
Current trends lead to increasing page sizes:
1 Increased record sizes lead to increased page sizes
Benefit Cost Ratio
0.6
7ms, 20MB/s, 10,000RPM Disk
6ms, 25MB/s, 10,000RPM Disk
0.4
5.5ms, 30MB/s, 15,000RPM Disk
4.5ms, 40MB/s, 15,000RPM Disk
0.2
0
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
Page Size (KB)
47 48
8
Outline RAID Introduction
49 50
Stripes (interleaves) data across multiple disks Mirrors (duplicates) data across multiple disks
No redundancy – more disks worse MTTF Data will survive N-1 disk failures
Potentially provides N * Bandwidth with N drives Performance increase for reads
Potential performance loss on writes
MTTFARRAY = MTTFDISK Increases space usage by 2x
N
MTTFARRAY = (MTTFDISK)2 (Given a 2 disk set)
(MTTRDISK * 2)
Disk 0 Disk 1 Disk 2 Disk 3
Disk 0 Disk 1
B0 B1 B2 B3
B4 B5 B6 B7 B0 B0
B8 B9 B10 B11 B1 B1
B2 B2
51 52
Also known as poor-man’s mirroring Disk are becoming bigger, faster, and cheaper
Data striped across disks with parity Disk accesses are precious
Data will survive 1 disk loss (reconstruct using parity) Cache random data that is re-used in 5 minute
Read performance equal to RAID 0 – striping
Write performance worse due to writing parity Cache sequential data that is re-used in 1 minute
More economical in terms of capacity than mirroring Current page sizes are growing (16-64KB)
MTTFARRAY = (MTTFDISK)2
N * (G - 1) * (MTTRDISK)
RAID offers redundancy as well as faster accesses
Disk 0 Disk 1 Disk 2 Disk 3
B0 B1 B2 P0
B3 B4 P1 B5
B6 P2 B7 B8
53 54
9
References
P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson.
“RAID: High-Performance, Reliable Secondary Storage,” ACM
Computing Surveys, 26(2):145--185, June 1994
J. Gray, G. R. Putzolu. “The 5 Minute Rule for Trading Memory for
Disk Accesses and The 10 Byte Rule for Trading Memory for CPU
Time,” 395-398, SIGMOD 1987
J. Gray, G. Graefe, "The 5 minute rule, ten years later," SIGMOD
Record 26(4): 63-68, 1997
J. Gray, P. Shernoy "Rules of Thumb in Data Engineering," ICDE
2000, April 20000, San Diego
C. Ruemmler, and J. Wilkes, “An Introduction to Disk Drive Modeling,”
IEEE Computer, 27 (3), March 1994
A. Silberschatz, H. Korth, and S. Sudarshan, “Database System
Concepts,” Chapter 10, McGraw-Hill, 1998
55
10