Disks and Databases: Outline

15-823
Advanced Topics in Database Systems Performance Outline
Disks and Databases Memory Hierarchy Overview

Disk Drive Characteristics
Disk Block Accesses
Trends and the Five Minute Rule
Disk Arrays (RAID)
1 2
Disks vs. Main Memory The Storage Hierarchy
Capacity:
Main memory size is some orders of magnitude smaller
than what large databases need Access Time Electronic RAM
Economics:
Main
and bulk storage
memory
Cost/MB for disks is much less than memory

Online Magnetic/optical
Today: Disks = $10/GB, Memory = $1/MB external disks
Durability: storage
Main memory is volatile Automated archives

Disks are inherently durable, storing persistent objects Archival (e.g. optical disk
storage jukeboxes, tape
Recovery begins with what is found in durable storage robots, etc.)
Typical capacity
3 4
Outline What’s Inside A Disk Drive?

Spindle
Memory Hierarchy Overview Arm Platters
Physical Characteristics Actuator
Disk Block Addressing
Disk Block Scheduling
Disk Block Accesses
Trends and the Five Minute Rule Electronics
Disk Arrays (RAID)

SCSI
connector
Image courtesy of Seagate Technology Corporation
5 6
1
Disk Components Top view of a single disk platter
Read/Write Head Arm
Upper Surface
Platter
Lower Surface
Cylinder
Track
Sector
Actuator
7 8
Surface organized into tracks Tracks broken up into sectors
9 10
Disk head position Rotation is counter-clockwise
11 12
2
About to read blue sector After reading blue sector
After BLUE read
13 14
Red request scheduled next Seek to Red’s track
After BLUE read After BLUE read Seek for RED
15 16
Wait for Red sector to reach head Read Red sector
After BLUE read Seek for RED Rotational latency After BLUE read Seek for RED Rotational latency After RED read
17 18
3
Response time for disks Seek Time
Access time: (service time for a disk access) Time required to move head over desired track
Command + Seek + Rotation + Transfer A seek has up to four components
Accelerate
Response time: Coast at max velocity
Queue time + Access time Only if going far enough to reach max velocity
Decelerate
Settle onto correct track
19 20
A real seek profile Rotational Latency

Full Seek Curve (Quantum Atlas III)
Time required for the first desired sector to reach head
Depends on rotation speed
16
14
Measured in Revolutions Per Minute (RPM)
12
Computing average rotational latency
Seek Time [ms]
10 Average rotational latency is time for 1/2 revolution

8 Example: 7200 RPM
One rotation = 60s / 7200 = 8.33 ms
6 Average rotational latency = 4.16 ms
0
0 1000 2000 3000 4000 5000 6000 7000 8000
Seek Distance [Cylinders]
21 22
Modern disk performance characteristics Storage Device Interface
Seek times: 1-15ms, depending on distance

Average 5-6ms
Improving at 7-10% per year
Rotation speeds: 10,000-15,000 RPM
… 5 6 7 12 23 …
Average latency of 3ms
Improving at 7-10% per year OS/Database’s view of storage device
Data rates: 15-30 MB/s, depending on zone

Average sector transfer time of 100-200us
Improving at 40+% per year Storage exposed as linear array of blocks
Common block size: 512 bytes
Number of blocks: device capacity / block size
Blocks accessed by Logical Block Number (LBN)
23 24
4
Logical Block Number Mappings Outline
Memory Hierarchy Overview

Disk Block Accesses
Scheduling
File/Record Layout
Non-volatile Write Buffers
Logging
Trends and the Five Minute Rule
Disk Arrays (RAID)
25 26
SCAN scheduling LOOK scheduling
The SCAN algorithm The LOOK and CLOOK algorithms

Introduced in 1967 Like SCAN and CSCAN, but don’t blindly go to end of disk
Start at one end of the disk, sweep to the other, then Sometimes called “Elevator” scheduling
reverse direction, and repeat Instead, stop and start only where there are actual requests
Service requests as they pass by
The CSCAN algorithm
Like SCAN, but at end go straight back to start
Fairer than SCAN for requests at either end
LOOK CLOOK
SCAN CSCAN
27 28
SSTF scheduling File/Record Organization

Shortest-Seek-Time-First (SSTF) Layout blocks on disk in expected access pattern
Pick the request that would incur the shortest seek time
Often done by assuming that distance predicts time
Example: Sequential Access Optimization
Store blocks sequentially on same/or adjacent cylinders
Very “greedy” algorithm
Allocate large extents (physically contiguous block ranges)
Generally best for average response time
Allows for some growth in file size
Down-side: Can be very unfair and even cause starvation
29 30
5
Remember Slotted-Pages (from System-R) A Record in a Slotted-Page
Records are stored sequentially

Offsets to start of each record stored at end of page Header Fixed-Length Fields Variable-Length Fields
R null, bitmap, offsets to

PAGE HEADER RH1 1237 record length, variable-length fields
RID SSN Name Age etc.
Jane 30 RH2 4322 John
1 1237 Jane 30
2 4322 John 45
45 RH3 1563 Jim 20 RH4
3 1563 Jim 20 7658 Susan 52
4 7658 Susan 52
5 2543 Leon 43
6 8791 Dan 37
31 32
Non-Volatile Write Buffers Log Disks
Non-volatile RAM (NVRAM) can be used to speed up Devote a disk as a log disk
disk writes All writes are sequential (to end of the log)
Non-volatile RAM retains changes after power is lost Eliminates seeks
How it works: Allows large sequential writes, uses disk bandwidth well
Database issues disk write Data written out to main disks in the background
Disk controller writes contents to NVRAM and returns
immediately
On crash-recovery end of log is examined
Disk controller writes contents of NVRAM to disk in
background
On crash-recovery contents of NVRAM buffers flushed to
disk
33 34
Outline Current Trends
Memory Hierarchy Overview 1. Capacity / Accesses per Second ratio increasing by

Disk Drive Characteristics 10x/decade
Disk accesses becoming more and more precious
Disk Block Accesses
Trends and the Five Minute Rule 2. Capacity / Bandwidth ratio increasing by 10x/decade
Disk Arrays (RAID) Disk data must become cooler (fewer accesses/byte stored)
35 36
6
1. Reducing number of disk accesses 2. Disk data must become cooler
Use a few large transfers instead of many small ones Disks of 1990:
Disk page size is growing (2KB – 8KB+ in last decade) 50 Kaps (KB accesses per second) to 1GB of data
Favor sequential transfers to random ones

1 Kaps per 20MB
$10 / MB for disk storage
Disks of today:
Reduces number of seeks, uses disk bandwidth better
Make use of disk mirroring (for redundancy) 120 Kaps to 80GB of data
Optimize for number of I/Os rather than for space (space is 1 Kaps per 500MB
cheap) < $1 / MB for RAM storage
So disk data today must be 25x cooler than in 1990
1990s disk data can live in RAM today
Large main memories can help cool disk data
Mirroring can also help spread out read accesses
37 38
Summary of Disk Trends Remember The Storage Hierarchy
Disk capacity increases 100x per decade

Disk page size increase 5x per decade Stale Electronic RAM
Disk data cools at 10x per decade Data
Main
memory
and bulk storage
In 10 years RAM will cost what disks do today Fresh

Data Online Magnetic/optical
external disks
storage
Automated archives
Archival (e.g. optical disk
storage jukeboxes, tape
robots, etc.)
Typical capacity
39 40
Cacheing: Location, Location, and Location Disk accesses are precious

The movement of data through the hierarchy
is guided by locality
Example:
Disk costs $1200 and does 120 accesses/second
Locality of active data: Each access/second costs $10
Data that have recently been referenced will very likely be One can buy 10MB of RAM for $10
referenced again Good investment if a 10MB cache would save an access
Locality of passive data: In general:
Data that have not been referenced recently will most likely
not be referenced in the future BreakEvenReferenceInterval(seconds) =
PagesPerMBofRAM x PricePerDisk
AccessesPerSecPerDisk PricePerMBRam
(Technology Ratio) (Economic Ratio)
41 42
7
Break Even Reference Interval The 5 minute and the 1 minute rule
Random Workloads: 5 Minute Rule:

Technology Ratio: ~1 – 2 Cache randomly accessed pages that
Economic Ratio: 100 – 400
are re-used every 5 minutes.
Break Even Interval: 2 – 5 minutes
Sequential Workloads: 1 Minute Rule:
Technology Ratio: ~0.1
Break Even Interval: 10 – 40 seconds Cache sequentially accessed pages
Example (TPC-C from 1997) that are re-used every 1 minute.
PagesPerMBofRAM = 128 pages/MB (8KB Pages)
AccessesPerSecPerDisk = 64
PricePerDisk = $2000 (9GB Disk)
PricePerMBofRAM = $15/MB
BreakEvenInterval = 2 * 133.3 = 266.6 = 4½ minutes
43 44
Optimal Index Page Size Benefit Cost Ratio and Record Sizes
Index Page Utility: 1

Benefit Cost Ratio vs. Page Size (7ms, 20MB/s, 10,000RPM Disk)
Gives number of levels of a binary tree that fit on a page 0.9

Increases with log2 of page size
0.8
IndexPageUtility = log2(EntriesPerPage)
Index Page Access Cost:

0.7
Benefit Cost Ratio
0.6
Gives the cost of fetching the page from disk 0.5
IndexPageAccessCost = DiskLatency + PageSize/DiskTransferRate 0.4 16 Byte Records
Index Page Benefit/Cost: 0.3

32 Byte Records
64 Byte Records
Ratio between the page utility and the page access cost 0.2 128 Byte Records
0.1
IndexPageBenefitCost = IndexPageUtility/IndexPageAccessCost
0
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
45 Page Size (KB) 46
Benefit Cost Ratio and Disk Characteristics Index Page Size Summary
Current technology dictates pages of ~32-64KB

Benefit Cost Ratio vs. Page Size (Across Disk Drives)
1.4
1.2
Current trends lead to increasing page sizes:
1 Increased record sizes lead to increased page sizes
Benefit Cost Ratio
Newer (faster) disks lead to increased page sizes

0.8
0.6
7ms, 20MB/s, 10,000RPM Disk
6ms, 25MB/s, 10,000RPM Disk
0.4
5.5ms, 30MB/s, 15,000RPM Disk
4.5ms, 40MB/s, 15,000RPM Disk
0.2
0
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
Page Size (KB)
47 48
8
Outline RAID Introduction
Memory Hierarchy Overview What is RAID?

Redundant Array of Independent Disks
Using multiple disks in an organized useful manner (usually
Disk Block Accesses for better reliability, but also better performance)
Trends and the Five Minute Rule Why?

Disk Arrays (RAID)
Increased Mean Time to Failure (MTTF) of a storage array
Increase the performance of a storage array
RAID 0, 1, and 5 Large databases may span many, many disks
Without redundancy, with N disks, MTTF decrease as 1/N
So with MTTF(1 Disk) = 5-10 yrs, MTTF (100 Disks) = 1-2 months
We will look at RAID 0, 1, and 5
49 50
RAID 0 - Striping RAID 1 – Mirroring
Stripes (interleaves) data across multiple disks Mirrors (duplicates) data across multiple disks
No redundancy – more disks worse MTTF Data will survive N-1 disk failures
Potentially provides N * Bandwidth with N drives Performance increase for reads
Potential performance loss on writes
MTTFARRAY = MTTFDISK Increases space usage by 2x
N
MTTFARRAY = (MTTFDISK)2 (Given a 2 disk set)
(MTTRDISK * 2)
Disk 0 Disk 1 Disk 2 Disk 3
Disk 0 Disk 1
B0 B1 B2 B3
B4 B5 B6 B7 B0 B0
B8 B9 B10 B11 B1 B1
B2 B2
51 52
RAID 5 – Distributed Parity Summary
Also known as poor-man’s mirroring Disk are becoming bigger, faster, and cheaper
Data striped across disks with parity Disk accesses are precious
Data will survive 1 disk loss (reconstruct using parity) Cache random data that is re-used in 5 minute
Read performance equal to RAID 0 – striping
Write performance worse due to writing parity Cache sequential data that is re-used in 1 minute
More economical in terms of capacity than mirroring Current page sizes are growing (16-64KB)
MTTFARRAY = (MTTFDISK)2
N * (G - 1) * (MTTRDISK)
RAID offers redundancy as well as faster accesses
Disk 0 Disk 1 Disk 2 Disk 3
B0 B1 B2 P0
B3 B4 P1 B5
B6 P2 B7 B8
53 54
9
References
P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson.
“RAID: High-Performance, Reliable Secondary Storage,” ACM
Computing Surveys, 26(2):145--185, June 1994
J. Gray, G. R. Putzolu. “The 5 Minute Rule for Trading Memory for
Disk Accesses and The 10 Byte Rule for Trading Memory for CPU
Time,” 395-398, SIGMOD 1987
J. Gray, G. Graefe, "The 5 minute rule, ten years later," SIGMOD
Record 26(4): 63-68, 1997
J. Gray, P. Shernoy "Rules of Thumb in Data Engineering," ICDE
2000, April 20000, San Diego
C. Ruemmler, and J. Wilkes, “An Introduction to Disk Drive Modeling,”
IEEE Computer, 27 (3), March 1994
A. Silberschatz, H. Korth, and S. Sudarshan, “Database System
Concepts,” Chapter 10, McGraw-Hill, 1998
55
10

Disks and Databases: Outline

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Disks and Databases: Outline

Uploaded by

Copyright:

Available Formats

15-823

Advanced Topics in Database Systems Performance Outline

Disks and Databases Memory Hierarchy Overview

Disks vs. Main Memory The Storage Hierarchy

Cost/MB for disks is much less than memory

Main memory is volatile Automated archives

Outline What’s Inside A Disk Drive?

Disk Arrays (RAID)

Read/Write Head Arm

Surface organized into tracks Tracks broken up into sectors

Disk head position Rotation is counter-clockwise

After BLUE read

Red request scheduled next Seek to Red’s track

After BLUE read After BLUE read Seek for RED

Wait for Red sector to reach head Read Red sector

A real seek profile Rotational Latency

10 Average rotational latency is time for 1/2 revolution

Modern disk performance characteristics Storage Device Interface

Seek times: 1-15ms, depending on distance

Memory Hierarchy Overview

SCAN scheduling LOOK scheduling

The SCAN algorithm The LOOK and CLOOK algorithms

SSTF scheduling File/Record Organization

Records are stored sequentially

R null, bitmap, offsets to

Non-Volatile Write Buffers Log Disks

Outline Current Trends

Memory Hierarchy Overview 1. Capacity / Accesses per Second ratio increasing by

Summary of Disk Trends Remember The Storage Hierarchy

Disk capacity increases 100x per decade

In 10 years RAM will cost what disks do today Fresh

Cacheing: Location, Location, and Location Disk accesses are precious

Random Workloads: 5 Minute Rule:

Index Page Utility: 1

Gives number of levels of a binary tree that fit on a page 0.9

Index Page Access Cost:

IndexPageAccessCost = DiskLatency + PageSize/DiskTransferRate 0.4 16 Byte Records

Index Page Benefit/Cost: 0.3

Current technology dictates pages of ~32-64KB

Newer (faster) disks lead to increased page sizes

Memory Hierarchy Overview What is RAID?

Trends and the Five Minute Rule Why?

We will look at RAID 0, 1, and 5

RAID 0 - Striping RAID 1 – Mirroring

RAID 5 – Distributed Parity Summary

You might also like