You are on page 1of 10

15-823

Advanced Topics in Database Systems Performance Outline

Disks and Databases  Memory Hierarchy Overview


 Disk Drive Characteristics
 Disk Block Accesses
 Trends and the Five Minute Rule
 Disk Arrays (RAID)

1 2

Disks vs. Main Memory The Storage Hierarchy

 Capacity:
 Main memory size is some orders of magnitude smaller
than what large databases need Access Time Electronic RAM

 Economics:
Main
and bulk storage
memory

 Cost/MB for disks is much less than memory


Online Magnetic/optical
 Today: Disks = $10/GB, Memory = $1/MB external disks
 Durability: storage

 Main memory is volatile Automated archives


 Disks are inherently durable, storing persistent objects Archival (e.g. optical disk
storage jukeboxes, tape
 Recovery begins with what is found in durable storage robots, etc.)

Typical capacity

3 4

Outline What’s Inside A Disk Drive?


Spindle
 Memory Hierarchy Overview Arm Platters
 Disk Drive Characteristics
 Physical Characteristics Actuator
 Disk Block Addressing
 Disk Block Scheduling
 Disk Block Accesses
 Trends and the Five Minute Rule Electronics

 Disk Arrays (RAID)


SCSI
connector
Image courtesy of Seagate Technology Corporation
5 6

1
Disk Components Top view of a single disk platter

Read/Write Head Arm

Upper Surface
Platter
Lower Surface

Cylinder

Track
Sector
Actuator

7 8

Surface organized into tracks Tracks broken up into sectors

9 10

Disk head position Rotation is counter-clockwise

11 12

2
About to read blue sector After reading blue sector

After BLUE read

13 14

Red request scheduled next Seek to Red’s track

After BLUE read After BLUE read Seek for RED

15 16

Wait for Red sector to reach head Read Red sector

After BLUE read Seek for RED Rotational latency After BLUE read Seek for RED Rotational latency After RED read

17 18

3
Response time for disks Seek Time

 Access time: (service time for a disk access)  Time required to move head over desired track
 Command + Seek + Rotation + Transfer  A seek has up to four components
 Accelerate
 Response time:  Coast at max velocity
 Queue time + Access time  Only if going far enough to reach max velocity
 Decelerate
 Settle onto correct track

19 20

A real seek profile Rotational Latency


Full Seek Curve (Quantum Atlas III)
 Time required for the first desired sector to reach head
 Depends on rotation speed
16

14
 Measured in Revolutions Per Minute (RPM)
12
 Computing average rotational latency
Seek Time [ms]

10  Average rotational latency is time for 1/2 revolution


8  Example: 7200 RPM
 One rotation = 60s / 7200 = 8.33 ms
6  Average rotational latency = 4.16 ms

0
0 1000 2000 3000 4000 5000 6000 7000 8000
Seek Distance [Cylinders]

21 22

Modern disk performance characteristics Storage Device Interface

 Seek times: 1-15ms, depending on distance


 Average 5-6ms
 Improving at 7-10% per year
 Rotation speeds: 10,000-15,000 RPM
… 5 6 7 12 23 …
 Average latency of 3ms
 Improving at 7-10% per year OS/Database’s view of storage device
 Data rates: 15-30 MB/s, depending on zone


Average sector transfer time of 100-200us
Improving at 40+% per year  Storage exposed as linear array of blocks
 Common block size: 512 bytes
 Number of blocks: device capacity / block size
 Blocks accessed by Logical Block Number (LBN)

23 24

4
Logical Block Number Mappings Outline

 Memory Hierarchy Overview


 Disk Drive Characteristics
 Disk Block Accesses
 Scheduling
 File/Record Layout
 Non-volatile Write Buffers
 Logging
 Trends and the Five Minute Rule
 Disk Arrays (RAID)
25 26

SCAN scheduling LOOK scheduling

 The SCAN algorithm  The LOOK and CLOOK algorithms


 Introduced in 1967  Like SCAN and CSCAN, but don’t blindly go to end of disk
 Start at one end of the disk, sweep to the other, then  Sometimes called “Elevator” scheduling
reverse direction, and repeat  Instead, stop and start only where there are actual requests
 Service requests as they pass by
 The CSCAN algorithm
 Like SCAN, but at end go straight back to start
 Fairer than SCAN for requests at either end

LOOK CLOOK

SCAN CSCAN

27 28

SSTF scheduling File/Record Organization


 Shortest-Seek-Time-First (SSTF)  Layout blocks on disk in expected access pattern
 Pick the request that would incur the shortest seek time
 Often done by assuming that distance predicts time
 Example: Sequential Access Optimization
 Store blocks sequentially on same/or adjacent cylinders
 Very “greedy” algorithm
 Allocate large extents (physically contiguous block ranges)
 Generally best for average response time
 Allows for some growth in file size
 Down-side: Can be very unfair and even cause starvation

29 30

5
Remember Slotted-Pages (from System-R) A Record in a Slotted-Page

 Records are stored sequentially


 Offsets to start of each record stored at end of page Header Fixed-Length Fields Variable-Length Fields

R null, bitmap, offsets to


PAGE HEADER RH1 1237 record length, variable-length fields
RID SSN Name Age etc.
Jane 30 RH2 4322 John
1 1237 Jane 30
2 4322 John 45
45 RH3 1563 Jim 20 RH4
3 1563 Jim 20 7658 Susan 52
4 7658 Susan 52
5 2543 Leon 43
6 8791 Dan 37

31 32

Non-Volatile Write Buffers Log Disks

 Non-volatile RAM (NVRAM) can be used to speed up  Devote a disk as a log disk
disk writes  All writes are sequential (to end of the log)
 Non-volatile RAM retains changes after power is lost  Eliminates seeks
 How it works:  Allows large sequential writes, uses disk bandwidth well

 Database issues disk write  Data written out to main disks in the background
 Disk controller writes contents to NVRAM and returns
immediately
 On crash-recovery end of log is examined
 Disk controller writes contents of NVRAM to disk in
background
 On crash-recovery contents of NVRAM buffers flushed to
disk

33 34

Outline Current Trends

 Memory Hierarchy Overview 1. Capacity / Accesses per Second ratio increasing by


 Disk Drive Characteristics 10x/decade
 Disk accesses becoming more and more precious
 Disk Block Accesses
 Trends and the Five Minute Rule 2. Capacity / Bandwidth ratio increasing by 10x/decade
 Disk Arrays (RAID)  Disk data must become cooler (fewer accesses/byte stored)

35 36

6
1. Reducing number of disk accesses 2. Disk data must become cooler

 Use a few large transfers instead of many small ones  Disks of 1990:
 Disk page size is growing (2KB – 8KB+ in last decade)  50 Kaps (KB accesses per second) to 1GB of data
 Favor sequential transfers to random ones 

1 Kaps per 20MB
$10 / MB for disk storage
 Disks of today:
 Reduces number of seeks, uses disk bandwidth better
 Make use of disk mirroring (for redundancy)  120 Kaps to 80GB of data
 Optimize for number of I/Os rather than for space (space is  1 Kaps per 500MB
cheap)  < $1 / MB for RAM storage
 So disk data today must be 25x cooler than in 1990
 1990s disk data can live in RAM today
 Large main memories can help cool disk data
 Mirroring can also help spread out read accesses

37 38

Summary of Disk Trends Remember The Storage Hierarchy

 Disk capacity increases 100x per decade


 Disk page size increase 5x per decade Stale Electronic RAM
 Disk data cools at 10x per decade Data
Main
memory
and bulk storage

 In 10 years RAM will cost what disks do today Fresh


Data Online Magnetic/optical
external disks
storage

Automated archives
Archival (e.g. optical disk
storage jukeboxes, tape
robots, etc.)

Typical capacity

39 40

Cacheing: Location, Location, and Location Disk accesses are precious


The movement of data through the hierarchy
is guided by locality
 Example:
 Disk costs $1200 and does 120 accesses/second
 Locality of active data:  Each access/second costs $10
 Data that have recently been referenced will very likely be  One can buy 10MB of RAM for $10
referenced again  Good investment if a 10MB cache would save an access
 Locality of passive data:  In general:
 Data that have not been referenced recently will most likely
not be referenced in the future BreakEvenReferenceInterval(seconds) =
PagesPerMBofRAM x PricePerDisk
AccessesPerSecPerDisk PricePerMBRam
(Technology Ratio) (Economic Ratio)

41 42

7
Break Even Reference Interval The 5 minute and the 1 minute rule

 Random Workloads:  5 Minute Rule:


 Technology Ratio: ~1 – 2 Cache randomly accessed pages that
 Economic Ratio: 100 – 400
are re-used every 5 minutes.
 Break Even Interval: 2 – 5 minutes
 Sequential Workloads:  1 Minute Rule:
 Technology Ratio: ~0.1
 Break Even Interval: 10 – 40 seconds Cache sequentially accessed pages
 Example (TPC-C from 1997) that are re-used every 1 minute.
 PagesPerMBofRAM = 128 pages/MB (8KB Pages)
 AccessesPerSecPerDisk = 64
 PricePerDisk = $2000 (9GB Disk)
 PricePerMBofRAM = $15/MB
 BreakEvenInterval = 2 * 133.3 = 266.6 = 4½ minutes

43 44

Optimal Index Page Size Benefit Cost Ratio and Record Sizes

 Index Page Utility: 1


Benefit Cost Ratio vs. Page Size (7ms, 20MB/s, 10,000RPM Disk)

 Gives number of levels of a binary tree that fit on a page 0.9


 Increases with log2 of page size
0.8
IndexPageUtility = log2(EntriesPerPage)

 Index Page Access Cost:


0.7
Benefit Cost Ratio

0.6
 Gives the cost of fetching the page from disk 0.5

IndexPageAccessCost = DiskLatency + PageSize/DiskTransferRate 0.4 16 Byte Records

 Index Page Benefit/Cost: 0.3


32 Byte Records
64 Byte Records
 Ratio between the page utility and the page access cost 0.2 128 Byte Records

0.1
IndexPageBenefitCost = IndexPageUtility/IndexPageAccessCost
0
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
45 Page Size (KB) 46

Benefit Cost Ratio and Disk Characteristics Index Page Size Summary

 Current technology dictates pages of ~32-64KB


Benefit Cost Ratio vs. Page Size (Across Disk Drives)
1.4

1.2
 Current trends lead to increasing page sizes:
1  Increased record sizes lead to increased page sizes
Benefit Cost Ratio

 Newer (faster) disks lead to increased page sizes


0.8

0.6
7ms, 20MB/s, 10,000RPM Disk
6ms, 25MB/s, 10,000RPM Disk
0.4
5.5ms, 30MB/s, 15,000RPM Disk
4.5ms, 40MB/s, 15,000RPM Disk
0.2

0
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
Page Size (KB)
47 48

8
Outline RAID Introduction

 Memory Hierarchy Overview  What is RAID?


 Disk Drive Characteristics
 Redundant Array of Independent Disks
 Using multiple disks in an organized useful manner (usually
 Disk Block Accesses for better reliability, but also better performance)

 Trends and the Five Minute Rule  Why?


 Disk Arrays (RAID)
 Increased Mean Time to Failure (MTTF) of a storage array
 Increase the performance of a storage array
 RAID 0, 1, and 5  Large databases may span many, many disks
 Without redundancy, with N disks, MTTF decrease as 1/N
 So with MTTF(1 Disk) = 5-10 yrs, MTTF (100 Disks) = 1-2 months

 We will look at RAID 0, 1, and 5

49 50

RAID 0 - Striping RAID 1 – Mirroring

 Stripes (interleaves) data across multiple disks  Mirrors (duplicates) data across multiple disks
 No redundancy – more disks worse MTTF  Data will survive N-1 disk failures
 Potentially provides N * Bandwidth with N drives  Performance increase for reads
 Potential performance loss on writes
MTTFARRAY = MTTFDISK  Increases space usage by 2x
N
MTTFARRAY = (MTTFDISK)2 (Given a 2 disk set)
(MTTRDISK * 2)
Disk 0 Disk 1 Disk 2 Disk 3
Disk 0 Disk 1

B0 B1 B2 B3
B4 B5 B6 B7 B0 B0
B8 B9 B10 B11 B1 B1
B2 B2

51 52

RAID 5 – Distributed Parity Summary

 Also known as poor-man’s mirroring  Disk are becoming bigger, faster, and cheaper
 Data striped across disks with parity  Disk accesses are precious
 Data will survive 1 disk loss (reconstruct using parity)  Cache random data that is re-used in 5 minute
 Read performance equal to RAID 0 – striping
 Write performance worse due to writing parity  Cache sequential data that is re-used in 1 minute
 More economical in terms of capacity than mirroring  Current page sizes are growing (16-64KB)
MTTFARRAY = (MTTFDISK)2
N * (G - 1) * (MTTRDISK)
 RAID offers redundancy as well as faster accesses
Disk 0 Disk 1 Disk 2 Disk 3

B0 B1 B2 P0
B3 B4 P1 B5
B6 P2 B7 B8
53 54

9
References
 P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson.
“RAID: High-Performance, Reliable Secondary Storage,” ACM
Computing Surveys, 26(2):145--185, June 1994
 J. Gray, G. R. Putzolu. “The 5 Minute Rule for Trading Memory for
Disk Accesses and The 10 Byte Rule for Trading Memory for CPU
Time,” 395-398, SIGMOD 1987
 J. Gray, G. Graefe, "The 5 minute rule, ten years later," SIGMOD
Record 26(4): 63-68, 1997
 J. Gray, P. Shernoy "Rules of Thumb in Data Engineering," ICDE
2000, April 20000, San Diego
 C. Ruemmler, and J. Wilkes, “An Introduction to Disk Drive Modeling,”
IEEE Computer, 27 (3), March 1994
 A. Silberschatz, H. Korth, and S. Sudarshan, “Database System
Concepts,” Chapter 10, McGraw-Hill, 1998

55

10

You might also like