You are on page 1of 37

I/O Systems:

Devices, Buses, & You ARE the


weakest link!
Queues

“I/O certainly has been lagging in the last


decade.”
- Seymour Cray (1976)
“Also, I/O needs a lot of work.”
- David Kuck, 15th ISCA (1988)

1
Today’s Menu:
 I/O Systems
 Design
 Performance: Throughput vs Latency
 Basic Disk Drive Anatomy
 Busses
 Types of busses
 Design Choices
 Arbitration

2
The Big Picture: Where are We Now?
 Today’s Topic: I/O Systems
Network

Processor Processor
Input Input
Control Control
Memory Memory

Datapath Output Datapath


Output

3
I/O System Design Issues
 Performance
 Expandability
 Resilience in the face of failure
interrupts
Processor

Cache

Memory - I/O Bus

Main I/O I/O I/O


Memory Controller Controller Controller

Disk Disk Graphics Network


4
Application Performance
 1996 - 1997
 CPU performance improves by
 N = 400/200 = 2 100.00
 program performance improves by 90.00
CPU Time
 N = 100/55 = 1.81
80.00
 1997 - 1998 70.00
I/O Time

 CPU performance - factor of 2 60.00

Seconds
 program performance
50.00
 N = 55/32.5 = 1.7
40.00
 1998 - 1999 30.00
 CPU performance - factor of 2
20.00
 program performance
10.00
 N = 32.5 / 21.25 = 1.53
0.00
 1999 - 2000 1996 1997 1998 1999 2000

 CPU Performance - factor of 2


 program performance
 N = 21.25 / 15.6 = 1.36
5
Performance for Web Surfing
 Assume 50 seconds CPU & 50 seconds I/O
 1996 - 1997
 CPU performance improves by
100.00
 N = 400/200 = 2
90.00 CPU Time
 program performance improves by
80.00
 N = 100/75 = 1.33 I/O Time
70.00

 1997 - 1998 60.00

Seconds
 CPU performance - factor of 2 50.00

40.00
 program performance
30.00
 N = 75/62.5= 1.2
20.00
 1998 - 1999 10.00
 CPU performance -f actor of 2 0.00
1996 1997 1998 1999 2000
 program performance
 N = 62.5/56.5 = 1.11
6
I/O Device Examples

Device Behavior Partner Data Rate (KB/sec)


Keyboard Input Human 0.01
Mouse Input Human 0.02
Printer Output Human 3.00
Floppy disk Storage Machine 50.00
Laser Printer Output Human 100.00
Optical Disk Storage Machine 500.00
Magnetic Disk Storage Machine 5,000.00
Network-LAN Input or Output Machine 20 – 1,000.00
Graphics Display Output Human 30,000.00

7
I/O System Performance

 I/O System performance depends on many aspects of the system


(“limited by weakest link in the chain”):
 The CPU
 The memory system:
 Internal and external caches
 Main Memory
 The underlying interconnection (buses)
 The I/O controller
 The I/O device
 The speed of the I/O software (Operating System)
 The efficiency of the software’s use of the I/O devices
 Two common performance metrics:
 Throughput: I/O bandwidth
 Response time: Latency
8
Throughput versus Respond Time
Response
Time (ms)
300

200

100

20% 40% 60% 80% 100%


Percentage of maximum throughput

9
What’s Inside A Disk Drive?
Spindle
Arm Platters

Actuator

Electronics

SCSI

Image courtesy of Seagate Technology Corporation 10


Magnetic Disk
 Purpose:

Registers
 Long term, nonvolatile storage

Cache

Memory
 Large, inexpensive, and slow

Disk
 Lowest level in the memory hierarchy

 Two major types:


 Floppy disk
 Hard disk

 Both types of disks:


 Rely on a rotating platter coated with a magnetic surface
 Use a moveable read/write head to access the disk

 Advantages of hard disks over floppy disks:


 Platters are more rigid ( metal or glass) so they can be larger
 Higher density because it can be controlled more precisely
 Higher data rate because it spins faster
 Can incorporate more than one platter 11
And If You Look More Closely

Platters

Tracks
Sectors

Two sides, write


on top and bottom

Cylinders: the set of corresponding


tracks on all the platters.
12
Organization of a Hard Magnetic Disk

A Track

Platters

A Sector

 Typical numbers (depending on the disk size):


 500 to 2,000 tracks per surface
 32 to 128 sectors per track
 A sector is the smallest unit that can be read or written

 Traditionally all tracks have the same number of sectors:


 Constant bit density: record more sectors on the outer tracks
 Recently relaxed: constant bit size, speed varies with track location
13
Disk Drive Performance: the Numbers
 Seek time
 move head to the desired track
 today’s drives - 5 to 15 ms
 average seek = time for all possible seeks/no. of possible seeks
 actual average seek = 25% to 33% due to locality
 Rotational latency
 today’s drives - 5,400 to 12,000 RPM
Track
 approximately 12 ms to 5 ms
Sector
 average rotational latency = (0.5)(rotational latency)
 Transfer time
 time to transfer a sector (1 KB/sector)
 function of rotation speed, recording density Cylinder
 today’s drives - 10 to 40 MBytes/second Platter
Head
 Controller time
 overhead on drive electronics adds to manage drive
 but also gives prefetching and caching

14
Disk Drive Performance (cont.)
 Average access time =
 (seek time) + (rotational latency) + (transfer) + (controller time)
 Track and cylinder skew
 cylinder switch time
 delay to change from one cylinder to the next
 may have to wait an extra rotation
 solution - drives incorporate skew
 offset sectors between cylinders to account for switch time
 head switch time
 change heads to go from one track to next on same cylinder
 incur additional settling time
 Prefetching
 disks usually read an entire track at a time
 assumes that request for the next sector will come soon
 Caching
 limited amount of caching across requests, but prefetching is preferred
15
Example

 Disk characteristics
 512 byte sector, rotate at 5400 RPM, advertised seeks is 12 ms,
transfer rate is 4 MB/sec, controller overhead is 1 ms,
queue idle so no service time
 Disk access time = ?
 Access Time = Seek time + Rotational Latency + Transfer time
+ Controller Time + Queuing Delay
Access Time = 12 ms + 0.5 / 5400 RPM + 0.5 KB / 4 MB/s + 1 ms + 0 ms
= 12 ms + 0.5 / 90 RPS + 0.125 / 1024 s + 1 ms + 0 ms
= 12 ms + 5.5 ms + 0.1 ms + 1 ms + 0 ms
= 18.6 ms
Be very very careful about
the units on things. For example, at left,
rotations per minute transformed into
rotations per sec here so we can cancel
the “rotations” part and get out “seconds”
16
ASIDE: Disk I/O Performance
Request Rate Service Rate
λ µ
Disk Disk
Controller
Queue
Processor

Disk Disk
Controller
Queue

 Disk Access Time


 Access time = Seek time + Rotational Latency + Transfer time
+ Controller Time + Queuing Delay

17
I/O Benchmarks for Magnetic Disks
 Supercomputer application:
 Large-scale scientific problems => large files
 One large read and many small writes to snapshot computation
 Data Rate: MB/second between memory and disk

 Transaction processing:
 Examples: Airline reservations systems and bank ATMs
 Small changes to large shared software
 I/O Rate: No. disk accesses / second given upper limit for latency

 File system:
 Measurements of UNIX file systems in an engineering environment:
 80% of accesses are to files less than 10 KB
 90% of all file accesses are to data with sequential addresses on disk
 67% of the accesses are reads, 27% writes, 6% read-write
 I/O Rate & Latency: No. disk accesses /second and response time
18
Magnetic Storage Is Cheaper Than Paper

 File cabinet: cabinet (four drawer) $250


paper (24,000 sheets) $250
space (2x3 @ 10$/ft2) $180
total $700
3¢/sheet
 Disk: disk (40 GB) $100
ASCII = 20 million pages
0.0005¢/sheet (6000x cheaper)

 Capacity (per unit area) doubles every 12 months!

 Conclusion - Store Everything on Disk


Courtesy of Jim Gray, Microsoft Research 19
But What Do We Have To Store?
Databases
One popular Information at Your Fingertips™
suggestion: Information Network™
Knowledge Navigator™

 You might record everything you


 read - 10 MB/day, 400 GB/lifetime
 (eight tapes today)
 hear - 400 MB/day, 16 TB/lifetime
 (three tapes/year today)
 see - 1 MB/s, 40GB/day, 1.6 PB/lifetime
 (maybe someday)

 All information will be in an online database (somewhere)

Courtesy of Jim Gray, Microsoft Research 20


System-Level View - Bandwidth

System Bus
1600 MB/s Memory

Processor

PCI
133 MB/s
Disk
 Disks are pretty far away... SCSI 10 MB/s
40 MB/s
21
System-Level View - Latency

System Bus
Memory
40 ns
Processor
1 ns

PCI

Disk
 And slow too... SCSI 7 ms

22
Busses

 Lots of sub-systems need to communicate

CPU

Video Bus Disk

Mem

 Busses: Shared wires for common communication


23
Other Bus Issues

 PRO: System flexibility


 Buy new components and integrate
 Build and integrate new components

 PRO: Shared resource


 No point to point interconnects that might not be fully utilized

 CON: Physical constraints


 Performance is limited by physical design
 CON: Standards trail the state of the art
 By the time its fully adopted, it is five years old

 CON: Shared Resource


 Simultaneous usage not possible
24
Bus Classifications

 CPU-memory busses
CPU-Memory Bus
 Fast
 Proprietary
 Closed and controlled Cache Bus Adapter
 Support only memory transactions Main Memory

CPU
 IO busses
 Standardized (SCSI, PCI) I/O Bus
 More diversity
 More length
IO controller IO controller

 Bus Bridges/Adapter
 Cross from one bus to another

25
Bus Design Decisions

High Performance Low Performance


Structure Split Addr & Data Multiplex Addr & Data
Width Wide Narrow
Transfer Size Large / Flexible Small
Split Transact. Yes No
Mastering Multiple bus master Single bus master
Clocking Synchronous Asynchronous

26
Bus Clocking: Synchronous

 Synchronous
 Sample the control signals at edge of clock

Clock

Addr Addr 0 Addr 1 Addr 2

Data Data 0 Data 1 Data 2

R/~W

 Pro: Fast and High Performance


 Con:
 Can’t be long (skew) or fast at same time
 All bus members must run at the right speed
27
Bus Clocking: Asynchronous
 Asynchronous
 Edge of control signals determines communication
 “Handshake Protocol”

Write Req
2

Addr Addr 0 Addr 1


3
Data Data 0 Data 1
1 4
Ack

1. Request (with actual transaction)


2. Acknowledge causes de-assert of Request
3. De-assert of Request causes de-assert of Ack
4. De-assert of Ack allows re-assertion of Request 28
Asynchronous Busses

 Pros:
 No clock
 Slow and fast components on the same bus

 Con:
 Inefficient: two round trips
Like somebody who always repeats what was said to them

Clock Skew Synchronous Better


(bus length)

Mixture of IO speeds

29
Structure, Width, and Transfer Length

 Separate vs. Multiplexed Address/Data


 Multiplexed: save wires
 Separate: more performance

 Wide words: higher throughput, less control per transfer


 On-chip cache to CPU busses: 256 bits wide
 Serial Busses

 Data Transfer Length


 More data per address/control transfer

 Example: Multiplexed Addr/Data with Data transfer of 4


Addr/Data Addr Data 0 Data 1 Data 2 Data 3

30
Split Transactions

 Problem: Long wait times

Clock
Addr Addr
Addr
Data Data
Data

6 cycles

ClockSolution: Split Transaction Bus



Addr Addr 0 Addr 1 Addr 2 Addr 3

Data Data0 Data1 Data0 Data1

Tag Tag 0 Tag 0 Tag 1 Tag 1

31
Bus Mastering

 Bus Master: a device that can initiate a bus transfer


1 2

CPU Mem Disk

3
 Example:
1. CPU makes memory request
2. Page Fault in VM requires disk access to load page
3. Mover data from disk to memory

 If the CPU is master, does it have to check to see if the disk is


ready to transfer?
32
Multiple Bus Masters

 What if multiple devices could initiate transfers?


 Update might take place in background while CPU operates

 Multiple CPUs on shared memory systems

 Challenge: Arbitration
 If two or more masters want the bus at the same time, who gets it?

33
Arbitration Goals

 Functionality
 Prevent bus conflicts (two bussed simultaneous drivers)

 Performance
 Need to make decisions quickly

 Priority
 Some masters are more desperate than others
 Example: DRAM refresh

 Fairness
 Every equal priority master should get equal service
 No “starvation”: Every requestor should eventually get bus

34
Arbitration Options
 Bus Request
 Bus Grant
 Bus Release

 Option 1: Daisy Chain

Device 1 Device2 Device3 Device4

Grant Grant Grant


 Problems:
 Not fair
 Not fast, especially for lowest priority

35
Centralized and Distributed Arbitration

 Centralized: Arbiter
 Require roundtrip
communication

Device 1 Device2 Device3 Device4

 Distributed:
 Self-selection
 Faster
 Require duplicated state Arb Arb Arb Arb
Device 1 Device2 Device3 Device4

36
Summary:

 I/O performance…
 … is limited by weakest link in chain between OS and device
 Disk I/O Benchmarks
 I/O rate vs. Data rate vs. latency
 Three Components of Disk Access Time:
 Seek Time: advertised to be 5 to 15 ms. May be lower in real life.
 Rotational Latency: 4 ms at 7200 RPM and 6 ms at 5400 RPM
 Transfer Time: 10 to 40 MB per second

 Busses
 Synchronous vs. Asynchronous
 Serial and Parallel
 Bus Mastering and Arbitration

37

You might also like