Vmworld Europe 2009 Vscsistats PDF

Session WV23
Fast and Easy Disk Workload

Characterization on
VMware ESX Server
Irfan Ahmad
R&D Engineer
VMware, Inc.
Motivation
This session is useful to you if you:

Have disk performance sensitive workload
Want to improve disk performance of your production VMs
Need to communicate better with your SAN admin about
your I/O workload
Want to match different RAID arrays to different workloads
Wish to monitor your disk workload characteristics over time
Like to pick the best filesystem for each job
Abstract and Outline
Disk I/O characterization of We demonstrate our technique

applications is the first step in using:
tuning disk subsystems; key Filebench OLTP
questions:
Differences between ZFS and UFS
I/O block size filesystems on Solaris
Spatial locality OSDL dbt-2
I/O interarrival period Windows large file copy
Active queue depth Negligible overheads in cpu,
Latency memory, and latency
Read/Write Ratios
Our technique allows
transparent and online
collection of essential workload
characteristics
Applicable to arbitrary, unmodified
operating systems running in virtual
machines
Target Audience
System administrators interested in optimizing

the disk subsystem
Good tool for sysadmins to gather data for the
consumption of SAN administrators
Meant for power users
In its current form, the user interface isn’t very pretty
Command line only; no graphical interface
Data produced is very detailed
Requires minimal post-processing for basic analysis
Workload Characterization Technique
Histograms of observed data values can be much more

informative than single numbers like mean, median, and
standard deviations from the mean
E.g., multimodal behaviors are easily identified by plotting a
histogram, but obfuscated by a mean
Histograms can actually be calculated efficiently online
Why take one number if you can have a distribution?
Made up Example
2500
2000
1500
1000
500
Frequency
0
1
10
Mean is 5.3! Latency of an operation (microseconds)
The ESX disk I/O workload

characterization is on a
per-virtual disk basis Data
collected
Allows us to separate out each
per-virtual
different type of workload into its
disk
own container and observe trends
Histograms only collected if
enabled; no overhead otherwise
Technique:
For each virtual machine I/O request
in ESX, we insert some values
into histograms
E.g., size of I/O request → 4KB
6
4
2
0
1024
2048
4096
8192
Full List of Histograms
Read/Write Distributions are I/O Size

available for our histograms All, Reads, Writes
Overall Read/Write ratio? Seek Distance
Are Writes smaller or larger All, Reads, Writes
than Reads in this workload?
Are Reads more sequential
Seek Distance Shortest
than Writes? Among Last 16
Which type of I/O is incurring Outstanding IOs
more latency? All, Reads, Writes
In reality, the problem is not I/O Interarrival Times
knowing which question
All, Reads, Writes
to ask
Collect data, see what you find
Latency
All, Reads, Write
Histograms Buckets
To make the histograms practical, bin sizes are on

rather irregular scales
E.g., the I/O length histogram bin ranges like this:
…, 2048, 4095, 4096, 8191, 8192, … rather odd: some buckets
are big and others are as small as just 1
Certain block sizes are really special since the underlying storage
subsystems may optimize for them; single those out from the start
(else lose that precise information)
E.g., important to know if the I/O was
16KB or some other size in the
interval (8KB,16KB)
2048
4095
4096
8191
8192
16383
16384
32768
Test Setup
Machine Model HP DL 585 G2
CPU 8 cpus (4 socket, dual-core)

@2.4 GHz
Total Memory 8 GB
Hypervisor VMware ESX Server
Disk Subsystem • EMC Symmetrix 500GB RAID

(FC SAN) • QLogic 2340 (Fibre Channel)
Table 1. Machine/storage specifications

Filebench OLTP (Solaris)
Filebench is a model-based workload generator for file

systems developed by Sun Microsystems
Input to this program is a model file that specifies processes,
threads in a workflow
Filebench OLTP “personality” is a model to emulate an
Oracle database server generating I/Os under an online
transaction processing workload
Other personalities include fileserver, webserver, etc.
Used two different filesystems (UFS and ZFS)
To study what effect a filesystem can have on I/O characteristics
Ran filebench on Solaris 5.11 (build 55)
I/O Length
Filebench OLTP 3500
I/O Length Histogram
3000
2500
2000
Frequency
1500
UFS 1000
500
0
512
1024
2048
4095
4096
8191
8192
16383
16384
32768
49152
65535
65536
81920
131072
262144
524288
>524288
4K and 8K I/O
transformed Length (bytes)
into 128K by I/O Length Histogram

1600
ZFS? 1400
1200
Frequency
1000
800
ZFS 600
400
200
0
512
1024
2048
4095
4096
8191
8192
16383
16384
32768
49152
65535
65536
81920
131072
262144
524288
>524288
Length (bytes)
Seek Distance 1400
Seek Distance Histogram
Filebench OLTP 1200

1000
Seek distance: a
Frequency
800
measure of 600
sequentiality UFS 400
versus 200
0
randomness in a
-500000
-50000
-5000
-500
-64
-16
-6
-2
0
2
6
16
64
500
5000
50000
500000
workload
Distance (sectors)
Somehow a
random workload 300
is transformed 250
into a sequential 200

Frequency
one by ZFS! ZFS 150
More details 100
needed ... 50
0
-500000
-50000
-5000
-500
-64
-16
-6
-2
0
2
6
16
64
500
5000
50000
500000
Distance (sectors)
Seek Distance
Filebench OLTP—More Detailed Split out reads & writes
Seek Distance Histogram (Writes) Seek Distance Histogram (Reads)

1200 600
1000 500
800
Frequency
400
Frequency
600
300
400
UFS 200
200
100
0
-500000
-50000
-5000
-500
-64
-16
-6
-2
0
2
6
16
64
500
5000
50000
500000
0
-500000
-50000
-5000
-500
-64
-16
-6
-2
16
64
500
5000
50000
500000
Distance (sectors) Distance (sectors)
Seek Distance Histogram (Writes) Seek Distance Histogram (Reads)

300 300
250 250
200 200
Frequency
Frequency
150 150
ZFS 100 100
50 50
0 0
-500000
-50000
-5000
-500
-64
-16
-6
-2
0
2
16
64
500
5000
50000
500000
-500000
-50000
-5000
-500
-64
-16
-6
-2
6
16
64
500
5000
50000
500000
Distance (sectors) Distance (sectors)
Transformation from Random to Sequential: primarily for Writes

Reads: Seek distance is reduced (look at histogram shape & scales)
Filebench OLTP
Summary
So, what have we learnt about Filebench OLTP?

I/O is primarily 4K but 8K isn’t uncommon (~30%)
Access pattern is mostly random
Reads are entirely random
Writes do have a forward-leaning pattern
ZFS is able to transform random Writes into sequential:
Aggressive I/O scheduling
Copy-on-write (COW) technique (blocks on disk not modified in place)
Changes to blocks from app writes are written to alternate locations
Stream otherwise random data writes to a sequential pattern on disk
Performed this detailed analysis in just a few minutes
OSDL Database Test 2 (Linux 2.6.17-10)
OSDL DBT-2:
A fair usage implementation of the TPC-C benchmark specification
Simulates a wholesale parts supplier
Several workers access a database, update customer information
and check on parts inventories
More info on DBT-2 @
More info on TPC-C @ http://www.tpc.org/tpcc
Used the PostgreSQL 8.1 RDBMS implementation of DBT-2
http://www.postgresql.org/docs/techdocs
Ubuntu 6.10 server distribution
Linux 2.6.17-10
Scaling factor of 250 (warehouses) with 50 connections
Analysis
Seek Distance Histogram (Writes)
300
Workload is primarily 250
random (big spikes towards 200
Frequency
the right and left edges of 150
100
the graph) 50
Still, many I/Os that are 0
-500000
-50000
-5000
-500
-64
-16
-6
-2
16
64
500
5000
50000
500000
within 500 sectors (20%) or Distance (sectors)
within 5,000 sectors (33%) of 1800

the previous command 1600

1400
1200
Frequency
1000
The workload is almost 800
600
exclusively 8K for both 400

200
0
reads and writes 512
1024
2048
4095
4096
8191
8192
16383
16384
32768
49152
65535
65536
81920
131072
262144
524288
> 524288
Length (bytes)
Analysis (2)
Outstanding I/Os Histogram (Reads, Writes)
1000
Reads
900
Writes
800
The number of outstanding 700
Frequency
600
I/Os are very different in this 500

400
300
workload between reads 200
100
and writes 0
12
16
20
24
28
32
64
> 64
I/Os Outstanding at Arrival time
PostgreSQL almost always
1200
issues 32 write I/Os Outstanding I/Os Histogram over Time
1000-1200
simultaneously 800-1000
600-800
1000
400-600 800
I/O rate from this workload 200-400

0-200
600
varies over time as much as Frequency
15% over a 2 min period 400
200
S16
S11
Time (in 6
S6 0
sec
> 64
32
24
intervals) S1
16
8
4
1
I/Os Outstanding
at Arrival time
Summary
On the aggregate the workload appears random

But 20% of I/Os are within 250KB and 33% are within 2.4MB!
I/O size is 8K for both reads and writes
Outstanding I/Os very different between reads and writes
PostgreSQL almost always issues 32 write I/Os simultaneously
I/O rate varies over time (up to 15%)
Don’t assume that every database workload behaves the
same; measure and determine for yourself
Windows File Copy 800
700
I/O Latency Histogram
Vista Enterprise
XP versus Vista 600

XP Pro
Frequency
500
400
300
XP issues 64KB I/Os 200

100
0
I/Os are largely sequential.
10
100
500
1000
5000
15000
30000
50000
100000
>100000
Latency (microseconds)
Vista is issuing very large 2000

Vista Enterprise
I/Os (1MB) 1800

1600
XP Pro
1400
Frequency
1200
Latency is higher 1000
800
600
400
Number of commands is lower 200
0
512
1024
2048
4095
4096
8191
8192
16383
16384
32768
49152
65535
65536
81920
131072
262144
524288
>524288
I/Os are very sequential Length (bytes)
Vista enables large I/Os to 1600

Vista Enterprise
be issued; file copy is just 1400

1200
XP Pro
Frequency
1000
an example 800
600
Keep an eye out for increasing 400

200
I/O sizes in future workloads 0

-500000
-50000
-5000
-500
-64
-16
-6
-2
16
64
500
5000
50000
500000
Distance (bytes)
Performance Overhead of Stats Collection
Used iometer to generate 4KB Sequential Reads

16 outstanding I/Os
Windows 2003 Enterprise Edition 64-bit VM
4KB is the most realistic worst-case scenario for overheads
Overhead is negligible (tested on internal build)
Online Histo Service Disabled Enabled

IOps 8187 8137
IOps Std. Dev. 6.5 200
MBps 35.1 34.8
CPU (out of 800) 106.0 108.0
CPU Std. Dev. 2.7 4.8
CPU Efficiency (UsedSec/IOps) 0.0417 0.0424
Latency (ms) 1.6 1.6
Table 2. Microbenchmark Performance

How to Run
Simple command-line interface; help is self-explanatory

$ /usr/lib/vmware/bin/vscsiStats -h
Use Excel to graph the output or inspect visually
Not detailed enough?
For analysis that cannot be done efficiently online, we provide a
virtual SCSI command tracing framework.
No actual customer data is included in the trace, rather only a list
of (command type, logical block address, time) tuples
$ vscsiStats -t …
Users can write scripts to post-process and analyze this trace
How to Run
Sample Output
$ /usr/lib/vmware/bin/vscsiStats -p iolength $ /usr/lib/vmware/bin/vscsiStats -p latency
Histogram: IO lengths of commands { Histogram: latency of IOs in Microseconds (us) {
min : 512 min : 191
max : 32768 max : 13391
mean: 11731 mean: 598
count : 241 count : 288
{ {
5 (<= 512) 0 (<= 1)
14 (<= 1024) 0 (<= 10)
5 (<= 2048) 0 (<= 100)
17 (<= 4095) 248 (<= 500)
76 (<= 4096) 28 (<= 1000)
1 (<= 8191) 4 (<= 5000)
20 (<= 8192) 8 (<= 15000)
18 (<= 16383) 0 (<= 30000)
36 (<= 16384) 0 (<= 50000)
49 (<= 32768) 0 (<= 100000)
0 (<= 49152) 0 (> 100000)
0 (<= 65535) }
0 (<= 65536) }
0 (<= 81920)
0 (<= 131072)
0 (<= 262144)
0 (<= 524288)
0 (> 524288)
}
}
Bin Ranges (Bucket Limits).

Think x-axis of histograms plots
Best Practices
When to use
When deploying a new disk performance sensitive workload
When optimizing an existing disk performance critical production VM
Which metrics to start with
Most important metrics to discuss with SAN Admin:
I/O Size
Read/Write Ratios
Outstanding I/Os
How to interpret
Pay attention to changes in distribution shape as well as magnitude
Corrective actions
Tune disk subsystem and remeasure; pay attention to latency histogram
Limitation: Histograms Are Per Virtual Disk
Strength: allows deep analysis of separate flavors of

workloads in each VM by splitting workloads by virtual disk
Place DB redo logs on a separate virtual disk than the DB tablespaces
Weakness: doesn’t give a complete picture of I/O going to a
VMFS LUN
Many VMs might be doing I/O from same ESX host
VMs from different ESX hosts might be doing I/O
In general, it is a hard problem to figure out
Rule of thumb: I/O to a LUN from different apps is effectively random
Still: storage arrays are rather smart to pull off individual sequential
streams and schedule I/O per stream
Interference with Multiple VMs
I/O Latency Histogram (Writes) Filecopy Solo
Analogous to multiple Filecopy (Iometer Interference)
hosts connected to the 700
600
same storage 500
Frequency
400
Two workloads: 300
200
Windows XP filecopy 100
0
Windows 2003 Iometer
10
100
500
1000
5000
15000
30000
50000
100000
100000
512K Random Writes
with 64 Outstanding IOs
Run Together,
Windows filecopy I/O Latency Histogram (Reads) Filecopy Solo
experienced a Filecopy (Iometer Interference)
800
slowdown 700
600
See the histogram shift to
Frequency
500
the right indicating 400
300
increasing disk I/O 200
latencies. 100
0
1
10
100
500
1000
5000
15000
30000
50000
100000
100000
Potential Future Work
If Enough Interest Exists
Further work in this area really depends on your

feedback
Some ideas worth talking about:
A graphical control and visualization UI
Recommendations for virtual disk placement
Binning virtual disks by workload similarity
User-specified histogram bin sizes
Provide feedback and file feature requests
Email: irfan@vmware.com
Stay tuned at author’s blog: http://virtualscoop.org
VROOM blog: http://blogs.vmware.com/performance
Questions? What’s Next?
Meet The Engineers Session

Area A @ Tues 6:00 pm - 7:30 pm
Read my academic paper
“Easy and Efficient Disk I/O Workload Characterization in
VMware ESX Server”
Submitted to IEEE International Symposium on Workload
Characterization (IISWC) 2007. To be presented: Sept 29, 2007
Try it out for yourself
For VMware ESX 3.0, vm-support collects much of this data. If
you send us the output we can parse it for you.
The interactive version of this tool (as shown today) is under
development. Not in any currently shipping products.

Vmworld Europe 2009 Vscsistats PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vmworld Europe 2009 Vscsistats PDF

Uploaded by

Copyright:

Available Formats

Session WV23

Fast and Easy Disk Workload

This session is useful to you if you:

Disk I/O characterization of We demonstrate our technique

System administrators interested in optimizing

Histograms of observed data values can be much more

The ESX disk I/O workload

Read/Write Distributions are I/O Size

To make the histograms practical, bin sizes are on

Machine Model HP DL 585 G2

CPU 8 cpus (4 socket, dual-core)

Hypervisor VMware ESX Server

Disk Subsystem • EMC Symmetrix 500GB RAID

Table 1. Machine/storage specifications

Filebench is a model-based workload generator for file

into 128K by I/O Length Histogram

Filebench OLTP 1200

sequentiality UFS 400

into a sequential 200

one by ZFS! ZFS 150

More details 100

Seek Distance Histogram (Writes) Seek Distance Histogram (Reads)

Seek Distance Histogram (Writes) Seek Distance Histogram (Reads)

Transformation from Random to Sequential: primarily for Writes

So, what have we learnt about Filebench OLTP?

Workload is primarily 250

random (big spikes towards 200

Still, many I/Os that are 0

within 5,000 sectors (33%) of 1800

the previous command 1600

exclusively 8K for both 400

The number of outstanding 700

I/Os are very different in this 500

issues 32 write I/Os Outstanding I/Os Histogram over Time

I/O rate from this workload 200-400

varies over time as much as Frequency

15% over a 2 min period 400

On the aggregate the workload appears random

XP versus Vista 600

XP issues 64KB I/Os 200

I/Os are largely sequential.

Vista is issuing very large 2000

I/Os (1MB) 1800

Vista enables large I/Os to 1600

be issued; file copy is just 1400

Keep an eye out for increasing 400

I/O sizes in future workloads 0

Used iometer to generate 4KB Sequential Reads

Online Histo Service Disabled Enabled

Table 2. Microbenchmark Performance

Simple command-line interface; help is self-explanatory

Bin Ranges (Bucket Limits).

Strength: allows deep analysis of separate flavors of

Further work in this area really depends on your

Meet The Engineers Session

You might also like