You are on page 1of 78

2022

Cloud Report

Authors:
Keith McClellan Yevgeniy Miretskiy
Lidor Carmel Stan Rosenberg
Charlie Custer Jane Xing
Cockroach Labs 1
Introduction 3

General methodology 4
Open source testing methodology and reproduction steps 5
Cloud instance naming conventions 5

Insights 6
AMD beats Intel on overall performance 7
All three clouds have price-competitive offerings 10
You get what you pay for, but not always what you don’t 11
Storage and transfer costs have an outsized impact
on total cost to operate 13
Small instances outperform large instances 15
vCPU to RAM ratio directly impacts performance consistency 16

Detailed benchmarks 19

OLTP benchmark 20
Methodology 21
AWS results 22
GCP results 29
Azure results 36
Storage benchmarks 57
CPU benchmark 43 Methodology 57
Methodology 43 AWS results 58
AWS results 44 GCP results 61
GCP results 45 Azure results 64
Azure results 46
Conclusion 67
Network benchmarks 47
About Cockroach Labs 68
Methodology 47
Ceative Commons 69
Latency 48
Throughput 50 Appendix i: Instance types tested 70
AWS results 51
Appendix ii: OLTP benchmark: small vs. large node investigation 74
GCP results 53
Azure results 55 Appendix iii: Cloud terminology 77

Cockroach Labs 2
Introduction
The 2022 Cloud Report evaluates the performance of Amazon Web Services (AWS), Microsoft Azure (Azure), and
Google Cloud Platform (GCP) for common OLTP workloads.

Using the product we know best – CockroachDB – we ran an industry-standard-based database benchmark
configured consistently across runs and instance types, as well as leading micro-benchmarks in the areas of
compute, networking, and storage, to compare the performance and value of 56 instance types and 107 discrete
configurations across the three clouds. In this report, we aim to provide an unbiased picture of the performance
users are paying for when they provision a specific configuration in one of the clouds.

2022 testing methodology

We tested multiple node sizes, performing an We expanded and improved our OLTP We enhanced our micro-benchmarking suites,
8 vCPU test and a roughly 32 vCPU test for each benchmarking this year, running over 3,000 particularly the networking benchmark, which
instance type. Because not all the large instances different OLTP iterations leveraging CockroachDB. now includes cross-region tests. CockroachDB is
are identical (large nodes varied from 30 to 36 We disabled wait times in the benchmark because a multi-region, multi-active database and we felt
vCPUs), we decided to focus on per-vCPU metrics it enabled us to narrow the variation in results that network performance, including intra-AZ and
for the majority of our analysis. This allows us to across runs. This change has allowed us to cross-region performance, was likely to have a
compare instances that are similar but not quite achieve a margin of error below 1.5%, but it does direct impact on customers’ experiences running
the same, and to compare different node sizes for mean that the results in this report aren’t directly applications like ours in production.
relative performance and cost effectiveness. comparable to results from previous years.

Cockroach Labs 3
General methodology
For this year’s Cloud Report, we focused on a question we get a lot when we are Since each of the cloud providers is now offering at least two performant persistent
working with CockroachDB users: is it better to use more, smaller nodes, or fewer storage options that meet CockroachDB’s requirements, we chose not to test large
large ones? This was our primary motivation for testing multiple node sizes and direct-attached volumes like we did in previous years. However, we enhanced the
focusing on per-vCPU metrics. storage benchmark to better reflect the way CockroachDB interacts with storage
to determine if there was a relative performance or latency impact to picking one
For instance types that present multiple CPU generations, we explicitly requested
storage type over another.
the newest available CPU platforms whenever possible. In cases where that was
not a configuration option, we ran some additional iterations so we had sufficient For the network benchmarks, we picked the Northern Virginia and Pacific Northwest
runs using the best available processors. regions for each cloud provider except where otherwise noted.

In a few cases, we ended up doing some additional testing outside our preferred All of our pricing calculations were based on the publicly available pricing for the
region to round out the comparison due to availability so as not to overly skew the Northern Virginia region for each of the cloud providers.
results. All instance types tested were in GA (general availability) and should be
All testing was conducted on instances running the latest LTS version of Ubuntu at
available for use by the general public as of the time of writing.
the time we started testing, which was 20.04 LTS (Focal Fossa).
We chose not to test ARM instance types this year as CockroachDB still does not
provide official binaries for that processor platform. Official support for ARM binaries
is slated for our Fall release (22.2), so we expect to return to testing this processor
platform next year.

For more details about the methodology used for each


benchmark, please refer to the “Methodology” section
that is included with each of four detailed benchmarks:

OLTP Methodology
CPU Methodology
Network Methodology
Storage Methodology

Cockroach Labs 4
Open source testing methodology Cloud instance naming conventions
and reproduction steps
AWS
In the 2022 Cloud Report, we benchmarked 56 instance types and Of all the three clouds, AWS has the most machine configurations. Instead
107 discrete configurations across the following broad axes: of specifying a flag during provisioning to get better network performance
or locally-attached disks, AWS users must choose a different machine type.
• OLTP Performance AWS provides a uniform pattern of naming across all instances.
• CPU Performance
a variants: denotes an AMD processor
• Network Performance
b variants: denotes instances optimized for EBS storage
• Storage I/O Performance
n variants: suggests better network performance

Machine selection was finalized on November 24, 2021 and tests were run Azure
over a variety of dates from November 24, 2021 through March 14, 2022. Similar to the other clouds, Azure has its own unique naming scheme. The
VM series is referenced by the capital letter preceding the vCPU count.
The full list of all instance types tested is available in Appendix I.
a suffix: denotes an AMD processor
Reproduction steps are fully open source, available in this repo: s suffix: designates the ability to attach a premium managed disk,
github.com/cockroachlabs/cloud-report such as premium-disk or ultra-disk
v4 suffix: previous-generation nodes, i.e. Intel Cascade Lake or
Additional benchmark-specific methodology and configuration details AMD Rome processors.
are available in the “Methodology” sections included with each of four v5 suffix: current-generation nodes, i.e. Intel Ice Lake or
detailed benchmarks. AMD Milan processors.

GCP
In contrast to AWS, GCP has very few named machines but they can be
configured appropriately to serve user needs. For example, GCP is the only
cloud that does not market a “storage-optimized machine,” but it is also
the only cloud that allows users to configure the number of local SSDs.

d suffix: denotes an AMD processor (for example, the n2d machine is the
AMD counterpart to the n2 machine)
highcpu/standard/highmem nomenclature: dictates the vCPU to memory
(RAM) ratio — 1:1, 1:4, 1:8 respectively.
For our purposes, custom refers to a vCPU to RAM ratio of 1:2

Cockroach Labs 5
2022 Cloud Report

Insights

Cockroach Labs 6
AMD beats Intel on
overall performance
This year’s report features brand-new instance types featuring both Intel Ice Lake processors and AMD Milan (3rd Generation EPYC) processors. While both performed well,
GCP’s t2d instance type (which uses AMD Milan processors) took the top overall spot for both large and small nodes in our OLTP testing.

Across both the OLTP and CPU benchmarks, AMD’s Milan processors outperformed Intel in many cases, and at worst statistically tied with Intel’s latest-gen Ice Lake processors.
In past years, we saw Intel lead the pack in overall performance, with AMD competing on price-for-performance metrics. This year, both the overall performance leader
and the price-for-performance leader were AMD-based instances.

69,117 Instance type


66,211
60,000

59,999
57,505
56,620 56,523 55,789 55,715 55,634 55,037
Among the large instance types, GCP’s t2d
instance was followed by its n2-standard
instance running Intel Ice Lake processors.
40,000
AWS’s large m6i instance, which also uses
Ice Lake processors, finished third, and
other AWS instances rounded out the top
ten. Two of the Azure AMD instance types
had individual runs that could have broken
n2-standard-32-icelake

into the top ten, but when looking at median


runs they were less performant than either
20,000
t2d-standard-32

AWS or GCP’s offerings.

m5n.8xlarge
m6i.8xlarge

c5n.9xlarge

r5b.8xlarge
m5.8xlarge

r5n.8xlarge
AMD Milan c5.9xlarge

r5.8xlarge
Intel Ice Lake

Intel Cascade Lake


Median TPM

Cockroach Labs 7
22,018
Among small instance types, the fastest
Instance type
run overall, the fastest median run, and the 20,000

fastest average run all belonged to GCP’s


20,108
t2d-standard-8, which uses AMD Milan 19,747
19,026
processors. Both AWS and GCP’s Ice Lake 18,799
based instances were also competitive. 18,217
17,781
Azure instance types were also more 17,334
17,072
competitive here, with several Ice Lake and 16,774
AMD Milan options cracking the top ten. 15,000

10,000

n2-standard-8-icelake

Standard_D8as_v5
Standard_E8as_v5
Standard_D8s_v5

Standard_E8s_v5
5,000
t2d-standard-8

m6i.2xlarge

c5n.2xlarge

r5n.2xlarge
AMD Milan

c5.2xlarge
Intel Ice Lake

Intel Cascade Lake


Median TPM

Cockroach Labs 8
The power of the new AMD Milan processors was even more apparent in the multi-core CPU microbenchmark. Instances running AMD Milan
outperformed instances running both Intel Ice Lake and Intel Cascade Lake processors. One big surprise here is that instances running Intel Ice
Lake processors performed worse than older Intel Cascade Lake processors across all three clouds. This is contrary to the results we saw in the
OLTP benchmark.

Note: Because of our machine selection and testing cutoff times, we were unable to test AWS’s m6a instances, which also run AMD’s Milan processors.
Based on the rest of our testing, we expect that the m6a instances could have outperformed m6i.

Median
CoreMark AMD Milan Intel Cascade Lake
score

Instance type
26,709
26,519

20,000

19,651 19,550 19,213 19,185 18,876 18,826 18,711 18,620


18,934 18,430 18,411 18,336 18,181
azure Standard_D32as_v5
azure Standard_E32as_v5
azure Standard_D8as_v5

azure Standard_D8as_v4
azure Standard_E8as_v5

azure Standard_E8as_v4
10,000

gcp n2d-highmem-32
gcp n2d-standard-32
gcp t2d-standard-32

gcp n2d-highmem-8

gcp n2d-highcpu-32

gcp n2d-standard-8
gcp t2d-standard-8

gcp n2d-highcpu-8

aws c5a.2xlarge

Cockroach Labs 9
All three clouds have
price-competitive offerings
Regardless of the cloud you choose, all three providers offer competitive price-for-performance solutions.1

Using our OLTP benchmark as the baseline, all three providers had at least one instance type and storage combination in the $0.04-$0.05 reserved $/TPM range (all prices are per
month, reserved assumes a one-year commitment). Even instance and storage combinations that are a bit more expensive than that are potentially very competitive depending
on the requirements of a specific workload. Note that the winning instance type, GCP’s t2d-standard-32, is not available in all GCP regions as of this writing.
1
All instance types were priced using Northern Virginia pricing for consistency (all three clouds have regions there) but some of our tests were run in other regions due to availability at the time of testing.

t2d-standard-32 $0.040 $0.051

n2d-highcpu-32 $0.041 $0.051

At a fixed workload complexity of 1,200 warehouses, larger nodes offered the best
n2-highcpu-32 $0.042 $0.052
price-for-performance, primarily because a greater proportion of the small nodes’ price
is storage. This is partially due to having the smaller nodes (relatively) overprovisioned
n2-standard-32 $0.044 $0.056 in comparison to the larger nodes – we feel this reflects likely real-world use cases, but it
does give the larger nodes a pricing advantage in this comparison.

Standard-D8as-v5 $0.046 $0.063 Unsurprisingly given the impact of storage costs on overall cost-to-operate (discussed
later in this report), all of the top ten instances here use the clouds’ more affordable

m6i.2xlarge $0.046 $0.060 general-purpose storage options.

Overall, pricing is very linear, with pricing scaling by the number of vCPUs, GB of RAM, or
the amount of disk allocated. In most cases you can expect to pay roughly the same
c5a.2xlarge $0.048 $0.064
amount for these resources across all three clouds, at least for all the configurations
we considered.
c5.2xlarge $0.049 $0.065
Note that the winning instance type, GCP’s t2d-standard-32, is not available in all GCP regions as of this writing.

c2-standard-30 $0.049 $0.067


AWS Reserved $/TPM (dark) AWS On-demand $/TPM (light)
n2-custom-32 $0.050 $0.065 GCP Reserved $/TPM (dark) GCP On-demand $/TPM (light)

Azure Reserved $/TPM (dark) Azure On-demand $/TPM (light)

$0.020 $0.040 $0.060

Cockroach Labs 10
You get what you pay for,
but not always what you don’t
In general, when we paid explicitly for a resource we got the promised performance out of that resource. For example, cross-region throughput was extremely
consistent for all three providers, as was storage performance.

48,956
48,951
50,000

Here, we’re looking at the median IOPS for small and large instance types, aggregated by cloud provider and
40,000
storage type, with a 50/50 split of read and write IOPS. For AWS instances with general-purpose gp3 volumes,
we are showing both base IOPS (advertised as 3,000 IOPS) and provisioned IOPS (8,000 for small instances,
16,000 for large instances).

For all of these storage options, we got at least the advertised performance.
30,000

19,554
19,547
20,000
12,168
12,162
8,506

8,497
8,507

8,512

8,158
8,154
8,158
8,154

10,000
7,292
7,297

6,532
6,531
4,254
4,253

3,859
3,857
2,560
2,556
1,589
1,594

1,593
1,592

ebs-gp3 ebs-gp3- ebs-io2 pss-pd pd-extreme premium- ultra-disk ebs-gp3 ebs-gp3- ebs-io2 pss-pd pd-extreme premium- ultra-disk
provisioned- disk provisioned- disk
iops iops

Small instance Large instance

Median write IOPS (dark) Median read IOPS (light)

Cockroach Labs 11
In cases where the cloud provider advertised a performance range (e.g. “up to 10 Gbps”) and did not explicitly charge
for a specific level of performance, the results varied between cloud providers and between runs.

This was best highlighted by the throughput tests in AWS. For intra-AZ data transfers (which are not priced explicitly),
we saw a wide range in performance for non-network optimized instance types, including some apparent throttling.

3/5/22 throughput test (AWS m5.2xlarge) 3/5/22 throughput test (AWS c5n.2xlarge)

25 G
24 G
23 G
22 G
21 G
20 G
In contrast, AWS’s network-optimized c5n.2xlarge instance
This demonstrates throughput dropping during our advertises network throughput of 25 Gbps, and during
19 G
18 G node-to-node testing of AWS’s m5.2xlarge instance, testing we saw almost exactly that.
17 G which is not network-optimized and advertises
The results of throughput tests across regions (where data
16 G “up to 10 Gbps” of throughput.
15 G transfers are priced explicitly) show very little difference
14 G between runs, and throughput remained close to the
13 G
advertised maximum network throughput performance for
12 G
the entire test duration.
11 G
10 G
9G
8G
7G
6G
5G
4G
3G
2G
1G
0
19:45:00 19:45:00 19:47:00 19:48:00 19:49:00 19:50:00 19:51:00 19:52:00 19:53:00 19:54:00 19:55:00 19:56:00 17:27:00 17:28:00 17:29:00 17:30:00 17:31:00 17:32:00 17:33:00 17:34:00 17:35:00 17:36:00 17:37:00 17:38:00 17:39:00

Cockroach Labs 12
Storage and transfer costs have an
outsized impact on total cost to operate
For even relatively small amounts of persistent block storage, the cost of running a particular workload is much more influenced by the cost of the storage than it is the cost
of the instance. For persistent workloads, it is extremely important to optimize cost calculations based on this consideration.

This means preferring mid-tier storage options (pd-ssd on GCP, gp3 on AWS, premium-disk on Azure) unless your workload requires an either extremely high number of IOPS
or very low storage latency. Among the instance types we tested this year, we did not find a single case where the most cost-effective choice was to use top-tier storage
(pd-extreme on GCP, io2 on AWS, ultra-disk on Azure) for any of the OLTP benchmarks.

This chart shows average storage costs as a percentage of the total


instance costs (including storage). For configurations using high-
performance storage, the storage cost accounted for the majority
of the total cost to operate.

While not accounted for in our $/TPM analysis, the biggest cost for this
testing was data transfer. None of the three cloud providers explicitly
charges for node-to-node data transfers within an availability zone, but
all of them charge $0.01-$0.02 per GB transferred across zones within a
region. Cross-region costs vary substantially depending on location.
Within the United States, all of the cloud providers charge between
$0.01-$0.02 per GB transferred between regions as well (as of this writing).

This means that in some cases, it may cost the same to run a multi-region
workload as it does to run a multi-AZ workload, while improving the
survivability of the application.

Cockroach Labs 13
Here, we see the average cost for a single run of the cross-region network throughput test for each instance type and the corresponding
throughput we saw. For our specific tests, AWS and Azure both charged $0.02/per GB and GCP charged $0.01/per GB. These tests ran for a
total of 12 minutes each. Given those conditions, GCP had an overall advantage in both performance and cost.

When building a highly resilient stateful application, the “hidden” costs of storage and data transfer can have a larger impact on total cost
than the price of the instances themselves. In future years, we hope to include more of these “hidden costs” in our price-for-performance
analysis as a part of the OLTP benchmark.

$343

300

$213 $210

200
$189 $193
$181

$168 $168 $170

$149 $150
$146 $142 $148
$140 $139 $134
$125 $122 $124
$119 $114 $113
$109 $111 $109 $110
$79 $105 $106 $106
$100
100 $94 $96
$83 $82 $82

$66 $66 $66 $68 $66 $68 $66 $68 $66 $67 $67 $67 $66 $66 $66 $68
$63 $64
Average
cost per
$32 run

24 24 25
19 21 20 20
17 16 15 15 15 15 15 17 16 15 15
12 12 13 10 10 10 9 13 13 9
5 6 5 5 5 5 5 4 5 5 5 5 5 5 5 5 5 5 5 4
8 7 8 7 6 7
2
6 Mean
throughput
(GBPS)
c5.2xlarge

c5.9xlarge

c5a.2xlarge

c5a.8xlarge

c5n.2xlarge

c5n.9xlarge

m5.2xlarge

m5.8xlarge

m5a.2xlarge

m5a.8xlarge

m5n.2xlarge

m5n.8xlarge

m6i.2xlarge

m6i.8xlarge

r5.2xlarge

r5.8xlarge

r5a.2xlarge

r5a.8xlarge

r5b.2xlarge

r5b.8xlarge

r5n.2xlarge

r5n.8xlarge

c2-standard-30

c2-standard-8

n2-custom-32-131072

n2-custom-8-32768

n2-highcpu-32

n2-highcpu-8

n2-highmem-32

n2-highmem-8

n2-standard-32

n2-standard-8

n2d-highcpu-32

n2d-highcpu-8

n2d-highmem-32

n2d-highmem-8

n2d-standard-32

n2d-standard-8

Standard_D32as_v4

Standard_D32as_v5

Standard_D32s_v4

Standard_D32s_v5

Standard_D8as_v4

Standard_D8as_v5

Standard_D8s_v4

Standard_D8s_v5

Standard_E32as_v4

Standard_E32as_v5

Standard_E32s_v4

Standard_E32s_v5

Standard_E8as_v4

Standard_E8as_v5

Standard_E8s_v4

Standard_E8s_v5

Standard_F32s_v2

Standard_F8s_v2
Cockroach Labs 14
Small instances outperform large instances
In both the OLTP and CPU benchmarks, we saw a per-vCPU performance advantage to running on smaller instances of a particular instance type, regardless of CPU platform,
cloud, or instance type. Depending on the instance type, these differences were anywhere from 4-5% on the low end and upwards of 30% on the high end.

AWS GCP Azure

t2d-standard-8 (AMD Milan) 906

Looking at the top ten instance types on a TPM per vCPU basis, small instance sizes
m6i.2xlarge (Ice Lake) 826 largely won in terms of both overall and average performance (regardless of storage
configuration). GCP’s t2d-standard-8 and AWS’s m6i.2xlarge led the pack, and overall
eight out of the top ten instances were the small 8 vCPU nodes.
n2-standard-8 (Ice Lake) 814
To account for some of these differences, we performed tests including:

• Short-Running Tests – In the short-running CPU benchmark, the smaller nodes had
t2d-standard-32 (AMD Milan) 785 a 3-4% per-vCPU speed advantage.

• Scaled-Up Tests – We did some scaled-out runs of our OLTP benchmark that
had an identical number of cores allocated to both large-node and small-node
Standard-E8s-v5 (Ice Lake) 783 database clusters to account for benchmark overhead. While this closed the gap
slightly, it didn’t account for the entire performance difference.

Standard-D8s-v5 (Ice Lake) 755 We did discover some factors that contributed to the overall finding, although none is
sufficient to explain the entire difference in performance for all instance types:

• Instances that have multiple NUMA nodes substantially under-perform instances


n2-standard-32 (Ice Lake) 754 running on a single NUMA node for many workloads. For some instance types, this
accounted for the entire performance difference between small and large instances.

• All instances have some form of variable clock-speed technology implemented.


Standard-E8as-v5 (AMD Milan) 748
Sample tests showed that smaller instances tended to run at slightly higher
average clock speeds than larger instances under sustained load, but this is not
enough to explain the performance difference.
Standard-D8as-v5 (AMD Milan) 747
This is a non-exhaustive list of things we looked at to account for the performance
differences between the instance sizes, and there were several additional dimensions
c5n.2xlarge (Cascade Lake) 712 we didn’t have the chance to fully explore. More details on our investigation into this
performance discrepancy are available in the Appendix II.
Median TPM per vCPU

250 500 750

Cockroach Labs 15
vCPU to RAM ratio directly impacts performance consistency
When looking across all of the different OLTP benchmark configurations we ran, all of number of vCPUs and the number of warehouses (this is why there are separate charts
the top ten instance types for both small and large instance types have a ratio of at for the small and large instances).
least 4 GB of RAM per vCPU.
Looking at TPM across a variety of workload complexities can tell us the suitability of a
In these charts, warehouses per vCPU serves as a proxy for workload complexity; particular instance type under different types of load, and can expose situations where
runs with higher Warehouses Per vCPU counts involve larger, more complicated the performance of an instance type degrades as workload complexity increases.
queries running in the background. That complexity scales non-linearly with the

c2-standard-8 c5.2xlarge c5n.2xlarge m6i.2xlarge n2-standard-8-icelake


TPM per vCPU

Standard-D8as-v5 Standard-D8s-v5 Standard-E8as-v5 Standard-E8s-v5 t2d-standard-8

Warehouses per vCPU (workload complexity)

AWS GCP Azure

Cockroach Labs 16
c2-standard-30 m6i.8xlarge n2-standard-32-icelake r5.8xlarge r5n.8xlarge
TPM per vCPU

Standard-E32s-v5 Standard-E32as-v5 t2d-standard-32 r5b.8xlarge m5n.8xlarge

Warehouses per vCPU (workload complexity)

AWS GCP Azure

Cockroach Labs 17
For example, when comparing otherwise-identical GCP instance types (highcpu with a 1:1 ratio, custom with a 2:1 ratio, standard with a 4:1 ratio, and highmem with an 8:1 ratio),
we can see that both n2-custom and n2-highcpu have inconsistent performance, whereas n2-standard and n2-highmem remained relatively steady as workload
complexity increased.

When comparing this to other testing results, we interpret this to mean that while you may save a bit of money by choosing instance types with a lower vCPU to RAM ratio,
you will likely see more consistent performance from instance types with more available memory, at least for the purposes of running CockroachDB.
The impact of this decision is more apparent with larger, more complicated workloads.

In our tests, we found the sweet spot to be a vCPU:RAM ratio of 1:4.

n2-highmem-32 n2-standard-32 n2-custom-32-65536 n2-highcpu-32


TPM per vCPU

Warehouses per vCPU (workload complexity)

Cockroach Labs 18
Detailed
benchmarks

Cockroach Labs 19
OLTP benchmarks
Methodology
This year, we scaled up the number and variations of our OLTP benchmark dramatically, and designed a
Cockroach Labs Derivative TPC-C nowait benchmark that allowed us to scale the benchmark with a fine
granularity using a fixed load multiplier per vCPU.

As the name suggests, our benchmark is based on TPC-C. TPC-C is a standards-based benchmark from
the Transaction Processing Performance Council that attempts to simulate a real-world logistics system,
creating and taking virtual customer orders from initial receipt through the manufacturing process and then
out for delivery. It imitates real-life user interactions with the system by limiting how fast warehouses move
orders through the fulfillment process.

As warehouses are added to the system, both query complexity and system load increase. However, this
means that the core metric of TPC-C (transactions per minute, i.e. the number of new orders that can be
processed per minute while the rest of the workload is running) is not directly comparable across runs with
different warehouse counts. Also, because of the simulated wait times, you need to be relatively close to
over-saturating the database before differences between different cloud configurations become apparent.

In an effort to compare across instance types and instance sizes fairly, we attempted to separate scaling
the number of transactions processed from the complexity of the workload by removing wait times from
this year’s testing. This allowed us to get a better comparative signal across instances all running the same
database. To do this, we moved to a new Cockroach Labs Derivative TPC-C nowait benchmark. While we think
that this has improved our ability to discern small performance differences between configurations, it does
mean that the numbers we’re reporting for this benchmark are not directly comparable to what we reported
in the 2021 Cloud Report.

Cockroach Labs 20
OLTP benchmark: configuration details Metrics
Our testing with the Cockroach Labs Derivative TPC-C nowait benchmark The core metrics measured by the Cockroach Labs Derivative TPC-C
used the following configuration parameters: nowait benchmark are new-order transactions per minute (TPM) and cost
per new-order transaction per minute ($/TPM). These are all calculated
• Wait times disabled (wait=0 in our test harness).
while running the other queries in the workload as well; they are not raw
• “Warehouse” per vCPU scaling (50, 75, 100, 125, 150 warehouses per vCPU). query metrics.

• Since query complexity scales with the warehouse count, we picked a


All prices were calculated as cost per month per month, with reserved
discrete warehouse count (1200 warehouses) for our direct performance
comparisons between large and small nodes. pricing assuming a one-year commitment.

• Set the “Active Workers” count to be equal to the “Warehouse” count.

• Set the number of active connections at the load generator as equal to


4x the number of vCPUs, as per CockroachDB’s production guidelines.

Most of our comparisons were made on a per-vCPU basis to avoid a


bias toward any instances with higher core counts. This includes pricing
comparisons as well as raw performance numbers. Since these are all
multi-server runs, we also collected information to determine whether or
not we got identical nodes (i.e. CPU info including the number of NUMA
nodes our vCPUs are running across).

All configurations (combinations of instance type, instance size, and storage


used) had a minimum of four runs executed for that configuration. In cases
where we had fewer than three valid runs (for example, if one of the instances
got a slightly different processor, or a different NUMA count than the others),
we attempted to run additional runs until we got a minimum of four wherever
possible. For the few cases where we couldn’t reliably get a consistent
configuration across all nodes, we may have chosen to report the other
runs so we could include that instance type in the final report.

All reported database tests were run on a cluster of three database nodes
and leveraged one additional load generator node, all identically provisioned
with CPU, network, and storage.

Cockroach Labs 21
AWS results

Median
TPM Among large instance types, m6i.8xlarge (Intel Ice Lake)
was the winner in terms of overall median TPM, but most
of the others performed comparably. Two instance
60,000
62,502

types, m5a and r5a, underperformed compared to


60,930

60,322

59,962

59,643
60,155

59,827

59,391
the rest of the pack – both use first-generation
AMD Naples processors.

The m6i instance was also the best among the small

50,371
nodes, but most of them were quite competitive and
many could be good options depending on the specifics
40,000
of your workload. Again, the m5a and r5a instances with
AMD Naples processors lagged behind.

31,308

28,047
20,000

19,815

17,077

16,550
16,858

16,259

16,165

14,968
16,064
16,586

10,538
10,872
m5a.8xlarge
m5n.8xlarge

m5a.2xlarge
m5n.2xlarge
m6i.8xlarge

c5a.8xlarge
c5n.9xlarge

m6i.2xlarge

c5a.2xlarge
c5n.2xlarge
r5b.8xlarge

r5a.8xlarge
m5.8xlarge

r5n.8xlarge

r5b.2xlarge

r5a.2xlarge
m5.2xlarge
r5n.2xlarge
c5.9xlarge

c5.2xlarge
r5.8xlarge

r5.2xlarge
Instance type

Cockroach Labs 22
AWS results

Median TPM
per vCPU When we compared all instance types on a per-vCPU basis in the 1,200 warehouse
test, the small m6i instance (m6i.2xlarge) was the clear winner, followed by a
statistical tie between two other small instance types, c5n.2xlarge and c5.2xlarge.

The best-performing large instance was m6i.8xlarge, but it finished ninth overall, behind

750
most of the small instances.
826

712

702

690
691

677

674

669

651

628

624
625
627

623

621

564
500

550

525

453

439

326

292
250

m5a.8xlarge
m5n.8xlarge

m5a.2xlarge
m5n.2xlarge

m6i.8xlarge

c5a.8xlarge
c5n.9xlarge
m6i.2xlarge

c5a.2xlarge
c5n.2xlarge

r5b.8xlarge

r5a.8xlarge
m5.8xlarge

r5n.8xlarge
r5b.2xlarge

r5a.2xlarge
m5.2xlarge
r5n.2xlarge

c5.9xlarge
c5.2xlarge

r5.8xlarge
r5.2xlarge

Instance type

Cockroach Labs 23
AWS results

When we looked at the 1,200 warehouse test for each instance with general-purpose and high-performance storage options, we saw that generally the performance difference
between high-performance io2 storage and general-purpose gp3 storage was negligible. The variation between them was within the margin of error for many instance types.
One notable exception was m6i.8xlarge, which was much more competitive with the smaller instances on a per-vCPU basis when using the high-performance io2 storage.

Median TPM
per vCPU
854
820

750

793
728

706
694
698
702
711

692

689

689
691

682
674

670
671

648
645

644
642
639

639
625
626

620

622

619
600

580
500

250

m6i.2xlarge c5n.2xlarge cn.2xlarge r5n.2xlarge m5n.2xlarge m5.2xlarge r5.2xlarge c5a.2xlarge m6i.8xlarge m5n.8xlarge r5.8xlarge m5.8xlarge r5b.2xlarge r5n.8xlarge r5b.8xlarge

Instance types and storage

High-performance io2 storage General-purpose gp3 storage

Cockroach Labs 24
AWS results

r5n.2xlarge
These charts compare the per-vCPU TPM of AWS’s small and large instance types over a range of workload complexities.
In both cases, the m6i instance types were the clear winners. Among the large instance types, second place was a
statistical tie between c5n.9xlarge and c5.9xlarge, but it should be noted that this ranking may be artificially skewed:
both have a mathematical advantage in that each has 36 cores and thus were run under slightly more complex workloads
at each increment.

c2.2xlarge c5a.2xlarge c5n.2xlarge m5.2xlarge m5a.2xlarge


TPM per vCPU

m5n.2xlarge m6i.2xlarge r5.2xlarge r5a.2xlarge r5b.2xlarge

Warehouses per vCPU (workload complexity)

Cockroach Labs 25
r5n.8xlarge

AWS results

c5.9xlarge c5a.8xlarge c5n.9xlarge m5.8xlarge m5a.8xlarge


TPM per vCPU

m5n.8xlarge m6i.8xlarge r5.8xlarge r5a.8xlarge r5b.8xlarge

Warehouses per vCPU (workload complexity)

Cockroach Labs 26
AWS results

m6i.2xlarge $0.047 $0.061

c5a.2xlarge $0.049 $0.064

Here, we have broken down the price-for-performance


c5.2xlarge $0.049 $0.065
($/TPM) of small instance types across all workload
complexities (warehouse counts). When comparing
m5.2xlarge $0.055 $0.075 reserved pricing, m6i.2xlarge was the least expensive,
followed by a tie for second place between c5a.2xlarge
and c5.2xlarge. The m6i and c5a nodes also led the pack
c5n.2xlarge $0.056 $0.077 in $/TPM using on-demand pricing.

m5n.2xlarge $0.063 $0.086


Reserved $/TPM (dark) On-demand $/TPM (light)

r5.2xlarge $0.065 $0.089

r5n.2xlarge $0.071 $0.100

m5a.2xlarge $0.077 $0.103

r5b.2xlarge $0.078 $0.110

r5a.2xlarge $0.093 $0.127

Average $/TPM $0.050 $0.100 $0.150

Cockroach Labs 27
AWS results

c5a.8xlarge $0.052 $0.072

c5.9xlarge $0.053 $0.074

m6i.8xlarge $0.053 $0.073 Among large nodes, c5a.8xlarge was the least
expensive in terms of reserved pricing, with a tie in
second place between c5.9xlarge and m6i.8xlarge.
m5.8xlarge $0.055 $0.078 The c5a, m6i, and c5 nodes also led the pack in terms
of on-demand pricing.

c5n.9xlarge $0.064 $0.093


Reserved $/TPM (dark) On-demand $/TPM (light)

m5n.8xlarge $0.065 $0.094

r5.8xlarge $0.066 $0.095

r5n.8xlarge $0.076 $0.111

r5b.8xlarge $0.078 $0.114

m5a.8xlarge $0.094 $0.131

r5a.8xlarge $0.121 $0.173

Average $/TPM $0.050 $0.100 $0.150

Cockroach Labs 28
GCP results

Understanding the results


In the results below, the *-custom-* instance types are built at a 1:2 vCPU to RAM ratio (i.e. one vCPU for every 2 GB of RAM). Azure and AWS both have standard
instance types that have that profile and we wanted to make sure we had an equal testing comparison for all CPU:Memory ratios of 1:1, 1:2, 1:4, and 1:8.
Median
75,356

TPM We tested a few GCP instance types with latest-generation CPUs in GCP’s central region, due to them being generally available but not yet available in the
Northern Virginia region when we conducted our testing. However, all pricing analysis is based on prices in the Northern Virginia region to ensure a fair comparison.
72,426

60,000
52,725

The best performer in terms of overall median TPM was


51,703

51,451

50,304 t2d-standard-32, which uses AMD Milan processors. In second

48,570

48,419

48,379
place was n2-standard-32, which uses Intel Ice Lake.

45,360
40,000
Among the small instance types, the t2d instance was again
the standout and n2-standard-8 was a close second.

20,000

21,748

19,531

16,624

16,465

15,908

15,684
15,790

15,267

14,923

14,026
t2d-standard-32

n2-standard-32-icelake

c2-standard-30

n2-highmem-32

n2d-highcpu-32

n2-standard-32

n2-custom-32-131072

n2d-highmem-32

n2d-standard-32

n2-highcpu-32

t2d-standard-8

n2-standard-8-icelake

n2-standard-8

c2-standard-8

n2-custom-8-32768

n2d-standard-8

n2-highmem-8

n2d-highcpu-8

n2d-highmem-8

n2-highcpu-8
Instance type

Cockroach Labs 29
30

473 n2-highcpu-32
On a per-vCPU basis, the t2d and n2-standard instances

Lake processors, respectively) were all at the top of the


heap, with the small 8 vCPU instance types ranked just

504 n2d-standard-32
(which use latest-generation AMD Milan and Intel Ice

504 n2d-highmem-32
506 n2-custom-32-131072
524 n2-standard-32
above the large ones.

536 n2d-highcpu-32
539 n2-highmem-32
584 n2-highcpu-8
586 c2-standard-30
622 n2d-highmem-8

Instance type
636 n2d-highcpu-8
654 n2-highmem-8
658 n2d-standard-8
663 n2-custom-8-32768
686 c2-standard-8
693 n2-standard-8
754 n2-standard-32-icelake
785 t2d-standard-32
814 n2-standard-8-icelake
906 t2d-standard-8

Cockroach Labs
GCP results

750

500

250
Median TPM
per vCPU
GCP results

Breaking out the per-vCPU results by storage type, we saw that even on the latest-generation processors, there seemed to be little OLTP performance difference between
pd-ssd and pd-extreme storage. Our best run overall on GCP was actually using the general-purpose pd-ssd storage.

Median TPM
per vCPU
913
906

822

750
798

797
781

757

723
702

680
679

679
679
681

673

673

655
645
643

636
641
629

620
608

612
598
579

566
570

556
500

539

528
530
534

514
518

504
506

476

377
250
t2d-standard-8

n2-standard-8-icelake

t2d-standard-32

n2-standard-32-icelake

n2-standard-8

c2-standard-8

n2d-standard-8

n2-highmem-8

n2-custom-8-16384

n2d-highcpu-8

n2d-highmem-8

c2-standard-30

n2-highcpu-8

n2-custom-32-65536

n2-highmem-32

n2d-highcpu-32

n2-standard-32

n2d-highmem-32

n2d-standard-32

n2-highcpu-32
Instance types and storage

High-performance pd-extreme storage General-purpose pd-ssd storage

Cockroach Labs 31
GCP results

These charts compare the per-vCPU TPM of GCP’s small and large instance types over a range of workload complexities. In both cases, we saw the t2d and n2-icelake
instances come out on top. Particularly among large nodes, a vCPU:RAM ratio of 1:4 appeared to be the sweet spot for achieving consistent performance.

c2-standard-8 n2-custom-8-16384 n2-highcpu-8 n2-highmem-8 n2-standard-8


TPM per vCPU

n2-standard-8-icelake n2d-highcpu-8 n2d-highmem-8 n2d-standard-8 t2d-standard-8

Warehouses per vCPU (workload complexity)

Cockroach Labs 32
GCP results

n2-custom-32-65536 n2-highcpu-32 n2-highmem-32 n2-standard-32 n2-standard-32-icelake


TPM per vCPU

n2d-highcpu-32 n2d-highmem-32 n2d-standard-32 t2d-standard-32

Warehouses per vCPU (workload complexity)

Cockroach Labs 33
GCP results

t2d-standard-8 $0.056 $0.065

n2-standard-8-icelake $0.058 $0.068


When looking at price-for-performance ($/TPM)
among the small instance types, the t2d and
n2d-highcpu-8 $0.062 $0.070 n2-icelake instances, with latest-generation
processors and vCPU:RAM ratios of 1:4, offered
the best value in terms of both reserved and
n2d-standard-8 $0.065 $0.075 on-demand pricing.

n2-custom-8-16384 $0.065 $0.077 Reserved $/TPM (dark) On-demand $/TPM (light)

n2-standard-8 $0.070 $0.082

c2-standard-8 $0.070 $0.087

n2-highcpu-8 $0.072 $0.083

n2d-highmem-8 $0.077 $0.092

n2-highmem-8 $0.084 $0.100

Average $/TPM $0.050 $0.100 $0.150

Cockroach Labs 34
GCP results

t2d-standard-32 $0.044 $0.056

n2-standard-32-icelake $0.045 $0.057


Price-for-performance among the large GCP nodes
looked similar to the small nodes: faster processors with
n2-highcpu-32 $0.048 $0.060
a vCPU:RAM ratio of 1:4 were the most cost-effective.

n2d-highcpu-32 $0.049 $0.061


Reserved $/TPM (dark) On-demand $/TPM (light)

n2-custom-32-65536 $0.055 $0.072

n2d-standard-32 $0.058 $0.074

n2d-standard-32 $0.060 $0.076

n2d-highmem-32 $0.072 $0.092

n2-highmem-32 $0.076 $0.098

Average $/TPM $0.050 $0.100 $0.150

Cockroach Labs 35
36

13,699 Standard_F8s_v2
Large v5 instances sporting AMD Milan (as_v5 variants)

up Azure’s top four in terms of overall median TPM, with


AMD instances grabbing the top two spots. However,
among small instances, the Intel Ice Lake machines
and Intel Ice Lake (s_v5 variants) processors made

14,867 Standard_E8as_v4
outperformed the AMD Milan instances.

14,958 Standard_D8as_v4
15,228 Standard_E8s_v4
15,492 Standard_D8s_v4
17,925 Standard_D8as_v5
17,963 Standard_E8as_v5
18,129 Standard_D8s_v5

Instance type
18,794 Standard_E8s_v5
37,517 Standard_D32as_v4
41,608 Standard_E32as_v4
44,570 Standard_E32s_v4
45,200 Standard_D32s_v4
45,839 Standard_F32s_v2
47,662 Standard_E32s_v5
48,974 Standard_D32s_v5
50,136 Standard_D32as_v5
51,577 Standard_E32as_v5

Cockroach Labs
Azure results

Median TPM

60,000

40,000

20,000
Azure results

When comparing median TPM per vCPU, the small Intel Ice Lake
instances were best, followed closely by the small AMD Milan
783
Median TPM instances. Small nodes with previous-generation processors
755

749

747
per vCPU
(both AMD and Intel) outperformed the larger latest-generation
instances by a substantial margin, although it is worth noting
that we did see a large amount of run-to-run variability in the

646
600
results of the large latest-generation AMD nodes.

635

623

620

571

537

522

510

497

478

464
471
400

433

391
200

Standard_E32s_v5

Standard_F32s_v2

Standard_D32s_v4
Standard_E8s_v5

Standard_D8s_v5

Standard_E8as_v5

Standard_D8as_v5

Standard_D8s_v4

Standard_E8s_v4

Standard_D8as_v4

Standard_E8as_v4

Standard_F8s_v2

Standard_E32as_v5

Standard_D32as_v5

Standard_D32s_v5

Standard_E32s_v4

Standard_E32as_v4

Standard_D32as_v4
Instance type

Cockroach Labs 37
Azure results

Among smaller instance types, the general-purpose premium-disk storage performed as well if not better than ultra-disk in many scenarios.
For larger instance types, it may be worth considering ultra-disk depending on your use case.

Median TPM
per vCPU

Instance types and storage

750
807
783

755
755

724

707

681

664
677

672
659

631
613
604
611
582

500

564
571

527

527
510

497

478
478

471

448

433

426
407

359
250
Standard_E8s_v5

Standard_D8s_v5

Standard_E8as_v5

Standard_D8as_v5

Standard_D8s_v4

Standard_E8s_v4

Standard_D8as_v4

Standard_E8as_v4

Standard_F8s_v2

Standard_E32as_v5

Standard_D32as_v5

Standard_D32s_v5

Standard_E32s_v5

Standard_F32s_v2

Standard_D32s_v4

Standard_E32s_v4

Standard_E32as_v4

Standard_D32as_v4
High-performance ultra-disk storage General-purpose premium storage

Cockroach Labs 38
Azure results

These charts compare the per-vCPU TPM of Azure’s small and large instance types over a range of workload complexities. In both cases, the latest-generation v5
instances won out, although among small instance types it was Intel-based instances that took the top spot, whereas among large instance types, the v5 AMD
instances performed best. Surprisingly, some of the large v4 Intel Cascade Lake instances were very competitive with the v5 Intel Ice Lake instances.

Standard_D8as_v4 Standard_D8as_v5 Standard_D8s_v4 Standard_D8s_v5 Standard_E8as_v4


TPM per vCPU

Standard_E8as_v5 Standard_E8s_v4 Standard_E8s_v5 Standard_F8s_v2

Warehouses per vCPU (workload complexity)

Cockroach Labs 39
Azure results

Standard_D32as_v4 Standard_D32as_v5 Standard_D32s_v4 Standard_D32s_v5 Standard_E32as_v4


TPM per vCPU

Standard_E32as_v5 Standard_E32s_v4 Standard_E32s_v5 Standard_F32s_v2

Warehouses per vCPU (workload complexity)

Cockroach Labs 40
Azure results

Standard_D8as_v5 $0.048 $0.067


Among the small node sizes, a v5 AMD instance with a
vCPU:RAM ratio of 1:4 was the price-for-performance
Standard_D8s_v5 $0.051 $0.073
leader, followed by a v5 Intel variant with the same
vCPU:RAM ratio.
Standard_E8as_v5 $0.059 $0.084
Reserved $/TPM (dark) On-demand $/TPM (light)

Standard_E8s_v5 $0.059 $0.086

Standard_E8s_v5 $0.065 $0.092

Standard_F8s_v2 $0.065 $0.092

Standard_D8s_v4 $0.066 $0.094

Standard_E8as_v4 $0.074 $0.108

Standard_E8s_v4 $0.076 $0.111

Average $/TPM $0.050 $0.100 $0.150

Cockroach Labs 41
Azure results

Among large instance types, a v5 AMD instance with a 1:4


vCPU:RAM ratio again topped the heap for both reserved
and on-demand pricing, and was again followed by a v5
Intel variant with the same vCPU:RAM ratio.
Reserved $/TPM (dark) On-demand $/TPM (light)
The F-series instance type running an older Intel Cascade
Lake processor with a 1:2 vCPU:RAM ratio was a surprise
performer among large instance types, and nearly tied
Standard_D32as_v5 $0.057 $0.085
with the newer Intel Ice Lake node. This is likely because the
F-series instance type was heavily discounted.
Standard_D32s_v5 $0.066 $0.101

Standard_F32s_v2 $0.068 $0.104

Standard_D32s_v4 $0.074 $0.114

Standard_E32as_v5 $0.074 $0.115

Standard_E32s_v5 $0.077 $0.120

Standard_D32as_v4 $0.088 $0.135

Standard_E32s_v4 $0.090 $0.140

Standard_E32as_v4 $0.104 $0.162

Average $/TPM $0.050 $0.100 $0.150

Cockroach Labs 42
CPU benchmarks
Methodology
We evaluated each cloud’s CPU performance using the CoreMark version 1.0 benchmark.

The CoreMark benchmark can be limited to run on a single vCPU, or it can execute workloads on multiple vCPUs in
parallel. We ran CoreMark in both modes ten times and recorded the average number of iterations/second for both
the single-core and multi-core results.

Here, we have reported only multi-core results, as we expect the multi-core benchmark to be more representative
of real-world performance. In an effort to compare multi-core performance across different instance sizes, we
derived a per-vCPU score for each multi-core run. While this does not correlate to the single-core score for a variety
of reasons, it allows us to compare multi-core runs where we don’t have the same vCPU counts.

About CoreMark
CoreMark is an open source, cloud-agnostic benchmark that evaluates general CPU performance. It executes
various workloads that are representative of the types of operations often found in real world applications, including
list management, list sort and search, matrix manipulation, and checksum computation. The benchmark then
reports the number of operations performed, and produces a single-number score (the higher the score, the better
the performance).

Cockroach Labs 43
AWS results

c5a.2xlarge (AMD Rome) 18,620

c5a.8xlarge (AMD Rome) 18,002

c5.2xlarge (Intel Cascade Lake or Skylake) 18,176

c5.9xlarge (Intel Cascade Lake or Skylake) 17,367

c5n.9xlarge (Intel Skylake) 17,400

c5n.2xlarge (Intel Skylake) 17,276

m6i.2xlarge (Intel Ice Lake) 17,019


Compute-optimized AWS instance types led the pack here –
m6i.8xlarge (Intel Ice Lake) unsurprising, given that they run at a higher clock speed. As 16,955

r5b.8xlarge (Intel Cascade Lake or Skylake) mentioned in the Insights section, the new Ice Lake (m6i) 16,097
instances underperformed on this benchmark. We’re not sure
r5b.2xlarge (Intel Cascade Lake or Skylake) 15,960
why; this may reflect a benchmark abnormality rather than
r5.8xlarge (Intel Cascade Lake or Skylake) a real performance difference. 16,159

r5.2xlarge (Intel Cascade Lake or Skylake) 15,714

m5n.2xlarge (Intel Cascade Lake or Skylake) 15,788

m5n.8xlarge (Intel Cascade Lake or Skylake) 15,763

m5.2xlarge (Intel Cascade Lake or Skylake) 15,744

m5.8xlarge (Intel Cascade Lake or Skylake) 15,710

r5n.8xlarge (Intel Cascade Lake) 15,767

r5n.2xlarge (Intel Cascade Lake) 15,636

m5a.2xlarge (AMD Naples) 13,036

m5a.8xlarge (AMD Naples) 12,662

r5a.2xlarge (AMD Naples) 13,030

r5a.8xlarge (AMD Naples) 12,557

Average CoreMark score 5,000 10,000 15,000 20,000

Cockroach Labs 44
GCP results

t2d-standard-8 (AMD Milan) 26,973

t2d-standard-32 (AMD Milan) 26,475

n2d-highcpu-8 (AMD Rome or AMD Milan) 19,120

n2d-highcpu-32 (AMD Rome or AMD Milan) 18,907

n2d-standard-8 (AMD Rome or AMD Milan) 18,792

n2d-standard-32 (AMD Rome or AMD Milan) 18,478

n2d-highmem-8 (AMD Rome or AMD Milan) 18,663

n2d-highmem-32 (AMD Rome or AMD Milan) 18,291

c2-standard-8 (Intel Cascade Lake) 18,010

c2-standard-30 (Intel Cascade Lake) 17,917

n2-highcpu-8 (Intel Cascade Lake) 16,232

n2-highcpu-32 (Intel Cascade Lake) 16,147

n2-custom-8-16384 (Intel Cascade Lake) 16,201

n2-custom-32-65536 (Intel Cascade Lake) 15,764

n2-highmem-8 (Intel Cascade Lake) 16,227 The latest generation AMD instances (t2d) excelled in this
benchmark on a per-vCPU basis and overall.
n2-highmem-32 (Intel Cascade Lake) 15,417

n2-standard-8 (Intel Cascade Lake or Intel Ice Lake) 16,183

n2-standard-32 (Intel Cascade Lake or Intel Ice Lake) 15,317

Average CoreMark score 10,000 20,000 30,000

Cockroach Labs 45
Azure results

Standard_E8as_v5 (AMD Milan) 19,651

Standard_E32as_v5 (AMD Milan) 19,213

Standard_D8as_v5 (AMD Milan) 19,550

Standard_D32as_v5 (AMD Milan) 19,185

Standard_E8as_v4 (AMD Rome) 18,411

Standard_E32as_v4 (AMD Rome) 17,122

Standard_D8as_v4 (AMD Rome) Overall, AMD Milan instances led the pack for Azure, followed 18,181
by most of the Intel Cascade Lake instances. This is in line with
Standard_D32as_v4 (AMD Rome) 17,168
what we saw with the other cloud providers, although given
Standard_D8s_v4 (Intel Cascade Lake) their dominance on GCP, we were expecting to see the 17,386
AMD Milan instances perform even better here. Azure AMD v5
Standard_D32s_v4 (Intel Cascade Lake) 17,274
instances run a different AMD processor model than the
Standard_E8s_v5 (Intel Ice Lake) Google nodes running AMD Milan processors, which could 17,005

Standard_E32s_v5 (Intel Ice Lake) explain the difference. 16,961

Standard_D8s_v5 (Intel Ice Lake) 17,032

Standard_D32s_v5 (Intel Ice Lake) 16,910

Standard_F8s_v2 (Intel Cascade Lake or Skylake) 17,399

Standard_F32s_v2 (Intel Cascade Lake or Skylake) 16,215

Standard_E32s_v4 (Intel Cascade Lake or Skylake) 17,340

Standard_E8s_v4 (Intel Cascade Lake or Skylake) 14,107

Average CoreMark score 5000 10,000 15,000 20,000

Cockroach Labs 46
Network benchmarks
Methodology
For this benchmark, we focused on node to node (single availability zone) • Incremented the stream count stepwise until the aggregate throughput
and cross-region latency and throughput, since CockroachDB leverages these converged (i.e. stopped increasing) to determine the number of streams
capabilities heavily when running in a production environment. for the highest available throughput of a particular configuration.

The network benchmark consists of two separate measurements: latency and • Increased the duration of the throughput test from 60 seconds to 12 minutes.
throughput, implemented in netperf 2.7.1.
• Ran three sequential runs for each iteration and only collected the results
In this year’s report, we have made numerous improvements2 to this benchmark from the median run in the throughput test to try to avoid skew and bursting
to suit the usage characteristics of CockroachDB more closely. These tests were errors associated with netperf 2.7.1.
run intra-AZ as well as inter-region, leveraging each cloud provider’s Northern
• Extended the output of the throughput test to the min/max/avg throughput
Virginia region as the client node and providers’ Pacific Northwest US region as
during the test duration, and also investigated time series plots to illustrate
the server node.
throughput fluctuation.
Specifically, this year we made the following changes:
To assist with measuring cross-region latency, we used publicly-available
• Implemented a multi-stream method for the throughput test, with a fixed
distances from Dulles, Virginia to Portland, Oregon and Seattle, Washington
effective TCP socket buffer size of 16MB. This size is consistent with the
and entered that into https://wintelguy.com/wanlat.html to calculate the
maximum flow control window size allowed in gRPC, which is the primary
theoretical latency floor for our cross-region network tests. We did not change
node-to-node communication protocol of CockroachDB.
any defaults in the calculator.

In the 2021 Cloud Report, we ran vanilla Netperf TCP_RR and TCP_STREAM tests between two nodes of the same machine type from the same availability zone of the same cloud provider, each test for 60 seconds.
2

We extended the runtime and tuned some parameters to ensure more consistent results in this year’s testing.

Cockroach Labs 47
Latency
There was no significant difference in latency for any of the instance types we tested across all three
clouds. Since the variation between runs was greater than the gross latency of our best available runs,
we see this as a statistical three-way tie.

Microseconds (µs) 279


Maximum
of mean
300 latency
In our intra-AZ latency testing, the highest

217 overall latency was 279 microseconds


Maximum (µs), or 0.279 milliseconds, across all of the
of mean
latency clouds, and the average latency was 70 µs.
The lowest overall latency for each cloud
was between 27 and 29 µs. This was without
200 optimizing our nodes into placement groups,
which could have pushed the average
numbers even lower.
28
60 Minimum It’s also important to note that since we
of min
Maximum latency
did not set up placement groups to control
31 of mean
node placement within an AZ, the variation
latency
Minimum
of min 29 in performance between nodes could be
100
latency Minimum a measure of physical distance between
of min
latency the instances rather than the provisioned
infrastructure.

Ultimately, outside of a true high-performance


60 49 101 computing use-case, we don’t expect that
Median of mean Median of mean Median of mean intra-AZ latency is likely to be an issue on
latency latency latency
any of the clouds.

Cockroach Labs 48
In our cross-region testing, we measured latency between each provider’s Northern Virginia and Pacific northwest regions. Using a WAN Latency Estimator, we calculated the best
possible latency between Northern Virginia and Portland, Oregon (GCP) at 54 milliseconds (ms), and between Northern Virginia and Seattle, Washington (Azure and AWS) at 53 ms.

Overall, the minimum and average latencies were pretty tightly clustered. Despite being at a slight distance disadvantage, GCP performed best overall with a ~56 ms average latency.
Azure had an average latency of ~62 ms, while AWS had an average latency of ~67 ms. (These results hold even when we remove outliers).

69,129
Microseconds (µs) Maximum
68,123 of mean
latency
80000 59,847 Maximum
of mean
58,348
Maximum 55,819 latency Minimum
of mean of min
54,809 latency Minimum
of min
latency
Minimum latency
of min
latency

60000

40000

20000

56,025 62,608 67,239


Median of mean Median
Median of
of mean
mean Median of mean
latency latency
latency latency

Cockroach Labs 49
Throughput
Unlike the latency numbers, which were extremely consistent across all the different instance types we tested, throughput varied
quite a bit by machine type, so we will break it down on a per-cloud basis.

For the throughput tests, we present both the median result of the mean throughput observed per instance type, and the median
result of the max throughput observed per instance type to demonstrate the extent to which throughput varied during runs.

Cockroach Labs 50
AWS results

c5n.9xlarge 36.70 37.08

m5n.8xlarge 18.35 18.52

r5n.8xlarge 18.12 18.51

c5n.2xlarge 14.79 14.95


Median of mean throughput Median of max throughput
r5n.2xlarge 11.67 14.90

m5n.2xlarge 10.50 14.88

m6i.8xlarge 9.06 9.31

c5.9xlarge 8.83 8.97

m5.8xlarge 7.34 7.48


Unsurprisingly, network-optimized instances (c5n, m5n, r5n) had the best
m5a.8xlarge 7.30 7.44
intra-AZ throughput on AWS. Other instances had roughly expected results,
r5b.8xlarge 7.12 7.21 capping out around 10 Gbps.

r5.8xlarge 7.12 7.23 One thing to note is that many of the instances encountered apparent
c5a.8xlarge 7.08 7.22 throttling at some point during the test, which is why some AWS throughput
medians are lower than the other clouds. We have included details relating
r5a.8xlarge 6.85 7.42
to this in the Insights section.
m6i.2xlarge 6.13 8.66
Because of the apparent throttling, most AWS instance types we tested
r5b.2xlarge 5.87 7.28
will not sustain their advertised maximum network throughput. This leads
m5a.2xlarge 5.82 7.19 us to believe that if you’re planning on using AWS for a project requiring
c5.2xlarge 5.72 7.24 a lot of high-throughput data transfers, you should lean towards the
network-optimized instance types.
r5a.2xlarge 5.56 7.42

r5.2xlarge 5.44 7.47

c5a.2xlarge 5.23 7.26

m5.2xlarge 5.14 7.20

0 Throughput in Gbps 10.00 20.00 30.00

Cockroach Labs 51
AWS results

c5n.9xlarge 23.82 24.29

r5n.8xlarge 11.95 12.17

m5n.8xlarge 11.59 12.15

c5.9xlarge 5.74 5.93

m6i.8xlarge 5.54 6.12


Median of mean throughput Median of max throughput
c5n.2xlarge 4.77 4.91

m5n.2xlarge 4.73 4.94

m5.8xlarge 4.73 4.89

r5n.2xlarge 4.70 4.89

r5b.8xlarge 4.70 4.83


AWS’s cross-region median and max throughput speeds were pretty
r5b.2xlarge 4.67 4.94
consistent, with larger, network-optimized instance types having the
r5a.2xlarge 4.66 4.89
highest overall throughput. We did not see the same throttling here that
r5.2xlarge 4.66 4.90 we saw in the intra-AZ comparison.

r5.8xlarge 4.63 4.89

c5a.8xlarge 4.63 4.89

m5a.8xlarge 4.62 4.88

c5.2xlarge 4.61 4.90

r5a.8xlarge 4.58 4.82

m5a.2xlarge 4.58 4.81

m6i.2xlarge 4.57 4.90

c5a.2xlarge 4.53 4.84

m5.2xlarge 4.52 4.74

0 Throughput in Gbps 10.00 20.00 30.00

Cockroach Labs 52
GCP results

All of GCP’s average throughput numbers were roughly in line with their advertised throughput caps. However, many of
the instance types showed performance bursts that significantly exceeded the throughput numbers GCP advertises.

c2-standard-30 31.20 32.68

n2-highmem-32 29.28 32.58

n2-custom-32-65536 28.69 32.70

n2d-highcpu-32 28.04 32.14

n2d-standard-32 26.85 32.17

n2-highcpu-32 26.39 32.03

n2-standard-32 26.12 31.08

n2d-highmem-32 21.88 28.68

c2-standard-8 16.74 17.07

n2-standard-8 15.82 15.92

n2-highcpu-8 15.80 15.88

n2d-highmem-8 15.78 16.04

n2-custom-8-16384 15.75 15.93

n2d-standard-8 15.72 15.96

n2-highmem-8 15.67 15.90


Median of mean throughput Median of max throughput
n2d-highcpu-8 15.54 15.89

0 Throughput in Gbps 10.00 20.00 30.00

Cockroach Labs 53
GCP results

GCP’s cross-region throughput was extremely consistent with the


intra-AZ testing, with no evidence of mid-run throttling.

n2-highmem-32 27.15 32.49

n2-custom-32-65536 24.65 32.55

c2-standard-30 23.98 27.67

n2d-highcpu-32 22.11 31.25

n2d-standard-32 21.22 31.64

n2-highcpu-32 20.76 27.04

n2-standard-32 19.97 28.06

n2d-highmem-32 18.71 25.66

c2-standard-8 16.46 17.80

n2d-highmem-8 15.75 16.37

n2-highcpu-8 15.37 16.64

n2-standard-8 15.34 16.04

n2d-highcpu-8 15.25 16.24

n2-highmem-8 15.21 15.88

n2d-standard-8 12.10 15.99

n2-custom-8-16384 10.82 16.02 Median of mean throughput Median of max throughput

Throughput in Gbps 10.00 20.00 30.00

Cockroach Labs 54
Azure results

All of the Azure instance types performed as expected, and some of the new v5 instances had single-run bursts way
above their advertised throughput speeds.

Standard_E32as_v5 15.22 15.38

Standard_E32as_v4 15.21 15.43

Standard_D32as_v4 15.21 15.38

Standard_E32s_v4 15.17 15.30

Standard_D32s_v5 15.12 17.14

Standard_D32as_v5 14.83 15.31

Standard_E32s_v5 13.89 16.37

Standard_D8as_v5 11.90 12.05

Standard_D8s_v5 11.88 15.22

Standard_D8s_v4 11.87 11.97

Standard_F8s_v2 11.67 11.89

Standard_E8as_v5 11.48 12.02

Standard_E8s_v5 10.76 12.72

Standard_D32s_v4 10.58 11.45

Standard_E8s_v4 8.98 10.30

Standard_F32s_v2 8.69 9.90

Standard_E8as_v4 7.61 7.75

Standard_D8as_v4 7.61 7.73 Median of mean throughput Median of max throughput

0 Throughput in Gbps 5.00 10.00 15.00

Cockroach Labs 55
Azure results

Azure’s cross-region performance was less consistent than the other providers, but it still roughly hit its
promised performance in most cases. There were also some single-run bursts that scored well above
advertised throughput speeds.

Standard_E32as_v4 14.97 15.59

Standard_D32as_v4 14.89 15.61

Standard_E32as_v5 14.34 15.51

Standard_E32s_v5 12.53 16.58

Standard_D8s_v4 11.73 12.61

Standard_D8as_v5 11.54 12.23

Standard_D32s_v5 10.71 14.63

Standard_E8as_v5 10.63 12.05

Standard_D32as_v5 10.10 15.45

Standard_D8s_v5 9.64 17.40

Standard_D8as_v4 7.61 8.11

Standard_E8as_v4 7.47 7.85

Standard_E8s_v5 6.17 10.13

Standard_E32s_v4 3.87 6.21

Standard_D32s_v4 3.64 6.19

Standard_E8s_v4 2.92 8.62

Standard_F8s_v2 2.66 4.10

Standard_F32s_v2 2.17 3.92 Median of mean throughput Median of max throughput

0 Throughput in Gbps 5.00 10.00 15.00

Cockroach Labs 56
Storage benchmarks
Methodology
We tested each cloud’s storage I/O performance using FIO (Flexible I/O For the smaller instance types we provisioned 1.0 TB volumes, and for the larger

tester) version 3.1, an open-source benchmarking suite designed for instances we provisioned 2.5 TB volumes. This allowed us sufficient space to

measuring storage I/O performance. FIO allowed us to isolate storage run identical tests across all platforms, while also accommodating storage

performance characteristics by running workloads against the storage platforms where the provisioned IOPS scale per GiB of storage. CockroachDB

volume itself, as opposed to a file system. requires a minimum of 500 IOPS/vCPU for general usage, and can sometimes
leverage more than that depending on the workload and specific processor.
Overall, we assessed each cloud’s storage I/O performance on durable We wanted to be certain that each instance type had sufficient IOPS to not be
network-attached storage by analyzing the following metrics: I/O bound.

• Input/output Operations Per Second (IOPS): number of operations Unlike in past years, we did not test local SSD storage options, as each cloud
per second. provider now offers multiple performant persistent-disk options, and for a
database we have a strong bias towards durability. Our usage of network-
• Throughput (also known as bandwidth): the quantity of data sent
attached storage volumes in CockroachDB Dedicated and Serverless served
and received over a time period.
as our primary motivation for making them the focus of this report.
• Fsync I/O latency: since CockroachDB is a durable application that
We have also added some additional testing to better simulate the way
executes fsync at the end of each write transaction, we added some
CockroachDB interacts with storage – largely, by working with a variety of
tests allowing us to measure the write latency impact of fsyncs.
block sizes and adding some tests where fsyncs are called to determine the
We looked at general-purpose volumes (gp3 for AWS, premium-disk for performance impact of an fsync on writes and overall IOPS.
Azure, and pd-ssd for GCP) as well as high-performance volumes (io2 for
AWS, ultra-disk for Azure, and pd-extreme for GCP).

Cockroach Labs 57
Understanding the results
AWS results Note: General-purpose gp3 storage volumes have 8,000 provisioned IOPS for small instance types and 16,000
provisioned IOPS for large instance types. The high-performance io2 volumes do not have provisioned IOPS.

16,002
m5a.8xlarge
16,036

16,003
c5.9xlarge
16,003

16,001
r5b.8xlarge
16,002

16,001
m6i.8xlarge 16,001

16,003
r5.8xlarge 15,186 Looking at write IOPS with a 4k block size and an fsync every
16,002 512 KB demonstrated how fsyncs might impact overall write
c5n.9xlarge 15,071
performance. Overall, we got what was provisioned, but we
16,002
r5n.8xlarge 14,455 did see some slowdowns on some of the larger nodes using

r5a.8xlarge
16,001 the general-purpose storage.
14,048
Instance types and storage

16,002
m5.8xlarge 13,805 High-performance io2 volumes General-purpose gp3 volumes

16,002
m5n.8xlarge 13,625

13,300
c5a.8xlarge 12,543

16,002
c5n.2xlarge 8,025

15,993
m6i.2xlarge 8,000

16,001
r5a.2xlarge 8,000

16,003
r5.2xlarge 8,000

16,002
m5.2xlarge 8,000

16,002
c5.2xlarge 8,000

16,002
m5n.2xlarge 8,000

15,999
m5a.2xlarge 8,000

16,001
r5b.2xlarge 8,000

13,300
c5a.2xlarge 8,000

16,000
r5n.2xlarge 7,941

Average write IOPS


5,000 10,000 15,000

Cockroach Labs 58
AWS results

17,340
c5n.9xlarge
9,986

17,448
r5.8xlarge
9,963

17,443
r5n.8xlarge
9,963
We tested read and write IOPS with a dynamic block size
17,442
m5n.8xlarge 9,962 based on the Cockroach Labs Derivative TPC-C nowait
17,342 workload to demonstrate the optimal read and write I/O
c5.9xlarge 9,960
performance we think we could see in the OLTP benchmark.
13,300
c5a.8xlarge 9,960

17,454
m5a.8xlarge 9,954 High-performance io2 volumes General-purpose gp3 volumes

17,446
m5.8xlarge 9,952
Instance types and storage

17,339
m6i.8xlarge 9,948

17,457
r5a.8xlarge 9,946

17,349
r5b.8xlarge 9,946

17,445
c5n.2xlarge 8,728

17,449
m5n.2xlarge 8,728

17,664
m5.2xlarge 8,725

16,003
m5a.2xlarge 8,000

17,447
r5.2xlarge 8,722

17,454
c5.2xlarge 8,721

17,539
r5n.2xlarge 8,721

16,002
r5a.2xlarge 8,720

13,299
c5a.2xlarge 8,718

17,337
r5b.2xlarge 8,670

17,336
m6i.2xlarge 8,668

Average read IOPS


5,000 10,000 15,000

Cockroach Labs 59
AWS results

16,375
r5a.8xlarge
16,420

16,375
r5a.8xlarge
16,420

16,377
m5n.8xlarge
16,416

16,376
r5n.8xlarge 16,415 High-performance io2 volumes General-purpose gp3 volumes

16,377
r5.8xlarge 16,409

16,380
m5a.8xlarge 16,398

16,310
c5n.9xlarge 16,337

16,309
m6i.8xlarge 16,327
Instance types and storage

16,303
r5b.8xlarge 16,317

16,308
c5.9xlarge 16,316

13,300
c5a.8xlarge 13,300

16,372
c5n.2xlarge 8,197

16,373
m5.2xlarge 8,193

16,395
r5n.2xlarge 8,192

15,999
m5a.2xlarge 8,191

16,372
r5.2xlarge 8,190

13,300
c5a.2xlarge 8,187

16,375
c5.2xlarge 8,185

16,001
r5a.2xlarge 8,185

16,378
m5n.2xlarge 8,184

16,303
m6i.2xlarge 8,153

16,299
r5b.2xlarge 8,153

Average write IOPS


5,000 10,000 15,000

Cockroach Labs 60
Understanding the results
GCP scales the available IOPS by the size of the instance.
GCP results

n2-highmem-32 47,691
50,487

n2-standard-32 47,611
48,953

n2-highcpu-32 44,627
46,663

38,538 Here, we see write IOPS with a 4k block size and an fsync
n2-custom-32-65536
37,611
every 512 KB, demonstrating how fsyncs might impact
30,551
n2d-highcpu-32
35,936 overall write performance. The pd-extreme storage

33,559 outperformed pd-ssd in general, but we hit instance


n2d-standard-32
33,496
caps at the top end for all the volumes (64+ vCPUs are
Instance types and storage

34,443
n2d-highmem-32
30,794
required to get optimal performance from GCP volumes),

25,001
and there were even a few cases where the cheaper
n2-standard-8
15,001
pd-ssd volumes outperformed the pd-extreme volumes.
25,001
n2-custom-8-16384
15,001
High-performance pd-extreme volumes
25,001
n2-highmem-8 15,000 General-purpose pd-ssd volumes

25,001
n2-highcpu-8 15,000

25,000
n2d-standard-8 15,000

25,000
n2d-highcpu-8 14,999

24,999
n2d-highmem-8
14,999

c2-standard-30 8,000
8,000

4,000
c2-standard-8 3,999

Average write IOPS


20,000 40,000 60,000

Cockroach Labs 61
GCP results

n2-custom-32-65536 98,334
58,982

n2-standard-32 96,190
57,698

n2-highcpu-32 96,189
57,697

96,190 We tested read and write IOPS with a dynamic block


n2-highmem-32
57,693
size based on the Cockroach Labs Derivative TPC-C
90,557
n2d-highcpu-32
57,691 nowait workload to demonstrate the optimal read and

96,435 write I/O performance we think we could see in the OLTP


n2d-highmem-32
57,690
benchmark. The pd-extreme volumes outperformed
Instance types and storage

92,289
n2d-standard-32
57,690
pd-ssd across the board.

24,567
n2-custom-8-16384
14,734 High-performance pd-extreme volumes

24,028 General-purpose pd-ssd volumes


n2d-highcpu-8
14,414

24,030
n2-highmem-8 14,411

24,029
n2-highcpu-8 14,410

24,030
n2-standard-8 14,409

24,028
n2d-standard-8 14,409

24,029
n2d-highmem-8
14,408

c2-standard-30 14,406
14,407

3,830
c2-standard-8 3,829

Average read IOPS


20,000 40,000 60,000 80,000 100,000

Cockroach Labs 62
GCP results

n2-highmem-32 117,314
58,525

n2-standard-32 117,268
58,525

n2-custom-32-65536 97,593
58,523

117,351
n2-highcpu-32 High-performance pd-extreme volumes
58,522
General-purpose pd-ssd volumes
98,042
n2d-highmem-32
58,517

95,335
n2d-standard-32
58,517
Instance types and storage

97,930
n2d-highcpu-32
58,515

24666
n2-highmem-8
14795

24667
n2-custom-8-16384
14795

24,665
n2d-highcpu-8 14,794

24,664
n2d-standard-8 14,794

24,665
n2-standard-8 14,794

24,665
n2-highcpu-8 14,794

24,665
n2d-highmem-8
14,793

c2-standard-30 7,887
7,888

3,941
c2-standard-8 3,941

Average write IOPS


20,000 40,000 60,000 80,000 100,000 120,000

Cockroach Labs 63
Understanding the results
There are a couple of instance types missing ultra-disk runs. This is because we were unable
Azure results
to get ultra-disk quota for those instance types when we were testing.

Standard_F32s_v2 16,321
7,651

Standard_E32as_v5 16,320
7,650

Standard_D32as_v5 16,320
7,649
Here we see write IOPS with a 4k block size and an
Standard_D32s_v5
7,635
fsync every 512 KB, which we think demonstrates
16,320
Standard_D32as_v4
7,601 how fsyncs might impact overall write performance.

16,320 Ultra-disk outperformed premium-disk across


Standard_E32as_v4
7,587
the board.
Standard_E32s_v5
7,537
Instance types and storage

High-performance volumes General-purpose volumes


16,320
Standard_D32s_v4
7,506

16,320
Standard_E32s_v4
7,337

16,322
Standard_E8as_v4 5,101

16,322
Standard_E8s_v4 5,100

16,320
Standard_D8s_v4 5,100

Standard_D8s_v5 5,100

16,319
Standard_E8as_v5
5,099

Standard_D8as_v5 16,320
5,099

Standard_E8s_v5 5,091

Standard_D8as_v4 16,286
5,090

16,320
Standard_F8s_v2 4,889
Average write IOPS
5,000 10,000 15,000

Cockroach Labs 64
Azure results

Standard_D32as_v4 16,312
7,790

Standard_D32as_v5 16,313
7,800

Standard_D32s_v4 16,313
7,767
We tested read and write IOPS with a dynamic block
Standard_D32s_v5
7,805
size based on the Cockroach Labs Derivative TPC-C
16,303
Standard_D8as_v4
5,147 nowait workload to demonstrate the optimal read

16,309 and write I/O performance we think we could see


Standard_D8as_v5
5,242
in the OLTP benchmark. Again, ultra-disk volumes
16,307
Standard_D8s_v4
5,151
outperformed premium-disk across the board.
Instance types and storage

Standard_D8s_v5
5,269
High-performance volumes General-purpose volumes
16,312
Standard_E32as_v4
7,779

16,313
Standard_E32as_v5 7,810

16,313
Standard_E32s_v4 7,803

Standard_E32s_v5 7,808

16,304
Standard_E8as_v4 5,146

16,310
Standard_E8as_v5
5,252

Standard_E8s_v4 16,305
5,151

Standard_E8s_v5 5,225

Standard_F32s_v2 16,313
7,795

16,303
Standard_F8s_v2 5,111
Average read IOPS
5,000 10,000 15,000

Cockroach Labs 65
Azure results

Standard_D32as_v5 16,316
7,647

Standard_D32s_v4 16,316
7,647

Standard_E32as_v5 16,316
7,646

Standard_D32s_v5 High-performance volumes General-purpose volumes


7,646

Standard_E32s_v5
7,645

16,316
Standard_F32s_v2
7,645

16,316
Standard_E32as_v4
7,610
Instance types and storage

16,316
Standard_E32s_v4
7,483

16,316
Standard_D32as_v4
7,446

16,319
Standard_F8s_v2 5,108

16,316
Standard_D8as_v5 5,106

Standard_D8s_v5 5,097

16,320
Standard_E8s_v4 5,096

16,316
Standard_E8as_v5
5,096

Standard_E8as_v4 16,317
5,096

16,311
Standard_D8as_v4 5,096

Standard_E8s_v5 5,094

16,315
Standard_D8s_v4 5,079

5,000 10,000 15,000

Cockroach Labs 66
This year, we are not declaring
an overall winner of the Cloud Report.
Our testing shows that all three cloud providers offer price-competitive options.

All three clouds can also offer top tier performance for OLTP applications. In terms of
per-vCPU TPM, GCP took the top spot, but all three clouds had at least one instance
type in the top five, and multiple instance types in the top ten.

However, our testing did reveal important considerations for customers assessing
their cloud and instance type options, including:

• It is generally not worth paying for high-performance storage unless your workload
specifically requires it.

• Storage and transfer represent a large portion of the total cost of operating a
database in the cloud, and in some cases we found that cross-region transfers
cost the same as intra-region transfers – getting the availability, latency, and
survivability advantages of a multi-region database doesn’t always come at
a higher cost than single-region.

• For consistency across a variety of workload complexities, we recommend instance


types with at least 4 GB of RAM per vCPU.

• Small instance types may outperform larger ones for OLTP applications.

Cockroach Labs 67
About Cockroach Labs
Cockroach Labs is the creator of CockroachDB, the most highly evolved, cloud-native, distributed SQL database on the planet.
Helping companies of all sizes — and the apps they develop — to scale fast, survive failures, and thrive everywhere. CockroachDB
is in use at some of the world’s largest enterprises across all industries, including some of the most recognized companies in banking,
media & entertainment, retail, and technology. Headquartered in New York City, Cockroach Labs is backed by Altimeter, Benchmark,
Greenoaks, GV, Firstmark, Index Ventures, Lone Pine, Redpoint Ventures, Sequoia Capital, Tiger Global, and Workbench.

CockroachDB is in use at some of the world’s largest enterprises across all industries, including Bose, Comcast, Hard Rock Digital,
and some of the largest companies in banking, retail, and media. Headquartered in New York City with offices in Toronto and San
Francisco, Cockroach Labs is backed by Greenoaks, Altimeter, BOND, Benchmark, Coatue, FirstMark, GV, Firstmark, Index Ventures,
J.P. Morgan, Lone Pine Capital, Redpoint Ventures, Sequoia Capital, Tiger Capital, and Workbench.

SERVERLESS,
01

02

03

CODE MORE.
04

05

06

07
Never worry about your database again.
08

09

10

11

12

13

14

15

16
Cockroach
17 Labs 68
This project is open-source and you are welcome to run the benchmarks yourself.

Reproduction steps are fully open source, available in this repo:


https://github.com/cockroachlabs/cloud-report

You may just end up saving money, increasing performance, and learning something
new. If you have any questions about the report, we encourage you to get in touch with
us via our public Slack channel or learn more via our YouTube channel.

Cockroach Labs 69
Appendix

Cockroach Labs 70
Appendix I: Instance types tested
For instance types that show multiple processor generations (in some tests, we received machines with different processors
on different runs), the latest-generation CPU’s info is provided.

AWS

vCPU RAM
Machine Type Storage Types Tested Processor Type Processor, Speed
Count (GB)

c5.2xlarge 8 16 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8275CL 3.00GHz(3.90GHz)

c5.9xlarge 36 72 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8275CL 3.00GHz(3.90GHz)

c5a.2xlarge 8 16 ebs-gp3, ebs-io2 AMD Rome (EPYC Gen 2) AMD EPYC 7R32 2.80Ghz(3.20GHz)

c5a.8xlarge 32 64 ebs-gp3, ebs-io2 AMD Rome (EPYC Gen 2) AMD EPYC 7R32 2.80Ghz(3.20GHz)

c5n.2xlarge 8 16 ebs-gp3, ebs-io2 Intel Skylake Intel Xeon Platinum 8124M 3.00GHz(3.50GHz)

c5n.9xlarge 36 72 ebs-gp3, ebs-io2 Intel Skylake Intel Xeon Platinum 8124M 3.00GHz(3.50GHz)

m5.2xlarge 8 32 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz)

m5.8xlarge 32 128 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz)

m5a.2xlarge 8 32 ebs-gp3, ebs-io2 AMD Naples (EPYC Gen 1) AMD EPYC 7571 2.20GHz(3.00GHz)

m5a.8xlarge 32 128 ebs-gp3 AMD Naples (EPYC Gen 1) AMD EPYC 7571 2.20GHz(3.00GHz)

m5n.2xlarge 8 32 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz

m5n.8xlarge 32 128 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz

m6i.2xlarge 8 32 ebs-gp3, ebs-io2 Intel Ice Lake Intel Xeon Platinum 8375C 2.90GHz(3.50GHz)

m6i.8xlarge 32 128 ebs-gp3, ebs-io2 Intel Ice Lake Intel Xeon Platinum 8375C 2.90GHz(3.50GHz)

r5.2xlarge 8 64 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz)

r5.8xlarge 32 256 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz)

r5a.2xlarge 8 64 ebs-gp3, ebs-io2 AMD Naples (EPYC Gen 1) AMD EPYC 7571 2.20GHz(3.00GHz)

r5a.8xlarge 32 256 ebs-gp3, ebs-io2 AMD Naples (EPYC Gen 1) AMD EPYC 7571 2.20GHz(3.00GHz)

r5b.2xlarge 8 64 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz)

r5b.8xlarge 32 256 ebs-gp3, ebs-io2 Intel Cascade Lake or Skylake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz)

r5n.2xlarge 8 64 ebs-gp3, ebs-io2 Intel Cascade Lake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz)

r5n.8xlarge 32 256 ebs-gp3, ebs-io2 Intel Cascade Lake Intel Xeon Platinum 8259CL 2.50GHz(3.50GHz)

Cockroach Labs 71
Appendix I: Instance types tested
GCP

vCPU RAM
Machine Type Storage Types Tested Processor Type Processor, Speed
Count (GB)

c2-standard-8 8 32 pd-ssd, pd-extreme Intel Cascade Lake Intel Xeon 3.10GHz(3.80/3.90GHz)

c2-standard-30 30 120 pd-ssd, pd-extreme Intel Cascade Lake Intel Xeon 3.10GHz(3.80/3.90GHz)

n2-custom-32-131072 32 64 pd-ssd, pd-extreme Intel Cascade Lake Intel Xeon 2.80GHz(3.40/3.90GHz)

n2-custom-8-32768 8 16 pd-ssd, pd-extreme Intel Cascade Lake Intel Xeon 2.80GHz(3.40/3.90GHz)

n2-highcpu-32 32 32 pd-ssd, pd-extreme Intel Cascade Lake Intel Xeon 2.80GHz(3.40/3.90GHz)

n2-highcpu-8 8 8 pd-ssd, pd-extreme Intel Cascade Lake Intel Xeon 2.80GHz(3.40/3.90GHz)

n2-highmem-32 32 256 pd-ssd, pd-extreme Intel Cascade Lake Intel Xeon 2.80GHz(3.40/3.90GHz)

n2-highmem-8 8 64 pd-ssd, pd-extreme Intel Cascade Lake Intel Xeon 2.80GHz(3.40/3.90GHz)

n2-standard-32 Intel Cascade Lake or


32 128 pd-ssd, pd-extreme Intel Xeon 2.60GHz(3.10/3.40GHz)
(-icelake Intel Ice Lake) Intel Ice Lake

n2-standard-8 Intel Cascade Lake or


8 32 pd-ssd, pd-extreme Intel Xeon 2.60GHz(3.10/3.40GHz)
(-icelake Intel Ice Lake) Intel Ice Lake

n2d-highcpu-32 32 32 pd-ssd, pd-extreme AMD Rome (EPYC Gen 2) or AMD EPYC 7B13 2.45GHz(3.5GHz)
AMD Milan (EPYC Gen 3)

n2d-highcpu-8 8 8 pd-ssd, pd-extreme AMD Rome (EPYC Gen 2) or AMD EPYC 7B13 2.45GHz(3.5GHz)
AMD Milan (EPYC Gen 3)

n2d-highmem-32 32 256 pd-ssd, pd-extreme AMD Rome (EPYC Gen 2) or AMD EPYC 7B13 2.45GHz(3.5GHz)
AMD Milan (EPYC Gen 3)

n2d-highmem-8 8 64 pd-ssd, pd-extreme AMD Rome (EPYC Gen 2) or AMD EPYC 7B13 2.45GHz(3.5GHz)
AMD Milan (EPYC Gen 3)

n2d-standard-32 32 128 pd-ssd, pd-extreme AMD Rome (EPYC Gen 2) or AMD EPYC 7B13 2.45GHz(3.5GHz)
AMD Milan (EPYC Gen 3)

n2d-standard-8 8 32 pd-ssd, pd-extreme AMD Rome (EPYC Gen 2) or AMD EPYC 7B13 2.45GHz(3.5GHz)
AMD Milan (EPYC Gen 3)

t2d-standard-8 8 32 pd-ssd, pd-extreme AMD Milan (EPYC Gen 3) AMD EPYC 7B13 2.45GHz(3.5GHz)

t2d-standard-32 32 128 pd-ssd, pd-extreme AMD Milan (EPYC Gen 3) AMD EPYC 7B13 2.45GHz(3.5GHz)

Cockroach Labs 72
Appendix I: Instance types tested
Azure

vCPU RAM
Machine Type Storage Types Tested Processor Type Processor, Speed
Count (GB)

Standard_D32as_v4 32 128 premium-disk, ultra-disk AMD Rome (EPYC Gen 2) AMD EPYC 7452 2.35GHz(3.35GHz)

Standard_D32as_v5 32 128 premium-disk, ultra-disk AMD Milan (EPYC Gen 3) AMD EPYC 7763 2.45GHz(3.5GHz)

Standard_D32s_v4 32 128 premium-disk, ultra-disk Intel Cascade Lake Intel Xeon Platinum 8272CL 2.60GHz(3.40/3.70GHz)

Standard_D32s_v5 32 128 premium-disk Intel Ice Lake Intel Xeon Platinum 8370C 2.80GHz(3.5GHz)

Standard_D8as_v4 8 32 premium-disk, ultra-disk AMD Rome (EPYC Gen 2) AMD EPYC 7452 2.35GHz(3.35GHz)

Standard_D8as_v5 8 32 premium-disk, ultra-disk AMD Milan (EPYC Gen 3) AMD EPYC 7763 2.45GHz(3.5GHz)

Standard_D8s_v4 8 32 premium-disk, ultra-disk Intel Cascade Lake Intel Xeon Platinum 8272CL 2.60GHz(3.40/3.70GHz)

Standard_D8s_v5 8 32 premium-disk Intel Ice Lake Intel Xeon Platinum 8370C 2.80GHz(3.5GHz)

Standard_E32as_v4 32 256 premium-disk, ultra-disk AMD Rome (EPYC Gen 2) AMD EPYC 7452 2.35GHz(3.35GHz)

Standard_E32as_v5 32 256 premium-disk, ultra-disk AMD Milan (EPYC Gen 3) AMD EPYC 7763 2.45GHz(3.5GHz)

Standard_E32s_v4 32 256 premium-disk, ultra-disk Intel Cascade Lake Intel Xeon Platinum 8272CL 2.60GHz(3.40/3.70GHz)

Standard_E32s_v5 32 256 premium-disk Intel Ice Lake Intel Xeon Platinum 8370C 2.80GHz(3.5GHz)

Standard_E8as_v4 8 64 premium-disk, ultra-disk AMD Rome (EPYC Gen 2) AMD EPYC 7452 2.35GHz(3.35GHz)

Standard_E8as_v5 8 64 premium-disk, ultra-disk AMD Milan (EPYC Gen 3) AMD EPYC 7763 2.45GHz(3.5GHz)

Standard_E8s_v4 8 64 premium-disk, ultra-disk Intel Cascade Lake Intel Xeon Platinum 8272CL 2.60GHz(3.40/3.70GHz)

Standard_E8s_v5 8 64 premium-disk Intel Ice Lake Intel Xeon Platinum 8370C 2.80GHz(3.5GHz)

Standard_F32s_v2 32 64 premium-disk, ultra-disk Intel Cascade Lake or Skylake Intel Xeon Platinum 8272CL 2.60GHz(3.40/3.70GHz)

Standard_F8s_v2 8 16 premium-disk, ultra-disk Intel Cascade Lake or Skylake Intel Xeon Platinum 8272CL 2.60GHz(3.40/3.70GHz)

Cockroach Labs 73
Appendix II: OLTP benchmark -
small vs. large node investigation
To investigate the performance discrepancy between smaller and larger nodes, we Next, we analyzed the detailed CPU info we collected, using lscpu and cpufetch to
performed experiments including: determine the NUMA node count for each run. The NUMA node count describes the
number of physical processors a particular virtual machine is running across, and
• Looking at specific CPU/Instance information to control for bias.
running across multiple physical processors has a material impact on the likelihood
• Sample runs looking at processor frequency (Turbo boost). of a processor cache miss (among other things). We trusted that the hypervisors were
reporting that information honestly, but that may not be the case.
• Reconsidering some of the parameters of our scaled-up test.

Overall, the gap for most AMD-based processors closed almost immediately when we
• Designing a test that had the same vCPU count between the large and small
controlled for NUMA nodes – in other words, when we only considered runs where each
nodes (sixteen 8 vCPU nodes compared against four 32 vCPU nodes at the
instance showed all vCPUs running across a single NUMA node. When we did this, the
same warehouse count).
performance gap dropped from 22% to 1%, which is smaller than our margin of error.
We started by considering whether or not it was appropriate to use the same
While there was still some variability in runs, the best AMD runs on the big instances
benchmark harness configuration for both the smaller and larger instance shapes.
showed roughly the same per-vCPU performance as the smaller instances after con-
Our initial hypothesis was that we were artificially limiting the performance of the
trolling for NUMA nodes.
larger nodes. We did several tests changing individual load-generator parameters,
and the only one that had a material impact on the performance of the database For Intel-based processors the results were less dramatic, with the gap narrowing from

was changing the overall size of the connection pool at the load generator. 16% to 12%. This led us to follow-up experiments to explain the remaining gap between
the small and large instance sizes for Intel-based instance types.
While our production guidance states that we recommend four active connections per
vCPU, we found that we could run a slightly higher number of active connections per We looked into variable CPU frequency technologies (Turbo Boost for Intel, Turbo Core

vCPU (either 5 or 6) in certain cases without pushing the database cluster into a poten- for AMD). Our first hypothesis was the difference between how variable processor

tially unstable state, and we saw a corresponding slight increase in overall throughput. speeds work in Intel and AMD based processors could explain why we were still seeing

However, some instance types performed worse than with the original parameters, and the difference in large/small node performance for Intel-based instance types even

both the small and large instance sizes for a particular instance type seemed to benefit after controlling for NUMA nodes.

from tuning the size of the connection pool. We determined that this was thus probably
not the cause of the small/large instance type performance discrepancy.

Cockroach Labs 74
AMD’s variable frequency technology (Turbo Core) is focused on multi-core perfor- To confirm these results, and to make an overall recommendation between fewer
mance, allowing all cores on a socket to scale their frequencies concurrently as large instances and more small instances, we ran a handful of tests with identical
long as the thermal load of the processor does not get too high. Intel’s variable core counts. We did this by comparing 16 small nodes to 4 large nodes for a handful
frequency technology (Turbo Boost) is more focused on single core performance, of instance types and measuring overall performance, with the fastest overall Intel
allowing a limited number of cores per processor to burst to much higher frequencies instance types running Intel Ice Lake processors and fastest GCP and Azure instances
at the cost of other cores not running at quite as high a frequency. Our supposition running AMD Milan (EPYC Gen 3) processors. We also ran a single Cascade Lake scale-up
was that smaller Intel instances were getting an outsized benefit from Intel’s Turbo test for comparison purposes.
Boost, which could explain the remaining performance differences.
All tests were run using the highest-tier storage available for that instance type, using
However, when we collected the average per-core frequency every 30 seconds on the same parameters as the flat 1,200 warehouse tests from our wide-scale analysis,
both small and large instance sizes for a select number of instance types, we did not with the exception of Azure’s standard_D*s_v5. We were not able to get ultra-disks
see a substantial difference between the big and small samples. The average frequency for that configuration, so the 32-core machine was likely I/O bound – in retrospect, we
per core for the larger instances was 47 MHz lower than the average frequency for the should have provisioned 8 TB volumes to accommodate this. Unfortunately, we didn’t
smaller instances. This works out to an advantage of roughly 1.5% to the smaller nodes have time for multiple runs, so this was a single-run benchmark.
– not enough to explain the entire performance discrepancy between large and small
The small instances all outperformed the large instances even after we scaled up to
nodes on Intel machines.
run the exact same number of cores per cluster. In some cases, the performance
Pulling in data from the CPU micro-benchmark, we can account for another 2-3% discrepancy was greater than 30%. Oddly, the difference between the small and large
performance difference between the small and large instance types - the smaller instance sizes increases as the performance of the CPU architecture increases.
nodes pretty consistently beat the larger nodes on a per vCPU basis by that percentage,
with only a couple of exceptions.

Cockroach Labs 75
1000

904
879

849

816
750
793

790

733
653
640

634
599
500

478
Standard-D32as-v5
Standard-D8as-v5

Standard-D32s-v4

Standard-D32s-v5
Standard-D8s-v4

Standard-D8s-v5

t2d-standard-32
n2-standard-32

t2d-standard-8
n2-standard-8
250
m6i-8xlarge
m6i-2xlarge

TPM per vCPU

AWS m6i Azure D*as Azure D*s Azure D*s GCP n2 GCP t2d
Ice Lake Epyc Cascade Lake Ice Lake Ice Lake Epyc

Instance type Small nodes (8 vCPU) Large nodes (32 vCPU)

Ultimately, we’re still not certain of the cause of the discrepancy. It could be because of host oversubscription (we are pushing the database close to max load here), some
limitation of the configuration we chose (perhaps we needed to have separate Linux kernel parameters for the small and large instance sizes), some limitation of the database
parameters (perhaps we needed to set different database configuration defaults for large nodes than we did for small nodes), or other scalability issues in Go or CockroachDB.

Cockroach Labs 76
Appendix III: Cloud terminology

Availability Zones or Zones: An Availability Zone (AZ) or Zone is an isolated location— often a data center — within a region. A region is made up of multiple AZs.

Regions: Regions are separate geographic areas defined by the cloud providers (e.g. among many others, GCP has regions called us-east1 and us-west1 located on the US East
Coast and West Coast, respectively). The closer a region is to the end user or application, the better the network latency.

gp3 volumes: AWS’s general-purpose storage volumes.

io2 volumes: AWS’s high-performance storage volumes.

pd-ssd (SSD persistent disks): GCP’s general-purpose storage volumes.

pd-extreme: GCP’s high-performance storage volumes.

Placement Policy / Placement Group: Rules designating where to place machines within an availability zone.

Premium-disk: Azure’s general-purpose storage volumes.

Ultra-disk: Azure’s high-performance storage volumes.

Volume: A unit of data storage, synonymous with “disk” in the context of cloud data storage.

Cockroach Labs 77
Cockroach Labs 78

You might also like