Professional Documents
Culture Documents
Index TermsDistributed Systems, Distributed applications, Performance evaluation, Metrics/Measurement, Performance measures.
I NTRODUCTION
TABLE 1
The resource characteristics for the instance types
offered by the four selected clouds.
Cores
Name
(ECUs)
Amazon EC2
m1.small
1 (1)
m1.large
2 (4)
m1.xlarge
4 (8)
c1.medium
2 (5)
c1.xlarge
8 (20)
GoGrid (GG)
GG.small
1
GG.large
1
GG.xlarge
3
Elastic Hosts (EH)
EH.small
1
EH.large
1
Mosso
Mosso.small
4
Mosso.large
4
RAM
[GB]
Archi.
[bit]
Disk
[GB]
Cost
[$/h]
1.7
7.5
15.0
1.7
7.0
32
64
64
32
64
160
850
1,690
350
1,690
0.1
0.4
0.8
0.2
0.8
1.0
1.0
4.0
32
64
64
60
60
240
0.19
0.19
0.76
1.0
4.0
32
64
30
30
0.042
0.09
1.0
4.0
64
64
40
160
0.06
0.24
3
ING
TABLE 2
The characteristics of the workload traces.
Trace ID,
Trace
Source (Trace ID
Time
Number of
in Archive)
[mo.]
Jobs
Users
Grid Workloads Archive [13], 6 traces
1. DAS-2 (1)
18
1.1M
333
2. RAL (6)
12
0.2M
208
3. GLOW (7)
3
0.2M
18
4. Grid3 (8)
18
1.3M
19
5. SharcNet (10)
13
1.1M
412
6. LCG (11)
1
0.2M
216
Parallel Workloads Archive [16], 4 traces
7. CTC SP2 (6)
11
0.1M
679
8. SDSC SP2 (9)
24
0.1M
437
9. LANLO2K (10)
5
0.1M
337
10. SDSC DS (19)
13
0.1M
460
System
Size
Sites
CPUs
Load
[%]
5
1
1
29
10
200+
0.4K
0.8K
1.6K
3.5K
6.8K
24.4K
15+
85+
60+
-
1
1
1
1
0.4K
0.1K
2.0K
1.7K
66
83
64
63
Results
Trace ID
DAS-2
RAL
GLOW
Grid3
SharcNet
LCG
CTC SP2
SDSC SP2
LANL O2
SDSC DS
BoTs 100
Jobs
CPUT
[%]
[%]
90
65
81
78
95
75
100
97
40
35
61
39
16
13
13
93
66
14
29
7
Tasks 1,000
Jobs
CPUT
[%]
[%]
95
73
98
94
99
91
100
97
95
85
87
61
18
14
25
6
73
18
44
18
Complex criterion
Jobs 10,000 &
BoTs 1,000
Jobs
CPUT
[%]
[%]
57
26
33
28
83
69
97
95
15
9
0
0
0
0
0
0
34
6
0
0
250
Number of Users
250
Number of Users
200
150
100
50
Number of Users
200
150
100
50
100000
50000
20000
Task Count
150
Number of Users
Number of Users
200
150
100
50
Number of Users
100
50
BoT Count
100000
50000
20000
10000
5000
5000
10000
1000
100
1000
100
Number of Users
10000
100
5000
10000
1000
100
BoT Count
250
5000
1000
Number of Users
300
Task Count
Fig. 1. Number of MTC users for the DAS-2 trace (top), and the San Diego Supercomputer Center (SDSC) SP2 trace
(bottom) when considering only the submitted BoT count criterion (left), and only submitted task count criterion (right).
TABLE 4
The benchmarks used for cloud performance evaluation.
B, FLOP, U, and PS stand for bytes, floating point
operations, updates, and per second, respectively.
Type
SI
SI
SI
MI
MI
MI
MI
MI
Suite/Benchmark
LMbench/all [24]
Bonnie/all [25], [26]
CacheBench/all [27]
HPCC/HPL [28], [29]
HPCC/DGEMM [30]
HPCC/STREAM [30]
HPCC/RandomAccess [31]
HPCC/bef f (lat.,bw.) [32]
Resource
Many
Disk
Memory
CPU
CPU
Memory
Network
Comm.
Unit
Many
MBps
MBps
GFLOPS
GFLOPS
GBps
MUPS
s, GBps
TABLE 5
The VM images used in our experiments.
VM image
EC2/ami-2bb65342
EC2/ami-36ff1a5f
EC2/ami-3e836657
EC2/ami-e813f681
GG/server1
EH/server1
EH/server2
Mosso/server1
OS, MPI
FC6
FC6
FC6, MPI
FC6, MPI
RHEL 5.1, MPI
Knoppix 5.3.1
Ubuntu 8.10
Ubuntu 8.10
Archi
32bit
64bit
32bit
64bit
32&64bit
32bit
64bit
32&64bit
Benchmarks
SI
SI
MI
MI
SI&MI
SI
SI
SI
Results
The experimental results of the Amazon EC2 performance evaluation are presented in the following.
4.3.1 Resource Acquisition and Release
We study two resource acquisition and release scenarios:
for single instances, and for multiple instances allocated
at once.
Single instances We first repeat 20 times for each
instance type a resource acquisition followed by a release
as soon as the resource status becomes installed (see
TABLE 6
Statistics for single resource allocation/release.
Instance
Type
m1.small
m1.large
m1.xlarge
c1.medium
c1.xlarge
GG.large
GG.xlarge
Res. Allocation
Min
Avg
Max
69
82
126
50
90
883
57
64
91
60
65
72
49
65
90
240
540
900
180
1,209
3,600
Res. Release
Min
Avg
Max
18
21
23
17
20
686
17
18
25
17
20
22
17
18
20
180
210
240
120
192
300
883
200
881
685
120
Quartiles
Median
Mean
Outliers
180
Quartiles
Median
Mean
Outliers
100
160
80
120
100
Duration [s]
Duration [s]
140
80
60
60
40
40
20
20
c1.xlarge
c1.medium
m1.large
m1.xlarge
m1.small
c1.xlarge
c1.medium
m1.large
m1.xlarge
m1.small
c1.xlarge
c1.medium
m1.large
m1.xlarge
m1.small
c1.xlarge
c1.medium
m1.large
m1.xlarge
m1.small
0
2 4 8 16 20
Instance Count
Total Time for
Res. Acquisition
2 4 8 16 20
Instance Count
VM Boot Time for
Res. Acquisition
2 4 8 16 20
Instance Count
Total Time for
Res. Release
Fig. 3.
Single-instance resource acquisition
and release overheads when acquiring multiple
c1.xlarge instances at the same time.
1
Performance [GOPS]
10
Performance [GOPS]
2 4 8 16 20
Instance Count
VM Install Time for
Res. Acquisition
0.8
0.6
0.4
0.2
0
m1.small
m1.small
m1.large
m1.xlarge
c1.medium
INT-add
INT-mul
INT64-bit
FLOAT-add
FLOAT-mul
INT64-mul
c1.medium
FLOAT-bogo
DOUBLE-add
c1.xlarge
DOUBLE-mul
DOUBLE-bogo
Performance [GOPS]
10
Performance [GOPS]
m1.xlarge
Instance Type
Instance Type
INT-bit
m1.large
c1.xlarge
0.8
0.6
0.4
0.2
0
EH.small
EH.small
EH.large
GG.small
GG.large
GG.xlarge
Mosso.small
Instance Type
INT-bit
INT-add
INT-mul
INT64-bit
EH.large
Mosso.large
INT64-mul
GG.small
GG.large
GG.xlarge
Mosso.small Mosso.large
Instance Type
FLOAT-add
FLOAT-mul
FLOAT-bogo
DOUBLE-add
DOUBLE-mul
DOUBLE-bogo
Fig. 4. LMbench results (top) for the EC2 instances, and (bottom) for the other instances. Each row depicts
the performance of 32- and 64-bit integer operations in giga-operations per second (GOPS) (left), and of floating
operations with single and double precision (right).
TABLE 7
The I/O of clouds vs. 2002 [25] and 2007 [26] systems.
Instance
Type
m1.small
m1.large
m1.xlarge
c1.medium
c1.xlarge
GG.small
GG.large
GG.xlarge
EH.large
Mosso.sm
Mosso.lg
02 Ext3
02 RAID5
07 RAID5
Char
[MB/s]
22.3
50.9
57.0
49.1
64.8
11.4
17.0
80.7
7.1
41.0
40.3
12.2
14.4
30.9
Seq. Output
Block
ReWr
Seq. Input
Char
Block
Rand.
Input
[MB/s]
[MB/s]
[MB/s]
[MB/s]
[Seek/s]
60.2
64.3
87.8
58.7
87.8
10.7
17.5
136.9
7.1
102.7
115.1
38.7
14.3
40.6
33.3
24.4
33.3
32.8
30.0
9.2
16.0
92.6
7.1
43.88
55.3
25.7
12.2
29.0
25.9
35.9
41.2
47.4
45.0
29.2
34.1
79.26
27.9
32.1
41.3
12.7
13.5
41.9
73.5
63.2
74.5
74.9
74.5
40.24
97.5
369.15
35.7
130.6
165.5
173.7
73.0
112.7
74.4
124.3
387.9
72.4
373.9
39.8
29.0
157.5
177.9
122.6
176.7
192.9
Multi-Machine Benchmarks
TABLE 8
HPL performance and cost comparison for various EC2
and GoGrid instance types.
Name
m1.small
m1.large
m1.xlarge
c1.medium
c1.xlarge
GG.large
GG.xlarge
Peak
Perf.
4.4
17.6
35.2
22.0
88.0
12.0
36.0
GFLOPS
2.0
7.1
11.4
3.9
50.0
8.8
28.1
GFLOPS
/Unit
2.0
1.8
1.4
0.8
2.5
8.8
7.0
GFLOPS
/$1
19.6
17.9
14.2
19.6
62.5
46.4
37.0
9
100
400
75
Efficiency [%]
Performance [GFLOPS]
500
300
200
50
25
100
0
1
16
Number of Nodes
m1.small
c1.xlarge
GG.1gig
16
Number of Nodes
GG.4gig
m1.small
c1.xlarge
GG.1gig
GG.4gig
Fig. 5. The HPL (LINPACK) performance of virtual clusters formed with EC2 m1.small, EC2 c1.xlarge, GoGrid
large, and GoGrid xlarge instances. (left) Throughput. (right) Efficiency.
TABLE 9
The HPCC performance for various platforms. HPCC-x is the system with the HPCC ID x [38]. The machines
HPCC-224 and HPCC-227, and HPCC-286 and HPCC-289 are of brand TopSpin/Cisco and by Intel Endeavor,
respectively. Smaller values are better for the Latency column and worse for the other columns.
Cores or
Capacity
1
2
4
2
8
16
32
64
128
16
1
4
8
3
12
24
16
128
16
128
Peak Perf.
HPL
[GFLOPS] [GFLOPS]
4.40
1.96
17.60
7.15
35.20
11.38
22.00
88.00
51.58
176.00
84.63
352.00
138.08
704.00
252.34
1,408.00
425.82
70.40
27.80
12.00
8.805
48.00
24.97
96.00
45.436
36.00
28.144
144.00
40.03
288.00
48.686
102.40
55.23
819.20
442.04
179.20
153.25
1,433.60
1,220.61
HPL
N
13,312
28,032
39,552
13,312
27,392
38,656
54,784
77,440
109,568
53,376
10,240
20,608
29,184
20,608
41,344
58,496
81,920
231,680
60,000
170,000
Performance [MBps]
Provider, System
EC2, 1 x m1.small
EC2, 1 x m1.large
EC2, 1 x m1.xlarge
EC2, 1 x c1.medium
EC2, 1 x c1.xlarge
EC2, 2 x c1.xlarge
EC2, 4 x c1.xlarge
EC2, 8 x c1.xlarge
EC2, 16 x c1.xlarge
EC2, 16 x m1.small
GoGrid, 1 x GG.large
GoGrid, 4 x GG.large
GoGrid, 8 x GG.large
GoGrid, 1 x GG.xlarge
GoGrid, 4 x GG.xlarge
GoGrid, 8 x GG.xlarge
HPCC-227, TopSpin/Cisco
HPCC-224, TopSpin/Cisco
HPCC-286, Intel Endeavor
HPCC-289, Intel Endeavor
DGEMM
[GFLOPS]
2.62
6.83
8.52
11.85
44.05
34.59
27.74
3.58
0.23
4.36
10.01
10.34
10.65
10.82
11.31
18.00
4.88
4.88
10.50
10.56
50000
40000
Range
Median
Mean
30000
20000
10000
0
8 10 15 20 25
2 2 2 2 2
m1.xlarge
8 10 15 20 25
2 2 2 2 2
GG.xlarge
8 10 15 20 25
2 2 2 2 2
EH.small
8 10 15 20 25
2 2 2 2 2
Mosso.large
Mosso.large resources. Figure 6 summarizes the results for one example benchmark from the CacheBench
suite, Rd-Mod-Wr. The GG.large and EH.small types
have important differences between the min, mean, and
max performance even for medium working set sizes,
such as 1010 B. The best-performer in terms of computation, GG.xlarge, is unstable; this makes cloud vendor
selection an even more difficult problem. We have performed a longer-term investigation in other work [52].
C LOUDS VS . OTHER
ING I NFRASTRUCTURES
S CIENTIFIC C OMPUT-
Method
10
Complete Workloads
11
12
TABLE 10
The results of the comparison between workload execution in source environments (grids, PPIs, etc.) and in clouds.
The - sign denotes missing data in the original traces. For the two Cloud models AWT=80s (see text). The total cost
for the two Cloud models is expressed in millions of CPU-hours.
20
10
0.5
0
0.1
0
6
20
10
0.5
0
2
Slowdown Factor
env. (Grid/PPI)
AReT
ABSD
[s]
(10s)
802
11
27,807
68
17,643
55
7,199
61,682
242
9,011
37,019
78
33,388
389
9,594
61
33,807
516
Trace ID
DAS-2
RAL
GLOW
Grid3
SharcNet
LCG
CTC SP2
SDSC SP2
LANL O2K
SDSC DS
Source
AWT
[s]
432
13,214
9,162
31,017
25,748
26,705
4,658
32,271
0
9
10
Slowdown Factor
Fig. 7. Performance and cost of using cloud resources for MTC workloads with various slowdown factors for sequential
jobs (left), and parallel jobs(right) using the DAS-2 trace.
TABLE 12
Relative strategy performance: resource bulk allocation
(S2) vs. resource acquisition and release per job (S1).
Only performance differences above 5% are shown.
Relative Cost
|S2S1|
100 [%]
S1
DAS-2
30.2
Grid3
11.5
LCG
9.3
LANL O2K
9.1
TABLE 11
The results of the comparison between workload execution in source environments (grids, PPIs, etc.) and in clouds
with only the MTC part extracted from the complete workloads. The LCG, CTC SP2, SDSC SP2, and SDSC DS
traces are not presented, as they do not have enough MTC users (the criterion is described in text).
Trace ID
DAS-2
RAL
GLOW
Grid3
SharcNet
LANL O2K
Source
AWT
[s]
70
4,866
6,062
10,387
635.78
env. (Grid/PPI)
AReT
ABSD
[s]
(10s)
243
3
18,694
24
13,396
41
7,422
30,092
6
1,715
4
6 R ELATED WORK
In this section we review related work from three areas:
clouds, virtualization, and system performance evaluation. Our work also comprises the first characterization
of the MTC component in existing scientific computing
workloads.
Performance Evaluation of Clouds and Virtualized
Environments There has been a recent spur of research
activity in assessing the performance of virtualized resources, in cloud computing environments [9], [10], [11],
[56], [57] and in general [8], [24], [51], [58], [59], [60],
[61]. In contrast to this body of previous work, ours is
different in scope: we perform extensive measurements
using general purpose and high-performance computing
benchmarks to compare several clouds, and we compare
clouds with other environments based on real longterm scientific computing traces. Our study is also much
broader in size: we perform in this work an evaluation
using over 25 individual benchmarks on over 10 cloud
instance types, which is an order of magnitude larger
than previous work (though size does not simply add to
quality).
Performance studies using general purpose benchmarks have shown that the overhead incurred by virtualization can be below 5% for computation [24], [51] and
below 15% for networking [24], [58]. Similarly, the performance loss due to virtualization for parallel I/O and
web server I/O has been shown to be below 30% [62]
and 10% [63], [64], respectively. In contrast to these, our
work shows that virtualized resources obtained from
public clouds can have a much lower performance than
the theoretical peak.
Recently, much interest for the use of virtualization
has been shown by the HPC community, spurred by two
seminal studies [8], [65] that find virtualization overhead
to be negligible for compute-intensive HPC kernels and
applications such as the NAS NPB benchmarks; other
studies have investigated virtualization performance for
specific HPC application domains [61], [66], or for mixtures of Web and HPC workloads running on virtualized
13
14
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
15
[52] A. Iosup, N. M. Yigitbasi, and D. Epema, On the performance variability of production cloud services, TU Delft, Tech.
Rep. PDS-2010-002, Jan 2010, [Online] Available: http://pds.twi.
tudelft.nl/reports/2010/PDS-2010-002.pdf.
[53] T. Killalea, Meet the virts, Queue, vol. 6, no. 1, pp. 1418, 2008.
[54] D. Nurmi, R. Wolski, and J. Brevik, Varq: virtual advance reservations for queues, in HPDC. ACM, 2008, pp. 7586.
[55] A. Iosup, D. H. J. Epema, T. Tannenbaum, M. Farrellee, and
M. Livny, Inter-operating grids through delegated matchmaking, in SC. ACM, 2007, p. 13.
[56] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman,
L. Youseff, and D. Zagorodnov, The Eucalyptus open-source
cloud-computing system, 2008, uCSD Tech.Rep. 2008-10. [Online] Available: http://eucalyptus.cs.ucsb.edu/.
[57] B. Quetier, V. Neri, and F. Cappello, Scalability comparison of
four host virtualization tools, J. Grid Comput., vol. 5, pp. 8398,
2007.
[58] A. Menon, J. R. Santos, Y. Turner, G. J. Janakiraman, and
W. Zwaenepoel, Diagnosing performance overheads in the Xen
virtual machine environment, in VEE. ACM, 2005, pp. 1323.
[59] N. Sotomayor, K. Keahey, and I. Foster, Overhead matters: A
model for virtual resource management, in VTDC. IEEE, 2006,
pp. 411.
[60] A. B. Nagarajan, F. Mueller, C. Engelmann, and S. L. Scott,
Proactive fault tolerance for HPC with Xen virtualization, in
ICS. ACM, 2007, pp. 2332.
[61] L. Youseff, K. Seymour, H. You, J. Dongarra, and R. Wolski, The
impact of paravirtualized memory hierarchy on linear algebra
computational kernels and software, in HPDC. ACM, 2008,
pp. 141152.
[62] W. Yu and J. S. Vetter, Xen-based HPC: A parallel I/O perspective, in CCGrid. IEEE, 2008, pp. 154161.
[63] L. Cherkasova and R. Gardner, Measuring CPU overhead for
I/O processing in the Xen virtual machine monitor, in USENIX
ATC, 2005, pp. 387390.
[64] U. F. Minhas, J. Yadav, A. Aboulnaga, and K. Salem, Database
systems on virtual machines: How much do you lose? in ICDE
Workshops. IEEE, 2008, pp. 3541.
[65] W. Huang, J. Liu, B. Abali, and D. K. Panda, A case for high
performance computing with virtual machines, in ICS. ACM,
2006, pp. 125134.
[66] L. Gilbert, J. Tseng, R. Newman, S. Iqbal, R. Pepper, O. Celebioglu,
J. Hsieh, and M. Cobban, Performance implications of virtualization and hyper-threading on high energy physics applications
in a grid environment, in IPDPS. IEEE, 2005.
[67] J. Zhan, L. Wang, B. Tu, Y. Li, P. Wang, W. Zhou, and D. Meng,
Phoenix cloud: Consolidating different computing loads on
shared cluster system for large organization, in CCA-08 Posters,
2008, pp. 711.
[68] C. Evangelinos and C. N. Hill, Cloud computing for parallel scientific hpc applications: Feasibility of running coupled
atmosphere-ocean climate models on amazons ec2, in CCA-08,
2008, pp. 16.
[69] M.-E. Bgin, B. Jones, J. Casey, E. Laure, F. Grey, C. Loomis, and
R. Kubli, Comparative study: Grids and clouds, evolution or revolution? CERN EGEE-II Report, June 2008, [Online] Available:
https://edms.cern.ch/file/925013/3/EGEE-Grid-Cloud.pdf.
[70] S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. A.
Yelick, The potential of the Cell processor for scientific computing, in Conf. Computing Frontiers. ACM, 2006, pp. 920.
[71] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D.
Joseph, R. H. Katz, S. Shenker, and I. Stoica, Mesos: A
platform for fine-grained resource sharing in the data center,
UCBerkeley, Tech. Rep. UCB/EECS-2010-87, May 2010. [Online].
Available: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/
EECS-2010-87.html
[72] L. Vu, Berkeley Lab contributes expertise to new Amazon Web
Services offering, Jul 2010, [Online] Available: http://www.lbl.
gov/cs/Archive/news071310.html.
[73] J. P. Jones and B. Nitzberg, Scheduling for parallel supercomputing: A historical perspective of achievable utilization, in JSSPP,
ser. LNCS, vol. 1659. Springer-Verlag, 1999, pp. 116.
[74] lxc Linux Containers Team, Linux containers overview, Aug
2010, [Online] Available: http://lxc.sourceforge.net/.
Alexandru Iosup received his Ph.D. in Computer Science in 2009 from the Delft University of Technology (TU Delft), the Netherlands.
He is currently an Assistant Professor with the
Parallel and Distributed Systems Group at TU
Delft. He was a visiting scholar at U.WisconsinMadison and U.California-Berkeley in the summers of 2006 and 2010, respectively. He is the
co-founder of the Grid Workloads, the Peerto-Peer Trace, and the Failure Trace Archives,
which provide open access to workload and
resource operation traces from large-scale distributed computing environments. He is the author of over 50 scientific publications and has
received several awards and distinctions, including best paper awards at
IEEE CCGrid 2010, Euro-Par 2009, and IEEE P2P 2006. His research
interests are in the area of distributed computing (keywords: massively
multiplayer online games, grid and cloud computing, peer-to-peer systems, scheduling, performance evaluation, workload characterization).
M.Nezih Yigitbasi received BSc and MSc degrees from the Computer Engineering Department of the Istanbul Technical University, Turkey,
in 2006 and 2008, respectively. Since September 2008, he is following a doctoral track in
Computer Science within the Parallel and Distributed Systems Group, Delft University of Technology. His research interests are in the areas
of resource management, scheduling, design
and performance evaluation of large-scale distributed systems, in particular grids and clouds.
16