Professional Documents
Culture Documents
Akshat Verma Gargi Dasgupta Tapan Kumar Nayak Pradipta De Ravi Kothari
Cumulative Probability
AppSuite-2 18 13 0.8
AppSuite-3 13 25 0.7
AppSuite-4 16 37 0.6
0.5
Table 1: Workload Details for each cluster 0.4
0.2
ing that terminated all daemons including the sensors).
0.1
Thus, the traces had missing or incorrect data for many 0
0 20 40 60 80 100
time intervals during the trace period. We used a sim- CPU Utilization
ple interval graph technique to identify the longest con- Figure 2: Cumulative Distribution Function of CPU Uti-
tiguous interval, where all the servers in one cluster had lization for AppSuite-1
monitored data available. Hence, for each server cluster
we identified a smaller period which had accurate mon- Fig. 1 shows the CPU utilization of each server
itored data available and used these smaller periods for in ’AppSuite-1’, server1(top) through server10(bottom).
our analysis. The important observation to make here is that all servers
The data center had a large number of clusters and we barring server8 reach a CPU utilization of 100% at some
have selected 4 representative clusters (Table. 1) for this point in time during the duration of the trace. This has an
analysis. ’AppSuite-1’, ’AppSuite-2’ and ’AppSuite-4’ important implication for consolidation. Observation 1:
had a 2 tiered application with application server com- If consolidation is performed by reserving the maximum
ponents and DB server components. ’AppSuite-3’ was a utilization for each application, the application may re-
3-tiered application suite with a few web servers, a few quire capacity equal to the size of its current entitlement.
application servers, and a few DB servers. In most cases, This observation is reinforced by taking a look at the Cu-
multiple application servers used a common DB server. mulative Probability Distribution (CDF) of CPU utiliza-
However, for ’AppSuite-2’, few application servers had tion (Fig. 2) for each server of ’AppSuite-1’. An inter-
a dedicated DB servers assigned to them. There were no esting observation in the CDF plot however is the large
restrictions on co-locating two or more components of skew in the distribution. For most of the applications,
an application suite on the same physical server. The de- the CPU utilization at 90-percentile of the distribution is
tailed information about the applications running in the less than half of the peak CPU utilization. Such a skew
data center and the virtual server to physical server map- can be utilized for a tighter consolidation by provisioning
ping are withheld for privacy and business reasons. less than peak resource consumed by each application.
We drill down further into the skew of the CPU utiliza-
3.2 Macro Workload Analysis tion distribution function in Fig. 3(a). We observe that
the 99-percentile CPU utilization value is significantly
We begin our workload characterization study with less than the maximum CPU utilization in many cases.
server utilization of individual servers. Due to space lim- This is also in line with observations on other enterprise
itations, we primarily report our observations on only workloads made in [16]. Interestingly, the 90-percentile
one server cluster ’AppSuite-1’, broadening our obser- CPU utilization is about half or less of the maximum
vations to other clusters only for important findings. CPU utilization for 9 out of 10 servers. Interestingly,
3.2.1 CPU Utilization Distribution the gap between the 80 and 90-percentile values is less
than 10% CPU utilization in all cases and less than 5%
100
50
0
in many cases. We also look at the other server clusters
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
100
50
in Fig. 3 and find the observations to hold there as well.
0
100
50
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
However, in the ’AppSuite-2’ cluster, a few servers have
0
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
high utilization (Servers 15 to 18) for most of the inter-
50
0
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 val. Hence, in these cases, both the 80 and 90-percentile
50
0
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
values are reasonably close to the peak CPU utilization.
50
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
The above findings lead us to our second important ob-
100
50
0
servation. Observation 2: If we could size an applica-
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
100
50
tion based on 90-percentile CPU utilization instead of
0
100
50
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
maximum CPU utilization, it could lead to significant
0
100
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
savings.
50
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 We next observe the variability of the CPU utilization
Figure 1: CPU Utilization for AppSuite-1 with Time for different servers. To measure the variability, we com-
puted the coefficient of variation (COV) for all the ap-
100 100
80 80
CPU Utilization
CPU Utilization
60 60
40 40
20 20
0 0
1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18
Server Id Server Id
(a) AppSuite-1 cluster (b) AppSuite-2 cluster
100 100
Mode
80−pcn
80 90−pcn 80
CPU Utilization
CPU Utilization
99−pcn
Max
60 60
40 40
20 20
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 0 2 4 6 8 10 12 14 16
Server Id Server Id
(c) AppSuite-3 cluster (d) AppSuite-4 cluster
Figure 3: Comparison of Peak, Mode and Percentile CPU Utilization
plications in a cluster. The coefficient of variation is a tailed distribution (COV >1) to another independent (or
normalized measure of dispersion of a probability distri- positively correlated) distribution with an exponentially
bution and is defined as COV = σ/µ, where σ is the decaying tail (COV =1) leads to an aggregate distribu-
standard deviation and µ is the mean of the distribution. tion, which is heavy-tailed. This leads to our third impor-
1.4 3
tant observation. Observation 3: If a statistical measure
Coefficient of Variation (COV)
1.2 2.5
1
2
that ignores the tail of the distribution is used for sizing
0.8
0.6
1.5 an application, the consolidated server may observe a
0.4
1
large number of SLA capacity violations.
0.2 0.5
0
1 2 3 4 5 6 7 8 9 10
0
0 2 4 6 8 10 12 14 16 18 3.2.2 Correlation
(a) AppSuite-1 (b) AppSuite-2 Our first few observations bring out the potential sav-
4 5
3.5
4
3
2.5 3
ues as opposed to peak values. However, sizing based on
2
0.5
1 if co-located applications peak together. Hence, we next
0
1 2 3 4 5 6 7 8 9 10 11 12 13
0
0 2 4 6 8 10 12 14 16 study the correlation between applications belonging to
(c) AppSuite-3 (d) AppSuite-4 the same application suite. The correlation between
a pair of applications with timeseries {x1 , x2 , . . . , xN }
Figure 4: Coefficient of Variation for all clusters. Servers and {y1 , y2 , . . . , yN } is represented by the Pearson cor-
in each cluster are sorted by COV for easy comparison. relation coefficient,
COV is a useful statistic for comparing the degree of
P P P
N xi yi − xi yi
variation and equals 1 for exponential distribution. Dis- rxy = p P 2 p P (1)
N xi − ( xi )2 N yi2 − ( yi )2
P P
tributions with COV >1 (such as a hyper-exponential
distribution) are considered high-variance, while those Fig. 5 shows the pair-wise correlation between the ap-
with COV <1 are considered low-variance. The coeffi- plications of ’App-Suite1’. One may observe that there
cient of variations for all the clusters are shown in Fig. 4. are many applications that have significant positive cor-
We observe that all clusters have at least a few appli- relation. On the other hand, there are also a few applica-
cations with high-variance distributions and ’AppSuite- tions (e.g., on Server 3,8, and 10) that have minimal cor-
3’ has the largest number of applications with COV >1. relation with other applications. The observations high-
There are also applications with low-variance distribu- light that (a) there is a risk of SLA violation if consolida-
tions. However, it is well known that combining a heavy tion methods are not aware of correlation and (b) there is
1
10 tions at all levels (both peak and off peak). Capacity vi-
9
0.9
olations, though, occur when two applications sized by
8
0.8 an off-peak value peak together. Hence, we look at the
7
0.7 correlation between only the peaks for various applica-
tions in Fig. 6. We observe that there are apps with low
Server Id
6 0.6
5 0.5
correlation, but whose peaks may be correlated. Further,
4 0.4
there also exists correlated apps whose peaks typically
3
do not occur at the same time (e.g., Server 5 and 7). This
0.3
2
leads to our next important observation. Observation 5:
0.2
1
Correlated Applications may not always peak together.
0.1
2 4 6 8 10 Similarly, non-correlated applications may also peak to-
Server Id
gether in some cases.
Figure 5: Inter-Server Correlation for AppSuite-1 cluster
3.3 Stability Analysis
potential for placing non-correlated applications to mit-
igate this risk. The other clusters have less correlation Static and semi-static placement decisions are made for
between servers but there are still a significant number extended periods of time. Hence, there is a need to ana-
of servers (more than 25%) that exhibit correlation with lyze the stability of workload statistical properties to en-
one or more other servers. One may observe that vir- sure the reliability of the placement decisions. In this
tual servers that are part of a multi-component applica- section, we study the workload periodicity, variation in
tion have a high likelihood of being correlated. How- statistical properties like mean, 90-percentile and corre-
ever, since in most cases, multiple (4 or 8) application lation co-efficient over the observation period.
servers were sharing a common DB server, the correla-
tion was not strong. ’App-Suite2’ however had 4 (appli- 3.3.1 Periodicity analysis of utilization data
cation server, db server) pairs that were dedicated. As
We first analyze the periodicity of the collected data. It
a result, even though the workload to this suite had low
will help to find the repeating patterns, such as the pres-
intrinsic correlation, the two-tier nature of the applica-
ence of a periodic signal which has been buried under
tion suite introduced correlation. Hence, multi-tier ap-
noise. The usual method for deciding if a signal is peri-
plications with a one-to-one mapping between servers in
odic and then estimating its period is the auto-correlation
different tiers are likely to exhibit correlation even for
function. For a discrete timeseries {x1 , x2 , . . . , xN }
workloads with no intrinsic correlation. This leads to our
with mean µ and variance σ 2 , the auto-correlation func-
next important observation. Observation 4: There are
tion for any non-negative integer k < N is given by
both positively correlated and uncorrelated applications
N −k
in a typical server cluster. Hence, correlation needs to 1 X
be considered during placement to avoid SLA capacity R(k) = [xn − µ] [xn+k − µ] , (2)
(N − k)σ 2 n=1
violations.
3.2.3 Temporal Distribution of Peaks Essentially, the signal {xn } is being convolved with a
time-lagged version of itself and the peaks in the auto-
5000 correlation indicate lag times at which the signal is rel-
4500 atively highly correlated with itself; these can be inter-
4000
preted as periods at which the signal repeats. To en-
3500
Time (Min)
3000
hance the analysis, we also computed the magnitude
2500 spectrum of the timeseries, |Xk |, where {Xk } is the Dis-
2000 crete Fourier Transform (DFT) of {xn } and is defined by
1500
N
1 X 2πi
xn e− N (k−1)(n−1) , 1 ≤ k ≤ N.
1000
Xk = (3)
500 N n=1
1 2 3 4 5 6 7 8 9 10
Server Id We study the auto-correlation function and magnitude
Figure 6: Duration of Peak CPU utilization (> 90- spectrum of the utilization data for all the applications
percentile) for AppSuite-1 cluster. Dark lines indicate and find that some servers exhibit nice periodic behav-
sustained peaks. ior, whereas some servers do not follow any particular
We have used the correlation coefficient as an indi- pattern. Fig. 7 shows a periodic pattern with a time pe-
cator of the temporal similarity between two applica- riod of one day as the lag between two consecutive peaks
tions. However, correlation is a comprehensive metric in the auto-correlation function is one day and there is
that captures temporal similarity between two applica- a peak in the magnitude spectrum corresponding to it.
This kind of workloads can be predicted with significant
reliability. Many applications do not show any periodic
100
pattern in the observed period, however, the statistical
Utilization
50
properties remain consistent over a long period. To an-
alyze the consistency, we computed the mean and 90-
0
0 2 4 6 8 10 12
percentile statistics over several windows of length 1 day.
Time (Days)
1 Fig. 8 shows that although the workload has no periodic
AutoCorrelation
4
period. However, for these applications, the moving av-
2
erages follows the actual mean and 90-percentiles closely
0
0.5 1 1.5 2 2.5 3
Frequency (HZ/(24*3600))
3.5 4 4.5 5 5.5 6 over the observed period (Fig. 9) and can be used for esti-
Figure 7: The timeseries, auto-correlation and frequency mation. These observations lead to the following conclu-
spectrum of this workload shows a periodicity of 1 day sion. Observation 6: Some servers exhibit periodic be-
havior and the future pattern can be reliably forecasted
with a day or a week of data. For many non-periodic
Utilization
50
servers, the statistical properties are fairly stable over
0
0 2 4 6 8 10 12
time. For highly variable servers, an adaptive prediction
Time (Days)
AutoCorrelation
0.5
method like MovingAverage should be used to estimate
0
−0.5 the statistical properties.
−1
0 2 4 6 8 10 12
Lag (Days)
3.3.2 Stability in Correlation
Magnitude
5000
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Standard Deviation of corr coefficients
Frequency (HZ/(24*3600))
50
mean 0.5
40 90−percentile
Utilization
0 0.2
1 2 3 4 5 6 7 8 9 10 11 12 13
Time
0.1
Figure 8: The timeseries, auto-correlation and frequency
0
spectrum plot of the workload do not show any periodic-
10
ity, but the mean and 90-percentile values show stability. 9
8
7
6
5 9 10
4 7 8
100 3 6
2 5
Utilization
1 3 4
1 2
50 Server Id
Server Id
0
0 2 4 6
Time (Days)
8 10 12 Figure 10: Stability of Correlation for App-Suite1 (Half
100
Mean the values have been deleted for visual clarity)
90−percentile
80 90−p Mov Avg
90−p Avg
We have observed that correlation between applica-
Utilization
60
Mean Mov Avg tions should be used while making placement decisions.
40
Mean
We next study the stability in correlation for AppSuite-1,
which has the highest correlation amongst all the clus-
20
ters. For this purpose, we compute the correlation be-
0
1 2 3 4 5 6 7 8 9 10 11 12 13 tween all pairs of applications for every day separately
Time
Figure 9: The timeseries has no regular pattern and the during the whole duration of the trace and compute the
mean and 90-percentile statistics also vary significantly standard deviation across these daily correlation values.
over the time period, but the moving averages track the We observe in Fig. 10 that the standard deviation is fairly
statistic well. low, indicating the stability of correlation across time.
Observation 7: The correlation between the CPU uti-
lization of various servers is fairly stable across time.
4 Placement Methodologies times (e.g., size at 90% cdf). (Fig. 2, 3)
We now present various placement methodologies. • If we size applications by an off-peak metric and
place correlated applications together, there is a
4.1 Workload-unaware Placement high risk of SLA capacity violation.
We presented the pM apper power-aware application • If two uncorrelated applications are placed together
placement methodology and system in [26] in the context and sized individually for a violation probability of
of a runtime placement controller. pM apper minimizes X%, the probability that both of them would violate
fragmentation using an enhancement of First Fit De- their sizes at the same time is (X 2 )%.
creasing (FFD) bin-packing algorithm and uses a novel
Order Preservation property to select the right server To take an example, consider two applications A1 and
for any application being migrated in order to mini- A2 . Assume that both A1 and A2 have a maximum size
mize power. The algorithm optimizes the use of one of 1000 RPE2 units with a 90 percentile value of 500
resource (typically CPU utilization) during packing and RPE2 units. Further, assume that A1 and A2 are un-
treats other resources (e.g., memory, I/O) as constraints. correlated with each other. It is now easy to see that if
Hence, it always comes up with a feasible packing for all we place A1 and A2 on a single server and allocate 500
resources but allocates only one resource in an optimized RPE2 units each to both the applications, the probabil-
manner. The methodology used by pM apper does not ity that both of them would exceed their allocation at the
focus on resource sizing of each VM for the next place- same time is only 1%. Hence, provisioning based on 90
ment interval, which is predicted by a Performance Man- percentile and placing uncorrelated applications together
ager. An important thing to note here is that pM apper is can lead to a potential savings of 50% over the peak-
designed for short consolidation intervals. There are two based sizing method. CBP uses exactly these ideas to
important implications of such an approach. Firstly, each size individual applications based on a tail bound instead
application is sized independently and a single number is of the maximum size of the application. Further, it adds
used to size an application. Secondly, as placement de- co-location constraints between positively correlated ap-
cisions need to be aware of application migration costs, plications so that two such applications are not placed on
few applications are migrated and the relocation decision the same server. The number of actual constraints added
takes an existing (old) placement into account. However, can be controlled using a tunable Correlation Cutof f .
such an approach can still be applied for static consoli- Hence, CBP proceeds in very similar manner to the
dation with much longer consolidation intervals. In such pM apper algorithm with few key differences: (i) We
a static placement scenario as the one considered in this add co-location constraints between any two positively
paper, the pM apper methodology is naturally adapted correlated application pairs (Ai , A0i ) that exhibit a cor-
by sizing an application based on the peak resource us- relation coefficient above the correlationthreshold (ii)
age of the application in the (longer) placement period. We size applications based on a tail bound instead of the
Note further that in the case of SLA governed data cen- maximum value and (iii) In the inner loop of pM apper
ters, one can use less strict sizing functions. For example, where we find the most power-efficient server Sj that has
if the SLA requires that resource requirements are met resources for an application Ai , we also make a check if
for at least 99% of the time, one could use a V M size none of the already placed applications on Sj have a co-
that ensures a tail violation probability of 1%. Similarly, location constraint with Ai . If indeed there is such an
one may also choose to size all applications based on a application, we mark the server ineligible and consider
metric like mode or median, if short-term violations are the next server for placing the application.
acceptable. We term this family of placement method- It is easy to see that CBP incurs an overhead in the
ologies using peak, percentile, mode etc. based sizing as computation of correlation for all application pairs.
Existing placement methodologies. Theorem 1 Given N applications and a timeseries with
4.2 Correlation Based Placement d points, CBP takes O(N 2 d) time to find the new place-
ment.
We now present our first methodology that leverages the
observations made from our workload analysis to place We have proposed the CBP methodology that takes
applications in a more effective manner. This method- an existing dynamic consolidation algorithm and adapts
ology aptly named as the Correlation Based Placement it to work in a static or semi-static consolidation sce-
(CBP) is based on the following important observations. nario. CBP adds co-location constraints between cor-
related applications to ensure that an application can be
• The peak resource usage of an application is signif- sized based on an off-peak value. However, it adds a hard
icantly higher than the resource usage at most other constraint between correlated applications.
50
Balanced
Skewed
40 Server Capacity
tion equal to an off peak value.
CPU (RPE2)
30
20
Servers VM
10
1. Envelop
0
5 10 15 20 25 30 Transformed
Time
VM
Server1
50 2. Cluster
Balanced
Skewed
Server Capacity VM
40
Clusters
CPU (RPE2)
30
3. Server Selection
20
Selected
10 Servers VM
Clusters
Are
0
5 10 15 20 25 30 ALL VM DONE
Time PLACED
Server 2 N
Figure 11: Fractional Optimal Solutions: Balanced and 4. Per−Cluster VM Shortlisting
Skewed for next server
Candidate VMs
We now take a closer look at the problem to under- VM Allocation
5. VM Placement for server
stand the nature of the optimal solution. Consider a for server
25 CB
PDF
20
4.3 Peak Clustering Based Placement 15
10
5 Original Time Series
We address the issues with CBP in a new consolida- 0
5
Envelop at PB = 0.67
10 15 20 25 30
tion method called Peak Clustering based Placement Capacity Used PB Time
Figure 14: Server Reservation: Each cluster gets a pro- We have performed extensive experiments to evaluate the
portional reservation. There is a buffer for the maximum proposed methodologies. We first detail out our evalua-
peak across all clusters. tion setup and then present some of our key findings.
The final steps in P CP pack all the applications on
the minimal number of servers, while packing the more 5.1 Evaluation Setup
power efficient servers first. The method picks the next
highest ranked server and selects a set of applications The methodologies have been implemented as part of an
from each cluster in a proportional manner. Given a Consolidation Planning Tool from IBM called Emerald
server with capacity Capj to pack and a set of applica- [11]. Emerald has built-in adapters to input trace data in
tions yet to be placed, we calculate the sum of the sizes various formats, a module to collect the current server in-
(CB ) of all remaining applications as T otalSize. For ventory and a knowledge base that includes a catalog of
each cluster Mk , we calculate the sum of the sizes Sizek various server platforms, their RPE2 values and power
1
Peak_Existing
Mode_Existing
models(Power Vs CPU Utilization). The power mod- 0.8 CBP
Normalized Power
PCP
els are derived from actual measurements on the servers
0.6
and are used by a Power Estimation module to estimate
the power drawn by any candidate placement. We feed 0.4
Violations / Day
200
administrator to specify an analysis period for the traces
followed by an evaluation period, where the effectiveness 150
Server Utilization
Average
0.8
shooting the server capacity, thereby causing violations.
0.6
Thus PCP sees a nominal number of violations per hour
Max
in all clusters. 0.4
Max Max
Average Violation
Largest Violation
2.5 0.1 favors balancing and the other algorithms favor creating
0.2
2 a skew.
1.5
Running Time: We study the running time of vari-
1
ous methods in Table 3. The CBP algorithm very clearly
says a super-linear relationship with the number of ap-
0.5
plications (N ) because of the N 2 correlation co-efficient
0
App-suite1 App-suite2 App-suite3 App-suite4 computation between all pairs of applications. The Ex-
Figure 16: Violations for M ode Existing on all active isting method in general scales linearly with the increase
servers in number of applications. P CP has a dependence on
(a) the number of applications, (b) the number of peak
We observed in Fig. 15 that Mode Existing has the
ranges in a cluster and (c) the number of clusters it cre-
potential to save a lot of power but may incur a large
ates. Recapitulate that AppSuite-3 has the highest CoV
number of violations. The correlation in the first two
(Fig. 4), which manifests in a large number of peak
clusters of AppSuite-1 and AppSuite-2 especially im-
ranges. As a result, AppSuite-3 has the highest running
pacts Mode Existing, leading to very high violations. Re-
time, even though the number of applications N is only
sults show that among the 9 active servers in all clus-
13. Overall, P CP has a running time that is double of
ters, 5 of them faced SLA capacity violations. We study
Existing and fairly stable. The running time of CBP
the size of the violations in the placement computed by
on the other hand increases super-linearly with the num-
Mode Existing in Fig. 16. In practise, administrators al-
ber of applications N .
low a buffer capacity (typically 10% of the server) and
hope that any peaks can be served from this buffer ca- 0.7 300
Normalized Power
CBP CBP
Daily Violations
0.6 250
PCP PCP
pacity. We observe that even a buffer capacity of 20% 0.5
0.4
200
150
0.3
is not able to handle the burstiness of the workload. 0.2 100
50
Amongst 5 servers, the average violation exceeds 20% 0.1
0 0
5 6 7 8 9 10 11 12 13 5 6 7 8 9 10 11 12 13
of the server capacity in 4 servers and the peak viola- Number of Physical Servers Number of Physical Servers
tion exceeds the size of the server itself in all the servers. (a) Power (b) Violations
Hence, a workload-unaware strategy that uses a buffer
capacity to handle peaks is not feasible in practise. Figure 18: Impact of change in virtual servers per physi-
cal server in AppSuite-2
Workload Balancing: We next investigate the load
balance across the active servers achieved by var- Fragmentation: We next investigate the ability of
ious methodologies in Fig. 17. We observe that CBP and P CP to deal with fragmentation (ratio of
Mode Existing and CBP have servers that are always application capacity C(Ai ) to server capacity C(Sj )).
overloaded (average utilization of highest loaded server We change the original servers in the data center and
is 100%) for AppSuite-1 and AppSuite-2. Such a place- simulate placement on a larger number of lower ca-
ment is not suitable in practise. Further, for all method- pacity servers. Fig 18 shows the behavior of CBP
ologies other than P CP , there is a large imbalance be- and P CP with increase in number of physical servers
tween the average utilization of the highest loaded server for AppSuite-2, which has the largest number of virtual
250
Number of Violations
Number of Violations
200
100
servers. CBP adds a fixed number of co-location con- 150
100
straints and needs at least as many servers as the max- 50
50
imum number of constraints added to any application. 0 0
Hence, it suffers when few high capacity servers are 0 50 100 150
Learning Period (Hours)
200 250 0 5 10
Learning Period (Hours)
15 20 25
0.8
10 0.6 PCP Power Consumed second cluster has a weekly pattern. We observe that
5
PCP Violations/Day 0.4 P CP is able to infer the characteristics of the work-
0.2 load in about half of the length of the period. Hence,
0
0.2 0.3 0.35 0.5 0.65 0.8
0
0.2 0.3 0.35 0.5 0.65 0.8 for AppSuite-2, it takes about half a day of training data
CBP Correlation Threshold CBP Correlation Threshold
to reduce violations, whereas for AppSuite-4, we require
(a) Violations (b) Power
about 4 days of training data. We also observe that the
Figure 19: Power drawn and SLA Violations for CBP impact of historical trends happens in a discrete manner.
with changing correlation cutoff in AppSuite-4 Hence, for AppSuite-4, the availability of 2 full days of
The performance of CBP depends on the correlation data leads to a significant decrease in violations as di-
cutoff parameter that is used to decide if correlation con- urnal trends are available. However, any further data is
straints need to be added between a pair of applications. not useful until we reach 5 days, when weekly trends be-
Fig. 19 shows CBP performance with different thresh- come available. A key strength of P CP is that a user can
olds, with the corresponding PCP metric shown as a ref- easily determine the length of training period for a server
erence line for AppSuite-4. Using a very low correlation cluster using a few sample runs and then store only the
threshold (0.2) creates constraints even between weakly relevant history for future analysis. We next investigate
correlated workloads thereby reducing the SLA viola-
tions (to even below that of PCP). However this comes 25
Normalized Power
20