Professional Documents
Culture Documents
net/publication/329953006
CITATION READS
1 134
2 authors, including:
Rukma Talwadker
Games24x7
9 PUBLICATIONS 150 CITATIONS
SEE PROFILE
All content following this page was uploaded by Rukma Talwadker on 17 July 2020.
18
The peak or the amplitude: indicates the highest value of
the counter during the burst.
The width or the duration: indicates the time along the x
axis between the start and the end of the burst. Now, we define
the metrics for determining the periodic and a-periodic burst
scores. These metrics are applied independently on both the
seasonal and the residual constituents of both the counter types
(read, write) to obtain the respective scores.
2) The Metrics: Peak to mean (PtM): The amplitude of (a)
the burst can digress the burst score in the cases where a)
peak is not appearing frequently (peak inter-arrival time being
high); b) peak appears but doesn’t last longer (width of the
peak). Yodea considers the ratio of maximum observed value
of the burst to the mean of the counter series as the metric.
Coefficient of Variation (CV): This metric establishes the
notion of a relative variation. CV is the ratio of standard
deviation to the mean of the time series. This way, the two (b)
time series with distinct mean and peak values can be directly
Fig. 4: Two instances of periodic burst scores, occasional
compared.
predictive bursts ranked higher than constant low amplitude
Burst Percentage (Bp): This metric denotes the amount of
bursts
time spent in a burst. This further separates the two candidate
workloads on the basis of the overall spread of the bursts.
As the burst percentage increases, the ratio of peak to mean a − periodic burst scorei is computed similarly. The
reduces. The two metrics therefore balance the overall score. periodic and a-periodic burst scores are values between 0 and
Burst Inter-arrival time (BIaT): This metric favours work- 4 whereas the trend score is between 0 and 1.
loads where the bursts are well separated out vs. the cases
where the bursts and non-bursts are localized. In both these C. Finding the Dominant Pattern
cases the, PtM and Bp would favour both, and BIaT metric It is quite unlikely that a given raw counter time series
would bring in the differentiation. exhibits a trend, a periodic as well as an a-periodic burst
The final score is a combination of these four metrics and is pattern. For e.g. in Figure 2, the significant portion of the
calculated for both, the seasonal and the residual constituents. raw counter time series seems to be coming from the seasonal
3) The Score: Putting it All Together: Assume that there component. To obtain that knowledge, Yodea, extracts the most
are a total of N workloads which are under Yodea evaluation. dominant dimension for the raw counter time series (read,
Each of the four metrics are computed independently for the write). Using the variance (σ 2 ) calculation over the respective
seasonal and the residual constituents. Yodea also computes time series, weights of the constituents are obtained. For
the trend metrics via regression line fitting. example, periodic burst score weighti can be calculated
Yodea normalizes each of the four metric values by the max- as:
imum (max) value of the corresponding metric from workloads periodic burst variancei
under evaluation. When normalizing the two counter types are raw time series variancei
considered separately. For example, normalized PtM value for Variance for the trend and the a-periodic scores are sim-
a seasonal component (normalized P tM periodici ) for a ilarly calculated. The constituent with the highest weight is
workload i is: considered dominant for the corresponding raw time series.
P tM periodici
max(P tM periodici )f or i f rom 1 to N D. Workload Ranking
Similar transformation is applied to other metrics as well. Each workload is tagged with two scores, one for each of
The respective normalized metric values are between 0 and the counter types. Each score corresponds to its respective
1. This normalization by max is a conscious decision. There dominant component. Workloads are ranked on a counter type
is no absolute measure of fitness for the cloud. For an in the descending order of the score. The workload with
enterprise given the CIO budget, only a few applications can highest score is considered to be most suitable for the cloud
be migrated to the cloud. Relative ranking helps to break and for a particular pattern that it is best described for. We
the ties. The final periodic burst score for a workload i is presently do not have a way to combine the read and write
calculated as: scores.
periodic burst scorei = normalized P tM periodici +
normalized CV periodici + normalized Bp periodici + E. Related Work
normalized BIaT periodici Yodea approach has been inspired from [5]. From the tech-
nique perspective, the nearest work to ours is [6]. The approach
19
(a)
Fig. 6: Top trend pattern for a datalake workload
B. A-Periodic Bursts
Figures 5a and 5b, represent the respective high and low
scored instances of volumes in a-periodic burst category for
read operations. Infrequent high amplitude bursts are scored
(b) higher than not-so frequent low amplitude bursts. Further
analysis of the volume in Figure 5a, revealed that this volume
Fig. 5: Two instances of a-periodic burst scores, infrequent hosted a media application which had unpredictable active
high bursts are scored higher than not-so frequent low ampli- peaks. A number of other volume’s dominant scores (were
tude bursts also a-periodic) matched closely with this volume and were
also a part of the same media application.
C. Trend/High Growth
claims to provide an effective way to increase utilization in Figure 6 shows a volume which scored much higher on the
the data centers. Our work differs and also complements this trend metric with respect to the other volumes. While more
work in many ways, a couple being: 1) The mentioned paper than 98% of the other volumes indicated a flat trend. Drill
forecasts trends as it observes the facts for a limited period down analysis revealed that this volume was a part of the
of time whereas Yodea works on long term trends. 2) Yodea customer’s datalake cluster where the growth pattern is trivial.
comes up with workload rankings based on patterns. These In all the above three cases, Yodea analysis has to be
metrics are complementary and can be directly fed to the tool repeated on the compute pattern as well to confirm the oppor-
discussed in [6]. tunity and rule out the need for better storage configuration.
Also, the volumes under comparison in Figures 4b and 5b
III. VALIDATION were ranked much lower and were not recommended for the
cloud.
Our preliminary analysis is based on the read counter data
IV. C ONCLUSION AND F UTURE W ORK
available at the storage volume level which can be termed as
a workload. Our goal is to first verify the pattern classification Paper motivates the need for workload pattern analysis
technique of Yodea. A volume is a logical storage container. as one of the precursor to earmarking applications for the
Multiple volumes could be mapped to a single application. For cloud. Yodea defines statistical metrics over time series data
the purpose of validation, we selected about 4,565 volumes to extract workload pattern properties. It classifies workloads
belonging to a single customer. For each volume, the total into patterns, and recommends the top ranked as cloud fit. We
number of 4KB reads done per hour over 12 weeks from 1st would like to extend our present validation to compute patterns
of September 2017 was considered as a workload. This data and build an end-to-end recommender tool for self-help.
was obtained via NetApp® AutoSupport® [4]. R EFERENCES
[1] 5 key essentials of cloud workloads migration.
A. Periodic Bursts http://www.iamondemand.com/blog/5-key-essentials-of-cloud-workloads-
migration/.
[2] Cloud management blog. https://www.rightscale.com/blog/enterprise-
Volume in Figure 4a scores much higher than the one in cloud-strategies/identifying-workloads-cloud.
Figure 4b though both volumes are dominated by periodic [3] Insight in the age of digital disruption. https://451research.com/.
bursts. Reason being twofold: 1) higher burst amplitude when [4] NetApp, inc. proactive health management with AutoSupport. http:
//media.netapp.com/documents/wp-7027.pdf.
compared with the burst threshold and; 2) burst periods are [5] R. B. Cleveland, W. S. Cleveland, J. McRae, and I. Terpenning. Stl:
well spread out and distinct in Figure 4a. Manual analysis of A seasonal-trend decomposition procedure based on loess. Journal of
the volume in Figure 4a indicated that, this was an oracleTM Official Statistics, pages 3–73, 1990.
[6] Y. Zhang, G. Prekas, G. M. Fumarola, M. Fontoura, I. Goiri, and
database’s data volume which might be subjected to periodic R. Bianchini. History-based harvesting of spare cycles and storage
application bursts. The same analysis has to be repeated on in large-scale datacenters. In 12th USENIX Symposium on Operating
the compute demand to re-confirm the pattern. Systems Design and Implementation (OSDI 16).
20