You are on page 1of 7

18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing

Tuning Logstash Garbage Collection for High


Throughput in a Monitoring Platform
Dong Nguyen Doan Gabriel Iuhasz
Computer Science Department, Computer Science Department,
West University of Timişoara , Romania West University of Timişoara, Romania
Email: dong.nguyen10@e-uvt.ro Email: iuhasz.gabriel@info.uvt.ro

Abstract—The collection and aggregation of monitoring data DICE1 is a project that aims at developing a quality
from distributed applications are an extremely important topic. engineering tool chain offering simulation, verification and
The scale of these applications, such as those designed for Big architectural optimization for Big Data applications [3]. DICE
Data, makes the performance of the services responsible for
parsing and aggregating logs a key issue. Logstash is a well- monitoring platform (named Dmon) has responsibility for
known open source framework for centralizing and parsing collecting, storing and serving data collected from Big Data
both structured and unstructured monitoring data. As with technologies to other DICE tools. Dmon uses the ELK stack
many parsing applications, throttling is a common issue due as its core services as this allows the processing and ingestion
to the incoming data exceeding Logstash processing ability. of unstructured logs as well as satisfying most if not all Dmon
The conventional approach for improving performance usually
entails increasing the number of workers as well as the buffer requirements such as scalability, reliability and timeliness [4].
size. However, it is unknown whether these approaches might For Elasticsearch, there are some work on improving per-
tackle the issue when scaling to thousands of nodes. In this formance including tips, guides and even profiling the Java
paper, by profiling Java virtual machine, we optimize Garbage virtual machine (JVM) [5]–[8]. Logstash plays a significant
Collection in order to fine tune a Logstash instance in DICE role in the ELK stack as well. It handles the incoming data and
monitoring platform to increase its throughput. A Logstash
shipper simulation tool was developed to transfer simulated data forward processed data to Elasticsearch. Hence, Logstash has
to the Logstash instance. It is capable of simulating thousands a big effect on the whole stack throughput. To the best of our
of monitored nodes. The obtained results show that with our knowledge, there are some guides to improve the performance
suggestion of minimizing Garbage Collection impact, the Logtash by tuning the filter configuration [9] and increasing number
throughput increases considerably. of workers and buffer size [10]. However, the tuning Logstash
Index Terms—performance tuning, big data, distributed mon-
itoring, logstash
for high throughput in term of optimizing Garbage Collection
(GC) has not been reported yet.
For Dmon, most of the filter configuration used in Logstash
I. I NTRODUCTION instances is regular expressions of grok filter, so the instances
demand high CPU cycles. As a result, there is a requirement
Big Data tools and services are ever more important both to reduce other impact on CPU utilization.
in academic and business applications. This wide adoption Logstash is implemented using JRuby, so it runs on JVM.
means that the problems present in these technologies need To date, researchers have conducted experiments on JVM and
to be addressed. One of most urgent challenge is the log concluded that the major factor of reduction in throughput of
management and analytics. The logs vary both in content and Java applications is the GC [11], [12]. Therefore, we have
format, or even in sources. Thus, there is no single tool to looked deeply into Logstash JVM to minimize GC impact on
process heterogeneous logs. the application.
Logstash is a well-known and wide-used open source frame- Tuning the GC for Java applications is a tedious task and
work for centralizing and parsing a large variety of both various techniques have been proposed in the last decades.
structured and unstructured data types. Logstash is a part of We are interested in particular to find a suitable configuration
ELK stack which is comprised of Elasticsearch, Logstash and for GC in order to increase the throughput of the Logstash
Kibana. The ELK stack is a powerful tool, providing a com- instances. The approach that we considered is to collect
pleted solution for log management and analytics [1]. Elastic- and analyze the GC activities logs. The concept and the
search is based on the open source search engine Lucene [2] experiments results are exposed in this paper.
and is used for indexing and analyzing data. Logstash is used The paper is organized as follows. The next section dis-
to collect, parse monitoring data from heterogeneous sources cusses the previous work. In section III, we provides an
and feed the results of the pre-processing into Elasticksearch overview of Logstash and its scalability, the brief of Garbage
for indexing and storage. Lastly Kibana is designed for data Collectors in JVM and the algorithm to calculate the Logstash
visualization and is the standard tool for interacting with the
ELK stack. 1 http://www.dice-h2020.eu/

2470-881X/16 $31.00 © 2016 IEEE 359


DOI 10.1109/SYNASC.2016.57
throughput metric. Subsequently, section IV describes the data to be written in a large variety of formats ranging from
methodology and Section V is about the implementation and json to Elasticsearch.
experimental setup. Section VI discusses the obtained results Logstash processing pipeline: From Logstash version
and finally, the conclusion and future work is in section VII. 2.2, output and filter are processed in the same threads. The
execution model as follows:
II. P REVIOUS WORK • Input threads: Each input{} statement in Logstash con-

Tuning of the GC for Java applications is not a new field figuration file is considered as its own thread. The input
of study, hundreds of papers have been published dealing threads write events in a queue (default size is 20). This
with this problem. Guan at el. [13] have investigated the queue will transfer events to worker threads, blocking if
effects of three nursery sizing policies on performance: GC worker threads are busy.
ergonomics, Fixed Ratio and Heap Availability policy. They • Worker threads: Each worker thread takes a batch of

have developed a hybrid policy which enables them to improve events off the queue, placing events in a buffer (default
performance by switching between policies efficiently. Brecht size is 125). It then run the batch of events based on
at al. [14] have examined the impact of different heap size on the configuration file and write through the outputs. The
applications execution time, GC pause times and footprint. number of workers and buffer size per each worker can
Based on the results, they have proposed an approach to be configured.
control the GC and heap size. Velasco at al. [15] have Scaling Logstash: Logstash can be scaled from stan-
introduced a technique to reorganize dynamically the heap in dalone instance to clusters [20]. The basic architecture is a
generational collector. They also have presented two strategies Logstash instance connected directly to an ElasticSearch in-
to choose the percentage of reserved space. These techniques stance. However, there can be any number of distinct logstash
improve execution times considerably. Singer et al. [16] have instances serving the same Elasticsearch instance. In fact each
implemented a framework for auto tuning GC for Map Reduce. Logstash instance is configurable so that it adhere to one
The framework uses decision trees to choose a suitable GC archetype only, meaning that it can be only used for input,
policy for each MRJ program. The chosen GC policy includes filtering or output [10]. Data loss prevention plays an important
GC algorithm (Serial, Parallel or Concurent-Mark-Sweep) and role in any monitoring deployment. When the incoming data
new-to-old generation ratios (ratio of 1:2 or 1:8). P. Lengauer to the pipeline is higher than its consumption ability, it will
at el [17] have proposed a techniques to tune the GC by finding lead to data loss. Therefore, a message broker can be used in
the optimal GC parameters using a black box optimization front of Logstash instances. The message broker is able to hold
method. Their algorithm uses ParamILS algorithm performing the events while Logstash is feeding data into ElasticSearch
iterated local search. Joshi [18] has provided guidelines to [1]. Currently there are a wide array of technologies which
tune not only software but also hardware stack such as OS can fulfill the task of a message broker: Redis [21], ZeroMQ,
configuration, BIOS configuration and JVM configuration for RabbitMQ, Kafka [22] etc.
Apache Hadoop framework. Blackburn at el [12] have inves- As seen in Fig. 1, we used two archetypes for Logstash
tigated the cost of three GC algorithms on different heap size instances:
and architecture by comparing these GC algorithms. Ghike • Logstash Shippers: receive data from data source and
[19] has given instructions in tuning GC for high throughput store to message queues
and low latency Java applications by experimenting on G1 • Logstash Indexer: retrieve events from queues, use filters
collector. and write the processed data to ElasticSearch.
The above mentioned work has been invaluable in deciding
how we should tackle the problem of increasing the throughput
of a Logstash instance. We started by profiling the JVM
collecting and analyzing GC activity logs. This formed the
basis of our search for a suitable configuration for GC that
enables us to increase the throughput of a Logstash instance.

III. BACKGROUND
A. Logstash Overview
Logstash is an open source data collection engine with near
real-time pipelining capabilities.
The Logstash pipeline consists of 3 main components. First
we have the input which enables the collection of logs in a Fig. 1. ELK architecture with message broker [1]
large variety of formats such as: files, TCP/UDP, Graphite,
lumberjack etc. Second, we have the filter plugins which Both the indexer and Shipper can be easily scale because
enable Logstash to execute transformation on the input data. they are in essence stateless requiring only the Elasticsearch
Lastly the output plugin allows the processed and transformed endpoints.

360
B. Garbage Collectors in JVM w and top [28]. The exponential smoothing equation is given
Arguably the most interesting features of Java is the au- by:
tomatic memory management. This means developers do not Y (t) = Y (t − 1) + α[X(t) − Y (t − 1)], (1)
need to handle the memory used by objects. In the HotSpot where X(t), Y (t − 1) and Y (t) are raw input, previous
JVM that function is performed by a Garbage Collector. The smoothing value and current value, respectively.
Garbage Collector is responsible for allocating memory for For Logstash, the raw input X(t) is calculated as:
objects, reclaiming memory of objects no longer in use and
keeping referenced objects in memory. GC is the process of X(0) = 0, (2)
finding and freeing objects no longer reachable [23]. N(events)
There are four types GC in JVM [23]–[25]: Serial Collector, X(t) = . (3)
Δt
Parallel Collector, Concurrent Mark Sweep Collector and where Δt = 5 seconds and N(events) is the number of events
Garbage First Collector (G1). in each 5 seconds, so X(t) is equivalent to average of events
The Serial Collector perform minor and major collections in 5 seconds.
in serial way with a single thread. It is recommended that the If the variation of X(t) is small, the value of Y (t) will
serial collector is used in programs with a small heap size coverage to a limitation and then fluctuate around that point.
(bellow 100M). The 1-minute rate will achieve the limited value more quickly
The Throughput Collector uses multiple threads for minor than 5-minute and 15-minute rate. The fluctuation is depended
and major collections to take advantage of multi-CPUs on on the homogeneous data.
machines. Therefore, it is known as Parallel Collector. How-
ever, minor and major collection using Throughput Collector IV. METHODOLOGY
is still causes a stop-the-world scenario. The old generation is In this section, we describe the method to tune Logstash for
compacted during major collection. This collector outperform high throughput.
the serial one. According to [11], the overall execution time of an appli-
The Concurrent Mark and Sweep Collector (CMS) aims cation (T) can be described by expression:
at low pause collections. The CMS Collector collects Young
Generation (YG) in the same way to the Throughput Collector. T = Tapp + Tgc + Tjit + Tsync , (4)
However, this collector does not stop application threads to where Tapp is the time for application itself; Tgc is the time
perform major collections. It uses background threads to find spent on GC; Tjit is the time for compilation and finally, Tsync
and free unused objects in old generation. Besides, the CMS is for synchronization time.
Collector does not compact the old generation, so it could For high application throughput, Tgc , Tjit and Tsync have
make the heap to be fragmented. to be minimized in order to increase Tapp . With GC time, it
The G1 Collector [25] is the newest collector. It aims to has a strong relation to Heap size, Heap layout and Garbage
low pauses for application with a large heap (greater than 4G Collection Algorithms. Consequently, if we consider GC time
of RAM). as a function, the function can be represented as follow:
A Logstash instance comes with a CMS Collector as the
default collector. The CMS Collector reduces pause time in Tgc ⇐ F (S, L, A), (5)
processing old generation but needs extra CPU cycles. A where S, L, A are as parameters representing for the Heap
Throughput Collector brings a high throughput for application Size, Heap Layout and Garbage Collection Algorithms, re-
but takes a long pause on account of Full GC. In theory, for spectively. As Eq. 5, each parameter has a big impact on
applications demanding high CPU utilization, the Throughput GC time. Changing the value of a parameter will lead to the
Collector should be used which yields better performance, change of Tgc . Therefore, we investigate on GC characteristic
and using the CMS collector is for applications requiring low of Logstash with different input parameter values in order to
pauses or short response time in detriment to the application’s optimize Tapp .
throughput [24]. However, it is unknown the effect of CMS In our experiment, we conduct with different heap sizes
Collector on application throughput; thus, we will compare from 1G to 4G and consider two Garbage Collectors: CMS
CMS and Parallel collector effect on Logstash. collector and Parallel Collector. For the heap layouts (ratio
of Young Generation to Old Generation), we choose the ratio
C. Exponential weighted moving average (EWMA) with value of one and two.
The exponentially weighted moving average (EWMA) is To reduce Tjit and Tsyn , we consider the flags
a statistic with the characteristic that it gives less and less to optimize JVM such as -XX:+AggressiveOpts and -
weight to data as they get older and older [26]. EWMA is XX:+UseFastAccessorMethods. These flags have a positive
used commonly in the financial field. However, the algorithm effect on the most of Java applications, but not all; therefore,
to calculate the load average in UNIX and GNU/Linux is it is needed to conduct experiment.
Exponential Smoothing, a basic form of EWMA [27]. This As stated in previous section, the Logstash throughput in
algorithm is used in some UNIX commands such as: uptime, each experiment are compared by using EWMA algorithm.

361
V. I MPLEMENTATION AND E XPERIMENTAL S ETUP TABLE I
GC FLAGS FOR ANALYZING ( BASED ON [29])
This section describes our simulation tool, setting experi-
ments and measurement metrics with the goal of creating a Flags Description
-XX:+PrintGCDetails Print details at garbage collec-
high throughput configuration for a logstash instance for use tion
in Dmon. -XX:+PrintGCTimeStamps Print timestamps at garbage
collection
A. Logstash Shipper simulation tool -XX:+HeapDumpOnOutOfMemoryError Dump heap to file when
java.lang.OutOfMemoryError
We conduct experiments with a Logstash instance under is thrown
heavy load condition. The Logstash instance uses the filter -XX:+PrintCodeCache Print detailed info on the com-
configurations generated by Dmon. A Redis server is used as piled code cache when exiting
-XX:PrintFlagsFinal Prints list of all available java
a buffer in front of the Logstash filter instance. There is no parameters
output plugin to write data out to Elasticsearch in the instance
because we want to improve Logstash performance with filter
TABLE II
plugin. Future work will focus on a more holistic approach in JVM FLAGS FOR PARALLEL C OLLECTOR
which the Logstash instance will feed the processed data into
Flags Description
Elasticsearch.
-XX:+UseParallelGC Enable the use of parallel garbage collector.
In Dmon, monitored nodes send metrics to Logstash Ship- -XX:+UseParallelOldGC Enable parallel garbage collector. This flag
pers. The shippers feed data to the Redis server via output is enabled automatically when enabling -
plugins. Therefore, to simulate the shippers, a simulation XX:+UseParallelGC
-XX:-UseAdaptiveSizePolicy Disables the use of adaptive generation siz-
tool is developed to mimic Logstash shipper behaviors. The ing. This is enable by default
dummy data are the metrics collected from monitored nodes.
Each dummy payload will mimic the metric format received
by Dmon during normal operation. After constructing into GC performance metrics: are used as follows:
json format, the dummy data are transferred to the buffer • Throughput - % of total time spent in application running
repeatedly. Each shipper is developed as a thread, in which, a time.
Redis connection is used to communicate to the Redis buffer. • GC Overhead - % of total time spent in GC
For input plugin of the Logstash instance, we use list as • Collection frequency - how often Collection happens.
data type in Redis and the key is specified as ”logstash”. The • GC pause - Application stop time to perform each col-
number of input threads is set equally to the number of CPU lection
cores to get the best performance [9]. The example of input • Promotion rate - the amount of data promoted from
configuration as follows: Young Generation to Old Generation per time unit.

Listing 1. Logstash Redis input configuration C. Experimental


redis{
h o s t => ” i p r e d i s h o s t ” Our experimental topology was comprised of 2 VMs and
p o r t => ” r e d i s p o r t ” # d e f a u l t : 6379 a physical machine; the first VM hosting the Simulation Tool
d a t a t y p e => ” l i s t ”
key => ” l o g s t a s h ” and the second hosting the Redis Server while the physical
c o d e c => ” j s o n ” machine hosting the Logstash instance.
t h r e a d s => 4
} As stated in the previous section the filter configuration
was generated by the Dmon platform and the statically
To avoid starvation of the filter instance due to insufficient loaded into the Logstash instance. Swappiness (by modifying
incoming data, we set length of the list in Redis to a threshold. vm.swappines) was set to zero.
When the length value reach to the threshold, the simulation With the metric filter configuration, flush interval value is
tool will stop sending data. If the length value is below the configured as 60s. To ensure that we collect accurate values,
threshold, the simulation will trigger the sending method. the value of clear interval is configured as 1800s (30 minutes).
The JVM flags shown in Table I where used in order to
B. Measurement metrics obtain GC logs.
When analyzing GC logs, we measure GC throughput, The flags in Table II and Table III are used to enable Parallel
minor GC interval, average promotion per collection. With Collector and CMS Collector.
the Logstash filter, we use metric filter plugin to measure its We ran 1 hour long warm-up experiments in order to ensure
throughput. This plugin is light and has low impact on system. a steady code cache. Then, we perform Full GC using the
Logstash metrics: It is possible to measure total count jcmd tool. We collect the GC activities logs for the next 30
of events, rate of events in 1-minute, 5-minute and 15-minute minutes of experiment. Then, we analyze the GC activities
by using metric filter plugin. The meter in metric plugin uses to decide which JVM flags should be changes and then re-
EWMA. We use this metric to compare Logstash throughput run the experiments with the new flag values. After some
in each experiment. experiments we choose the most suitable JVM flags for the

362
TABLE III
JVM FLAGS FOR CMS C OLLECTOR
·104

Flags Description
-XX:+UseConcMarkSweepGC Enables CMS collector for
1.2

The 5-minute rate of events


old generation.
-XX:+UseParNewGC Enables parallel threads to
perform new generation
1
-XX:+UseCMSInitiatingOccupancyOnly Enables the use of the oc-
cupancy value as the only 0.8
criterion for initiating the
CMS collector.
-XX:CMSInitiatingOccupancyFraction=percent Set the percentage of old 0.6
generation occupancy at
which to start a CMS col-
lection cycle. In Logstash it 0.4
is set to 75%. Ratio=1 with -XX:+AgressiveOpts flag
0.2 Young-and-Old generation ratio=1
Baseline configuration
·104
0
0 5 10 15 20 25 30
1.2 Time (minute)
The 5-minute rate of events

1 Fig. 3. Logstash throughput in 3 experiments

0.8
B. Tuning Logstash GC for DICE monitoring platform
0.6 CMS with 1G heap size 1) Analyzing result of with baseline configuration: We set
CMS with 2G heap size Logstash to 2G of Ram and use Parallel Collector with JVM
0.4 CMS with 4G heap size flags as Table II.
Parallel with 1G heap size From the result in Table V, the application running time
0.2 Parallel with 2G heap size takes up 98.89% of total time. If the application running time
Parallel with 4G heap size increases more, the application throughput will also raise.
0 To reduce GC overhead, it is required to decrease GC
0 5 10 15 20 25 30
frequency and GC pause time. Increasing YG can reduce GC
Time (minute) frequency but it may lead to degraded application throughput
and accounts for longer GC pauses. Because of longer duration
Fig. 2. Comparing CMS and Parallel Collector
in minor GC, more data could be promoted and copied to
survivor spaces in each collection. As a result, this has a
Logstash instance. To report Logstash throughput, we choose negative impact on application throughput. If the promotion
5-minute rate because the rate will coverage to steady value rate does not increase much, the decrease of GC frequency
in 30 minutes and the fluctuation around the steady value is leads to increase overall application throughput. The results in
more stable than at the 1-minute rate. Table V shows that the average promotion is small, just 1.3K
per collection. That means most new objects are discarded at
VI. RESULTS each collection. Therefore, we decide to increase YG size by
A. Comparing CMS and Parallel collector using flag NewRation=1.
Furthermore, the default thread stack size on Linux 64 bit
In this section, two collectors applied for Logstash with
is 2048k, we decide to decrease this value to 512K in order
different heap sizes: 1G, 2G and 4G are compared by Logstash
to save memory.
throughput.
With Parallel Collector, the default values of initial tenuring
As Fig. 2 and table IV show that, for parallel collector, the
threshold and maximum tenuring threshold are 7 and 15,
bigger heap size yields the better the throughput of application
respectively. To reduce promotion rate to Old Generation,
as well as GC . Whereas, the throughput of Logstash decreases
these values for both flags are set to maximum value of 15.
with the increase in heap size and is lower than the one with
These flags are described in Table VI.
Parallel collector in the same heap size condition. As a result,
By analyzing compilation and metadata logs during the
we conclude that CMS collector is unsuitable for Logstash in
experiment, the metaspace takes up 42MB with a maximum
the heavy load condition. It might be a reasonable choice in
code cache size of 22MB. Therefore, we decide to initiate
the light one.
these values as shown in Table VII. With these flags in Table
In the next experiment, we set 2G of RAM for Logstash
VII, the application start-up time is reduced.
heap size because we choose this configuration to develop
2) Analyzing result with young-and-old generation ratio=1:
application in the project.
With the Young-and-Old ratio = 1 from Table VIII, the GC

363
TABLE IV
GC STATISTICS FOR CMS AND PARALLEL COLLECTOR WITH DIFFERENT HEAP SIZE

Collectors — Heap size 1G 2G 4G


GC Throughput Average GC pause GC Throughput Average GC pause GC Throughput Average GC pause
CMS 97.93% 0.009s 96.95% 0.0118s 95.45% 0.02s
Parallel 97.97% 0.00742s 98.89% 0.00743s 99.6% 0.00723s

TABLE V TABLE IX
GC STATISTICS WITH BASELINE CONFIGURATION JVM O PTIMIZATION F LAGS

GC throughput 98.89% Flags Description


Full GC none -XX:+AggressiveOpts Enable point performance
Average Promotion 1,034B per collection compiler optimization
Average GC pause 0.00743s -XX:+UseFastAccessorMethods Enable optimization for Get <
Min/Max GC pause 0.00557/0.0247s P rimitive > Field
Average GC pause interval 0.72834s
TABLE X
TABLE VI GC STATISTICS WITH OPTIMIZATION FLAGS
T ENURING THRESHOLD VALUE USED FOR NEXT EXPERIMENTS
GC throughput 99.2%
Flags Default value Used value Full GC none
-XX:InitialTenuringThreshold 7 15 Average Promotion 1,407B per collection
-XX:MaxTenuringThreshold 15 15 Average GC pause 0.0080s
Min/Max GC pause 0.00653s/0.0238s
Average GC pause interval 1.084s

throughput increase to 99.27% compared to 98.89% with the


baseline because the GC frequency is reduced (GC pause
The AutoBoxCacheMax flag is set to 20,000; thus, the
interval is higher) and the GC duration is nearly the same.
performance of certain applications is improved.
Likewise, the Fig. 3 shows that the throughput after changing
The BiasedLocking flag is an optimization technique that
the heap layout is better, from 11800 events/5-minute for
biases an object to the thread last acquiring the lock. This flag
base line configuration to 12000 events/5-minute for ratio = 1
improves un-contended synchronization performance. It is on
configuration.
by default in Java SE 6 or later.
3) Analyzing result with JVM optimization flags Aggres-
The EscapeAnalysis flag is an optimization technique that
siveOpts: There are some JVM optimization flags that may
evaluates the scope of objects. By default, this feature is off.
improve application performance. We conduct an experiment
The OptimizeStringConcat flag will optimize the use of
with the flags as shown in Table IX.
StringBuilder objects.
When the AggressiveOpts flag is enabled, there are some
The EliminateLocks flag is on by default. It eliminates the
flags coming along with: AutoBoxCacheMax, DoEscapeAnaly-
unlock and relock in un-observed operations. This flag reduces
sis, UseBiasedLocking, EliminateLocks, OptimizeStringConcat
synchronization time.
and AutoFill [24] [30].
Finally, The value of BiasedLockingStartupDelay flag is set
The AutoFill flag enables some better loop optimization by
to 500 (default value is 2000). This means the biased locking
the compiler. This feature is disabled by default.
will trigger sooner.
As we show in Table X, although the GC throughput value
TABLE VII is slightly lower than the one with Young-and-Old ratio=1,
M ETA SPACE AND CODE CACHE SIZE FOR NEXT EXPERIMENTS the optimization flags have improved Logstash throughput as
shown in Fig. 3 because of optimization techniques applied
Flags Default value Used value
-XX:MetaspaceSize depends on platform 64m on JVM. The rate reach to 12150 events/5-minute compared
-XX:MaxMetaspaceSize unlimited 64m to 12000 and 11800 with ratio=1 and baseline configuration,
-XX:ReservedCodeCacheSize 240m 32m respectively.
-XX:InitialCodeCacheSize 500k 32m
VII. C ONCLUSION
TABLE VIII
GC STATISTICS WITH YOUNG - AND -O LD GENERATION RATIO = 1
In this paper we have presented the method to tune Logstash
performance with filter configuration of Dmon by optimiz-
GC throughput 99.27% ing GC. Our results by analyzing GC activities show that
Full GC none
Average Promotion 1,307B per collection
mostly created objects in Logstash are short-lived. Therefore,
Average GC pause 0.00785s increasing YG size yields better performance. Even better,
Min/Max GC pause 0.00597s/0.01312s applying JVM optimization flags also brings higher throughput
Average GC pause interval 1.08494s for Logstash.

364
As for next steps, we intend to study the impact of increas- [16] J. Singer et al., “Garbage collection auto-tuning for java mapreduce
on multi-cores,” in Proceedings of the International Symposium on
ing the number of workers and buffer size in Logstash on Memory Management, ser. ISMM ’11. New York, NY, USA: ACM,
throughput. Since, the increase of worker threads might lead 2011, pp. 109–118. [Online]. Available: http://doi.acm.org/10.1145/
to lock contention and increasing buffer size might degrade 1993478.1993495
[17] P. Lengauer and H. Mössenböck, “The taming of the shrew:
GC throughput. Therefore, it is needed to further profile and Increasing performance by automatic parameter tuning for java garbage
analyze the Logstash JVM. collectors,” in Proceedings of the 5th ACM/SPEC International
We can also use the insights and data gained from profiling Conference on Performance Engineering, ser. ICPE ’14. New
York, NY, USA: ACM, 2014, pp. 111–122. [Online]. Available:
Logstash to create a machine learning based predictive model http://doi.acm.org/10.1145/2568088.2568091
that will be able to detect event trends and using a multi- [18] S. B. Joshi, “Apache hadoop performance-tuning methodologies and
agent based self-management module autonomously enact the best practices,” in Proceedings of the 3rd ACM/SPEC International
Conference on Performance Engineering, ser. ICPE ’12. New
optimization’s detailed in this article. Thus enabling Dmon York, NY, USA: ACM, 2012, pp. 241–242. [Online]. Available:
autonomously adapt and provide the best throughput possible http://doi.acm.org/10.1145/2188286.2188323
for any given use case. [19] S. Ghike, “Garbage collection optimization for high-throughput
and low-latency java applications,” Apr. 2014. [Online]. Available:
VIII. ACKNOWLEDGMENT https://engineering.linkedin.com/garbage-collection/garbage-collection-
optimization-high-throughput-and-low-latency-java-applications
This work was partially funded from the European Unions [20] “Deploying and scaling logstash.” [Online]. Available: https:
Horizon 2020 research and innovation program under grant //www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
[21] J. L. Carlson, Redis in Action. Greenwich, CT, USA: Manning
agreement No. 644869 (DICE). Publications Co., 2013.
[22] N. Garg, Apache Kafka. Packt Publishing, 2013.
R EFERENCES [23] Memory management in the Java HotSpot Virtual Machine, Sun Mi-
crosystems, 2006.
[1] S. Chhajed, Learning ELK Stack. Packt Publishing, 2015.
[24] S. Oaks, Java Performance: The definitive guide First Edition. O’Reilly
[2] M. McCandless et al., Lucene in Action, Second Edition: Covers Apache
Media, 2014.
Lucene 3.0. Greenwich, CT, USA: Manning Publications Co., 2010.
[25] D. Detlefs et al., “Garbage-first garbage collection,” in Proceedings
[3] G. Casale et al., “Dice: Quality-driven development of data-intensive
of the 4th International Symposium on Memory Management, ser.
cloud applications,” in 2015 IEEE/ACM 7th International Workshop on
ISMM ’04. New York, NY, USA: ACM, 2004, pp. 37–48. [Online].
Modeling in Software Engineering, May 2015, pp. 78–83.
Available: http://doi.acm.org/10.1145/1029873.1029879
[4] G. Iuhasz and I. Dragan, “An overview of monitoring tools for big
[26] J. S. Hunter, “The exponentially weighted moving average,” Journal of
data and cloud applications,” in 2015 17th International Symposium on
Quality Technology, vol. 18, no. 4, pp. 203–210, 1986.
Symbolic and Numeric Algorithms for Scientific Computing (SYNASC),
[27] N. J. Gunther, UNIX load average - Part 2: Not your average average,
Sept 2015, pp. 363–366.
TeamQuest, 2010.
[5] S. Thies, “10 elasticsearch metrics to watch,” Apr. 2015. [Online]. Avail-
[28] ——, UNIX load average - Part 1: How it works, TeamQuest, 2010.
able: https://www.oreilly.com/ideas/10-elasticsearch-metrics-to-watch
[29] “Java platform, standard edition tools reference.” [Online]. Available:
[6] “Indexing performance tips.” [Online]. Available:
http://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-
[30] C. Hunt and B. John, Java Performance 1st. Addison Wesley, 2011.
performance.html
[7] J. Prante, “Elasticsearch java virtual machine settings explained,”
Nov. 2012. [Online]. Available: http://jprante.github.io/2012/11/28/
Elasticsearch-Java-Virtual-Machine-settings-explained.html
[8] C. Gormley and Z. Tong, Elasticsearch: The Definitive Guide. O’Reilly
Media, Jan. 2015.
[9] “Logstash configuration tuning.” [Online]. Available: https:
//www.elastic.co/blog/logstash-configuration-tuning
[10] “Logstash processing pipeline.” [Online]. Available: https:
//www.elastic.co/guide/en/logstash/current/pipeline.html
[11] F. Xian et al., “Investigating throughput degradation behavior of
java application servers: A view from inside a virtual machine,”
in Proceedings of the 4th International Symposium on Principles
and Practice of Programming in Java, ser. PPPJ ’06. New
York, NY, USA: ACM, 2006, pp. 40–49. [Online]. Available:
http://doi.acm.org/10.1145/1168054.1168061
[12] S. M. Blackburn et al., “Myths and realities: The performance impact of
garbage collection,” in Proceedings of the Joint International Conference
on Measurement and Modeling of Computer Systems, ser. SIGMETRICS
’04/Performance ’04. New York, NY, USA: ACM, 2004, pp. 25–36.
[Online]. Available: http://doi.acm.org/10.1145/1005686.1005693
[13] X. Guan et al., “Investigating the effects of using different
nursery sizing policies on performance,” in Proceedings of the 2009
International Symposium on Memory Management, ser. ISMM ’09.
New York, NY, USA: ACM, 2009, pp. 59–68. [Online]. Available:
http://doi.acm.org/10.1145/1542431.1542441
[14] T. Brecht et al., “Controlling garbage collection and heap growth to
reduce the execution time of java applications,” ACM Trans. Program.
Lang. Syst., vol. 28, no. 5, pp. 908–941, Sep. 2006. [Online]. Available:
http://doi.acm.org/10.1145/1152649.1152652
[15] J. M. Velasco et al., “Dynamic management of nursery space organiza-
tion in generational collection,” in Interaction between Compilers and
Computer Architectures, 2004. INTERACT-8 2004. Eighth Workshop on,
Feb 2004, pp. 33–40.

365

You might also like