You are on page 1of 49

Performance tuning

Grails applications
by Lari Hotari @lhotari
2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

"Programmers waste enormous amounts of time thinking


about, or worrying about, the speed of noncritical parts
of their programs, and these attempts at efficiency
actually have a strong negative impact when debugging
and maintenance are considered. We should forget
about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we
should not pass up our opportunities in that critical 3%."
- Donald Knuth, 1974
2

Mature performance optimisation


Find out the quality requirements of your solution. Keep
learning about them and keep them up-to-date. It's a
moving target.
Keep up the clarity and the consistency of your solution.
Don't introduce accidental complexity.
Don't do things just "because this is faster" or
someone thinks so.
Start doing mature performance tuning and optimisation
today!
3

How do we define application


performance?

2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Performance aspects
Latency of operations
Throughput of operations
Quality of operations - efficiency, usability,
responsiveness, correctness, consistency, integrity,
reliability, availability, resilience, robustness,
recoverability, security, safety, maintainability

Amdahl's law

Little's law

L = W
MeanNumberInSystem = MeanThroughput * MeanResponseTime

MeanThroughput = MeanNumberInSystem / MeanResponseTime

TL;DR
Lari's Grails Performance
Tuning Method
2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Lari's Grails Performance Tuning Method


Look for 3 things:
Slow database operations - use a profiler that shows SQL
statements
Thread blocking - shows up as high object monitor usage
in the profiler
Exceptions used in normal program flow - easy to check
in profiler
Pick the low hanging fruits
Find the most limiting bottleneck and eliminate it
Iterate
9

What are we aiming for?


The goals of Grails application tuning

2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

What's the goal of performance tuning?


The primary goal of performance tuning is to assist in
fulfilling the quality requirements and constraints of your
system.
Meeting the quality requirements makes you and your
stakeholders happy: your customers, your business
owners, and you the dev&ops.

11

Performance - Quality of operations

efficiency
usability
responsiveness
correctness
consistency / integrity /
reliability

availability
resilience / robustness /
recoverability
security / safety
maintainability

!
!
!

12

Operational efficiency
Tuning your system to meet it's quality requirements
with optimal cost
Optimising costs to run your system - operational
efficiency

13

How do you succeed


in performance tuning?
The continuous improvement strategy for performance tuning
anything

2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Performance tuning improvement cycle


Measure & profile
o start with the tools you have
available. You can add more tools
and methods in the next iteration.
Think & learn, analyse and plan
the next change
o find tools and methods to
measure something in the next
iteration you want to know about
more
Implement a change

Measure
& profile

Think
and
Learn

Performance
tuning feedback
cycle

Do
tuning
and
fixes

15

Iterate, Iterate, Iterate


Iterate: do a lot of iterations and change one thing at a
time
learn gradually about your system's performance and
operational aspects

16

Feedback from production


Set up a different feedback cycle for production
environments.
Don't forget that usually it's irrelevant if the system
performs well on your laptop.
If you are not involved in operations, use innovative
means to set up a feedback cycle.

17

More specific derived goals

2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

If your requirement is to lower latency


Amdahl's law - you won't be able to effectively speed up a
single computation task if you cannot parallellize it.
In an ordinary synchronous blocking Servlet API
programming model, you have to make sure that the use
of shared locks and resources is minimised.
Reducing thread blocking (object monitor usage) is a key
principle for improving performance - Amdahl's law
explains why.
The ideal is lock free request handling when synchronous
19
Servlet API is used.

Understand Little's law in your context

With Little's law you can do calculations and reasoning


about programming models that fit your requirements
and available resources
the traditional Servlet API thread-per-request model
could fit your requirements and you can still make it
"fast" (low latency) in most cases.

20

Cons of the thread-per-request model in the light of Little's law and


Amdahl's law

From Little's law: MeanNumberInSystem =


MeanThroughput * MeanResponseTime
In the thread-per-request model, the upper bound for
MeanNumberInSystem is the maximum for the number of
request handling threads. This might limit the throughput of
the system, especially when the response time get higher
or request handling threads get blocked and hang.
Shared locks and resources might set the upper bound to
a very low value. Such problems get worse under error
21
conditions.

Advantages of thread-per-request model


We are used to debugging the thread-per-request model
- adding breakpoints, attaching the debugger and going
through the stack
The synchronous blocking procedural programming
model is something that programmers are used to doing.
There is friction in switching to different programming
models and paradigms.

22

KillerApp for non-blocking async model


Responsive streaming of a high number of clients on a
single box
continuously connected real-time apps where lowlatency and high availablity is a requirement
limited resources (must be efficient/optimal)

23

Profiling concepts and tools

2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

JVM code profiler concepts


Sampling

statistical ways to get information about the execution using JVM


profiling interfaces with a given time interval, for example 100
milliseconds. Statistical methods are used to calculate values based
on the samples.
o

Unreliable results, but certainly useful in some cases since the


overhead of sampling is minimal compared to instrumentation

Usually helps to get better understanding of the problem if you


learn to look past the numeric values returned from measurements.

Instrumentation
o

exact measurements of method execution details

25

Load testing tools and services


Simple command line tools
wrk https://github.com/wg/wrk
modern HTTP benchmarking tool
o has lua scripting support for doing things like
verifying the reply
Load testing toolkits and service providers
Support testing of full use cases and stateful flows
toolkits: JMeter (http://jmeter.apache.org/),
Gatling (http://gatling.io/)

26

Common pitfalls in profiling Grails


Measuring wall clock time
Measuring CPU time
Instrumentation usually provides false results because
of JIT compilation and other reasons like spin locks
lack of proper JVM warmup
Relying on gut feeling and being lazy

27

Ground your feet


Find a way to review production performance graphs regularly,
especially after making changes to the system
system utilisation over time (CPU load, IO load & wait, Memory
usage), system input workload (requests) over time, etc.
In the Cloud, use tools like New Relic to get a view in operations
CloudFoundry based Pivotal Web Services and IBM Bluemix
have New Relic available
In the development environment, use a profiler and debugger to
get understanding. You can use grails-melody plugin to get
insight on SQL that's executed.
28

Grails - The low hanging fruit

Improper JVM config


Slow SQL
Blocking caused by caching
Bad regexps
Unnecessary database transactions
Watch out for blocking in the Java API: Hashtable

29

Environment related problems


Improper JVM configuration for Grails apps
out-of-the-box Tomcat parameters
a single JVM running with a huge heap on a big box
o If you have a big powerful box, it's better to run
multiple small JVMs and put a load balancer in front
of them

30

Example of proper Tomcat config for *nix


Create a file setenv.sh in tomcat_home/bin directory:

!
1
2
3
4
5
6
7
8
9
10
11
12

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_60
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
CATALINA_OPTS="$CATALINA_OPTS -server -noverify"
CATALINA_OPTS="$CATALINA_OPTS -XX:MaxPermSize=256M -Xms768M -Xmx768M" # tune heap size
CATALINA_OPTS="$CATALINA_OPTS -Djava.net.preferIPv4Stack=true" # disable IPv6 if not used
# set default file encoding and locale
CATALINA_OPTS="$CATALINA_OPTS -Dfile.encoding=UTF-8 -Duser.language=en -Duser.country=US"
CATALINA_OPTS="$CATALINA_OPTS -Duser.timezone=CST" # set default timezone
CATALINA_OPTS="$CATALINA_OPTS -Dgrails.env=production" # set grails environment
# set timeouts for JVM URL handler
CATALINA_OPTS="$CATALINA_OPTS -Dsun.net.client.defaultConnectTimeout=10000
-Dsun.net.client.defaultReadTimeout=10000"
13 CATALINA_OPTS="$CATALINA_OPTS -Duser.dir=$CATALINA_HOME" # set user.dir
14 export CATALINA_OPTS
15 export CATALINA_PID="$CATALINA_HOME/logs/tomcat.pid"

31

JVM heap size


Assumption: optimising throughput and latency on the cost of
memory consumption
set minimum and maximum heap size to the same value to
prevent compaction (that causes full GC)
look at the presentation recording of the "Tuning Large scale
Java platforms" by Emad Benjamin and Jamie O'Meara for more.
rule in the thumb recommendation for heap size: survivor
space size x 3...4 and don't exceed NUMA node's local
memory size for your server configuration (use: "numactl -hardware" to find out Numa node size on Linux).
32

The most common problem: SQL


SQL and database related bottlenecks: learn how to profile
SQL queries and tune your database queries and your
database
grails-melody plugin can be used to spot costly SQL
queries in development and testing environments.
Nothing prevents use in production however there is a
risk that running it in production environment has
negative side effects.
New Relic in CloudFoundry (works for production
33
environments)

Use a non-blocking cache implemention


Guava LoadingCache is a good candidate https://
code.google.com/p/guava-libraries/wiki/CachesExplained
"While the new value is loading the previous value (if any)
will continue to be returned by get(key) unless it is evicted.
If the new value is loaded successfully it will replace the
previous value in the cache; if an exception is thrown while
refreshing the previous value will remain, and the exception
will be logged and swallowed." (http://docs.guavalibraries.googlecode.com/git-history/release/javadoc/com/
google/common/cache/LoadingCache.html#refresh(K))

34

Some regexps are CPU hogs

https://twitter.com/lhotari/status/474591343923449856

35

Verify regexps against catastrophic backtracking

Verify regexps that are used a lot


use profiler's CPU time measurement to spot
search for the code for candidate regexps
Use a regexp analyser to check regexps with different input size
(jRegExAnalyser/RegexBuddy).
Make sure valid input doesn't trigger "catastrophic backtracking".
Understand what it is.
http://www.regular-expressions.info/catastrophic.html
"The solution is simple. When nesting repetition operators, make
absolutely sure that there is only one way to match the same
match"

37

Eliminate unnecessary database transactions in Grails

should use "static transactional = false" in services that


don't need transactions
Don't call transactional services from GSP taglibs (or
GSP views), that might cause a large number of short
transactions during view rendering

38

JDK has a lot of unnecessary blocking


java.util.Hashtable/Properties is blocking
these block:
System.getProperty("some.config.value","some.default
"), Boolean.getBoolean("some.feature.flag")
Instantiation of PrintWriter, Locale, NumberFormats,
CurrencyFormats etc. , a lot of them has blocking
problems because System.getProperty calls.
Consider monkey patching the JDK's Hashtable class:
https://github.com/stephenc/high-scale-lib
39

Misc Grails tips


Use singleton scope in controllers
grails.controllers.defaultScope = 'singleton'
default for new apps for a long time, might be
problem for upgraded apps
when changing, make sure that you previously didn't
use controller fields for request state handling (that
was ok for prototype scope)
Use controller methods (replace closures with
methods in upgraded apps)

40

Tools for performance environments

2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Simple inspection in production environments


kill -3 <PID> or jstack <PID>
Makes a thread dump of all threads and outputs it to
System.out which ends up in catalina.out in default
Tomcat config.
the java process keeps running and it doesn't get
terminated

42

Java Mission Control & Flight Recorder


Oracle JDK 7 and 8 includes Java Mission Control since
1.7.0_40 .
JAVA_HOME/bin/jmc executable for launching the client
UI for jmc
JMC includes Java Flight Recorder which has been
designed to be used in production.
JFR can record data without the UI and store events in
a circular buffer for investigation of production
problems.
43

JFR isn't free


JFR is a commercial non-free feature, available only in
Oracle JVMs (originally from JRockit).
You must buy a license from Oracle for each JVM using
it.
"... require Oracle Java SE Advanced or Oracle Java
SE Suite licenses for the computer running the
observed JVM" , http://www.oracle.com/technetwork/
java/javase/documentation/java-se-producteditions-397069.pdf , page 5

44

Controlling JFR
enabling JFR with default continuous "black box"
recording:
export _JAVA_OPTIONS="-XX:+UnlockCommercialFeatures

-XX:+FlightRecorder
-XX:FlightRecorderOptions=defaultrecording=true"

Runtime controlling using jcmd commands


help for commands with
jcmd
jcmd
jcmd
jcmd

<pid>
<pid>
<pid>
<pid>

help
help
help
help

JFR.start
JFR.stop
JFR.dump
JFR.check
45

Demo

2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

wrk http load testing tool sample output


1 Running 10s test @ http://localhost:8080/empty-test-app/empty/index
2
10 threads and 10 connections
3
Thread Stats
Avg
Stdev
Max
+/- Stdev
4
Latency
1.46ms
4.24ms 17.41ms
93.28%
5
Req/Sec
2.93k
0.90k
5.11k
85.67%
6
Latency Distribution
check latency, the max
7
50% 320.00us
and it's distribution
8
75% 352.00us
9
90% 406.00us
10
99%
17.34ms
11
249573 requests in 10.00s, 41.22MB read
12
Socket errors: connect 1, read 0, write
0, throughput
timeout 5
Total
13 Requests/sec: 24949.26
14 Transfer/sec:
4.12MB
https://github.com/lhotari/grails-perf-testapps/empty-test-app

47

Questions?

2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Thanks!
Lari Hotari @lhotari
Pivotal Software, Inc.
2014 SpringOne 2GX. All rights reserved. Do not distribute without permission.

You might also like