You are on page 1of 26

Performance Tuning

Java Applications
Dave Scruggs
Systems Engineer
Outline

1. General overview
2. Suggested toolkit and procedures
„ Demo 1—Temporary Objects
3. Compilers and the JVM
„ Demo 2—Memory Leaks
4. Networking and I/O
„ Demo 3—non-blocking I/O
5. Threading
„ Demo 4—CPU Bottlenecks
Know your target
„ What kind of application are you writing?
ƒ Embedded, Workstation, Workgroup, Intranet, or Enterprise?
ƒ Personalized or Generalized?
„ What is your user audience?
ƒ Technically Sophisticated?
ƒ Impatient?
„ What is your data like?
ƒ Detailed, hard to generate reports or simple data?
ƒ Persistent or volatile?
ƒ Order-entry or decision support?
„ How are you interacting?
ƒ Synchronous or asynchronous?
Why doesn’t my application perform?
„ Political Constraints (Boss, Client requires it “done a certain
way)
„ Infrastructure—both the infrastructure you have and what you
don’t.
ƒ Memory, CPU, OS, Disk Space
ƒ Directory Servers, Databases, Legacy Applications
ƒ Network
ƒ Application Server
„ Time—you have limited resources
„ Money
„ Third-party technologies
„ JVM and/or JDK
„ Compiler
How It Doesn’t Perform
„ Scalability: “It worked great for the first 50 users…”
ƒ CPU Utilization
ƒ Memory Utilization
ƒ I/O Utilization
„ Reliability: “Where did it go?”
„ Latency: “It works great, once it gets started…”
„ Throughput: “How long, on average, does it take to do ‘x’?”
ƒ Algorithm Design
ƒ Architecture
What to Do…
„ Political Constraints:
ƒ Work within constraints
ƒ get a friend who knows how to work these situations
„ Infrastructure:
ƒ Test software on the systems you’ll be using
ƒ Identify weak portions of technology and avoid them
ƒ Tune your code
„ Time—get good tools, choose methodology well
„ Money—see “Political Constraints”
„ Third-party technologies
ƒ Tune your code
„ JVM, JDK, and Compiler—choose faster option, if supported and
available
A Tuner’s Toolkit
„ Packet Sniffer
„ Profiler (Code coverage, Memory profiler, Thread profiler)
„ Unit tests
„ Load tester
ƒ JMeter
ƒ java.awt.Robot
ƒ Expect (if you can use Tcl)
ƒ Mercury, Silk, other commercial packages
„ Network, Architecture, and Class Diagrams
„ Other environments (OS, Foreign languages, AppServer,
VMWare, Browsers…)
Performance Testing Workflow

start analysis design develop

unit test

O.K.
no
yes
perf test

profile perf o.k.?


yes
stop
no
Temporary Objects
Profiling Demo
What Is a JVM?
„ Can be thought of as a “processor on top of a processor.”
ƒ Has a program counter, stacks, method memory space, and
heap space.
ƒ Direct support for the following types: int, long, float, double,
“references”, and return addresses (hidden, no pointers).
ƒ Partial support for boolean, byte, char, and short—partially
treated as integers.
ƒ Return values are kept in “frames”, which are allocated and
deallocated as methods run and complete.
„ Distinct from the class libraries supplied with the JDK or java
runtime.
„ Native methods throw a “twist” into how the JVM “hardware”
values are interpreted--native method “stacks” may be handled
differently by different VM’s (or not supported at all).
JVM Diagram

ir heap
Execution Engine
(“Clock”, ALU, ...) pc bne …
… …
Thread-local
dup …
native
stack stack … …
… … … …
… … … …
… … … …
JVM Characteristics to Watch Out for
„ Many JVM’s (including sun’s) allow the heap to grow. Although
garbage collection frees space in the heap, the heap itself
doesn’t shrink. Consequently, creating a lot of objects can steal
memory away from other processes that can’t be reclaimed
without killing a java process.
„ There are two “sizes” of objects that go on the stack—int width
and long width. Variables of long and double width take up two
entries on the stack, and take twice as long to hop on or off. The
short, boolean, byte, references, and char variables take up an
int entry—the rest of the space (if any) is wasted.
„ Frames hold local variables, incoming operands, and return
values. Again, doubles and longs take up twice as much room.
A Very Simple Class
package hello; public void setUsername(String
setUsername(String username) {
this.username = username;
/** }
* This is a <B>very</B> simple class, just to show what it takes
takes public String getUsername()
getUsername() {
* to run a simple method. return username;
*/ }
public void sayHello(){
sayHello(){
public class SimpleHello { System.out.println
System.out.println(message
(message + " " + username + "!");
}
// private attributes public long addTwoLong(long
addTwoLong(long a, long b){
private String message; return a + b;
private String username; }
public int addTwoInt(
addTwoInt(int a, int b){
// Constructors return a + b;
public SimpleHello()
SimpleHello() { }
message = "Hello"; public static void main(String[] args)
args) {
username = "Dude"; SimpleHello simpleHello1 = new SimpleHello();
SimpleHello();
} simpleHello1.sayHello
simpleHello1.sayHello();
();
public void setMessage(String
setMessage(String message) { }
this.message = message; }
}
public String getMessage()
getMessage() {
return message;
}
How ““SimpleHello”
SimpleHello” Appears to the VM
SimpleHello Constructor SimpleHello Constructor
0 aload_0 public SimpleHello() {
1 invokespecial #1 <Method java.lang.Object()> message = "Hello";
4 aload_0 username = "Dude";
5 ldc #2 <String "Hello"> }
7 putfield #3 <Field java.lang.String message>
10 aload_0
11 ldc #4 <String "Dude">
13 putfield #5 <Field java.lang.String username>
16 return

addTwoLong addTwoLong
0 lload_1 public long addTwoLong(long
addTwoLong(long a, long b){
1 lload_3 return a + b;
2 ladd }
3 lreturn
Local variables for method long addTwoLong(long, long)
long b pc=0, length=4, slot=3
long a pc=0, length=4, slot=1
hello.SimpleHello this pc=0, length=4, slot=0
How to “Say Hello”
sayHello sayHello
0 getstatic #6 <Field java.io.PrintStream out> public void sayHello(){
sayHello(){
3 new #7 <Class java.lang.StringBuffer> System.out.println
System.out.println(message
(message + " " + username + "!");
6 dup }
7 aload_0
8 getfield #3 <Field java.lang.String message>
11 invokestatic #8 <Method java.lang.String valueOf(java.lang.Object)>
14 invokestatic #8 <Method java.lang.String valueOf(java.lang.Object)>
17 invokespecial #9 <Method java.lang.StringBuffer(java.lang.String)>
20 ldc #10 <String " ">
22 invokevirtual #11 <Method java.lang.StringBuffer
append(java.lang.String)>
25 aload_0
26 getfield #5 <Field java.lang.String username>
29 invokevirtual #11 <Method java.lang.StringBuffer
append(java.lang.String)>
32 ldc #12 <String "!">
34 invokevirtual #11 <Method java.lang.StringBuffer
append(java.lang.String)>
37 invokestatic #8 <Method java.lang.String valueOf(java.lang.Object)>
40 invokestatic #8 <Method java.lang.String valueOf(java.lang.Object)>
43 invokevirtual #13 <Method void println(java.lang.String)>
46 return
Compilers
„ Javac
„ BCJ: JBuilder’s compiler—optimizations, including obfuscation,
targeting specific releases of the JVM, and controlling the
amount of debug information provided within the classes.
„ Jikes: Provides some optimization, and a very useful “lint”
feature
„ University of Indiana’s “JAVAR”—finds parallelism in code
Memory Leak Profiling
Demo
Networking and I/O
„ JDK 1.4 has improved I/O classes (nonblocking I/O, channels)
„ Nonblocking I/O is available for earlier JDK’s at
http://www.cs.berkeley.edu/~mdw/proj/java-nbio/
„ Watch FileInputStream, FileOutputStream, and
RandomAccessFile. (FileReader and FileWriter use the streams)
„ Be careful passing collections of objects through RMI/IIOP or
CORBA—these objects are passed as “any’s”, which are heavy.
„ Minimize objects transferred—transfer only what you have to.
„ Speed generally follows this order:
1. Register Access and Cache (local primitive variables)
2. RAM (heap variables, arrays, and all classes)
3. Disk (files)
4. Network (remote calls)
How Do I Cut Down Network Traffic?
„ Keep a local (cached) copy. Value Objects work well, here.
„ Choose your technology wisely (CORBA and RMI beat SOAP).
„ Message Queues can help when the data transfer doesn’t require
synchronous connections—they won’t block.
„ Identify tight communication “communities”, and colocate that
code.
„ Use value objects for getter’s and setter’s—and reduce the
network traffic from a number of small calls to one big one.
„ Validate input data as close to the client as possible.
„ Use constants where possible (for instance “1” is easier that
“Could not get a database connection, stack trace…”).
What About Databases?
„ Use prepared statements, when possible.
„ Don’t use “SELECT *”.
„ Use the lightest transaction possible
1. TRANSACTION_NONE
2. TRANSACTION_READ_UNCOMMITTED
3. TRANSACTION_READ_COMMITTED
4. TRANSACTION_REPEATABLE_READ
5. TRANSACTION_SERIALIZABLE
„ ODBMS systems work better for pure OO data.
„ Use Connection Pools.
„ Batch queries, if possible.
Non-blocking I/O Demo
Threads
„ Even though two threads can theoretically synchronize at the
same time, the mutex locking process itself is single-threaded.
„ Prefer Maps to Hashtables, and ArrayLists to Arrays, since the
former allow a choice of synchronization.
„ Parallelizing work between multiple threads can help
ƒ If you can, keep as much code as you can within thread-local
storage, and “synchronize at the end”
ƒ Thread pools can save work for heavily loaded applications.
„ Produce working, thread-safe code first—then optimize.
„ Synchronization on its own is not enough to provide proper
code—threads have to act in the proper order.
Collection Synchronization
Vector synchronizes the entire Hashtable synchronizes the entire
object on these methods: object on these methods:
add() clear()
addAll()
addAll() clone()
clone() contains(Object obj)
obj)
containsAll()
containsAll() containsKey(Object
containsKey(Object obj)obj)
elements().nextElement
elements().nextElement()
() elements()
equals() equals(Object obj)
obj)
get() get(Object obj)
obj)
hashCode()
hashCode() hashCode()
hashCode()
remove() (calls "removeElement
"removeElement")
") keys()
removeAll()
removeAll() put(Object obj,
obj, Object obj1)
removeAllElements()
removeAllElements() putAll(Map
putAll(Map map)
set() readObject(
readObject(ObjectInputStream objectinputstream)
objectinputstream)
toArray()
toArray() remove() [synchronized(Hashtable
[synchronized(Hashtable.this)]
.this)]
toString()
toString() remove(Object obj)obj)
toString()
toString()
writeObject(
writeObject(ObjectOutputStream objectoutputstream)
objectoutputstream)

If you use a vector or hashtable for configuration data, you’re blocking other
parts of your application.
Execution Order
„ Actions performed by one thread are ordered.
„ For any single variable on the heap, access is implicitly
synchronized .
ƒ If two threads write to the same variable, one always writes
before the other.
ƒ Which thread accesses a heap variable first is undetermined
ƒ WAW, WAR, and RAW hazards are still possible.
„ Within a monitored (locked) section, access is synchronized.
„ Actions only happen once.
CPU Profiling Demo
Bibliography and References
„ Shirazi, Jack, Java Performance Tuning, O'Reilly and Associates,
Sebastapol, CA, September 2000
„ Wilson, Steve and Jeff Kesselman, Java Platform Performance:
Strategies and Tactics (First Edition), Addison Wesley, Reading, MA,
http://java.sun.com/docs/books/performance/
„ Lea, Doug, Concurrent Programming in Java, Addison Wesley, Reading,
MA, 1997
„ Lindholm, Tim and Frank Yellin, The Java Virtual Machine Specification
Second Edition, Addison Wesley, Reading, MA,1999
„ Venners, Bill, Inside the Java Virtual Machine, McGraw-Hill Professional
Publishing, December 1999
„ Alur, Deepak and John Crupi and Dan Malks, Core J2EE Patterns: Best
Practices and Design Strategies, Prentice Hall PTR, Palo Alto, CA,2001
„ Hennessy, John and David Patterson, Computer Architecture: A
Quantitative Approach, Morgan Kaufmann Publishers, Palo Alto, CA, 1990

You might also like