Garbage Collection Tuning in the Java HotSpot™ Virtual Machine

Tony Printezis, Charlie Hunt
Sun Microsystems

Trademarks and Abbreviations
> >

Java™ Virtual Machine (JVM) Java HotSpot™ Virtual Machine (HotSpot JVM)

2

Who We Are
>

Tony Printezis
GC Group / HotSpot JVM development team ● Been working on the HotSpot JVM since 2006 ● 10+ years of GC experience Charlie Hunt
● ● ●

>

Java Platform Performance Engineering Group Works with many Sun product teams and customers 10+ years of Java technology performance work
3

.And if you remember only one thing. GC Tuning is an Art! 4 ..

GC Tuning is an Art > Unfortunately. we can't give you a flawless recipe or a flowchart that will apply to all your GC tuning scenarios GC tuning involves a lot of common pattern recognition This pattern recognition requires experience ● > > We have a lot of it. :-) 5 .

Agenda > > > Introductions Brief GC Overview GC Tuning ● ● ● Tuning the young generation Tuning Parallel GC Tuning CMS > > Monitoring the GC Conclusions 6 .

GCs in the HotSpot JVM > Three available GCs: ● ● ● Serial GC Parallel GC / Parallel Old GC Concurrent Mark-Sweep GC (CMS) 7 .

Heap Layout (same for all GCs)
Young Generation

Old Generation

Permanent Generation

8

Young Generation
Allocation (new Object())

Eden

Survivor Spaces

9

Old Generation

Promotion (survivors from the Young Generation)

10

Permanent Generation Allocation (only directly from the JVM) 11 .

Agenda > > > Introductions Brief GC Overview GC Tuning ● ● ● Tuning the young generation Tuning Parallel GC Tuning CMS > > Monitoring the GC Conclusions 12 .

Your Dream GC > You would really like a GC that has ● ● ● Low GC overhead. and Good space efficiency > Unfortunately. you'll have to pick two (any two!) 13 . Low GC pause times.

Heap Sizing Tuning Advice Supersize it! 14 .

the larger the heap space. objects more likely to become garbage Smaller space: faster GCs (not always! see later) > Sometimes max heap size is dictated by available memory and/or max space the JVM can address ● You have to find a good balance between young and old generation size 15 . lower GC overhead.Heap Sizing Trade-Offs > Generally. the better ● ● ● For both young and old generation Larger space: less frequent GCs.

Generation Size Roles > Young Generation Size ● ● Dictates frequency of minor GCs Dictates how many objects will be reclaimed in the young generation ● Along with tenuring threshold + survivor space size tuning > Old Generation ● ● Should comfortably hold the application's steadystate live size Decrease the major GC frequency as much as possible 16 .

Two Very Important Points > You should try to maximize the number of objects reclaimed in the young generation ● This is probably the most important piece of advice when sizing a heap and/or tuning the young generation > Your application's memory footprint should not exceed the available physical memory ● This is probably the second most important piece of advice when sizing a heap > The above apply to all our GCs 17 .

Sizing Heap Spaces > -Xmx<size> : max heap size ● young generation + old generation young generation + old generation > -Xms<size> : initial heap size ● > > -Xmn<size> : young generation size Applications with emphasis on performance tend to set -Xms and -Xmx to the same value When -Xms != -Xmx. heap growth or shrinking requires a Full GC 18 > .

it's better to do a Full GC and grow the heap than to get an OOM and crash 19 . set -Xmx to something larger than -Xms “just in case” ● ● Maybe the application is hit with more load Maybe the DB gets larger over time > In most occasions.Should -Xms == -Xmx? > Set -Xms to what you think would be your desired heap size ● It's expensive to grow the heap > If memory allows.

the permanent generation occupancy is hard to predict 20 .Sizing Heap Spaces (ii) > -XX:PermSize=<size> : permanent generation initial size -XX:MaxPermSize=<size> : permanent generation max size Applications with emphasis on performance almost always set -XX:PermSize and -XX:MaxPermSize to the same value ● > > Growing or shrinking the permanent generation requires a Full GC too > Unfortunately.

Agenda > > > Introductions Brief GC Overview GC Tuning ● ● ● Tuning the young generation Tuning Parallel GC Tuning CMS > > Monitoring the GC Conclusions 21 .

. the live objects). not the young generation size 22 .Young Generation Sizing > Eden size determines ● ● The frequency of minor GCs Which objects will be reclaimed at age 0 ● ● Newly-allocated objects in Eden start from age 0 Their age is incremented at every minor GC > Increasing the size of the Eden will not always affect minor GC times ● Remember: minor GC times are proportional to the amount of objects they copy (i.e.

Young Object Survivor Ratio Survivor Ratio 0 Youngest New-Allocated Object Age Oldest 23 .

Young Object Survivor Ratio (ii) Survivor Ratio 0 Youngest New-Allocated Object Age Oldest 24 .

Young Object Survivor Ratio (iii) Survivor Ratio 0 Youngest New-Allocated Object Age Oldest 25 .

Sizing Heap Spaces (iii) > -XX:NewSize=<size> : initial young generation size -XX:MaxNewSize=<size> : max young generation size -XX:NewRatio=<ratio> : young generation to old generation ratio Applications with emphasis on performance tend to use -Xmn to size the young generation since it combines the use of -XX:NewSize and -XX:MaxNewSize 26 > > > .

Tenuring > -XX:TargetSurvivorRatio=<percent>..g. 50 ● How much of the survivor space should be filled ● Typically leave extra space to deal with “spikes” > -XX:InitialTenuringThreshold=<threshold> (PGC only) > > -XX:MaxTenuringThreshold=<threshold> -XX:+AlwaysTenure ● Never keep any objects in the survivor spaces Very bad idea! 27 > -XX:+NeverTenure ● . e.

Tenuring Threshold Trade-Offs > Try to retain as many objects as possible in the survivor spaces so that they can be reclaimed in the young generation ● ● Less promotion into the old generation Less frequent old GCs > But also. try not to unnecessarily copy very longlived objects between the survivors ● Unnecessary overhead on minor GCs Generally: better copy more. than promote more 28 > Not always easy to find the perfect balance ● .

2315488 total 2335016 total 2335112 total 2335144 total > Young generation seems well tuned here ● We can even decrease the survivor space size 29 .age . 19528 bytes. 96 bytes.age 1: 2: 3: 4: 2315488 bytes.age . new threshold 8 (max 8) . 32 bytes.Tenuring Distribution > Monitor tenuring distribution with -XX:+PrintTenuringDistribution Desired survivor size 6684672 bytes.age .

age 1: 3956928 bytes. 3956928 total > Survivor space too small! ● Increase survivor space and/or eden size 30 . new threshold 1 (max 6) .Tenuring Distribution (ii) Desired survivor size 3342336 bytes.

age . 2483440 total 2984680 total 3034696 total 3083784 total 3132400 total 3182528 total > Might be able to do better ● ● Either increase max tenuring threshold Or even set max tenuring threshold to 2 ● If ages > 6 still have around 50K of surviving bytes 31 .age .age .age 1: 2: 3: 4: 5: 6: 2483440 bytes. 501240 bytes. 50016 bytes. 48616 bytes. new threshold 6 (max 6) .Tenuring Distribution (iii) Desired survivor size 3342336 bytes.age . 50128 bytes. 49088 bytes.age .

UltraSPARC T1 / T2 32 .. i.Stop-The-World Parallel GC Threads > The number of parallel GC threads is controlled by -XX:ParallelGCThreads=<num> Default value assumes only one JVM per system Set the parallel GC thread number according to: ● > > ● Number of JVMs deployed on the system / processor set / zone CPU chip architecture ● Multiple hardware threads per chip core.e.

Agenda > > > Introductions Brief GC Overview GC Tuning ● ● ● Tuning the young generation Tuning Parallel GC Tuning CMS > > Monitoring the GC Conclusions 33 .

e.. auto-tuning > Ergonomics help in improving out-of-the-box GC performance To get maximum performance.Parallel GC Ergonomics > The Parallel GC has ergonomics ● i. most customers we know do manual tuning > 34 .

Parallel GC Tuning Advice > > Tune the young generation as described so far Try to avoid / decrease the frequency of major GCs We know of customers who use the Parallel GC in low-pause environments ● ● > Avoid Full GCs by avoiding / minimizing promotion Maximize heap size 35 .

more recently Intel platforms > > -XX:+UseNUMA Splits the young generation into partitions ● Each partition “belongs” to a CPU > Allocates new objects into the partition that belongs to the allocating CPU Big win for some applications 36 > . Opteron.NUMA > Non-Uniform Memory Access ● Applicable to most SPARC.

Agenda > > > Introductions Brief GC Overview GC Tuning ● ● ● Tuning the young generation Tuning Parallel GC Tuning CMS > > Monitoring the GC Conclusions 37 .

the more likely fragmentation will settle in 38 .CMS Tuning Advice > > Tune the young generation as described so far Need to be even more careful about avoiding premature promotion ● ● Originally we were using an +AlwaysTenure policy We have since changed our mind :-) > > Promotion in CMS is expensive (free lists) The more often promotion / reclamation happens.

when applications load exceeds what they have provisioned for Schedule Full GCs at non-critical times (say. late at night) to “tidy up” the heap and minimize fragmentation 39 .CMS Tuning Advice (ii) > We know customers who tune their applications to do mostly minor GCs. even with CMS ● ● CMS is used as a “safety net”.

Fragmentation > Two types ● External fragmentation ● No free chunk is large enough to satisfy an allocation Allocator rounds up allocation requests Free space wasted due to this rounding up ● Internal fragmentation ● ● 40 .

when is the heap fragmented anyway? 41 .Fragmentation (ii) > The bad news: you can never eliminate it! ● It has been proven Decrease promotion into the CMS old generation Be careful when coding ● > The good news: you can decrease its likelihood ● ● Large objects of various sizes are the main cause > But.

Concurrent CMS GC Threads > Number of parallel CMS threads is controlled by -XX:ParallelCMSThreads=<num> ● Available in post 6 JVMs CMS cycle duration vs. Concurrent overhead during a CMS cycle > Trade-Off ● ● 42 .

classes will not be unloaded by default from the permanent generation when using CMS ● ● Both -XX:+CMSClassUnloadingEnabled and -XX: +PermGenSweepingEnabled need to be set to enable class unloading in CMS The 2nd switch is not needed in post 6u4 JVMs 43 .Permanent Generation and CMS > To date.

CMS will constantly do CMS cycles 44 > . a tricky trade-off! Starting a CMS cycle too early ● ● Frequent CMS cycles High concurrent overhead Chance of an evacuation failure / Full GC > Starting a CMS cycle too late ● > Initiating heap occupancy should be (much) higher than the application steady-state live size Otherwise.Setting CMS Initiating Threshold > > Again.

if at all Very infrequent CMS cycles CMS cycles can start quite late 45 .Common CMS Scenarios > Applications that promote non-trivial amounts of objects to the old generation ● ● ● Old generation grows at a non-trivial rate Very frequent CMS cycles CMS cycles need to start relatively early > Applications that promote very few or even no objects to the old generation ● ● ● Old generation grows very slowly.

it tries to start cycles as late as possible. but early enough not to run out of heap before the cycle completes It keeps collecting stats and adjusting when to start cycles Sometimes. the second cycle starts too late 46 .Initiating CMS Cycles > CMS will try to automatically find the best initiating occupancy ● ● ● ● It first does a CMS cycle early to collect stats Then.

Initiating CMS Cycles (ii) > -XX:CMSInitiatingOccupancyFraction=<percent> ● Occupancy percentage of CMS old generation that triggers a CMS cycle Don't use the ergonomic initiating occupancy > -XX:+UseCMSInitiatingOccupancyOnly ● 47 .

Initiating CMS Cycles (iii) > -XX:CMSInitiatingPermOccupancyFraction=<percent> ● ● Occupancy percentage of permanent generation that triggers a CMS cycle Class unloading must be enabled 48 .

0353394 secs] [ParNew 407285K->312829K(773376K). 0. 0.1922082 secs] [ParNew 404913K->310361K(773376K). 0.1909849 secs] [ParNew 406005K->311878K(773376K).CMS Cycle Initiation Example > Cycle started too early: [ParNew 390868K->296358K(773376K). 0. 0.0847541 secs] [ParNew 401318K->306863K(773376K). 0.1843175 secs] [CMS-initial-mark 295026K(773376K).0865858 secs] [ParNew 397885K->303822K(773376K).963 secs] [CMS-concurrent-reset: 0.2012884 secs] [CMS-concurrent-sweep: 2. 0.149/0.152 secs] [CMS-concurrent-abortable-preclean: 0. 0.183 secs] [CMS-remark 374049K(773376K).010 secs] [ParNew 387767K->292925K(773376K).1969370 secs] [ParNew 405554K->311100K(773376K).010/0.1882258 secs] [CMS-initial-mark 298458K(773376K).1995878 secs] 49 .179/2. 0.787/0.105/0.1933159 secs] [CMS-concurrent-mark: 0. 0.981 secs] [CMS-concurrent-preclean: 0. 0.

0.1688876 secs] [ParNew 753466K->659042K(773376K). 8. 0.9112629 secs] [ParNew 339295K->247490K(773376K).0230993 secs] [ParNew 352450K->259959K(773376K).CMS Cycle Initiation Example (ii) > Cycle started too late: [ParNew 742993K->648506K(773376K).1933945 secs] 50 . 0. 0. 0.1695921 secs] [CMS-initial-mark 661142K(773376K).0861029 secs] [Full GC 645986K->234335K(655360K).

1849950 secs] 51 .0883685 secs] [ParNew 651320K->556690K(773376K).745 secs] [CMS-concurrent-reset: 0.146/0.832/1. 0.010/0. 0.010 secs] [ParNew 445124K->350518K(773376K).2050494 secs] [ParNew 463096K->368901K(773376K).181 secs] [CMS-remark 623877K(773376K).1839508 secs] [CMS-initial-mark 548460K(773376K). 0. 0. 0. 0.2053158 secs] ..181/0. [ParNew 489586K->395012K(773376K).1800791 secs] [ParNew 455478K->361141K(773376K). 0.CMS Cycle Initiation Example (iii) > This is better: [ParNew 640710K->546360K(773376K).2052309 secs] [CMS-concurrent-mark: 0.151 secs] [CMS-concurrent-abortable-preclean: 0. 0.2088224 secs] [ParNew 648882K->554390K(773376K).873/6.. 0.038 secs] [CMS-concurrent-preclean: 0.0328863 secs] [ParNew 655656K->561336K(773376K).2137257 secs] [CMS-concurrent-sweep: 4. 0.

use: ● -XX:+ExplicitGCInvokesConcurrent ● Requires a post 6 JVM Requires a post 6u4 JVM ● -XX:+ExplicitGCInvokesConcurrentAndUnloadClasses ● > Useful when wanting to cause references / finalizers to be processed 52 .Start CMS Cycles Explicitly > If relying on explicit GCs and want them to be concurrent.

Agenda > > > Introductions Brief GC Overview GC Tuning ● ● ● Tuning the young generation Tuning Parallel GC Tuning CMS > > Monitoring the GC Conclusions 53 .

sun.com/performance/jvmstat/ VisualGC is also available as a VisualVM plug-in Can monitor multiple JVMs within the same tool > Offline ● ● ● GC Logging PrintGCStats GChisto 54 .java.dev.Monitoring the GC > Online ● ● VisualVM: http://visualvm.net/ VisualGC: ● ● ● http://java.

GC Logging in Production > Don't be afraid to enable GC logging in production ● Very helpful when diagnosing production issues Maybe some large files in your file system. I shoot them!” 55 > Extremely low / non-existent overhead ● ● > Real customer quote: ● . :-) We are surprised that customers are still afraid to enable it “If someone doesn't enable GC logging in production.

Most Important GC Logging Parameters > You need at least: ● -XX:+PrintGCTimeStamps ● Add -XX:+PrintGCDateStamps if you must Preferred over -verbosegc as it's more detailed ● -XX:+PrintGCDetails ● > Also useful: ● ● -Xloggc:<file> Separates GC logging output from application output 56 .

com/developer/technicalArticles/Pr ogramming/turbo/PrintGCStats.sun.zip PrintGCStats -v cpus=<num> <gc log file> ● > Usage ● Where <num> is the number of CPUs on the machine where the GC log was obtained > It might not work with some of the printing flags 57 .PrintGCStats > > Summarizes GC logs Downloadable script from ● http://java.

930 635.8519 9.9291 17.366% 58 .586 MB/s = 9.896 91802.609 11244.0000 87.375 635.792 s s s s s s s s max 0.792 934.350 7.0633 0.470 7.236 301.213 17854.8376 9.819 11244.609 807.000% = 24.05943 7.896 736.188 11244.350 100.609 807.237 11.687 7.490 114.0000 9.34973 0.4899 0.09701 58.110 0.8209 0.000 301.099 MB/s = 12.110 MB MB MB MB MB s s s / / / / / / / / mean 0.366% = 0.451 MB/s = 70.792 1235.50874 77.000 141374.5272 18.50874 640.000 754.350 18.875 96.188 123520.792 1235.18257 82.470 1235.500 stddev 0.00000 732.26222 4.236 16018.99964 635.236 807.426 114.682 77.237 1235.609 11244.89648 473.8209 alloc/elapsed_time alloc/tot_cpu_time alloc/mut_cpu_time promo/elapsed_time promo/gc0_time gc_seq_load gc_conc_load gc_tot_load = 145.500 640.0000 0.030 MB/s = 10.PrintGCStats Parallel GC what gen0t(s) gen1t(s) GC(s) alloc(MB) promo(MB) used0(MB) used1(MB) used(MB) commit0(MB) commit1(MB) commit(MB) count 193 1 194 193 193 193 1 194 193 193 193 = = = = = = = = total 11.20728 92.380 MB/s = 24.

0015 0.50000 12.016 102.718 396.22179 0.774 2.472 83.381 24.0112 0.0000 0.500 104.16183 4.427 MB/s = 12.09494 0.000 1322.13409 514.818 MB/s = 54.751 0.936 1337.2157 91.PrintGCStats CMS what gen0(s) gen0t(s) cmsIM(s) cmsRM(s) GC(s) cmsCM(s) cmsCP(s) cmsCS(s) cmsCR(s) alloc(MB) promo(MB) used0(MB) used(MB) commit0(MB) commit1(MB) commit(MB) count 110 110 3 3 113 3 6 3 3 110 110 110 110 110 110 110 = = = = = = = = total 24.086 414.751 0.209 MB/s = 15.626% = 1.608 115.378 18.00000 755.21924 0.036 11275.625 115.500 11275.718 1322.191 4.835 MB/s = 8.108 0.032 1.718 12664.8770 1.000 1322.81967 0.971 14.092 24.250 640.0000 alloc/elapsed_time alloc/tot_cpu_time alloc/mut_cpu_time promo/elapsed_time promo/gc0_time gc_seq_load gc_conc_load gc_tot_load = 134.02471 115.25000 640.22164 0.459 0.352% = 30.542 12677.000 755.464 MB MB MB MB MB s s s / / / / / / / / mean 0.621 1337.217 MB/s = 29.250 stddev 0.0638 0.620 0.000 11275.01200 102.397 1337.835 0.87333 0.000 83077.05947 115.936 923.500 70400.397 0.2038 0.2013 0.750 56546.936 s s s s s s s s max 1.0000 11.25000 83.0000 0.0272 0.916 0.0035 0.2038 0.000 11275.250 640.03074 0.936 1337.751 1.285 0.5858 0.978% 59 .0146 0.621 24.

dev.java.net/ > Open source at ● > It might not work with some of the printing flags 60 . can only show pause times http://gchisto.GChisto > > Graphical GC log visualizer Under development ● Currently.

Demo GChisto Demo 61 .

Agenda > > > Introductions Brief GC Overview GC Tuning ● ● ● Tuning the young generation Tuning Parallel GC Tuning CMS > > Monitoring the GC Conclusions 62 .

Conclusions > > Remember: GC tuning is an art The talk contained ● ● ● ● Basic GC tuning concepts How to monitor GCs What to look out for Examples of good tuning practices > ...and practice makes perfect! 63 .

com charlie.Tony Printezis.hunt@sun.printezis@sun.com . Charlie Hunt tony.

Sign up to vote on this title
UsefulNot useful