Professional Documents
Culture Documents
Garbage Collection Tutorial
Garbage Collection Tutorial
What a re the benefits of knowi ng how ga rba ge col l ec�on (GC) works i n Java ? Sa�sfyi ng the
i ntel l ectua l curi os i ty a s a s o�wa re engi neer woul d be a va l i d ca us e, but a l s o, understa ndi ng how
GC works ca n hel p you wri te much be�er Java a ppl i ca�ons .
Thi s i s a very pers ona l a nd s ubjec�ve opi ni on of mi ne, but I bel i eve that a pers on wel l vers ed i n
GC tends to be a be�er Java devel oper. If you a re i nterested i n the GC proces s , that mea ns you have
experi ence i n devel opi ng a ppl i ca�ons of certa i n s i ze. If you have thought ca reful l y a bout choos i ng
the ri ght GC a l gori thm, that mea ns you compl etel y understa nd the features of the a ppl i ca�on you
have devel oped. Of cours e, thi s may not be common sta nda rds for a good devel oper. However, few
woul d object when I s ay that understa ndi ng GC i s a requi rement for bei ng a great Java devel oper.
Thi s i s the first of a s eri es of "Become a Java GC Expert" a r�cl es . I wi l l cover the GC introduc�on thi s
�me, a nd i n the next a r�cl e, I wi l l ta l k a bout a na l yzi ng GC status a nd GC tuni ng exa mpl es from
NHN.
The purpos e of thi s a r�cl e i s to i ntroduce GC to you i n a n ea sy way. I hope thi s a r�cl e proves to be
very hel pful . Actua l l y, my col l ea gues have a l rea dy publ i s hed a few great a r�cl es on Java
Interna l s whi ch beca me qui te popul a r on Twi �er. You may refer to them a s wel l .
Returni ng ba ck to Ga rba ge Col l ec�on, there i s a term that you s houl d know before l ea rni ng a bout
GC. The term i s "stop-the-world." Stop-the-worl d wi l l occur no ma�er whi ch GC a l gori thm you choos e.
Stop-the-world mea ns that the JVM i s stoppi ng the a ppl i ca�on from runni ng to execute a GC. When
stop-the-worl d occurs , every threa d except for the threa ds needed for the GC wi l l stop thei r ta s ks .
The i nterrupted ta s ks wi l l res ume onl y a�er the GC ta s k ha s compl eted. GC tuni ng o�en mea ns
reduci ng thi s stop-the-worl d �me.
Java does not expl i ci tl y s peci fy a memory a nd remove i t i n the progra m code. Some peopl e s ets the
rel eva nt object to nul l or us e System.gc() method to remove the memory expl i ci tl y. Se�ng i t to nul l
i s not a bi g dea l , but ca l l i ng System.gc() method wi l l affect the system performa nce dra s�ca l l y, a nd
must not be ca rri ed out. (Tha nkful l y, I have not yet s een a ny devel oper i n NHN ca l l i ng thi s method.)
In Java , a s the devel oper does not expl i ci tl y remove the memory i n the progra m code, the ga rba ge
col l ector finds the unneces s a ry (ga rba ge) objects a nd removes them. Thi s ga rba ge col l ector wa s
created ba s ed on the fol l owi ng two hypothes es . (It i s more correct to ca l l them s uppos i �ons or
precondi �ons , rather tha n hypothes es .)
Thes e hypothes es a re ca l l ed the weak genera�onal hypothesis. So i n order to pres erve the strengths
of thi s hypothes i s , i t i s phys i ca l l y di vi ded i nto two - young genera�on a nd old genera�on - i n HotSpot
VM.
Young genera�on: Most of the newl y created objects a re l ocated here. Si nce most objects s oon
become unrea cha bl e, ma ny objects a re created i n the young genera�on, then di s a ppea r. When
objects di s a ppea r from thi s a rea , we s ay a "minor GC" ha s occurred.
Old genera�on: The objects that di d not become unrea cha bl e a nd s urvi ved from the young
genera�on a re copi ed here. It i s genera l l y l a rger tha n the young genera�on. As i t i s bi gger i n s i ze,
the GC occurs l es s frequentl y tha n i n the young genera�on. When objects di s a ppea r from the ol d
genera�on, we s ay a "major GC" (or a "full GC") ha s occurred.
The permanent genera�on from the cha rt a bove i s a l s o ca l l ed the "method area," a nd i t stores cl a s s es
or i nterned cha ra cter stri ngs . So, thi s a rea i s defini tel y not for objects that s urvi ved from the ol d
genera�on to stay perma nentl y. A GC may occur i n thi s a rea . The GC that took pl a ce here i s s�l l
counted a s a ma jor GC.
To ha ndl e thes e ca s es , there i s s omethi ng ca l l ed the a "card table" i n the ol d genera�on, whi ch i s a
512 byte chunk. Whenever a n object i n the ol d genera�on references a n object i n the young
genera�on, i t i s recorded i n thi s ta bl e. When a GC i s executed for the young genera�on, onl y thi s
ca rd ta bl e i s s ea rched to determi ne whether or not i t i s s ubject for GC, i nstea d of checki ng the
reference of a l l the objects i n the ol d genera�on. Thi s ca rd ta bl e i s ma na ged wi th write barrier. Thi s
write barrier i s a devi ce that a l l ows a fa ster performa nce for mi nor GC. Though a bi t of overhea d
occurs beca us e of thi s , the overa l l GC �me i s reduced.
In order to understa nd GC, l et's l ea rn a bout the young genera�on, where the objects a re created for
the first �me. The young genera�on i s di vi ded i nto 3 s pa ces .
One Eden s pa ce
Two Survivor s pa ces
There a re 3 s pa ces i n tota l , two of whi ch a re Survi vor s pa ces . The order of execu�on proces s of
ea ch s pa ce i s a s bel ow:
As you ca n s ee by checki ng thes e steps , one of the Survi vor s pa ces must rema i n empty. If data exists
in both Survivor spaces, or the usage is 0 for both spaces, then ta ke that a s a s i gn that something is wrong
with your system.
The proces s of data pi l i ng up i nto the ol d genera�on through mi nor GCs ca n be s hown a s i n the
bel ow cha rt:
Figure 3: Before & A�er a GC.
Note that i n HotSpot VM, two techni ques a re us ed for fa ster memory a l l oca�ons . One i s ca l l ed
"bump-the-pointer," a nd the other i s ca l l ed "TLABs (Thread-Local Alloca�on Buffers)."
Bump-the-pointer techni que tra cks the l a st object a l l ocated to the Eden s pa ce. That object wi l l be
l ocated on top of the Eden s pa ce. And i f there i s a n object created a�erwa rds , i t checks onl y i f the
s i ze of the object i s s ui ta bl e for the Eden s pa ce. If the s a i d object s eems ri ght, i t wi l l be pl a ced i n
the Eden s pa ce, a nd the new object goes on top. So, when new objects a re created, onl y the l a stl y
a dded object needs to be checked, whi ch a l l ows much fa ster memory a l l oca�ons . However, i t i s a
di fferent story i f we cons i der a mul �threa ded envi ronment. To s ave objects us ed by mul �pl e
threa ds i n the Eden s pa ce for Threa d-Safe, a n i nevi ta bl e l ock wi l l occur a nd the performa nce wi l l
drop due to the l ock-conten�on. TLABs i s the s ol u�on to thi s probl em i n HotSpot VM. Thi s a l l ows
ea ch threa d to have a s ma l l por�on of i ts Eden s pa ce that corres ponds to i ts own s ha re. As ea ch
threa d ca n onl y a cces s to thei r own TLAB, even the bump-the-poi nter techni que wi l l a l l ow memory
a l l oca�ons wi thout a l ock.
Thi s ha s been a qui ck overvi ew of the GC i n the young genera�on. You do not neces s a ri l y have to
remember the two techni ques that I have just men�oned. You wi l l not go to ja i l for not knowi ng
them. But pl ea s e remember that a�er the objects a re first created i n the Eden s pa ce, a nd the
l ong-s urvi vi ng objects a re moved to the ol d genera�on through the Survi vor s pa ce.
The ol d genera�on ba s i ca l l y performs a GC when the data i s ful l . The execu�on procedure va ri es by
the GC type, s o i t woul d be ea s i er to understa nd i f you know di fferent types of GC.
1. Seri a l GC
2. Pa ra l l el GC
3. Pa ra l l el Ol d GC (Pa ra l l el Compa c�ng GC)
4. Concurrent Ma rk & Sweep GC (or "CMS")
5. Ga rba ge Fi rst (G1) GC
Among thes e, the serial GC must not be used on an opera�ng server. Thi s GC type wa s created when
there wa s onl y one CPU core on des ktop computers . Us i ng thi s s eri a l GC wi l l drop the a ppl i ca�on
performa nce s i gni fica ntl y.
Serial GC (-XX:+UseSerialGC)
The GC i n the young genera�on us es the type we expl a i ned i n the previ ous pa ra gra ph. The GC i n
the ol d genera�on us es a n a l gori thm ca l l ed "mark-sweep-compact."
1. The first step of thi s a l gori thm i s to ma rk the s urvi vi ng objects i n the ol d genera�on.
2. Then, i t checks the hea p from the front a nd l eaves onl y the s urvi vi ng ones behi nd (s weep).
3. In the l a st step, i t fil l s up the hea p from the front wi th the objects s o that the objects a re pi l ed
up cons ecu�vel y, a nd di vi des the hea p i nto two pa rts : one wi th objects a nd one wi thout
objects (compa ct).
The s eri a l GC i s s ui ta bl e for a s ma l l memory a nd a s ma l l number of CPU cores .
Parallel GC (-XX:+UseParallelGC)
From the pi cture, you ca n ea s i l y s ee the di fference between the s eri a l GC a nd pa ra l l el GC. Whi l e
the s eri a l GC us es onl y one threa d to proces s a GC, the pa ra l l el GC us es s evera l threa ds to proces s
a GC, a nd therefore, fa ster. Thi s GC i s us eful when there i s enough memory a nd a l a rge number of
cores . It i s a l s o ca l l ed the "throughput GC."
Pa ra l l el Ol d GC wa s s upported s i nce JDK 5 update. Compa red to the pa ra l l el GC, the onl y di fference
i s the GC a l gori thm for the ol d genera�on. It goes through three steps : mark – summary – compac�on.
The s umma ry step i den�fies the s urvi vi ng objects s epa ratel y for the a rea s that the GC have
previ ous l y performed, a nd thus di fferent from the s weep step of the ma rk-s weep-compa ct
a l gori thm. It goes through a l i �l e more compl i cated steps .
CMS GC (-XX:+UseConcMarkSweepGC)
Figure 5: Serial GC & CMS GC.
As you ca n s ee from the pi cture, the Concurrent Ma rk-Sweep GC i s much more compl i cated tha n a ny
other GC types that I have expl a i ned s o fa r. The ea rl y ini�al mark step i s s i mpl e. The s urvi vi ng
objects a mong the objects the cl os est to the cl a s s l oa der a re s ea rched. So, the pa us i ng �me i s very
s hort. In the concurrent mark step, the objects referenced by the s urvi vi ng objects that have just been
confirmed a re tra cked a nd checked. The di fference of thi s step i s that i t proceeds whi l e other
threa ds a re proces s ed at the s a me �me. In the remark step, the objects that were newl y a dded or
stopped bei ng referenced i n the concurrent ma rk step a re checked. La stl y, i n the concurrent sweep
step, the ga rba ge col l ec�on procedure ta kes pl a ce. The ga rba ge col l ec�on i s ca rri ed out whi l e
other threa ds a re s�l l bei ng proces s ed. Si nce thi s GC type i s performed i n thi s ma nner, the pa us i ng
�me for GC i s very s hort. The CMS GC i s a l s o ca l l ed the l ow l atency GC, a nd i s used when the response
�me from all applica�ons is crucial.
Whi l e thi s GC type ha s the a dva nta ge of s hort stop-the-worl d �me, i t a l s o ha s the fol l owi ng
di s a dva nta ges .
You need to ca reful l y revi ew before us i ng thi s type. Al s o, i f the compa c�on ta s k needs to be ca rri ed
out beca us e of the ma ny memory fra gments , the stop-the-worl d �me ca n be l onger tha n a ny other
GC types . You need to check how o�en a nd how l ong the compa c�on ta s k i s ca rri ed out.
G1 GC
If you wa nt to understa nd G1 GC, forget everythi ng you know a bout the young genera�on a nd the
ol d genera�on. As you ca n s ee i n the pi cture, one object i s a l l ocated to ea ch gri d, a nd then a GC i s
executed. Then, once one a rea i s ful l , the objects a re a l l ocated to a nother a rea , a nd then a GC i s
executed. The steps where the data moves from the three s pa ces of the young genera�on to the ol d
genera�on ca nnot be found i n thi s GC type. Thi s type wa s created to repl a ce the CMS GC, whi ch ha s
ca us es a l ot of i s s ues a nd compl a i nts i n the l ong term.
The bi ggest a dva nta ge of the G1 GC i s i ts performance. It i s fa ster tha n a ny other GC types that we
have di s cus s ed s o fa r. But i n JDK 6, thi s i s ca l l ed a n early access a nd ca n be us ed onl y for a test. It i s
offici a l l y i ncl uded i n JDK 7. In my pers ona l opi ni on, we need to go through a l ong test peri od (at
l ea st 1 yea r) before NHN ca n us e JDK7 i n a ctua l s ervi ces , s o you proba bl y s houl d wa i t a whi l e. Al s o,
I hea rd a few �mes that a JVM cra s h occurred a�er a ppl yi ng the G1 i n JDK 6. Pl ea s e wa i t un�l i t i s
more sta bl e.
I wi l l ta l k a bout the GC tuning i n the next i s s ue, but I woul d l i ke to a s k you one thi ng i n a dva nce. If
the s i ze a nd the type of a l l objects created i n the a ppl i ca�on a re i den�ca l , a l l the GC op�ons for
WAS us ed i n our compa ny ca n be the s a me. But the s i ze a nd the l i fes pa n of the objects created by
WAS va ry dependi ng on the s ervi ce, a nd the type of equi pment va ri es a s wel l . In other words , just
beca us e a certa i n s ervi ce us es the GC op�on "A," i t does not mea n that the s a me op�on wi l l bri ng
the best res ul ts for a di fferent s ervi ce. It i s neces s a ry to find the best va l ues for the WAS threa ds ,
WAS i nsta nces for ea ch equi pment a nd ea ch GC op�on by consta nt tuni ng a nd moni tori ng. Thi s di d
not come from my pers ona l experi ence, but from the di s cus s i on of the engi neers ma ki ng Ora cl e
JVM for Java One 2010.
In thi s i s s ue, we have onl y gl a nced at the GC for Java . Pl ea s e l ook forwa rd to our next i s s ue, where
I wi l l ta l k a bout how to monitor the Java GC status and tune GC.
I woul d l i ke to note that I referred to a new book rel ea s ed i n December 2011 ca l l ed "Java
Performance" (Ama zon, i t ca n a l s o be vi ewed from s afa ri onl i ne, i f the compa ny provi des a n
a ccount), a s wel l a s “Memory Management in the Java HotSpotTM Virtual Machine,” a whi te pa per
provi ded by the Ora cl e webs i te. (The book i s di fferent from "Java Performance Tuning.")
By Sa ngmi n Lee, Seni or Engi neer at Performa nce Engi neeri ng La b, NHN Corpora�on.
See also