You are on page 1of 7

Automatic Garbage Collection Reference Counting

The alternative to manual This is one of the oldest and simplest


deallocation of heap space is garbage garbage collection techniques.
collection. A reference count field is added to
Compiler-generated code tracks each heap object. It counts how many
pointer usage. When a heap object is references to the heap object exist.
no longer pointed to, it is garbage, When an object’s reference count
and is automatically collected for reaches zero, it is garbage and may
subsequent reuse. collected.
Many garbage collection techniques The reference count field is updated
exist. Here are some of the most whenever a reference is created,
important approaches: copied, or destroyed. When a
reference count reaches zero and an
object is collected, all pointers in the
collected object are also be followed
and corresponding reference counts
decremented.

© ©
CS 536 Spring 2005 427 CS 536 Spring 2005 428

As shown below, reference counting Mark-Sweep Collection


has difficulty with circular structures.
Global pointer P Many collectors, including mark &
Reference Count = 2
Link
sweep, do nothing until heap space is
Data nearly exhausted.
Reference Count = 1 Then it executes a marking phase that
Link
Data identifies all live heap objects.
If pointer P is set to null, the object’s Starting with global pointers and
reference count is reduced to 1. Both pointers in stack frames, it marks
objects have a non-zero count, but reachable heap objects. Pointers in
neither is accessible through any marked heap objects are also
external pointer. The two objects are followed, until all live heap objects
garbage, but won’t be recognized as are marked.
such. After the marking phase, any object
If circular structuresare common, not marked is garbage that may be
then an auxiliary technique, like freed. We then sweep through the
mark-sweep collection, is needed to heap, collecting all unmarked objects.
collect garbage that reference During the sweep phase we also clear
counting misses. all marks from heap objects found to
be still in use.

© ©
CS 536 Spring 2005 429 CS 536 Spring 2005 430
Mark-sweep garbage collection is pointers is a bit tricky in languages
illustrated below. like Java, C and C++, that have
Global pointer Global pointer
pointers mixed with other types
Internal pointer
within data structures, implicit
pointers to temporaries, and so forth.
Object 1 Object 3 Object 5
Considerable information about data
structures and frames must be
available at run-time for this purpose.
Objects 1 and 3 are marked because In cases where we can’t be sure if a
they are pointed to by global pointers. value is a pointer or not, we may need
Object 5 is marked because it is to do conservative garbage collection.
pointed to by object 3, which is In mark-sweep garbage collection all
marked. Shaded objects are not heap objects must be swept. This is
marked and will be added to the free- costly if most objects are dead. We’d
space list. prefer to examine only live objects.
In any mark-sweep collector, it is vital
that we mark all accessible heap
objects. If we miss a pointer, we may
fail to mark a live heap object and
later incorrectly free it. Finding all

© ©
CS 536 Spring 2005 431 CS 536 Spring 2005 432

Compaction start of the heap and the current


object. This is illustrated below:
After the sweep phase, live heap Global pointer
Adjusted Global pointer
objects are distributed throughout Adjusted internal pointer

the heap space. This can lead to poor


locality. If live objects span many Object 1 Object 3 Object 5
memory pages, paging overhead may
be increased. Cache locality may be
degraded too. Compaction merges together freed
objects into one large block of free
We can add a compaction phase to heap space. Fragments are no longer
mark-sweep garbage collection. a problem.
After live objects are identified, they Moreover, heap allocation is greatly
are placed together at one end of the simplified. Using an “end of heap”
heap. This involves another tracing pointer, whenever a heap request is
phase in which global, local and received, the end of heap pointer is
internal heap pointers are found and adjusted, making heap allocation no
adjusted to reflect the object’s new more complex than stack allocation.
location.
Pointers are adjusted by the total size
of all garbage objects between the

© ©
CS 536 Spring 2005 433 CS 536 Spring 2005 434
Because pointers are adjusted, Copying Collectors
compaction may not be suitable for
languages like C and C++, in which it Compaction provides many valuable
is difficult to unambiguously identify benefits. Heap allocation is simple
pointers. end efficient. There is no
fragmentation problem, and because
live objects are adjacent, paging and
cache behavior is improved.
An entire family of garbage collection
techniques, called copying collectors
are designed to integrate copying
with recognition of live heap objects.
Copying collectors are very popular
and are widely used.
Consider a simple copying collector
that uses semispaces. We start with
the heap divided into two halves—the
from and to spaces.

© ©
CS 536 Spring 2005 435 CS 536 Spring 2005 436

Initially, we allocate heap requests This is illustrated below:


from the from space, using a simple
“end of heap” pointer. When the from Global pointer Global pointer
Internal pointer
space is exhausted, we stop and do
garbage collection.
Object 5 From Space
Actually, though we don’t collect Object 1 Object 3

garbage. We collect live heap


objects—garbage is never touched. To Space

We trace through global and local


pointers, finding live objects. As each The from space is completely filled.
object is found, it is moved from its We trace global and local pointers,
current position in the from space to moving live objects to the to space
the next available position in the to and updating pointers. This is
space. illustrated in Figure 0.1. (Dashed
The pointer is updated to reflect the arrows are forwarding pointers). We
object’s new location. A “forwarding have yet to handle pointers internal
pointer” is left in the object’s old to copied heap objects. All copied
location in case there are multiple heap objects are traversed. Objects
pointers to the same object. referenced are copied and internal
pointers are updated. Finally, the to

© ©
CS 536 Spring 2005 437 CS 536 Spring 2005 438
and from spaces are interchanged, dead objects is essentially free. In
and heap allocation resumes just fact, garbage collection can be made,
beyond the last copied object. This is on average, as fast as you wish—
illustrated in Figure 0.2. simply make the heap bigger. As the
Object 5 From Space heap gets bigger, the time between
Internal pointer
collections increases, reducing the
Object 3
number of times a live object must be
Object 1 To Space
copied. In the limit, objects are never
copied, so garbage collection becomes
Global pointer Global pointer

Figure 0.1 Copying Garbage Collection (b)


free!
Of course, we can’t increase the size
To Space
of heap memory to infinity. In fact,
Internal pointer
we don’t want to make the heap so
Object 1 Object 3 Object 5 From Space large that paging is required, since
swapping pages to disk is dreadfully
Global pointer Global pointer End of Heap pointer slow. If we can make the heap large
Figure 0.2 Copying Garbage Collection (c)
enough that the lifetime of most
The biggest advantage of copying heap objects is less than the time
collectors is their speed. Only live between collections, then
objects are copied; deallocation of deallocation of short-lived objects

© ©
CS 536 Spring 2005 439 CS 536 Spring 2005 440

will appear to be free, though longer- be greater than the average lifetime
lived objects will still exact a cost. of most heaps objects, we can
Aren’t copying collectors terribly improve our use of heap space.
wasteful of space? After all, at most Assume that 50% or more of the
only half of the heap space is actually heap will be garbage when the
used. The reason for this apparent collector is called. We can then divide
inefficiency is that any garbage the heap into 3 segments, which we’ll
collector that does compaction must call A, B and C. Initially, A and B
have an area to copy live objects to. will be used as the from space,
Since in the worst case all heap utilizing 2/3 of the heap. When we
objects could be live, the target area copy live objects, we’ll copy them into
must be as large as the heap itself. To segment C, which will be big enough
avoid copying objects more than if half or more of the heap objects are
once, copying collectors reserve a to garbage. Then we treat C and A as
space as big as the from space. This is the from space, using B as the to
essentially a space-time trade-off, space for the next collection. If we
making such collectors very fast at are unlucky and more than 1/2 the
the expense of possibly wasted space. heap contains live objects, we can still
get by. Excess objects are copied onto
If we have reason to believe that the an auxiliary data space (perhaps the
time between garbage collections will

© ©
CS 536 Spring 2005 441 CS 536 Spring 2005 442
stack), then copied into A after all their start, and utilize that structure
live objects in A have been moved. throughout the program. Copying
This slows collection down, but only collectors handle long-lived objects
rarely (if our estimate of 50% poorly. They are repeatedly traced and
garbage per collection is sound). Of moved between semispaces without
course, this idea generalizes to more any real benefit.
than 3 segments. Thus if 2/3 of the Generational garbage collection
heap were garbage (on average), we techniques [Unger 1984] were
could use 3 of 4 segments as from developed to better handle objects
space and the last segment as to with varying lifetimes. The heap is
space. divided into two or more generations,
Generational Techniques each with its own to and from space.
The great strength of copying New objects are allocated in the
collectors is that they do no work for youngest generation, which is
objects that are born and die between collected most frequently. If an object
collections. However, not all heaps survives across one or more
objects are so short-lived. In fact, collections of the youngest
some heap objects are very long- generation, it is “promoted” to the
lived. For example, many programs next older generation, which is
create a dynamic data structure at collected less often. Objects that

© ©
CS 536 Spring 2005 443 CS 536 Spring 2005 444

survive one or more collections of this object in a newer generation. If we


generation are then moved to the don’t do this, we may mistake a live
next older generation. This continues object for a dead one. When an object
until very long-lived objects reach the is promoted to an older generation,
oldest generation, which is collected we can check to see if it contains a
very infrequently (perhaps even pointer into a younger generation. If
never). it does, we record its address so that
The advantage of this approach is we can trace and update its pointer.
that long-lived objects are “filtered We must also detect when an existing
out,” greatly reducing the cost of pointer inside an object is changed.
repeatedly processing them. Of Sometimes we can do this by
course, some long-lived objects will checking “dirty bits” on heap pages to
die and these will be caught when see which have been updated. We
their generation is eventually then trace all objects on a page that
collected. is dirty. Otherwise, whenever we
assign to a pointer that already has a
An unfortunate complication of value, we record the address of the
generational techniques is that pointer that is changed. This
although we collect older generations information then allows us to only
infrequently, we must still trace their trace those objects in older
pointers in case they reference an

© ©
CS 536 Spring 2005 445 CS 536 Spring 2005 446
generations that might point to return address stored in a frame) to
younger objects. determine the routine a frame
Experience shows that a carefully corresponds to. This allows us to then
designed generational garbage determine what offsets in the frame
collectors can be very effective. They contain pointers. When heap objects
focus on objects most likely to are allocated, we can include a type
become garbage, and spend little code in the object’s header, again
overhead on long-lived objects. allowing us to identify pointers
Generational garbage collectors are internal to the object.
widely used in practice. Languages like C and C++ are weakly
Conservative Garbage Collection typed, and this makes identification
of pointers much harder. Pointers may
The garbage collection techniques be type-cast into integers and then
we’ve studied all require that we back into pointers. Pointer arithmetic
identify pointers to heap objects allows pointers into the middle of an
accurately. In strongly typed object. Pointers in frames and heap
languages like Java or ML, this can be objects need not be initialized, and
done. We can table the addresses of may contain random values. Pointers
all global pointers. We can include a may overlay integers in unions,
code value in a frame (or use the

© ©
CS 536 Spring 2005 447 CS 536 Spring 2005 448

making the current type a dynamic done. However, mark-sweep


property. collection will work.
As a result of these complications, C Garbage collectors that work with
and C++ have the reputation of being ordinary C programs have been
incompatible with garbage collection. developed [BW 1988]. User programs
Surprisingly, this belief is false. Using need not be modified. They simply are
conservative garbage collection, C and linked to different library routines, so
C++ programs can be garbage that malloc and free properly
collected. support the garbage collector. When
The basic idea is simple—if we can’t new heap space is required, dead heap
be sure whether a value is a pointer objects may be automatically
or not, we’ll be conservative and collected, rather than relying entirely
assume it is a pointer. If what we on explicit free commands (though
think is a pointer isn’t, we may retain frees are allowed; they sometimes
an object that’s really dead, but we’ll simplify or speed heap reuse).
find all valid pointers, and never With garbage collection available, C
incorrectly collect a live object. We programmers need not worry about
may mistake an integer (or a floating explicit heap management. This
value, or even a string) as an pointer, reduces programming effort and
so compaction in any form can’t be eliminates errors in which objects are

© ©
CS 536 Spring 2005 449 CS 536 Spring 2005 450
prematurely freed, or perhaps never
freed. In fact, experiments have
shown [Zorn 93] that conservative
garbage collection is very competitive
in performance with application-
specific manual heap management.

©
CS 536 Spring 2005 451

You might also like