The memory management of an application is said to be dynamic if its
storage needs vary during the execution. The application then
allocates space in the heap to satisfy new storage requests. Since the
size of storage space is bounded, the application must free unused
objects to assure that future requests can be satisfied. Storage space
can be freed either explicitly or implicitly. When explicit memory
reclamation is used, it is up to the application developer to free
previously allocated objects. When the space is reclaimed implicitly a
garbage collector searches and frees the objects which are no
longer used by the application. Applications written in the C/C++
language typically use explicit memory reclamation. Lisp-style
languages and Java rely on a garbage collector to reclaim storage
space.
With implicit memory management, the developer need not worry about
the bookkeeping of previously allocated memory. When the application
runs out of storage space a garbage collector reclaims the allocated
memory that is no longer used (hence garbage) by the application. This
makes programming simpler and more straightforward. Objects that are
still used by the application are called ``live.'' The collection of
garbage normally proceeds in two steps: 1) distinguishing the live
objects from the garbage (garbage detection or tracing), 2) reclaiming
the garbage objects so that their occupied space can be reused
(garbage reclamation). The garbage collector traces the live objects
starting from the active variables. These includes the statically
allocated and global objects, the local variables on the stack, and
the variables in the registers. These objects are called the root
set. Any object that is reachable from the root set by dereferencing
is also reachable by the application. It is therefore considered
alive. The objects that are not reachable by traversal of the
application's data structures are considered garbage.
Both explicit and implicit memory reclamation have their advantages
and inconveniences. Explicit reclamation can lead to hard to find
errors. Forgetting to free memory leads to an accumulation of garbage
(memory leaks) until the application runs out of memory. Reclaiming
memory too early can lead to strange behavior. The freed memory can
be used to allocate a new object; two objects occupy the same physical
space and overwrite each other data causing unpredictable mutations.
These errors are hard to find. Moreover, it's possible that the
application runs correctly in most situations and that the errors only
show up in particular situations. Garbage collection, however, is able
to reclaim most garbage and therefore eliminates the risk of running
out of memory due to memory leaks.
Garbage collection is often considered expensive, both in CPU time and
in storage space. For every object the collector may allocate an
additional header to store information about the object. Furthermore,
traditional collectors interrupt the application when it runs out of
memory to perform a full garbage detection and reclamation. This
interruption can take a fair amount of time and result in a poor
responsiveness of the application. More advanced garbage collectors
using some of the techniques discussed later are sometimes cheaper,
are less disruptive, and are usually competitive with explicit
deallocation [Zor93,NG95].
In terms of software design, the convincing arguments in favor of
implicit memory management is that garbage collection is necessary
for fully modular programming. The ``liveness'' of an object is a
global property which depends on the liveness and state of other
objects. With implicit storage reclamation, the developer of a
routine can reason locally: ``if I don't need this object, I don't
need reference it anymore.'' With explicit storage reclamation the
developer has to ask the question whether the object in question is
still needed by other routines or whether the object can be
freed. This introduces non-local bookkeeping into routines that
otherwise would be locally understandable and flexibly composable.
Determining the liveness ``by hand'' in the case of explicit storage
reclamation becomes more difficult as the application grows. More
important, it creates interdependencies between the different modules
(data structures, libraries). The user of a library must know by who
and when objects are allocated and freed. It is, therefore, not
uncommon that the developer falls back on variants of reference
counting or other garbage collection techniques to help him with the
bookkeeping of unused objects. The integration of an optimized garbage
collector for the design of large, dynamic, and modular applications
becomes the better choice both in terms of development efforts and
runtime efficiency [App87,Zor93].
Before we discuss existing garbage collection techniques, we briefly
recall the notions of fragmentation and locality of reference. When
objects are allocated and freed dynamically, free space gets
interleaved with live objects and the heap space gets
fragmented. Without special care the maximum size of contiguous free
space may become too small to satisfy allocation requests, even though
the total size of free space is large enough. Locality of reference
means that if a memory page is referenced, chances are that the next
reference will be in the same page. A good locality of reference is
important to obtain good performance when virtual memory is used. Part
of the virtual memory is kept on the hard disk. Referencing a page
that is not loaded but stored on the hard disk results in a page
fault. The page will be loaded into memory. When objects of the same
age end up in different pages the assumptions about locality of
reference do not hold. This will result in many page faults and poor
performance. Fragmentation and locality of reference are two factors
that influence the efficiency of a collection technique.
In the rest of the section we will give an overview of garbage
collection techniques. This brief and incomplete discussion will help
us understand the behavior of our system in
chapter 7. Much of this section is based on Paul
Wilson's overview of uniprocessor garbage collection techniques
[Wil94].
1.2.1 Basic garbage collection techniques
We will discuss four garbage collection techniques: reference
counting, mark-and-sweep garbage collectors, copy collectors, and
non-copying implicit collectors.
Reference counting is not always considered to be a true garbage
collection technique. We include it for completeness. In this
technique, every allocated object has an associated count. Every time
a variable is set to point to an object, the object's count is
incremented. When the reference is destroyed, the count is
decremented. When the count goes to zero the object can be freed since
no variables refer to it. The object is scanned for pointer variables
and all the objects to which it refers have their reference count
decremented. One advantage of reference counting is that it is
``incremental:'' the garbage collection is closely interleaved with
the program execution and advances step-wise. Updating the reference
count takes a few operations that can be bounded in time. The
reclamation of objects can be deferred by linking them in a free-list
and freeing them few at a time. There are two major problems with
reference counting. The first is that reference counting fails to
reclaim cyclic structures. The reference counts of the objects that
form a cyclic structure are never set to zero. This can lead to an
accumulation of garbage. The second problem with reference counting is
that its cost is generally proportional to the amount of work done by
the running program, with a fairly high large constant of
proportionality.
|
|
Figure 1.1: The mark-and-sweep collector: the collector
traces the live objects starting from the root set. The root set
includes the variables on the stack, in the data segment (global and
static variables), and in the registers. The traversal marks the
live objects (black in the figure.) The objects not marked (white)
are linked in a free list. |
A mark-and-sweep collector proceeds in two phases as in the general
case stated above. The garbage detection is called the mark phase, the
garbage reclamation the sweep phase. During the mark phase the live
objects are traced by traversing the application's data structures
starting from the root set. Live objects are marked, often by
toggling a bit in the header of the object. Once the live objects have
been made distinguishable from the garbage objects the memory is
swept: all the objects that are not marked (the garbage objects) are
linked into a free list (Fig. 1.1). There are
three major problems with mark-and-sweep collectors: fragmentation,
efficiency, and locality. Since the heap is never compacted objects
stay in their place. New objects will be allocated in the free space
between old objects. This has negative implications for fragmentation
and locality of reference. Mark-and-sweep collectors are therefore
often considered unsuitable for virtual memory applications. The
mark-compact collector remedies the fragmentation and locality problem
of the mark-and-sweep collector. After the marking phase it compacts
the heap. Live objects are slid to one side of the heap adjacent one
to another. After compaction the free space forms one contiguous space
at the other side of the heap. The second problem of mark-and-sweep
collectors is that the amount of work is proportional to the size of
the heap. To find the garbage the complete heap has to be scanned.
|
|
|
|
|
Figure 1.2:
The copy collector.
|
The third type of collectors we discuss is the copy collector. The
simple model of a copy collector is the stop-and-copy collector. This
collector uses two semi-spaces, the to-space and the from-space. New objects are allocated in the from-space. When no
more space is available all live objects are copied to the
to-space. This can be done in one traversal. The from-space is now
known to contain only garbage, and the roles of the two spaces are
inverted. The term scavenging is sometimes used to denote this
traversal since only the worthwhile objects amid the garbage are
saved. If an object can be reached following two different paths it
must be copied only once. To this effect extra space is reserved in
the header of an object to store a forwarding pointer. When an object
is copied the forwarding pointer is set to the address of the new
object. During the scavenging the forwarding pointer is checked
first. If the object has already been copied only the pointer that led
to the object needs to be updated (Fig. 1.2).
The advantages of the copying collector are that allocation can be
done very fast. It requires to increment the pointer that points to
the available space in the from-space. The copying also results in a
defragmentation. Fast algorithms exist to traverse the heap (Cheney's
algorithm). The amount of work that needs to be done is proportional
to the amount of live objects. Copy collectors do not need to traverse
the entire heap, as is the case for mark-and-sweep collectors, which
has positive implications for virtual memory applications. A simple
way to decrease the frequency of the garbage collection is to increase
the size of the heap. Large, long lived objects, however, will be
copied at every collection. Also, only one of the two semi-spaces is
used which results in a memory usage of less than fifty percent.
Copy collectors move all the live data out of one region of the memory
into another region. The last region is known to contain only garbage
and can be reclaimed without further examination. The amount of work
that needs to be done is proportional to the amount of life
objects. The objects do not really have to be in different areas of
the memory to apply this strategy. They can be thought of as part of
different sets. Initially the to-set is empty and all
allocated objects are inserted into the from-set. During the
traversal, the live objects are removed from the from-set and inserted
into the to-set. All objects remaining in the from-set are known to be
garbage and can be linked into the free-set. Instead of using
different memory regions, double linked lists can be used. This
collection technique is called non-copying implicit collection. With
copy collectors it shares the advantage of discarding a sweep phase
and only visiting the live object. In addition, it avoids the copying
of objects. However, since it performs no compaction, they risk to
fragment the heap.
Garbage collecting techniques can be accurate or conservative. They
are accurate if they can distinguish pointers from non-pointer
data. They are conservative when they only have partial information
about the locations of pointers. In that case they must treat every
memory word as a potential pointer. They may misidentify words as
pointers and wrongly retain garbage. Furthermore, they can not move
objects around (as do copy collectors) since this might overwrite data
mistakenly interpreted as pointers [PB98].
Conservatism is needed for languages and compilers that do not provide
any help for implicit memory management, in particular Pascal, C, and
C++. Conservative collectors for C have been proposed. Well known is
the collector developed by Boehm [Boe93].
Traditional collectors perform a full scale garbage collection when
the application runs out of memory. During this period the application
is interrupted. This reduces the responsiveness of a system. In the
next section we consider incremental collectors. They improve the
responsiveness thru a stepwise collection.
1.2.2 Incremental collectors
Incremental collectors interleave small units of garbage collection
with small units of program execution. The garbage collection is not
performed as one atomic operation while the application is halted.
However, the scanning of the root set is normally done as one atomic
operation. Incremental copy collectors, for example, start with a flip operation during which the root set is copied into the to-space
[Bak78].
The main problem with incremental collectors
is that while the collector is tracing out the graph of reachable
objects the running application can mutate the graph while the
collector is not looking. The running application is therefore called
the mutator. An incremental collector must be able to keep track of
the changes made to the graph of reachable objects and retrace some
if necessary. Collectors that move objects such as the copying
collector must also protect the mutator. Which one of the two, the
application or the collector, is regarded as a mutator depends on
who's side you stand. The two can be regarded as two processes that
share changing data while maintaining some kind of consistent view.
|
|
Figure 1.3:
The figure shows a violation of the color invariant after the
mutator stored a pointer to C into A and erased the reference from D
to C. If the collector is not protected against modifications by the
mutator, object C in the figure will be collected and the data
structures will be corrupted.
|
The abstraction of tricolor marking is helpful in understanding
incremental garbage collection. Garbage detection algorithms can be
described as a process of traversing the graph of reachable objects
and coloring them. The objects subject to collection are conceptually
colored white, and by the end of the collection the objects that must
be retained must be colored black. The garbage detection phase ends
when all reachable objects have been traversed and there are no more
objects to blacken. In the mark-and-sweep algorithm marked objects are
considered black, unreached objects are white. In the copying
algorithm the objects in the to-space are black, the unreached objects
in the from-space are white (see Fig. 1.1 and
1.2). To understand the interactions between an
incremental collector and the mutator it is helpful to introduce a
third color, gray, to signify that an object has been reached but that
its descendants might not have been. The gray colored objects form a
wave front separating the white from the black objects. No direct
reference from the black to the white objects should exist. This
property is referred to as the color invariant. The mutator moves
pointers around and may the gray wave front
(Figure 1.3). The mutator corrupts the
collectors tracing when 1) it writes a pointer to a white object into
a black object, and 2) when it erased the original pointer
before the collector sees it. The mutator must prevent the collector
somehow when the assumption of the color invariant is violated. There
are two basic techniques for the coordination between the two: read
barriers and write barriers. Read barriers catch every attempt to read
a pointer to a white object. The white object is immediately colored
gray. Since the mutator can never access a white object, no pointers
to white objects can be stored in black objects. Write barriers catch
any attempt to write a pointer into another object. Write barriers
come in two flavors. One strategy, snapshot-at-beginning, is to
ensure that objects never get lost by preventing pointers to get
lost. When a path to an object is changed, a new path is created to
ensure that the object will be found. One implementation of a write
barrier is to store the original pointer onto a stack for later
examination before it is overwritten with a new value.
Snapshot-at-beginning techniques retain all the objects that were
alive at the beginning of the collection. Another strategy, incremental update collectors, focuses on the writing of pointers
into objects that already have been examined by the collector. The
collector's view of the reachable data structures is incrementally
updated as changes to the object graph are made. Conceptually,
whenever a pointer to a white object is written into a black object,
the black object or the white object is reverted gray. Incremental
update strategies are able to reclaim objects that become garbage
during the collection.
The mutator also has to be protected from changes made by the
collector. An incremental copying collector must prevent the mutator
to access an old copy in the from-heap instead of the new copy in the
to-heap. Copying collectors use read barriers to assure the mutator
finds the object at the correct location.
Read barriers and write barriers are conceptually synchronization
mechanisms. Before the mutator can access or write a location, the
collector must perform some bookkeeping. This bookkeeping, in general,
requires only a few actions. These operations can be inlined into the
mutators code for efficiency. Since write operations occur less
frequent than read operations, write barriers are potentially much
cheaper.
1.2.3 Generational collectors
Generational collectors try to benefit from the empirically observed
property that most allocated objects live for a very short time, while
a small percentage of them live much longer. If an object survives one
collection, chances are it will survive many collections. If the
collector copies objects or compacts the heap, these long lived
objects are copied over and over again. Generational techniques split
the heap into several sub-heaps and segregates the objects in the
sub-heaps according to their age. New objects are allocated in the
first generation sub-heap. When the application runs out of memory,
only the first generation sub-heap is scanned. The objects that
survive one or more collections are moved to the second generation
heap. If most objects are indeed short-lived, most of the first
generation heap is freed and few objects need copying to the second
generation heap. The older generation sub-heaps are scanned less
frequently. Since smaller fragments of the heap are scanned and
proportionally more storage space is recovered, the overall efficiency
of the garbage collector improves.
1.2.4 Dynamic memory management in real-time applications
Real-time applications must be guaranteed sufficient storage space to
execute their task. Real-time application design carefully calculates
and allocates all storage space statically before
execution. Traditional real-time applications, therefore, do not
manage their memory usage dynamically. However, when the application
deals with problems that are unpredictable in size and complexity,
dynamic memory management is unavoidable. The real-time application
designer will therefore need to apply real-time design techniques to
the memory allocation and deallocation. This implies a number of
constraints:
- The use of CPU and memory of the allocation and deallocation
technique must be efficient.
- The time required to allocate and deallocate memory must be
tightly bound.
- The time required to read/write a heap allocated object must
be tightly bound.
- The application must make sufficient progress and not be
interrupted by the collector for too long or too frequent.
- The availability of storage space must be guaranteed.
The efficiency of the CPU and memory usage of the allocation and
deallocation technique is not, strictly speaking, a real-time
constraint, but can be decisive in the question whether dynamic memory
management is possible or not. The CPU efficiency of dynamic memory
management in general greatly depends upon the type of the
application. Furthermore, the efficiency of a particular garbage
collection technique depends on the targeted platform: CPU-type,
amount of available memory, memory-management hardware support, data
caches, etc. When comparing explicit and implicit memory management
techniques, both seem to be comparably efficient for the CPU usage,
although techniques relying on garbage collection require more
memory. In an early article, Appel even reported how implicit memory
techniques can be faster than traditional techniques
[App87]. He comes to this conclusion by making the heap
arbitrarily big. Zorn tested six existing, large C programs with four
explicit and one implicit memory strategy and concludes that CPU
overhead of storage managements is application dependent and that
garbage collection compares well with other explicit storage
management techniques [Zor93]. The memory usage is another
important factor, in particular for embedded systems with limited
hardware resources [PB98]. On this point garbage
collectors score low. Copy collectors have a memory usage of less than
fifty percent and are for this reason alone ruled out for memory
scarce systems.
One of the major problems with dynamic memory management for real-time
applications is that the time needed to allocate and deallocate memory
is unpredictable or has unacceptable worst case values. Nilsen
examined the average and worst case delays needed for well-known
memory allocation algorithms of several operating systems. The
conclusion is clear: dynamic memory allocation, independent of garbage
collected or not, represent an potential source for unpredictable
timing performance [NG95]. On this point, implicit
memory management techniques can offer better performance than
traditional allocation techniques. Copy collectors only require the
incrementing of a pointer plus a boundary test for the allocation, and
deallocation comes for free.
Incremental garbage collection techniques require the synchronization
between the collector and mutator. Read or write barriers may include
extra instructions for every referencing of a heap object. A range
check may be required for every pointer dereferencing. This introduces
an extra overhead. Also virtual memory protection can be configured to
generate page faults on those pages that require synchronization
[AEL88]. This solution offers unacceptable bad worst-case
times [WJ93,Nil94]. Incremental copy
collectors move objects whenever the mutator references an object in
the from-space. Baker's real-time copy collector was designed for Lisp
and only handles objects of the same size: two words
[Bak78]. It therefore offers predictable performance.
However, copying during dereferencing is unacceptable if the objects
can be arbitrarily large. Nilsen remedies this problem with a lazy
copying: the collector only reserves the space for the copied object
and keeps the copying for a later date. In general, though,
synchronization reduces the system throughput and in many cases offers
bad worst case delays. Wilson & Johnstone's non-copying implicit
collector uses a write barrier to synchronize the collector and the
mutator, making the coordination costs cheaper and more predictable.
The garbage collection can itself be considered as a real-time task
with a fixed upper time limit and a fixed period. There are, however,
some atomic operations which the garbage collector must do. The delay
caused by these operation might determine the limitations on the
real-time guarantees. One such atomic operation is the root set
scanning. Copy collectors must perform a flip operation. This may
require the copying of the root set (unless objects are copied
lazily). This flip operation must also invalidate the memory caches to
assure that all stored data is written thru. Another concern is that
the application must be guaranteed sufficient progress: even if the
interruption of the collector is small and bounded in time, they must
not occur too often. The collector must guarantee that for any given
increment of computation, a minimum amount of the CPU is always
available for the running application [WJ93]. Baker's
algorithm, for example, closely integrated program execution and
collector actions. The collector's interruptions may cluster together,
preventing the application to meet its deadlines.
One of the reasons why real-time developers are reluctant to use
dynamic memory management, is that they fear that the heap will get so
fragmented that future storage request can not be satisfied. They do
not use dynamic memory allocation at all, or use predictable
allocation techniques that use a number of fixed size allocation cells
to prevent fragmentation at the expense of storage overhead (for
example, Kingsley's powers-of-two buddy algorithm
[Kin82]). Wilson & Johnstone limit the risk of
fragmentation of their real-time non-copying implicit collector by
rounding the size of storage space requests up to a power of two, much
like the buddy allocation algorithm [WJ93]. Another issue
of real-time garbage collection is the guarantee of sufficient
progress. The garbage collector has a real-time task of its own:
collect the heap before all storage space is allocated. The garbage
collection must therefore be able to keep up with the mutator's
allocation rate.
Nilsen's study of available techniques [Nil94]
concludes that without the assistance of specialized hardware, the
existing real-time garbage collection techniques may generalize to
soft real-time applications. According to him real-time system
designers argue that these techniques do not provide the fine
granularity of timing behavior that is required for the creation of
reliable real-time systems. Wilson & Johnstone argue that their
incremental non-copying implicit garbage collector satisfies hard
real-time constraints. We have found no studies that test their claims
in fine grained real-time applications. As discussed in
section 1.1, real-time design techniques measure the maximum
time needed for every instruction. Wilson remarks that this approach
unrealistically emphasizes the smallest operations
[Wil94]. For some complex tasks thousand of instruction are
executed. For most applications, a more realistic requirement for
real-time performance is that the application always be able to use
the CPU for a given fraction of time at a time scale relevant to the
application. Many real-time applications can function correctly if the
garbage collector sometimes stops the application, provided that these
pauses are not to frequent and too short to be relevant to the
applications deadlines.
This overview of garbage collection techniques may give the reader an
idea of the complexity of the issue. Real-time garbage collections
remains a topic of discussion. Many authors propose real-time garbage
collection techniques, but there claims often seem to be valid for
particular applications, not for the general case or for heavy duty or
short delay real-time systems. We have not found any article that
clearly states that it is possible or impossible to implement fine
grained real-time garbage collection on commercial uniprocessor
systems.