previous up next
1.2 Dynamic memory management

The memory management of an application is said to be dynamic if its storage needs vary during the execution. The application then allocates space in the heap to satisfy new storage requests. Since the size of storage space is bounded, the application must free unused objects to assure that future requests can be satisfied. Storage space can be freed either explicitly or implicitly. When explicit memory reclamation is used, it is up to the application developer to free previously allocated objects. When the space is reclaimed implicitly a garbage collector searches and frees the objects which are no longer used by the application. Applications written in the C/C++ language typically use explicit memory reclamation. Lisp-style languages and Java rely on a garbage collector to reclaim storage space.

With implicit memory management, the developer need not worry about the bookkeeping of previously allocated memory. When the application runs out of storage space a garbage collector reclaims the allocated memory that is no longer used (hence garbage) by the application. This makes programming simpler and more straightforward. Objects that are still used by the application are called ``live.'' The collection of garbage normally proceeds in two steps: 1) distinguishing the live objects from the garbage (garbage detection or tracing), 2) reclaiming the garbage objects so that their occupied space can be reused (garbage reclamation). The garbage collector traces the live objects starting from the active variables. These includes the statically allocated and global objects, the local variables on the stack, and the variables in the registers. These objects are called the root set. Any object that is reachable from the root set by dereferencing is also reachable by the application. It is therefore considered alive. The objects that are not reachable by traversal of the application's data structures are considered garbage.

Both explicit and implicit memory reclamation have their advantages and inconveniences. Explicit reclamation can lead to hard to find errors. Forgetting to free memory leads to an accumulation of garbage (memory leaks) until the application runs out of memory. Reclaiming memory too early can lead to strange behavior. The freed memory can be used to allocate a new object; two objects occupy the same physical space and overwrite each other data causing unpredictable mutations. These errors are hard to find. Moreover, it's possible that the application runs correctly in most situations and that the errors only show up in particular situations. Garbage collection, however, is able to reclaim most garbage and therefore eliminates the risk of running out of memory due to memory leaks.

Garbage collection is often considered expensive, both in CPU time and in storage space. For every object the collector may allocate an additional header to store information about the object. Furthermore, traditional collectors interrupt the application when it runs out of memory to perform a full garbage detection and reclamation. This interruption can take a fair amount of time and result in a poor responsiveness of the application. More advanced garbage collectors using some of the techniques discussed later are sometimes cheaper, are less disruptive, and are usually competitive with explicit deallocation [Zor93,NG95].

In terms of software design, the convincing arguments in favor of implicit memory management is that garbage collection is necessary for fully modular programming. The ``liveness'' of an object is a global property which depends on the liveness and state of other objects. With implicit storage reclamation, the developer of a routine can reason locally: ``if I don't need this object, I don't need reference it anymore.'' With explicit storage reclamation the developer has to ask the question whether the object in question is still needed by other routines or whether the object can be freed. This introduces non-local bookkeeping into routines that otherwise would be locally understandable and flexibly composable. Determining the liveness ``by hand'' in the case of explicit storage reclamation becomes more difficult as the application grows. More important, it creates interdependencies between the different modules (data structures, libraries). The user of a library must know by who and when objects are allocated and freed. It is, therefore, not uncommon that the developer falls back on variants of reference counting or other garbage collection techniques to help him with the bookkeeping of unused objects. The integration of an optimized garbage collector for the design of large, dynamic, and modular applications becomes the better choice both in terms of development efforts and runtime efficiency [App87,Zor93].

Before we discuss existing garbage collection techniques, we briefly recall the notions of fragmentation and locality of reference. When objects are allocated and freed dynamically, free space gets interleaved with live objects and the heap space gets fragmented. Without special care the maximum size of contiguous free space may become too small to satisfy allocation requests, even though the total size of free space is large enough. Locality of reference means that if a memory page is referenced, chances are that the next reference will be in the same page. A good locality of reference is important to obtain good performance when virtual memory is used. Part of the virtual memory is kept on the hard disk. Referencing a page that is not loaded but stored on the hard disk results in a page fault. The page will be loaded into memory. When objects of the same age end up in different pages the assumptions about locality of reference do not hold. This will result in many page faults and poor performance. Fragmentation and locality of reference are two factors that influence the efficiency of a collection technique.

In the rest of the section we will give an overview of garbage collection techniques. This brief and incomplete discussion will help us understand the behavior of our system in chapter 7. Much of this section is based on Paul Wilson's overview of uniprocessor garbage collection techniques [Wil94].

1.2.1 Basic garbage collection techniques

We will discuss four garbage collection techniques: reference counting, mark-and-sweep garbage collectors, copy collectors, and non-copying implicit collectors.

Reference counting is not always considered to be a true garbage collection technique. We include it for completeness. In this technique, every allocated object has an associated count. Every time a variable is set to point to an object, the object's count is incremented. When the reference is destroyed, the count is decremented. When the count goes to zero the object can be freed since no variables refer to it. The object is scanned for pointer variables and all the objects to which it refers have their reference count decremented. One advantage of reference counting is that it is ``incremental:'' the garbage collection is closely interleaved with the program execution and advances step-wise. Updating the reference count takes a few operations that can be bounded in time. The reclamation of objects can be deferred by linking them in a free-list and freeing them few at a time. There are two major problems with reference counting. The first is that reference counting fails to reclaim cyclic structures. The reference counts of the objects that form a cyclic structure are never set to zero. This can lead to an accumulation of garbage. The second problem with reference counting is that its cost is generally proportional to the amount of work done by the running program, with a fairly high large constant of proportionality.

Figure 1.1
Figure 1.1: The mark-and-sweep collector: the collector traces the live objects starting from the root set. The root set includes the variables on the stack, in the data segment (global and static variables), and in the registers. The traversal marks the live objects (black in the figure.) The objects not marked (white) are linked in a free list.

A mark-and-sweep collector proceeds in two phases as in the general case stated above. The garbage detection is called the mark phase, the garbage reclamation the sweep phase. During the mark phase the live objects are traced by traversing the application's data structures starting from the root set. Live objects are marked, often by toggling a bit in the header of the object. Once the live objects have been made distinguishable from the garbage objects the memory is swept: all the objects that are not marked (the garbage objects) are linked into a free list (Fig. 1.1). There are three major problems with mark-and-sweep collectors: fragmentation, efficiency, and locality. Since the heap is never compacted objects stay in their place. New objects will be allocated in the free space between old objects. This has negative implications for fragmentation and locality of reference. Mark-and-sweep collectors are therefore often considered unsuitable for virtual memory applications. The mark-compact collector remedies the fragmentation and locality problem of the mark-and-sweep collector. After the marking phase it compacts the heap. Live objects are slid to one side of the heap adjacent one to another. After compaction the free space forms one contiguous space at the other side of the heap. The second problem of mark-and-sweep collectors is that the amount of work is proportional to the size of the heap. To find the garbage the complete heap has to be scanned.

Figure 1.2.a
Figure 1.2.b
Figure 1.2.c
Figure 1.2.d
Figure 1.2: The copy collector.

The third type of collectors we discuss is the copy collector. The simple model of a copy collector is the stop-and-copy collector. This collector uses two semi-spaces, the to-space and the from-space. New objects are allocated in the from-space. When no more space is available all live objects are copied to the to-space. This can be done in one traversal. The from-space is now known to contain only garbage, and the roles of the two spaces are inverted. The term scavenging is sometimes used to denote this traversal since only the worthwhile objects amid the garbage are saved. If an object can be reached following two different paths it must be copied only once. To this effect extra space is reserved in the header of an object to store a forwarding pointer. When an object is copied the forwarding pointer is set to the address of the new object. During the scavenging the forwarding pointer is checked first. If the object has already been copied only the pointer that led to the object needs to be updated (Fig. 1.2). The advantages of the copying collector are that allocation can be done very fast. It requires to increment the pointer that points to the available space in the from-space. The copying also results in a defragmentation. Fast algorithms exist to traverse the heap (Cheney's algorithm). The amount of work that needs to be done is proportional to the amount of live objects. Copy collectors do not need to traverse the entire heap, as is the case for mark-and-sweep collectors, which has positive implications for virtual memory applications. A simple way to decrease the frequency of the garbage collection is to increase the size of the heap. Large, long lived objects, however, will be copied at every collection. Also, only one of the two semi-spaces is used which results in a memory usage of less than fifty percent.

Copy collectors move all the live data out of one region of the memory into another region. The last region is known to contain only garbage and can be reclaimed without further examination. The amount of work that needs to be done is proportional to the amount of life objects. The objects do not really have to be in different areas of the memory to apply this strategy. They can be thought of as part of different sets. Initially the to-set is empty and all allocated objects are inserted into the from-set. During the traversal, the live objects are removed from the from-set and inserted into the to-set. All objects remaining in the from-set are known to be garbage and can be linked into the free-set. Instead of using different memory regions, double linked lists can be used. This collection technique is called non-copying implicit collection. With copy collectors it shares the advantage of discarding a sweep phase and only visiting the live object. In addition, it avoids the copying of objects. However, since it performs no compaction, they risk to fragment the heap.

Garbage collecting techniques can be accurate or conservative. They are accurate if they can distinguish pointers from non-pointer data. They are conservative when they only have partial information about the locations of pointers. In that case they must treat every memory word as a potential pointer. They may misidentify words as pointers and wrongly retain garbage. Furthermore, they can not move objects around (as do copy collectors) since this might overwrite data mistakenly interpreted as pointers [PB98]. Conservatism is needed for languages and compilers that do not provide any help for implicit memory management, in particular Pascal, C, and C++. Conservative collectors for C have been proposed. Well known is the collector developed by Boehm [Boe93].

Traditional collectors perform a full scale garbage collection when the application runs out of memory. During this period the application is interrupted. This reduces the responsiveness of a system. In the next section we consider incremental collectors. They improve the responsiveness thru a stepwise collection.

1.2.2 Incremental collectors

Incremental collectors interleave small units of garbage collection with small units of program execution. The garbage collection is not performed as one atomic operation while the application is halted. However, the scanning of the root set is normally done as one atomic operation. Incremental copy collectors, for example, start with a flip operation during which the root set is copied into the to-space [Bak78].

The main problem with incremental collectors is that while the collector is tracing out the graph of reachable objects the running application can mutate the graph while the collector is not looking. The running application is therefore called the mutator. An incremental collector must be able to keep track of the changes made to the graph of reachable objects and retrace some if necessary. Collectors that move objects such as the copying collector must also protect the mutator. Which one of the two, the application or the collector, is regarded as a mutator depends on who's side you stand. The two can be regarded as two processes that share changing data while maintaining some kind of consistent view.

Figure 1.3
Figure 1.3: The figure shows a violation of the color invariant after the mutator stored a pointer to C into A and erased the reference from D to C. If the collector is not protected against modifications by the mutator, object C in the figure will be collected and the data structures will be corrupted.

The abstraction of tricolor marking is helpful in understanding incremental garbage collection. Garbage detection algorithms can be described as a process of traversing the graph of reachable objects and coloring them. The objects subject to collection are conceptually colored white, and by the end of the collection the objects that must be retained must be colored black. The garbage detection phase ends when all reachable objects have been traversed and there are no more objects to blacken. In the mark-and-sweep algorithm marked objects are considered black, unreached objects are white. In the copying algorithm the objects in the to-space are black, the unreached objects in the from-space are white (see Fig. 1.1 and 1.2). To understand the interactions between an incremental collector and the mutator it is helpful to introduce a third color, gray, to signify that an object has been reached but that its descendants might not have been. The gray colored objects form a wave front separating the white from the black objects. No direct reference from the black to the white objects should exist. This property is referred to as the color invariant. The mutator moves pointers around and may the gray wave front (Figure 1.3). The mutator corrupts the collectors tracing when 1) it writes a pointer to a white object into a black object, and 2) when it erased the original pointer before the collector sees it. The mutator must prevent the collector somehow when the assumption of the color invariant is violated. There are two basic techniques for the coordination between the two: read barriers and write barriers. Read barriers catch every attempt to read a pointer to a white object. The white object is immediately colored gray. Since the mutator can never access a white object, no pointers to white objects can be stored in black objects. Write barriers catch any attempt to write a pointer into another object. Write barriers come in two flavors. One strategy, snapshot-at-beginning, is to ensure that objects never get lost by preventing pointers to get lost. When a path to an object is changed, a new path is created to ensure that the object will be found. One implementation of a write barrier is to store the original pointer onto a stack for later examination before it is overwritten with a new value. Snapshot-at-beginning techniques retain all the objects that were alive at the beginning of the collection. Another strategy, incremental update collectors, focuses on the writing of pointers into objects that already have been examined by the collector. The collector's view of the reachable data structures is incrementally updated as changes to the object graph are made. Conceptually, whenever a pointer to a white object is written into a black object, the black object or the white object is reverted gray. Incremental update strategies are able to reclaim objects that become garbage during the collection.

The mutator also has to be protected from changes made by the collector. An incremental copying collector must prevent the mutator to access an old copy in the from-heap instead of the new copy in the to-heap. Copying collectors use read barriers to assure the mutator finds the object at the correct location.

Read barriers and write barriers are conceptually synchronization mechanisms. Before the mutator can access or write a location, the collector must perform some bookkeeping. This bookkeeping, in general, requires only a few actions. These operations can be inlined into the mutators code for efficiency. Since write operations occur less frequent than read operations, write barriers are potentially much cheaper.

1.2.3 Generational collectors

Generational collectors try to benefit from the empirically observed property that most allocated objects live for a very short time, while a small percentage of them live much longer. If an object survives one collection, chances are it will survive many collections. If the collector copies objects or compacts the heap, these long lived objects are copied over and over again. Generational techniques split the heap into several sub-heaps and segregates the objects in the sub-heaps according to their age. New objects are allocated in the first generation sub-heap. When the application runs out of memory, only the first generation sub-heap is scanned. The objects that survive one or more collections are moved to the second generation heap. If most objects are indeed short-lived, most of the first generation heap is freed and few objects need copying to the second generation heap. The older generation sub-heaps are scanned less frequently. Since smaller fragments of the heap are scanned and proportionally more storage space is recovered, the overall efficiency of the garbage collector improves.

1.2.4 Dynamic memory management in real-time applications

Real-time applications must be guaranteed sufficient storage space to execute their task. Real-time application design carefully calculates and allocates all storage space statically before execution. Traditional real-time applications, therefore, do not manage their memory usage dynamically. However, when the application deals with problems that are unpredictable in size and complexity, dynamic memory management is unavoidable. The real-time application designer will therefore need to apply real-time design techniques to the memory allocation and deallocation. This implies a number of constraints:

  • The use of CPU and memory of the allocation and deallocation technique must be efficient.

  • The time required to allocate and deallocate memory must be tightly bound.

  • The time required to read/write a heap allocated object must be tightly bound.

  • The application must make sufficient progress and not be interrupted by the collector for too long or too frequent.

  • The availability of storage space must be guaranteed.

The efficiency of the CPU and memory usage of the allocation and deallocation technique is not, strictly speaking, a real-time constraint, but can be decisive in the question whether dynamic memory management is possible or not. The CPU efficiency of dynamic memory management in general greatly depends upon the type of the application. Furthermore, the efficiency of a particular garbage collection technique depends on the targeted platform: CPU-type, amount of available memory, memory-management hardware support, data caches, etc. When comparing explicit and implicit memory management techniques, both seem to be comparably efficient for the CPU usage, although techniques relying on garbage collection require more memory. In an early article, Appel even reported how implicit memory techniques can be faster than traditional techniques [App87]. He comes to this conclusion by making the heap arbitrarily big. Zorn tested six existing, large C programs with four explicit and one implicit memory strategy and concludes that CPU overhead of storage managements is application dependent and that garbage collection compares well with other explicit storage management techniques [Zor93]. The memory usage is another important factor, in particular for embedded systems with limited hardware resources [PB98]. On this point garbage collectors score low. Copy collectors have a memory usage of less than fifty percent and are for this reason alone ruled out for memory scarce systems.

One of the major problems with dynamic memory management for real-time applications is that the time needed to allocate and deallocate memory is unpredictable or has unacceptable worst case values. Nilsen examined the average and worst case delays needed for well-known memory allocation algorithms of several operating systems. The conclusion is clear: dynamic memory allocation, independent of garbage collected or not, represent an potential source for unpredictable timing performance [NG95]. On this point, implicit memory management techniques can offer better performance than traditional allocation techniques. Copy collectors only require the incrementing of a pointer plus a boundary test for the allocation, and deallocation comes for free.

Incremental garbage collection techniques require the synchronization between the collector and mutator. Read or write barriers may include extra instructions for every referencing of a heap object. A range check may be required for every pointer dereferencing. This introduces an extra overhead. Also virtual memory protection can be configured to generate page faults on those pages that require synchronization [AEL88]. This solution offers unacceptable bad worst-case times [WJ93,Nil94]. Incremental copy collectors move objects whenever the mutator references an object in the from-space. Baker's real-time copy collector was designed for Lisp and only handles objects of the same size: two words [Bak78]. It therefore offers predictable performance. However, copying during dereferencing is unacceptable if the objects can be arbitrarily large. Nilsen remedies this problem with a lazy copying: the collector only reserves the space for the copied object and keeps the copying for a later date. In general, though, synchronization reduces the system throughput and in many cases offers bad worst case delays. Wilson & Johnstone's non-copying implicit collector uses a write barrier to synchronize the collector and the mutator, making the coordination costs cheaper and more predictable.

The garbage collection can itself be considered as a real-time task with a fixed upper time limit and a fixed period. There are, however, some atomic operations which the garbage collector must do. The delay caused by these operation might determine the limitations on the real-time guarantees. One such atomic operation is the root set scanning. Copy collectors must perform a flip operation. This may require the copying of the root set (unless objects are copied lazily). This flip operation must also invalidate the memory caches to assure that all stored data is written thru. Another concern is that the application must be guaranteed sufficient progress: even if the interruption of the collector is small and bounded in time, they must not occur too often. The collector must guarantee that for any given increment of computation, a minimum amount of the CPU is always available for the running application [WJ93]. Baker's algorithm, for example, closely integrated program execution and collector actions. The collector's interruptions may cluster together, preventing the application to meet its deadlines.

One of the reasons why real-time developers are reluctant to use dynamic memory management, is that they fear that the heap will get so fragmented that future storage request can not be satisfied. They do not use dynamic memory allocation at all, or use predictable allocation techniques that use a number of fixed size allocation cells to prevent fragmentation at the expense of storage overhead (for example, Kingsley's powers-of-two buddy algorithm [Kin82]). Wilson & Johnstone limit the risk of fragmentation of their real-time non-copying implicit collector by rounding the size of storage space requests up to a power of two, much like the buddy allocation algorithm [WJ93]. Another issue of real-time garbage collection is the guarantee of sufficient progress. The garbage collector has a real-time task of its own: collect the heap before all storage space is allocated. The garbage collection must therefore be able to keep up with the mutator's allocation rate.

Nilsen's study of available techniques [Nil94] concludes that without the assistance of specialized hardware, the existing real-time garbage collection techniques may generalize to soft real-time applications. According to him real-time system designers argue that these techniques do not provide the fine granularity of timing behavior that is required for the creation of reliable real-time systems. Wilson & Johnstone argue that their incremental non-copying implicit garbage collector satisfies hard real-time constraints. We have found no studies that test their claims in fine grained real-time applications. As discussed in section 1.1, real-time design techniques measure the maximum time needed for every instruction. Wilson remarks that this approach unrealistically emphasizes the smallest operations [Wil94]. For some complex tasks thousand of instruction are executed. For most applications, a more realistic requirement for real-time performance is that the application always be able to use the CPU for a given fraction of time at a time scale relevant to the application. Many real-time applications can function correctly if the garbage collector sometimes stops the application, provided that these pauses are not to frequent and too short to be relevant to the applications deadlines.

This overview of garbage collection techniques may give the reader an idea of the complexity of the issue. Real-time garbage collections remains a topic of discussion. Many authors propose real-time garbage collection techniques, but there claims often seem to be valid for particular applications, not for the general case or for heavy duty or short delay real-time systems. We have not found any article that clearly states that it is possible or impossible to implement fine grained real-time garbage collection on commercial uniprocessor systems.

previous up next