You are on page 1of 16

SOFTWARE—PRACTICE AND EXPERIENCE, VOL.

24(6), 527–542 (JUNE 1994)

Memory Allocation Costs in Large C and C Programs
Systems Research Center, Digital Equipment Corporation, 130 Lytton Avenue, Palo Alto, CA 94301, U.S.A. (e-mail: detlefs src.dec.com,dosser zso.dec.com)

david detlefs and al dosser and

Department of Computer Science, Campus Box #430, University of Colorado, Boulder CO 80309-0430, U.S.A. (e-mail: zorn cs.colorado.edu)

benjamin zorn

SUMMARY Dynamic storage allocation is an important part of a large class of computer programs written in C and C . High-performance algorithms for dynamic storage allocation have been, and will continue to be, of considerable interest. This paper presents detailed measurements of the cost of dynamic storage allocation in 11 diverse C and C programs using five very different dynamic storage allocation implementations, including a conservative garbage collection algorithm. Four of the allocator implementations measured are publicly available on the Internet. A number of the programs used in these measurements are also available on the Internet to facilitate further research in dynamic storage allocation. Finally, the data presented in this paper is an abbreviated version of more extensive statistics that are also publicly available on the Internet.
key words: Garbage collection Dynamic storage allocation tion Dynamic memory management Performance evaluation Conservative collec-

INTRODUCTION Dynamic storage allocation (DSA) is an important part of many C and C programs, including language interpreters, simulators, CAD tools, and interactive applications. Many different algorithms for dynamic storage allocation have been designed, implemented, and compared. In the past, implementation comparisons have most often been based on synthetic allocation behavior patterns, where object lifetime, size, and interarrival time are taken from probability distributions (e.g. References 1 and 2). Recently, Zorn and Grunwald have shown that the use of synthetic behavior patterns may not lead to an accurate estimation of the performance of a particular algorithm.3 The existence of instruction-level profiling tools4,5 has made it possible to directly count the number of instructions required by various allocation algorithms in large, allocation-intensive programs. In this paper, we present a large number of detailed measurements of the performance of five very different dynamic storage allocation CCC 0038–0644/94/060527–16 © 1994 by John Wiley & Sons, Ltd. Received 23 August 1993 Revised 26 January 1994

6 Furthermore.6 ms. of which . To allow other researchers to reproduce our results. ALLOCATORS The allocators we measured are summarized in Table III. 8. Ghost. These programs have been used in a variety of dynamic storage allocation studies (e.colorado.edu in the directory pub/cs/misc/malloc-benchmarks. and 9).9 The Berkeley Unix 4. The results of this paper extend and complement the C-program measurements of Zorn. One of the DSA implementations measured is a publicly available conservative garbage collector for C and C (BW 2.528 d. the programs are ordered by approximate size in lines of source code.7 Our measurements show that this collector is competitive in both CPU time and memory usage with existing commercial-quality malloc/free allocators. Make. References 3. PROGRAMS The programs we measured were drawn from a wide variety of application areas and were written in C and C . Furthermore.cs. the reader is referred to other papers. Ptc. Other recent papers comparing the performance of various aspects of dynamic storage allocation also describe these algorithms. Espresso. G . Many of the programs measured are publicly available. Furthermore. Gnu . The programs Perl. zorn implementations in 11 large. Espresso. detlefs. these allocators implement some of the most efficient dynamic storage allocation algorithms currently available. and Qf all described in more detail by Grunwald. and programmers should keep this technology in mind when building allocation-intensive programs. respectively. Table I summarizes the functionality of all of the programs that we measured and Table II summarizes the allocation behavior of those programs. All the programs measured make explicit calls to malloc and free to allocate and deallocate heap storage.6 The purpose of this paper is to make additional detailed measurements available to a broad audience.g. Gawk.2 BSD allocator. In particular. whereas others are proprietary. allocation-intensive C and C programs. this allocator can be used to replace the operating system calls to malloc/free without any modifications to the program source code and is compatible with both C and C++ programs. Some of the programs. a. Based on our experience. and Cfrac are available via anonymous FTP from the machine ftp. For more details about the implementations measured and the measurement methods used. and Perl. each benchmark includes a number of test inputs. Gawk. dosser and b. A README file in that directory describes how these benchmarks were used to gather the statistics presented. notably Sis. we have made the tested versions of a number of our test programs available on the Internet. We conclude that conservative garbage collection is a competitive alternative to malloc/free implementations. they are significantly faster than both the Cartesian tree implementation and the first-fit implementation measured by Zorn.ip). and we refer the interested reader to those papers. Zorn and Henderson. four of the five allocators are publicly available and can be obtained via the Internet. also make a relatively large number of calls to the realloc function. Specifically. including the ones used in this paper. In these and all subsequent tables.

GhostScript. is a programming language debugger.1.00.ip allocator in these measurements is placed at somewhat of a disadvantage. version 3. the allocator periodically determines what heap objects are no longer in use and automatically frees those objects. Espresso.6 ms. and continuing. this allocator provides significant advantages to users that are not emphasized at all in the performance measurements provided in this paper. The input file was a 126-page user manual.10. is also described in that paper.2.memory allocation costs Table I. Xfig. resized. examining the stack. Release 1.1.11 The conservative collection allocator (BW 2. Furthermore. is a program to factor large integers using the continued fraction method. The input file was a 19. CFRAC. The test case used included the creation of a large number of circles and splines that were duplicated. is a tool for synthesis of synchronous and asynchronous circuits. The input script formatted the words in a dictionary into filled paragraphs.ip) differs significantly from the other allocators in that programmers using it are not required to call free. As such. is an interactive drawing program. The input script formatted words in a dictionary.6 ms.1. is a Pascal to C translator. the BW 2. version 2. version 2. The input to this program is a C compiler front end. version 1.6 Black-listing. is a logic optimization program.ip allocator is described in detail by Boehm and Weiser7 and summarized by Zorn. Instead. GNU make. The input set was the makefile of another large application. Xfig C Ghost C Make Espresso Ptc Gawk Cfrac C C C C C Ultrix is a derivative. is described by Boehm. Ild. The input file is an example provided with the release code.3.6 of the collector. while the allocator provides a malloc atomic function for allocating objects that do not contain . PTC. version 2. is a publicly available interpreter for the AWK report and extraction language. Geodesy.3. version 1.1. is a publicly available interpreter for the PostScript page-description language. but instead was executed with the NODISPLAY option that simply forces the interpretation of the PostScript (without displaying the results). version 2.62 is a version of the common ‘make’ utility used on UNIX. The input used in the run was the full simplification of an exclusive-lazy-and circuit. The input to the program involved incrementally loading 16 saved versions of a set of 26 object files.6 ms. because all the application programs measured were written with explicit calls to malloc and free.500 line Pascal program (mf2psv.11. General information about the test programs Program Sis Language C Description 529 SIS. The operations performed include setting breakpoints. Geodesy C Ild Perl C C Perl 4.2. It includes a number of capabilities such as state minimization and optimization. version 2. The BW 2. The input was a 35-digit product of two large prime numbers. an enhancement present in Version 2. Specifically. This execution of GhostScript did not run as an interactive application as it is often used. version 2. is a publicly available report extraction and printing language commonly used on UNIX systems. is an incremental loader.p) that is part of the TEX release. GNU Awk. and reshaped.

RESULTS The results presented were gathered using a variety of measurement tools on a DECstation 5000/240 with 112 megabytes of memory. Performance information about the memory allocation behavior for each of the test programs. Maximum bytes and Maximum objects show the maximum number of bytes and objects. we indicate both the absolute performance of each allocator and also the relative performance of each allocator compared to the Ultrix allocator. zorn Table II.ip allocator.6 ms.14 which presents per-procedure instruction counts with an output format similar to that of gprof. maintaining object reference counts) are also unnecessary when the BW 2.. Instructions executed shows the total instructions executed by the program using the Ultrix allocator. . Other operations that the test programs perform for the sole purpose of allowing them to correctly call free (e. that were allocated by each program at any one time Program Lines of Instructions Total Total bytes Average Maximum Maximum Allocation size objects bytes rate source executed objects (×10 3) 6 3 (×10 ) (bytes) (×10 3) (×103 ) (Kbytes/s) (×10 ) 172000 82500 36000 34500 30500 29500 21000 15500 9500 8500 6000 64794·1 2648·1 381·3 1091·0 52·4 1196·5 53·7 2400·0 353·9 957·3 202·5 63395 15797173 2517 42152 33 24829 1604 34089 25 1852 924 89782 23 539 1675 107062 103 2386 1704 67559 522 8001 249·2 16·7 752·4 21·3 72·7 97·2 23·0 63·9 23·2 39·6 15·3 48·5 113·8 2·1 2·3 19·8 26·5 10·4 4·4 102·7 1·6 1·5 1932·2 3880·6 1278·7 116·4 1129·3 2129·0 208·1 280·1 2385·8 41·0 21·4 4120·8 324·3 220·3 714·2 372·1 1861·7 282·5 1497·1 202·4 2050·1 1145·9 Sis Geodesy Ild Perl Xfig Ghost Make Espresso Ptc Gawk Cfrac pointers. respectively. and allocator maximum heap sizes were measured using a modified version of the Unix sbrk system call. these programs do not take advantage of that function.4. Tables IV and V show how many instructions. The measurement of each program’s live data was gathered using a modified version of malloc/free.530 d. The measurements we present concern the CPU overhead (in terms of total execution time and time spent in allocation routines) and memory usage of the various combinations of allocators and programs.g. as substantial overhead in the BW 2.6 ms. respectively. detlefs. sometimes occurs in the realloc routine. Instruction counts were all gathered by instrumenting the programs with Larus’ QPT tool. on average. dosser and b. resulting from garbage collections.15 Program execution time was measured using the Unix C-shell built-in ‘time’ command. each program/allocator required to perform the malloc and free operations. In all of the tables presented. Total bytes and Total objects refer to the total bytes and objects allocated by each program. a. The Ultrix allocator was chosen as the baseline for comparison because it is a commercially implemented allocator distributed with a widely used operating system.ip allocator is used. These two tables should be used only for comparing the explicit malloc/free implementations. Average size shows the average size of the objects allocated.

All the allocators except BW 2. Contact Person: Doug Lea (dl oswego. libg (through version 2. that is supplied with the Berkeley 4. but is smaller and faster than the GNU version.c Qf which is not presented. This table should be used to compare the per-object overhead of all the allocators.com) FTP Site: anonymous arisia.2 Unix release. Also note the BW 2.ip BW 2. including BW 2.edu:pub/cs/misc/qf. realloc.edu) FTP Site: anonymous ftp.6 in more detail in Reference 9 Ultrix ms. would be less unusual.ip allocator requires only two instructions per free because we have intentionally caused frees for this allocator to have no effect.11 For the measurements collected. This table shows the total instructions in malloc.uoregon. It is distributed with the GNU C library. free.6 ms/ip.ip.12. the definitions of MERGE SIZES (Ms) and ALL INTERIOR POINTERS (Ip) were enabled.tar. but we disabled them to observe the performance of the collection algorithm.6 ms. for allocators other than BW 2.oswego. Contact person: Mike Haertel (mike cs. It is not publicly available. G ). but comes with the DEC Ultrix operating system.ip 531 are described Ultrix is a variant of the malloc implementation. One should note that the overhead of an allocator has a per-object-allocated and a per-byte-allocated component.com:/pub/russell/gc.cs. .PARC xerox.4.10.6 ms. With various other authors.edu:/pub/misc/malloc. It is an ancestor/sibling of the malloc used in GNU libc. Boehm describes a number of related versions of this collector.uoregon.edu:pub/mike/malloc.edu) FTP Site: anonymous ftp. If this table showed instructions/allocated-byte. The Ultrix allocator requires 18 instructions to place the freed objects on the appropriate free list.7. Qf is a hybrid algorithm that allocates small and large objects in different ways. The most recent version of the collector is version 3.6 This is version 2.c Qf is an implementation of Weinstock and Wulf’s fast segregated-storage algorithm based on an array of free lists. ms.z Gnu G G is an enhancement of the first-fit roving pointer algorithm using bins of different sizes.colorado.6 ms. and any related routines. the per-object component dominates.6 of the Boehm–Demers–Weiser conservative garbage collector.colorado. outliers for BW 2.Z Gnu is variant hybrid first-fit/segregated algorithm written by Mike Haertel (version dated 930716). the Boehm–Weiser collector does support explicit programmer frees. we also see that many of the allocators perform a constant number of instructions in free. Contact Person: Dirk Grunwald (grunwald cs. divided by the total number of objects allocated. In fact. The G allocator requires only eight instructions by deferring the size determination of the freed objects until a subsequent malloc.6.ip such as Ild.tar. G requires fewer instructions per free than Ultrix but more instructions per malloc.5) and is also available separately. which allocates a small number of relatively large objects. General information about the allocators.memory allocation costs Table III. Thus. Contact Person: Hans Boehm (Hans Boehm. In Table V. Large objects are handled by a general algorithm (in this case.edu) FTP Site: anonymous g.13 Like the Gnu algorithm.xerox. Table VI shows the average number of instructions per object allocated that each program/allocator spent doing storage allocation.cs. written by Chris Kingsley.

detlefs.6 ms. a.ip (instr/malloc) Absolute Relative G Gnu (instr/malloc) (instr/malloc) Absolute Relative Absolute Relative (instr/malloc) Absolute Relative 192 29 76 37 89 74 43 40 47 39 33 64 3·20 0·63 1·33 0·80 1·51 1·32 0·90 0·80 0·85 0·80 0·70 1·17 d. dosser and b. zorn Sis Geodesy Ild Perl Xfig Ghost Make Espresso Ptc Gawk Cfrac 60 46 57 46 59 56 48 50 55 49 47 Average 52 . Relative is relative to Ultrix Ultrix = 1 Qf Program (instr/malloc) Absolute Relative 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 611 11·04 92 1·77 65 489 179 3544 207 543 614 296 208 456 83 106 8·15 3·89 62·18 4·50 9·20 10·96 6·17 4·16 8·29 1·69 2·26 101 81 92 82 105 101 91 90 101 84 86 1·68 1·76 1·61 1·78 1·78 1·80 1·90 1·80 1·84 1·71 1·83 84 53 99 44 62 71 74 77 66 54 30 1·40 1·15 1·74 0·96 1·05 1·27 1·54 1·54 1·20 1·10 0·64 1·24 BW 2. Absolute and relative instructions per call to Malloc.532 Table IV.

Absolute and relative instructions per call to Free.ip (instr/free) Absolute Relative G (instr/free) Absolute Relative (instr/free) Absolute Relative 53 25 42 31 39 56 28 32 90 32 26 41 2·94 1·39 2·33 1·72 2·17 3·11 1·56 1·78 5·00 1·78 1·44 2·29 Sis Geodesy Ild Perl Xfig Ghost Make Espresso Ptc Gawk Cfrac 18 18 18 18 18 18 18 18 18 18 18 Average 18 533 . Relative is relative to Ultrix Ultrix Gnu = 1 Qf Program (instr/free) Absolute Relative 71 66 74 82 71 88 83 84 113 81 83 81 4·53 8 3·94 3·67 4·11 4·56 3·94 4·89 4·61 4·67 6·28 4·50 4·61 8 8 8 8 8 8 8 8 8 8 8 0·44 0·44 0·44 0·44 0·44 0·44 0·44 0·44 0·44 0·44 0·44 0·44 memory allocation costs (instr/free) Absolute Relative 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 2 0·11 2 2 2 2 2 2 2 2 2 2 2 0·11 0·11 0·11 0·11 0·11 0·11 0·11 0·11 0·11 0·11 0·11 BW 2.6 ms.Table V.

534 Table VI.6 ms.ip (instr/object) Absolute Relative G (instr/object) Absolute Relative (instr/object) Absolute Relative 246 53 115 68 97 129 59 72 47 71 59 92 3·15 0·84 1·55 1·06 1·54 1·77 1·02 1·06 0·85 1·06 0·91 1·35 d. Relative is relative to Ultrix Ultrix Gnu = 1 Qf Program (instr/object) Absolute Relative 172 144 161 163 119 187 137 174 101 165 169 154 2·32 71 2·21 2·29 2·18 2·55 1·89 2·56 2·36 2·56 1·84 2·46 2·60 92 61 107 52 64 79 78 85 66 62 38 1·18 0·97 1·45 0·81 1·02 1·08 1·34 1·25 1·20 0·93 0·58 1·07 (instr/object) Absolute Relative 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 710 10·20 655 194 3806 229 671 670 319 318 485 349 110 8·40 3·08 51·43 3·58 10·65 9·18 5·50 4·68 8·82 5·21 1·69 BW 2. zorn Sis Geodesy Ild Perl Xfig Ghost Make Espresso Ptc Gawk Cfrac 78 63 74 64 63 73 58 68 55 67 65 Average 66 . dosser and b. detlefs. Absolute and relative instructions per object allocated. a.

the Ultrix and G performance and the BW 2. . However. the Ultrix allocator is the faster allocator just as the data in Table VII would lead us to believe. whereas the system time is 157·9 seconds. Unfortunately we currently do not have the tools necessary to definitively determine the cause of the added system overhead.6 ms.ip allocator is almost 2·5 times that of the Ultrix allocator. the fragmentation of these programs is somewhat exaggerated. the average heap expansion of the BW 2. The user time of Sis Ultrix is 3675·7 seconds. To measure this value. We call the instructions spent doing storage allocation the ‘allocation instructions’. Although the average heap expansion of the BW 2. whereas As is clear from the table. In comparing Tables VII and VIII we see some unexpected results. This number (we will call it the ‘application instructions’) remains constant across all the allocators used. we see that in user time.ip collector has the worst performance overall. . should be expected. Gnu . Table IX shows the maximum size of the heap for each program/allocator. Gawk. the G allocator spends more of its instructions executing storage allocation code but executes faster than the Ultrix allocator.ip allocator allocates space in units of 64 kilobytes. as one might think. The added overhead of Sis Ultrix is caused by additional time being spent in the operating system.ip allocator is only 1·53 times that of the Ultrix allocator. and 21 thousand bytes. contribute significantly to this average. Table VII shows what fraction the allocation instructions are of the application instructions. namely Perl. If these programs are not considered in the average. respectively).memory allocation costs 535 Each program spends a certain number of instructions outside storage allocation routines doing program-specific work. fragmentation was measured as the ratio between the maximum heap size (as shown in Table IX) and the maximum bytes that were alive in each program at any time (shown in Table II). Because the BW 2.e.6 ms. Finally. whereas the system time is just 3·3 seconds. which has not been measured. Table VIII shows the absolute and relative execution times of the different program/allocator combinations. and Qf are all quite space efficient. we also note that three programs. From our measurements we have determined that this overhead is not directly attributable to increased page faults in the Ultrix allocator.6 ms. as measured by calls to the Unix operating system sbrk system call. In particular.6 ms. The user time of the Sis G allocator is 3688·9 seconds. 41. specifically with the Sis program. an instrumented version of sbrk that maintained a high-water mark was used. Table X shows the maximum amount of fragmentation that occurred in each program/allocator combination. The times presented are the sums of the user and system times reported by the ‘time’ command. and Cfrac. All of these programs require a relatively small heap as indicated in Table II (i.6 ms. These data were collected from a single run of each program/allocator and thus some variation in execution time. G Ultrix and especially BW 2. 116.ip require more space. our experience collecting similar results indicates that the variation observed between different runs with the same input is not a significant fraction of the total execution time. This percentage provides a good measure for comparing the relative CPU overhead of the different algorithms have the best allocators. As is clear from the table. Thus. This behavior is more understandable when we look more closely at the separate components of the execution times as reported by the ‘time’ utility (a breakdown of these times into user and system components is publicly available on the Internet). In this case.

time) Absolute Relative 18·2 14·5 1·6 26·1 6·6 15·3 6·2 12·8 3·0 33·6 52·3 17·29 2·40 6·32 2·22 2·27 2·67 2·78 2·06 2·55 2·38 2·56 1·88 2·47 2·60 9·8 6·1 0·9 7·5 3·5 6·5 3·5 6·2 1·9 11·9 11·7 1·20 0·95 1·50 0·80 1·09 1·08 1·35 1·24 1·19 0·88 0·58 1·08 (% exec. time) Absolute Relative 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 36·40 11·13 69·3 19·6 37·1 33·7 34·5 54·8 14·3 23·2 14·3 65·5 34·1 8·45 3·06 61·83 3·59 10·78 9·13 5·50 4·64 8·94 4·82 1·70 BW 2. a. dosser and b. time) Absolute Relative 26·0 5·3 0·9 10·8 4·9 10·5 2·6 5·3 1·4 14·5 18·3 9·14 3·17 0·83 1·50 1·15 1·53 1·75 1·00 1·06 0·87 1·07 0·91 1·35 d.6 ms. Relative is relative to Ultrix Ultrix Gnu = 1 Qf Program (% exec.ip (% exec. Percent storage allocation instructions/application instructions. zorn Sis Geodesy Ild Perl Xfig Ghost Make Espresso Ptc Gawk Cfrac 8·2 6·4 0·6 9·4 3·2 6·0 2·6 5·0 1·6 13·6 20·1 Average 6·97 .536 Table VII. detlefs. time) Absolute Relative (% exec. time) Absolute Relative G (% exec.

6 ms.Table VIII. Relative is relative to Ultrix Ultrix Gnu = 1 Qf Program (seconds) Absolute Relative 3932·0 518·2 76·1 50·1 7·1 57·0 2·0 74·0 12·3 32·1 8·7 434 1·05 409 1·03 1·00 1·03 1·05 1·02 1·18 1·03 1·03 1·04 0·98 1·19 3692·1 504·3 75·3 49·8 7·3 48·2 1·8 72·6 13·2 31·2 6·6 0·96 0·98 1·01 1·04 1·05 1·00 0·92 1·02 1·12 0·95 0·90 1·00 memory allocation costs (seconds) Absolute Relative 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 614 1·21 5867·3 520·8 75·6 60·3 6·7 69·8 2·1 82·1 13·3 51·9 8·3 1·53 1·01 1·02 1·26 0·96 1·45 1·11 1·15 1·13 1·57 1·13 BW 2. Total program execution time.ip (seconds) Absolute Relative G (seconds) Absolute Relative (seconds) Absolute Relative 4581·2 494·1 75·9 44·3 7·3 50·2 2·0 72·5 11·9 27·6 7·0 489 1·20 0·96 1·02 0·93 1·04 1·04 1·07 1·01 1·01 0·84 0·95 1·01 Sis Geodesy Ild Perl Xfig Ghost Make Espresso Ptc Gawk Cfrac 3833·5 516·1 74·2 47·7 7·0 48·2 1·9 71·5 11·8 33·0 7·3 Average 423 537 .

Maximum heap size.ip (Kbytes) Absolute Relative G (Kbytes) Absolute Relative (Kbytes) Absolute Relative 8608 4864 1456 144 1576 2408 336 408 3144 64 64 2097 2·54 0·75 0·81 0·64 0·88 0·68 0·86 0·52 0·91 0·81 1·00 0·95 d. detlefs. zorn Sis Geodesy Ild Perl Xfig Ghost Make Espresso Ptc Gawk Cfrac 3387 6487 1800 226 1784 3541 390 792 3438 79 64 Average 1999 . dosser and b.538 Table IX.6 ms. Relative is relative to Ultrix Ultrix Gnu = 1 Qf Program (Kbytes) Absolute Relative 2363 4595 1392 162 1582 2837 306 340 3414 83 64 1558 0·80 1607 0·70 0·71 0·77 0·72 0·89 0·80 0·78 0·43 0·99 1·05 1·00 2776 4928 1520 144 1552 2632 328 320 3360 64 48 0·82 0·76 0·84 0·64 0·87 0·74 0·84 0·40 0·98 0·81 0·75 0·77 (Kbytes) Absolute Relative 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 2986 2·48 7532 7348 3572 616 2436 5268 584 1188 3448 352 504 2·22 1·13 1·98 2·73 1·37 1·49 1·50 1·50 1·00 4·43 7·87 BW 2. a.

Absolute Relative Frag.Table X. Heap expansion due to fragmentation. Absolute Relative 4·56 1·28 1·17 1·27 1·43 1·16 1·65 1·49 1·35 1·60 3·06 1·82 2·54 0·75 0·81 0·64 0·88 0·68 0·86 0·52 0·91 0·81 1·00 0·95 Sis Geodesy Ild Perl Xfig Ghost Make Espresso Ptc Gawk Cfrac 1·79 1·71 1·44 1·99 1·62 1·70 1·92 2·90 1·48 1·98 3·06 Average 1·96 539 .ip Frag. Absolute Relative 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 1·00 5·50 2·48 3·99 1·94 2·86 5·42 2·21 2·53 2·87 4·34 1·48 8·80 24·09 2·22 1·13 1·98 2·73 1·37 1·49 1·50 1·50 1·00 4·43 7·87 BW 2. Absolute Relative G Frag. Absolute Relative 1·25 1·21 1·11 1·42 1·43 1·36 1·51 1·24 1·47 2·08 3·06 1·56 0·80 1·46 0·70 0·71 0·77 0·72 0·89 0·80 0·78 0·43 0·99 1·05 1·00 1·47 1·30 1·22 1·27 1·41 1·27 1·61 1·17 1·44 1·60 2·29 0·82 0·76 0·84 0·64 0·87 0·74 0·84 0·40 0·98 0·81 0·75 0·77 memory allocation costs Frag. Relative is relative to Ultrix Ultrix Gnu = 1 Qf Program Frag.6 ms.

Storage required to prevent frequent garbage collections. Additional storage that is retained due to collector conservatism. less frequent collections result in a larger heap.g. Some of these causes are summarized below: 1.6 ms. However. Unfortunately. 5. a. the conservative collector (as used in this paper) will ignore the free and reclaim the object only after the last pointer to the object has been removed. This fragmentation results from the available free space being split into pieces that are too small to satisfy most requests. a programmer will correctly free an object that still has pointers pointing to it. we may instrument the allocators to collect more specific information about the sources of fragmentation. Clearly if request sizes are rounded up.6 ms.ip allocator will preserve any object in the heap that ‘appears’ to have a pointer pointing to it. then the result is more internal fragmentation and less external fragmentation. In some cases.. This fragmentation results from rounding up allocation request sizes to common sizes (e. some of the blocks in the heap are deemed inappropriate for heapallocation and contribute to overall storage usage. As mentioned above. Overhead from the allocator data structures.6 ms. In addition to the forms of fragmentation that are common to all allocation algorithms.ip allocator incurs additional sources of fragmentation summarized below: 1. In the future. In such cases. The other allocators typically have a smaller unit of expansion (e. 2. 2. The BW 2. Sometimes integer variables contain values that appear to be pointers and the collector conservatively preserves such objects. As a result. some algorithms round requests up to sizes that are powers of two). we were not able to identify the specific sources of fragmentation in the programs and allocators measured. detlefs. Internal fragmentation. and as such unreachable objects are not available for reuse until the collector is invoked.g. The CPU overhead of the BW 2. 3.540 d. Another way to view this form of fragmentation is to observe that the collector only reclaims inaccessible storage when the algorithm runs.ip allocator uses a heuristic called ‘black-listing’ to reduce the amount of incorrectly retained garbage. The BW 2. Black-listed blocks in the heap. the BW 2. SUMMARY This paper presents detailed measurements of the cost of dynamic storage allocation (DSA) in 11 diverse C and C programs using five very different dynamic storage . zorn Fragmentation. External fragmentation. 3.ip allocator is reduced if garbage collections occur less frequently. is the result of many different causes. 8 kilobytes). 4. even when all the storage allocated may not be used. the BW 2.ip allocator grows the heap in units of 64 kilobytes.. For example. as measured in Table X. since no storage is reclaimed between collections.6 ms. Storage that is ‘freed’ but remains accessible. many allocation algorithms require additional words of storage per object to store information such as the object’s status (allocated or free) and the object’s size. Additional storage allocated due to the heap-expansion granularity. dosser and b.6 ms. thus making the space unusable.

(1994). Software— Practice and Experience.memory allocation costs 541 allocation implementations. 807–820 (1988). ‘In search of a better malloc’. Dirk Grunwald and Benjamin Zorn. volume 1 of The Art of Computer Programming. seven of the eleven programs measured are also available on the Internet. 733–756 (1993). Larus.edu in the file pub/cs/misc/malloc-benchmarks/SPE-MEASUREMENTS. 9. ‘The measured cost of conservative garbage collection’. 1580. Benjamin Zorn and Robert Henderson.cs. ‘Improving the cache locality of memory allocation. 489–506. It is our hope that when other researchers implement new algorithms. 10Hans-Juergen Boehm. The data presented in this paper are a subset of data available in a textual form on the Internet. 435–451. 2. 8. pp. Proceedings of the SIGPLAN’92 Conference on Programming Language Design and Implementation. Fundamental Algorithms. 2nd edn. Dirk Grunwald. This material is based upon work supported by the National Science Foundation under Grant No. Digital Equipment Corporation. . Addison Wesley. Further results will be added to this file as they become available. Canada. well-crafted programs. Likewise. Unix Manual Page for pixie. 23. but as a first small step along the way. MA. Benjamin Zorn. 851–869 (1993). June 1991. ACM Trans. We see these programs and allocators not as the final word in allocator benchmarking. and we provide their location as well. REFERENCES 1. acknowledgements We would like to thank Hans Boehm. and Doug Lea for all implementing very efficient dynamic storage allocation algorithms and making them available to the public. and techniques used in this paper to provide comparable measurements of the new algorithm.edu) indicating that you intend to use the data and how you intend to use it. ‘Mostly parallel garbage collection’. ‘Evaluating models of memory allocation’. September 1991. 1973. Please feel free to use these data but we would appreciate your sending one of us e-mail (zorn cs. chapter 2. 7. pp. ‘Optimally profiling and tracing programs’. ‘CustoMalloc: efficient synthesized memory allocators’. Donald E. pp. January 1992. We also thank them for their comments on drafts of this paper. 4. Software—Practice and Experience. (1). 1985. 6. These data are available via anonymous FTP from the machine ftp. Benjamin Zorn and Dirk Grunwald. they will use the programs. CCR-9121269 and by Digital Equipment Corporation External Research Grant No. Furthermore. 5. June 1993. We would also like to thank the anonymous reviewers for their insightful comments.colorado. 18. pp. Toronto.colorado. allocators. Hans-Juergen Boehm and Mark Weiser. SIGPLAN’93 Conference on Programming Language Design and Implementation. Proceedings of the Summer 1985 USENIX Conference. including a conservative garbage collection algorithm. four out of five of these implementations are publicly available on the Internet and we provide Internet sites and the contact persons who are responsible for them. ULTRIX V4. Michael Haertel. 59–70.2 (rev 96) edition. Korn and Kiem-Phong Vo. Alan Demers and Scott Shenker.txt. 177–186. All of the DSA implementations we measured are highly efficient. ‘Garbage collection in an uncooperative environment’. Knuth. Conference Record of the Nineteenth ACM Symposium on Principles of Programming Languages. The measurements include the CPU overhead of storage allocation and the memory usage of the different allocators. Modeling and Computer Simulation. 3. David G. pp. 157–164. 23. Thomas Ball and James R. Software— Practice and Experience. Albuquerque. Reading. 4.

Thomas Standish. 13. 12. Computer Sciences Department. Wulf. Charles B. pp. Data Structures Techniques. SIGPLAN’93 Conference on Programming Language Design and Implementation. . 671–685 (1983). 14.542 d. zorn REFERENCES 11. Han-Juergen Boehm. 23. ‘Quickfit: an efficient algorithm for heap storage allocation’. Graham. June 1993. 197–206. University of Wisconsin-Madison. a. 15. ‘An execution profiler for modular programs’. James R. Albuquerque. (10). Weinstock and William A. Larus and Thomas Ball. 13. WI. ‘Rewriting executable files to measure program behavior’. Technical Report 1083. Software—Practice and Experience. Madison. 141–144 (1988). detlefs. Peter B. ACM SIGPLAN Notices. Kessler and Marshall K. Susan L. Addison-Wesley Publishing Company. dosser and b. March 1992. ‘Space efficient conservative garbage collection’. McKusick. 1980.