You are on page 1of 16

SLIMalloc: a Safer, Faster, and more Capable

Heap Allocator
July 3, 2020
Pierre Gauthier
TWD Industries AG
pierre@trustleap.com

Abstract

One of the most puzzling issues when writing computer programs comes from memory allocation
errors or patterns causing instant or delayed crashes, memory leaks, or poor resource
management compromising performance, scalability – and process lifespan. Significant advances
have been made, but performance, security, and convenience are often seen as incompatible
goals. We introduce SLIMalloc, a rewrite of the 2019 secure SlimGuard heap allocator.
SLIMalloc, as compared to the non-secure GLIBC standard allocator and 2019 Microsoft
Research secure Mimalloc, delivers (1) the most advanced security features available, (2) an
unprecedented real-time invalid-pointer detection capability preventing allocator misuse such as
double-free or invalid-free/realloc errors, (3) new troubleshooting features to assist developers
with the exact location of allocation errors, (4) the detection, location and correction of memory
leaks, (5) tracing of allocation calls in third-party code, (6) a much smaller source-code, (7)
higher performance than GLIBC and Mimalloc, (8) a novel architecture making room for
advanced features without compromising performance and scalability, (9) and the automatic
release of memory to the OS during or after heavy workloads. As far as we know, SLIMalloc is
the first scalable allocator able to catch and report invalid pointers in real-time. Speed matters:
memory allocation consumes 7% of all CPU cycles in Google datacenters.

Keywords: IoT, Data-centers, Software Engineering, Troubleshooting, Security and Privacy,


System Security, Memory Allocation, Malloc.

1. Introduction

Never before I.T. has been deployed so widely – from ubiquitous IoT to datacenters, the skills
have been so diverse, and operating systems and runtime libraries have been under pressure to
deliver optimal results[6] and resilience in such a wide variety of cases and exposure to risk[7].

In this context, novel architectures and capabilities are required, as well as carefully implemented
tools so that the cost of poor design is not impacting billions of users – and even more devices.

For example, when a developer has made a


mistake, with effects eventually seen in his
code or in third-party code, a crash-dump is
not enough since it may result as the long-
deferred consequence of something that (1)
could have been detected immediately, (2)
reported in the most intelligible manner
available at the time, and (3) automatically
“Profiling a warehouse-scale computer”[6] blocked or corrected when possible.

1 / 16
Another major goal is to let operating systems, IT professionals and end-users rely on the
memory allocator to deliver the best possible performance, locality, and most relevant features.

2. The GLIBC Standard Allocator (1987), and Microsoft Research's Mimalloc (2019)

The GLIBC allocator[1] (derived from ptmalloc2 itself originated from dlmalloc) has evolved
into a fast and scalable tool, but it still lacks modern security features. Mimalloc[3] seems to be
(often but not always: see our KV test) slower than GLIBC, but it offers more security:

(1) The GLIBC allocator's metadata is embedded in the allocated blocks, exposed to accidental
errors and competent opponents. Mimalloc segregates metadata… at a known offset, excluding
accidental errors but leaving its encoded internal structures exposed to opponents.

(2) GLIBC's canary implementation requires instrumentation so it is slow and it uses known
values, helping to fix errors but not attacks. Mimalloc supports canaries in debug mode.

(3) GLIBC and Mimalloc lack most SLIMalloc features, half inherited from SlimGuard[2]:
randomly segregated metadata, guard-page density, over-provisioning, address randomization,
delayed memory reuse, zeroed freed blocks, allocation location tracing, leaks detection, location
and correction, and the ability to locate, signal and prevent allocation errors in real-time.

(4) GLIBC's allocation error reporting capabilities are primitive. mallopt(M_CHECK_ACTION, 1)


did not work as expected (there's no “detailed error message” after the double-free), and the
invalid-free error caused a SEGFAULT instead of the expected abort:

--- test double-free() ----------------------------------------------------


--- test free(NULL) -------------------------------------------------------
--- test free(0xbadbeef) --------------------------------------------------
Segmentation fault

The GLIBC allocator statistics are either not thread-capable and limited to the “main area”
(mallinfo and malloc_stats) or not human-readable (malloc_info) and generated in XML files.

Mimalloc does a bit better but also crashes: it caught the free(0xbadbeef) test only because this
pointer did not use the mimalloc-page alignment, and the realloc test caused a crash without any
mimalloc error message – despite the problem being the same as for free(0xbadbeef):

--- test double-free() ----------------------------------------------------


mimalloc: error: double free detected of block 0x56d60403880 with size 16

--- test free(NULL) -------------------------------------------------------


--- test free(0xbadbeef) --------------------------------------------------
mimalloc: error: trying to free an invalid (unaligned) pointer: 0xbadbeef

--- test realloc(0xbadbeef) -----------------------------------------------


Segmentation fault

2 / 16
Yet, usability is also lacking:

(5) With GLIBC, if you need to allocate a new heap dedicated to a given task (for isolation,
security, performance, locality, etc.) then you have to write your own. This is a major concern as
dedicated heaps can greatly enhance performance, reduce memory fragmentation and the hurdle
of identifying problems hidden within a program-wise set of allocations.

In this regard, Mimalloc does much better and offers mi_heap_new and mi_heap_destroy.

(6) GLIBC's detection of memory leaks via mtrace is cumbersome and slow, requires a malloc
API instrumentation and debug symbols, but it produces human-readable output. Again, this is a
development feature – not something fast enough to be usable in production.

Mimalloc leaves it as an exercise to the developer via mi_heap_visit_blocks – but, since it comes
after-the-facts, doing so will miss the location of leaks (name and/or address of the function(s)
and source-code file names having made these allocations).

(7) GLIBC malloc custom functions malloc_info, malloc_stats, malloc_usable_size, malloc_trim,


mcheck and mtrace are sometimes redundant, not always reliable nor even thread-safe, and some
significantly slow-down the allocator – on the top of a convoluted API and documentation.

The Mimalloc API is more reliable and provides very detailed statistics reporting. Unfortunately,
the Mimalloc features come at a price (for example, Mimalloc's trimming strategy is expensive):

--- Microsoft Research malloc stress-test -----------------------------------


GLIBC
-----------------------------------------------------------------------------
- THREADS:6, SCALE:10%, ITER:1000, LARGE:0, 1-SIZE:0
- total time: 1.936 seconds (hh:mm:ss 00:00:01)
- user CPU time ................ 4.358 sec (0.872 per thread)
- system CPU time .............. 2.515 sec (0.503 per thread)
- VM, current virtual memory ... 442904576 bytes (422.3 MB)
- RSS, current real RAM use .... 2510848 bytes (2.4 MB)
- RSS peak ..................... 9011200 bytes (8.6 MB)
- page reclaims ................ 7290859520 bytes (6.8 GB)
- voluntary context switches ... 8472 (threads waiting, locked)
- involuntary context switches . 6324 (time slice expired)

--- Microsoft Research malloc stress-test -----------------------------------


MIMalloc (secure: 4)
-----------------------------------------------------------------------------
- THREADS:6, SCALE:10%, ITER:1000, LARGE:0, 1-SIZE:0
- total time: 6.708 seconds (hh:mm:ss 00:00:06)
- user CPU time ................ 10.928 sec (2.186 per thread)
- system CPU time .............. 11.193 sec (2.239 per thread)
- VM, current virtual memory ... 308785152 bytes (294.4 MB)
- RSS, current real RAM use .... 1892352 bytes (1.8 MB)
- RSS peak ..................... 14848000 bytes (14.1 MB)
- page reclaims ................ 15408136192 bytes (14.3 GB)
- voluntary context switches ... 1227420 (threads waiting, locked)
- involuntary context switches . 7003 (time slice expired)

3 / 16
3. SLIMalloc Allocator Features

Despite being faster, SLIMalloc offers all the GLIBC and Mimalloc features (and new ones) by-
design, as options that can be enabled or disabled at run time and on a per-heap basis:

heap->opt.abort = false; // warn & abort on double/invalid-free (or continue)


heap->opt.canary = false; // slightly enlarge (small) blocks for canary byte
heap->opt.guardpages = true; // catch buffer overflows (small blocks)
heap->opt.random = false; // randomized block addresses (over-provisioning)
heap->opt.reclaim = false; // @free() release unused OS PAGES to the system
heap->opt.trace = false; // record functions making/freeing allocations
heap->opt.trim = true; // @free() release all unused memory to the system
heap->opt.zeronfree = false; // useful for short-life confidential data

“Small blocks” (up to 2MB) are picked from areas allocated by mmap. “Large blocks” are
directly allocated by mmap. These options can be enabled for a portion of your application and
disabled later – this is useful to avoid their memory and performance penalty elsewhere.

There are no complex APIs involved by any of the tasks associated with the options above. And
these options marginally slow-down the allocator when enabled (we have measured an average 5-
15% execution time increase with the very demanding Microsoft Research malloc stress-test).

Performance matters because some problems can only be experienced at high concurrencies[19].
Therefore, tools aimed at assisting users during a troubleshooting session should not prevent the
inspected program from reaching the state at which trouble is expected. This is even more true for
core system organs, like the system memory allocator.

This is how SLIMalloc compares to GLIBC and Mimalloc, using the features seen earlier:

(1) The allocator metadata is stored at random addresses and the allocated blocks are stored at
unrelated addresses, avoiding accidental errors and making it much easier to keep competent
opponents at bay.

(2) The canary implementation is very fast and uses a different byte value for each block in order
to discourage (instead of invite) abuse of the protection – even for production.

(3) SLIMalloc has inherited some of its security features from SlimGuard, but a very optimized
implementation made it possible to have all these features available at all times (SlimGuard used
#defines instead, which require recompilation and linkage), and to add new desirable features,
including the kind never seen before in allocators – without impacting performance.

Developers can declare custom heaps (not just per-thread heaps), all of which can use distinct
options. This is also true for per-thread default heaps (which can free the memory of another
thread's heap), making them scale ideally on parallelized applications, even with legacy code.

(4) Error reporting, allocation tracing, and memory leaks detection, get the most of what
information is available in a process:

4 / 16
Executable source file name line number function name address
exported symbols yes no partial yes
debug symbols yes yes yes yes
nothing (stripped) partial no no yes

And, as a novel major step, SLIMalloc prevents allocation errors in real-time, such as double-
free/realloc or invalid free/realloc so that the memory allocator will not be the cause of a crash
nor an available way for hackers to corrupt the allocator metadata.

Of course, developers can still raise a SEGFAULT by dereferencing an invalid pointer, but
doing so is not involving the memory allocator.

And, if you have a doubt before dereferencing a pointer, SLIMalloc's isgoodptr and/or
isfreeableptr will tell you if this can safely been done – without impacting performance.

Mimalloc provides a function called mi_check_owned but, unlike SLIMalloc, it cannot


afford to use it internally to prevent allocation errors because, as its documentation states,
this is an “expensive function, linear in the pages in the heap” – hence its crashes.

It demonstrates that design and implementation matter, making the difference between
features that can – or cannot – be used. As far as we know, SLIMalloc is the first
scalable memory allocator able to catch and report invalid pointers in real-time.

Here are properly characterized errors handled by SLIMalloc which is preventing the error
condition from corrupting data, before reporting it in human-readable text:

--- test CANARY: malloc(10), memset(), free() -------------------


(10 bytes requested, 16 allocated, now writing 16 bytes)
> ERROR: heap[test-1] buffer overflow (canary)
ptr:0x800000000
end:0x80000000f block-size:16
in slim.c:320 get_canary()
caller slim.c:514 heap_free()
caller test.c:100 main()

--- test double-free() ------------------------------------------


> ERROR: !sfree(h[test-1] 0x14000000000 sz:2048):double-free
in slim.c:354 mark_free()
caller test.c:194 main()

--- test free(0xbadbeef) ----------------------------------------


> ERROR: !free(h[null] 0xbadbeef):invalid-ptr
in slim.c:346 heap_free()
caller test.c:197 main()

--- test realloc(0xbadbeef) -------------------------------------


> ERROR: !realloc(h[null] 0xbadbeef):invalid-ptr
in slim.c:541 heap_realloc()
caller test.c:204 main()

5 / 16
Exploiting Heaps to hack a program requires bugs (without them, programs are not vulnerable).
Programming languages predictably allocate variables behind the scenes in standard libraries or,
even worse, as part of the language design (C++, Java, C#, PHP...). Hackers then just have to
read the memory layout, and find where to alter memory (block size/address, function pointer,
return address) to trigger control-flow and code-injection attacks[7, 11, 12, 14, 15, 16]:

1. buffer overflows/underflows, a block is written beyond its size, or before its start
2. use-after-free, a block is freed, maliciously modified, and reused
3. double-free, a block is deleted multiple times, corrupting metadata
4. invalid-free/realloc, a never-allocated block is deleted, corrupting metadata
5. uninitialized-reads, a newly allocated block exposes previously set data
6. format bugs, integer overflows, signedness bugs, bad casts, variadic arguments, etc.

SLIMalloc contributes to the line of (investigation and) defense with[2, 4, 5, 8, 9, 13, 15]:

1. canaries, guard-pages, segregated block size classes, segregated metadata


2. delayed block reuse, over-provisioning, and block address randomization
3. double-free real-time detection, blocking and detailed reporting
4. invalid-free/realloc real-time detection, blocking and detailed reporting
5. zero-on-free, and blocked access to unallocated areas restrict the surface of vulnerability
6. refresh of memory areas used by the heaps, destruction/reconstruction of new heaps,
ability to discard invalid pointers in real-time – a feature rarely implemented in software
(too slow) but which helps to prevent accidental errors and planned abuses (as previously
seen with the first GLIBC and Mimalloc tests at the beginning of this document).

SLIMalloc reports memory usage and block-size usage without malloc/free overhead. We use the
list of blocks currently in use, and the class-size area pointer (anything below this pointer has
been used in the past – or reserved for use if block address randomization is enabled):

allocated block-sizes [in-use / used]:


16[0 / 1]
2,048[0 / 1]
2,097,152[0 / 1]

(5) new heaps dedicated to a given task are trivial to create and use (see the heap0.c example) and
can be given a name to identify them in error messages or during tracing (like in the above test
for canary overwrite, double-free, an invalid-free/realloc).

char *p = malloc(size); // using the implicit per-thread default heap


heap_stats(s_heap, 0,0,0, true); // s_heap: explicit per-thread default heap

// custom heap (can be shared by threads)


heap_t heap = { .options = GUARDPAGES | ABORT, .name = "custom-1" },
*h = &heap; // it can free() blocks of other heaps (and vice-versa)
p = heap_malloc(h, size); // allocate block
heap_stats(h, 0,0,0, true); // get heap statistics

6 / 16
Given its crucial value for application development, it is beyond understanding that a venerable
allocator like GLIBC (which is 33 years old) does not offer the ability to create custom heaps.

(6) SLIMalloc can easily detect, locate and fix memory leaks – even in system libraries:

--- Microsoft Research malloc stress-test ----------------------------------------


SLIMalloc heap[default].opt(40): guardpages trace trim
----------------------------------------------------------------------------------
- THREADS:6, SCALE:10%, ITER:1000, LARGE:0, 1-SIZE:0
> 4 GLIBC memory leak(s) detected:
1.1 KB in 4 small-block(s)
calloc(0x7c8800000240, 288) 0x7f0bdf9e7ee5 ld-linux-x86-64.so _dl_allocate_tls()
calloc(0x7c8800000360, 288) 0x7f0bdf9e7ee5 ld-linux-x86-64.so _dl_allocate_tls()
calloc(0x7c8800000480, 288) 0x7f0bdf9e7ee5 ld-linux-x86-64.so _dl_allocate_tls()
calloc(0x7c88000005a0, 288) 0x7f0bdf9e7ee5 ld-linux-x86-64.so _dl_allocate_tls()
leaked blocks are freed now
- total time: 1.383 seconds (hh:mm:ss 00:00:01)
- user CPU time ................ 4.720 sec (0.944 per thread)
- system CPU time .............. 0.144 sec (0.029 per thread)
- VM, current virtual memory ... 17226698752 bytes (16.0 GB)
- RSS, current real RAM use .... 1908736 bytes (1.8 MB)
- RSS peak ..................... 9793536 bytes (9.3 MB)
- page reclaims ................ 33251328 bytes (31.7 MB)
- voluntary context switches ... 5627 (threads waiting, locked)
- involuntary context switches . 5302 (time slice expired)

In this particular example, if you launch new threads after the GLIBC “leaks” were freed by
SLIMalloc freeleaks, then pthread_create will crash because the memory cached by GLIBC to
skip malloc calls (probably in an attempt to scale better on multicore systems) is missing.

Third-party code “leaks” might be designed as an optimization for future use and if you
decide to remove it then you must know what you are doing (for example, freeing the
GLIBC blocks above has no consequences if you no longer create threads).

Many other GLIBC functions keep allocated memory instead of just using the stack or
dedicated PAGES allocated by mmap if persistence is required. This inefficient practice
generating deferred trouble should be banned, especially from system core libraries.

Due to the way blocks are allocated, these leaks can amount to a megabyte or more(!) like
GLIBC setlocale(LC_ALL, "") despite a few allocations amounting to very little memory:

--- allocations performed by GLIBC setlocale() -------------------------


malloc (0x7c0400000000, 5) at 0x7fcb186176c5 libc.so.6
free (0x7c0400000000, 8) at 0x7fcb18611c8f libc.so.6
malloc (0x7c3c00000000, 120) at 0x7fcb18611ac6 libc.so.6
malloc (0x7c0800000000, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7ce400000000, 776) at 0x7fcb18610df0 libc.so.6
malloc (0x7c3800000000, 112) at 0x7fcb18610df0 libc.so.6
malloc (0x7cf800000000, 952) at 0x7fcb18610df0 libc.so.6
malloc (0x7c6c00000000, 216) at 0x7fcb18610df0 libc.so.6
malloc (0x7cac00000000, 432) at 0x7fcb18610df0 libc.so.6
malloc (0x7c3400000000, 104) at 0x7fcb18610df0 libc.so.6

7 / 16
malloc (0x7c2c00000000, 88) at 0x7fcb18610df0 libc.so.6
malloc (0x7c3c00000078, 120) at 0x7fcb18610df0 libc.so.6
malloc (0x7c5400000000, 168) at 0x7fcb18610df0 libc.so.6
malloc (0x7c3400000068, 104) at 0x7fcb18610df0 libc.so.6
malloc (0x7c2800000000, 80) at 0x7fcb18610df0 libc.so.6
malloc (0x7c6000000000, 192) at 0x7fcb18610df0 libc.so.6
malloc (0x7c0800000010, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c0800000020, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c0800000030, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c0800000040, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c0800000050, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c0800000060, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c0800000070, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c0800000080, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c0800000090, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c08000000a0, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c08000000b0, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c08000000c0, 12) at 0x7fcb1866bb4a libc.so.6 __strdup()
malloc (0x7c08000000d0, 12) at 0x7fcb1860ffec libc.so.6
free ( (nil), 0) at 0x7fcb186103d8 libc.so.6 setlocale()
free ( (nil), 0) at 0x7fcb186103e2 libc.so.6 setlocale()

The trailing free(NULL) GLIBC calls above might be legitimate (ie: a variable that can either be
NULL or something else depending on the setlocale function parameters), but they can also be
dead code that you want to clean up. SLIMalloc lets you get that information effortlessly.

This shows how vital it is for long-lived applications to know exactly what the system (and other
third-party libraries) are doing with the memory they use. You might decide to rewrite GLIBC
functions to avoid such leaks potentially leading to (1) easy heap exploitation targets and (2)
memory fragmentation causing the (3) inability for your applications to release memory back to
the system (something that will end badly if your application is a long-lasting server process).

Whether or not you are using the SLIMalloc feature to detect and fix system memory leaks, it is
good to have it: memory leaks might also happen in your own applications and libraries.

(7) Let's now review the extended features implemented as functions. Most of these functions
check the validity and characteristics of pointers in real-time:

isgoodptr true if a pointer is valid (it belongs to allocated memory, including freed blocks)
isfreeableptr true if a pointer can be freed (belongs to still in use allocated memory)
ptr2heap a pointer to the heap that contains pointer, or NULL if no such heap exists
ptr2block the pointer's block address that can be freed, if any
ptr2size the block size of the specified pointer, if it is valid
size2used how much is allocated for a given size (useful to choose the optimal size)
freeclass release memory to the system, any new class allocation will reallocate an area
heap_stats get per-heap statistics with the breakdown per-size, still used, or used in the past
getleaks list the specified heap's allocated blocks that have not yet been freed
freeleaks free the specified heap's allocated blocks that have not yet been freed
heap_trim return the specified heap's allocated memory to the system... but keep it available!
heap_reset free the specified heap's allocated memory (new allocations will reallocate it)
heap_kill free the specified heap and all its allocated memory

8 / 16
As a picture is worth a thousand words, here is an example using some of the above functions in
action after malloc first, and then after free:

--- SMALL(1.9 KB) malloc(2047), memset(), free() -------------------


(2047 bytes requested, 2048 bytes allocated)
ptr:0x14000000000, freeable-block-addr:0x14000000000,
block-size:2048 bytes, freeable-ptr:yes good-ptr:yes
free() passed!
now, after free():
ptr:0x14000000000, freeable-block-addr:0x14000000000,
block-size:2048 bytes, freeable-ptr:no good-ptr:yes

Below, we have enabled the SLIMalloc opt.trace flag to trace allocations done by your own code
or by third-parties (GLIBC, system libraries, third-party libraries, whether they are closed-source
or open-source), and even if this takes place before or after your program's main:

--- allocations performed by GLIBC popen() ---------------------------


malloc (0x7c8000000100, 256) at 0x7f76616b2c48 libc.so.6 popen()
free (0x7c8000000100, 256) at 0x7f76616b0a25 libc.so.6 fclose()

Since these flags can enable SLIMalloc options within the code at any given time during the life
of a program, they can target a specific portion of the code (and even a single function call),
limiting the need to deal with gigabytes of trace data to find the searched information.

This is saving a lot of time usually wasted in conjectures and third-party tools that are not as
flexible (practical, or fast) as just a couple of if / then / else tests in your code.

4. SLIMalloc Allocator Implementation

How is it possible to do more (features, performance, scalability) with less (code, CPU and
RAM)? Two factors have a direct influence: (1) the architecture, and (2) the implementation.

The architecture of SLIMalloc is based on the architecture of SlimGuard (authored by US and


UK academics). Very little was changed to this architecture except the ability to use per-thread
and custom heaps, and to efficiently spot invalid pointers. The rest is made of either (drastic)
optimizations or new features.

In this paper we only compare SLIMalloc to GLIBC (legacy) and Microsoft Research's
Mimalloc (2019) depicted by its authors as the epitome of all allocators:

“We tested mimalloc against many other leading allocators over a wide range of
benchmarks and mimalloc consistently outperforms all others”.

Since Mimalloc offers a large range of what has been done before in terms of features, portability
and optimizations, we did not test older allocators – except one, the standard GLIBC allocator.

9 / 16
Why? Because with its non-secure design GLIBC has an advantage in performance and memory
savings. It is therefore quite a feat for a secure allocator to be noticeably faster and consume less
memory than GLIBC – especially if it also offers many more features.

5. Code-Size and Features: GLIBC / Microsoft Research Mimalloc / SLIMalloc

Source code

Allocator Language blank-lines comment-lines code-lines


GLIBC C 2005 3333 (48% of code) 6957
Mimalloc C 1709 2764 (33% of code) 8336
SLIMalloc C 135 716 (60% of code) 1188

Features (security and advanced)

Feature GLIBC Mimalloc SLIMalloc


Spot invalid pointers no too slow, unusable real-time

Block allocation errors no, crash very limited, crash all, report, continue

Canary constant debug mode only encoded

Guard-pages no mimalloc-page end chosen density

Segregated metadata no known-offset random by-design

Addr. randomization no at free() picking random blocks

Zero-memory on free no no yes

Delayed memory reuse no via delayed free() picking random blocks

Over-provisioning no no by-design

Detect memory leaks slow, clunky, no, but can be done via fast, locate (visible symbols
requires debug traversal (without or debug symbols) and fix
symbols location info)
Double/invalid-free/realloc abort / warn limited warning and choice to warn-abort or to
and/or continue blocking, crash for block-all, locate, warn and
with corruption invalid realloc() continue clean

6. Disclaimer

TWD Industries AG is based in the “Zurich great area”. Founded in 1998 to create and market
network products, TWD has paid for this research to upgrade G-WAN, TWD's Web application
server written in 2009 (under Windows and Linux) to fuel TWD's Global-WAN.

This work has been done by one engineer during three months, on a part-time basis.

During this research several bugs, some of them leading to memory allocator crashes, have been
found in SlimGuard and Mimalloc, and responsibly reported to their respective authors.

10 / 16
7. Performance: GLIBC / Microsoft Research Mimalloc / SLIMalloc

HW: 6-Core MacPro (1x Intel Xeon CPU W3680 @ 3.33GHz), 8 GB RAM DDR3 1333 MHz
OS : Ubuntu 14.04.2 LTS (sponsors are welcome to offer a more recent machine and OS)

---------------------------------------------------------------------------------------------------------------------
(1) Microsoft Research / malloc STRESS-TEST (6 threads, small-blocks, no exchange)
---------------------------------------------------------------------------------------------------------------------
Note: Microsoft Research warns that this stress test “tries to reflect real-world workloads, but
execution can still depend on random thread scheduling; do not use this test as a benchmark”.

Yet all the other benchmarks Microsoft has used, as user-mode processes, are also subject
to (1) the randomness of the kernel task-scheduler, on the top of (2) the (ever-increasing)
operating-system tasks running in the background, and (3) the overhead of the test itself
(which often amounts to the majority of the execution time instead of timing malloc/free
operations).

A test should be able to push things to the limits so we can find where things break and
improve them – and this is what this stress-test does. Don't be shy using it.

GLIBC: allocated memory trimming (RSS) takes place by default and is fast but does not go as
far as SLIMalloc, even with malloc_trim(0).

Mimalloc: as Mimalloc mi_collect did not much, we have tested Mimalloc with and without
mi_option_page_reset and mi_option_segment_reset which better trim the heaps (+25% of
execution time); we tried a mi_option_reset_delay within the [0-10,000] range but did not notice
any visible effect; finally, mi_reserve_huge_os_pages printed a failure message and provided no
visible benefit (our test machine does not have huge pages setup at boot time).

SLIMalloc: the enabled-by-default opt.trim option provided the expected results on the RSS with
a 10-15% overhead (depending on the options) with this Microsoft Research stress-test.

Results:

Allocator RSS peak RSS Stress Test


GLIBC 2.6 GiB 2.3 MiB 3.230 seconds (average)
Mimalloc 2.6 GiB 20-29 MiB 4.669 seconds (average)
SLIMalloc 2.3 GiB 1.8 MiB 2.279 seconds (average)

Interpretation:

GLIBC ..... the 2nd fastest, has the same “RSS peak” as Mimalloc with a good “RSS”.

MIMALLOC .. the slowest, has the same “RSS peak” as GLIBC, a 10 times higher “RSS”.

SLIMalloc . the fastest in all stress cases (exchanged pointers, large blocks), has
the lowest “RSS peak” value, and the lowest “RSS” value.

11 / 16
=============================================================== ===============================================================
GLIBC MIMalloc (secure:4)
--------------------------------------------------------------- ---------------------------------------------------------------
- THREADS:6, SCALE:5000%, ITER:1, LARGE:0, 1-SIZE:0 - THREADS:6, SCALE:5000%, ITER:1, LARGE:0, 1-SIZE:0
- total time: 3.245 seconds (hh:mm:ss 00:00:03) - total time: 4.663 seconds (hh:mm:ss 00:00:04)
- user CPU time ................ 5.739 sec (1.148 per thread) - user CPU time ................ 13.818 sec (2.764 per thread)
- system CPU time .............. 4.350 sec (0.870 per thread) - system CPU time .............. 3.030 sec (0.606 per thread)
- VM, current virtual memory ... 442904576 bytes (422.3 MB) - VM, current virtual memory ... 308789248 bytes (294.4 MB)
- RSS, current real RAM use .... 2437120 bytes (2.3 MB) - RSS, current real RAM use .... 21131264 bytes (20.1 MB)
- RSS peak ..................... 2837585920 bytes (2.6 GB) - RSS peak ..................... 2844844032 bytes (2.6 GB)
- page reclaims ................ 3858198528 bytes (3.6 GB) - page reclaims ................ 2984943616 bytes (2.7 GB)
- voluntary context switches ... 838439 (threads waiting, locked) - voluntary context switches ... 135253 (threads waiting, locked)
- involuntary context switches . 530 (time slice expired) - involuntary context switches . 1008 (time slice expired)
--------------------------------------------------------------- ---------------------------------------------------------------
- total time: 3.222 seconds (hh:mm:ss 00:00:03) - total time: 4.694 seconds (hh:mm:ss 00:00:04)
- user CPU time ................ 5.838 sec (1.168 per thread) - user CPU time ................ 13.863 sec (2.773 per thread)
- system CPU time .............. 4.172 sec (0.834 per thread) - system CPU time .............. 3.009 sec (0.602 per thread)
- VM, current virtual memory ... 442904576 bytes (422.3 MB) - VM, current virtual memory ... 308789248 bytes (294.4 MB)
- RSS, current real RAM use .... 2482176 bytes (2.3 MB) - RSS, current real RAM use .... 31285248 bytes (29.8 MB)
- RSS peak ..................... 2834841600 bytes (2.6 GB) - RSS peak ..................... 2829500416 bytes (2.6 GB)
- page reclaims ................ 3860295680 bytes (3.6 GB) - page reclaims ................ 2979389440 bytes (2.7 GB)
- voluntary context switches ... 840270 (threads waiting, locked) - voluntary context switches ... 142320 (threads waiting, locked)
- involuntary context switches . 511 (time slice expired) - involuntary context switches . 983 (time slice expired)
--------------------------------------------------------------- ---------------------------------------------------------------
- total time: 3.228 seconds (hh:mm:ss 00:00:03) - total time: 4.651 seconds (hh:mm:ss 00:00:04)
- user CPU time ................ 5.793 sec (1.159 per thread) - user CPU time ................ 13.739 sec (2.748 per thread)
- system CPU time .............. 4.258 sec (0.852 per thread) - system CPU time .............. 3.040 sec (0.608 per thread)
- VM, current virtual memory ... 442904576 bytes (422.3 MB) - VM, current virtual memory ... 308789248 bytes (294.4 MB)
- RSS, current real RAM use .... 2445312 bytes (2.3 MB) - RSS, current real RAM use .... 28696576 bytes (27.3 MB)
- RSS peak ..................... 2845052928 bytes (2.6 GB) - RSS peak ..................... 2830995456 bytes (2.6 GB)
- page reclaims ................ 3860291584 bytes (3.6 GB) - page reclaims ................ 3002880000 bytes (2.8 GB)
- voluntary context switches ... 841017 (threads waiting, locked) - voluntary context switches ... 141482 (threads waiting, locked)
- involuntary context switches . 493 (time slice expired) - involuntary context switches . 1029 (time slice expired)
--------------------------------------------------------------- ---------------------------------------------------------------
- total time: 3.224 seconds (hh:mm:ss 00:00:03) - total time: 4.670 seconds (hh:mm:ss 00:00:04)
- user CPU time ................ 5.182 sec (1.036 per thread) - user CPU time ................ 13.992 sec (2.798 per thread)
- system CPU time .............. 4.828 sec (0.966 per thread) - system CPU time .............. 2.909 sec (0.582 per thread)
- VM, current virtual memory ... 442904576 bytes (422.3 MB) - VM, current virtual memory ... 308789248 bytes (294.4 MB)
- RSS, current real RAM use .... 2428928 bytes (2.3 MB) - RSS, current real RAM use .... 25899008 bytes (24.7 MB)
- RSS peak ..................... 2833170432 bytes (2.6 GB) - RSS peak ..................... 2840764416 bytes (2.6 GB)
- page reclaims ................ 3860299776 bytes (3.6 GB) - page reclaims ................ 2937954304 bytes (2.7 GB)
- voluntary context switches ... 844973 (threads waiting, locked) - voluntary context switches ... 135609 (threads waiting, locked)
- involuntary context switches . 632 (time slice expired) - involuntary context switches . 991 (time slice expired)
=============================================================== ===============================================================

=============================================================== Mimalloc
SLIMalloc heap[default].opt(8): guardpages trim heap stats: peak total freed unit count
--------------------------------------------------------------- -----------------------------------------------------------------
- THREADS:6, SCALE:5000%, ITER:1, LARGE:0, 1-SIZE:0 normal 1: 13.4 mb 20.2 mb 20.2 mb 8 b 2.6 m
- total time: 2.264 seconds (hh:mm:ss 00:00:02) normal 4: 53.8 mb 81.2 mb 81.2 mb 32 b 2.6 m
- user CPU time ................ 4.216 sec (0.843 per thread) normal 6: 80.7 mb 121.8 mb 121.8 mb 48 b 2.6 m
- system CPU time .............. 2.993 sec (0.599 per thread) normal 8: 64 b 64 b 64 b 64 b 1
- VM, current virtual memory ... 17226702848 bytes (16.0 GB) normal 9: 134.6 mb 203.0 mb 203.0 mb 80 b 2.6 m
- RSS, current real RAM use .... 1925120 bytes (1.8 MB) normal 13: 269.0 mb 405.8 mb 405.8 mb 160 b 2.6 m
- RSS peak ..................... 2561155072 bytes (2.3 GB) normal 17: 213.2 mb 213.2 mb 213.2 mb 320 b 698.9 k
- page reclaims ................ 3875684352 bytes (3.6 GB) normal 21: 426.7 mb 426.7 mb 426.7 mb 640 b 699.1 k
- voluntary context switches ... 206893 (threads waiting, locked) normal 23: 671.7 mb 687.4 mb 687.4 mb 896 b 804.4 k
- involuntary context switches . 365 (time slice expired) normal 27: 61.8 mb 92.8 mb 92.8 mb 1.7 kb 54.3 k
--------------------------------------------------------------- normal 31: 123.1 mb 185.9 mb 185.9 mb 3.5 kb 54.4 k
- total time: 2.279 seconds (hh:mm:ss 00:00:02) normal 35: 249.1 mb 374.9 mb 374.9 mb 7.0 kb 54.8 k
- user CPU time ................ 3.528 sec (0.706 per thread) normal 39: 489.9 mb 744.3 mb 744.3 mb 14.0 kb 54.4 k
- system CPU time .............. 3.694 sec (0.739 per thread) normal 43: 395.7 mb 395.7 mb 395.7 mb 28.1 kb 14.4 k
- VM, current virtual memory ... 17226702848 bytes (16.0 GB) normal 47: 780.8 mb 780.8 mb 780.8 mb 56.2 kb 14.2 k
- RSS, current real RAM use .... 1912832 bytes (1.8 MB) normal 63: 5.2 mb 5.2 mb 5.2 mb 899.5 kb 6
- RSS peak ..................... 2563158016 bytes (2.3 GB) normal 67: 10.5 mb 10.5 mb 10.5 mb 1.7 mb 6
- page reclaims ................ 3863744512 bytes (3.6 GB) normal 68: 2.0 mb 2.0 mb 2.0 mb 2.0 mb 1
- voluntary context switches ... 206081 (threads waiting, locked)
- involuntary context switches . 407 (time slice expired)
--------------------------------------------------------------- SLIMalloc stats (one heap shown)
- total time: 2.288 seconds (hh:mm:ss 00:00:02)
- user CPU time ................ 3.550 sec (0.710 per thread) --- heap[5] ---------------------------
- system CPU time .............. 3.695 sec (0.739 per thread) allocated block-sizes[in-use/used]:
- VM, current virtual memory ... 17226702848 bytes (16.0 GB) 8[ 0/582541]
- RSS, current real RAM use .... 1957888 bytes (1.8 MB) 16[ 0/583742]
- RSS peak ..................... 2563497984 bytes (2.3 GB) 32[ 0/581327]
- page reclaims ................ 3858415616 bytes (3.6 GB) 64[ 0/583934]
- voluntary context switches ... 207324 (threads waiting, locked) 128[ 0/583772]
- involuntary context switches . 397 (time slice expired) 256[ 0/232068]
--------------------------------------------------------------- 512[ 0/232677]
- total time: 2.284 seconds (hh:mm:ss 00:00:02) 800[ 0/12014]
- user CPU time ................ 4.161 sec (0.832 per thread) 1600[ 0/12511]
- system CPU time .............. 3.079 sec (0.616 per thread) 3200[ 0/12585]
- VM, current virtual memory ... 17226702848 bytes (16.0 GB) 6400[ 0/12037]
- RSS, current real RAM use .... 1945600 bytes (1.8 MB) 12800[ 0/12115]
- RSS peak ..................... 2558676992 bytes (2.3 GB) 25600[ 0/ 5846]
- page reclaims ................ 3861565440 bytes (3.6 GB) 51200[ 0/ 7819]
- voluntary context switches ... 206583 (threads waiting, locked) 819200[ 0/ 1]
- involuntary context switches . 475 (time slice expired) 1638400[ 0/ 1]
===============================================================

Without trimming SLIMalloc requires 1.9 seconds, Mimalloc 3.7 seconds, GLIBC has no option.

12 / 16
---------------------------------------------------------------------------------------------------------------------
(2) Patricia Trie / Key-Value TEST (single-thread)
---------------------------------------------------------------------------------------------------------------------
As compared to the synthetic Microsoft Research malloc stress-test used earlier, a Key-Value
Store offers the best of both worlds (pressure on the allocator and a real-life use-case) if used
with many random-length keys (like the paragraphs of a very long book) that are added, sorted,
searched (top to bottom and in reverse order), modified, traversed, and freed.
=============================================================== ===============================================================
GLIBC MIMalloc (secure:4)
--------------------------------------------------------------- ---------------------------------------------------------------
- loaded 'Bible.txt', 114112 CR-terminated lines (paragraphs) - loaded 'Bible.txt', 114112 CR-terminated lines (paragraphs)
- user CPU time ................. 2.703 sec - user CPU time ................. 2.547 sec
- system CPU time ............... 0.020 sec - system CPU time ............... 0.024 sec
- VM, current virtual memory .... 14925824 bytes (14.2 MB) - VM, current virtual memory .... 275202048 bytes (262.4 MB)
- RSS, current real RAM use ..... 9723904 bytes (9.2 MB) - RSS, current real RAM use ..... 8392704 bytes (8.0 MB)
- RSS peak ...................... 15564800 bytes (14.8 MB) - RSS peak ...................... 18702336 bytes (17.8 MB)
- page reclaims ................. 12472320 bytes (11.9 MB) - page reclaims ................. 39092224 bytes (37.2 MB)
- voluntary context switches .... 0 (threads waiting, locked) - voluntary context switches .... 0 (threads waiting, locked)
- involuntary context switches .. 131 (time slice expired) - involuntary context switches .. 136 (time slice expired)
--------------------------------------------------------------- ---------------------------------------------------------------
- user CPU time ................. 2.725 sec - user CPU time ................. 2.528 sec
- system CPU time ............... 0.004 sec - system CPU time ............... 0.040 sec
- VM, current virtual memory .... 14925824 bytes (14.2 MB) - VM, current virtual memory .... 275202048 bytes (262.4 MB)
- RSS, current real RAM use ..... 9891840 bytes (9.4 MB) - RSS, current real RAM use ..... 9179136 bytes (8.7 MB)
- RSS peak ...................... 15667200 bytes (14.9 MB) - RSS peak ...................... 19566592 bytes (18.6 MB)
- page reclaims ................. 12488704 bytes (11.9 MB) - page reclaims ................. 39768064 bytes (37.9 MB)
- voluntary context switches .... 0 (threads waiting, locked) - voluntary context switches .... 0 (threads waiting, locked)
- involuntary context switches .. 118 (time slice expired) - involuntary context switches .. 85 (time slice expired)
--------------------------------------------------------------- ---------------------------------------------------------------
- user CPU time ................. 2.720 sec - user CPU time ................. 2.564 sec
- system CPU time ............... 0.004 sec - system CPU time ............... 0.016 sec
- VM, current virtual memory .... 14925824 bytes (14.2 MB) - VM, current virtual memory .... 275202048 bytes (262.4 MB)
- RSS, current real RAM use ..... 9838592 bytes (9.3 MB) - RSS, current real RAM use ..... 8507392 bytes (8.1 MB)
- RSS peak ...................... 15613952 bytes (14.9 MB) - RSS peak ...................... 18747392 bytes (17.9 MB)
- page reclaims ................. 12484608 bytes (11.9 MB) - page reclaims ................. 39104512 bytes (37.3 MB)
- voluntary context switches .... 0 (threads waiting, locked) - voluntary context switches .... 0 (threads waiting, locked)
- involuntary context switches .. 125 (time slice expired) - involuntary context switches .. 152 (time slice expired)
=============================================================== ===============================================================

=============================================================== Mimalloc
SLIMalloc heap[default].opt(8): guardpages trim heap stats: peak total freed unit count
--------------------------------------------------------------- -----------------------------------------------------------------
- loaded 'Bible.txt', 114112 CR-terminated lines (paragraphs) normal 1: 8 b 80 b 80 b 8 b 10
- user CPU time ................. 1.274 sec normal 4: 2.5 mb 25.7 mb 25.7 mb 32 b 843.1 k
- system CPU time ............... 0.008 sec normal 6: 388.3 kb 3.7 mb 3.7 mb 48 b 82.5 k
- VM, current virtual memory .... 85929414656 bytes (16.0 GB) normal 8: 492.3 kb 4.7 mb 4.7 mb 64 b 78.4 k
- RSS, current real RAM use ..... 1683456 bytes (1.6 MB) normal 9: 4.5 mb 45.2 mb 45.2 mb 80 b 593.7 k
- RSS peak ...................... 14581760 bytes (13.9 MB) normal 63: 899.5 kb 899.5 kb 899.5 kb 899.5 kb 1
- page reclaims ................. 9191424 bytes (8.7 MB)
- voluntary context switches .... 0 (threads waiting, locked) SLIMalloc stats
- involuntary context switches .. 146 (time slice expired)
--------------------------------------------------------------- --- heap[default] ---------------------------
- user CPU time ................. 1.275 sec allocated block-sizes[in-use/used]:
- system CPU time ............... 0.004 sec 8[ 0/ 2]
- VM, current virtual memory .... 85929414656 bytes (16.0 GB) 16[ 0/ 1038]
- RSS, current real RAM use ..... 1662976 bytes (1.6 MB) 24[ 0/91933]
- RSS peak ...................... 14561280 bytes (13.9 MB) 32[ 0/ 4594]
- page reclaims ................. 9179136 bytes (8.7 MB) 40[ 0/ 4589]
- voluntary context switches .... 39 (threads waiting, locked) 48[ 0/ 4504]
- involuntary context switches .. 143 (time slice expired) 56[ 0/ 4214]
--------------------------------------------------------------- 64[ 0/ 6504]
- user CPU time ................. 1.269 sec 72[ 0/59010]
- system CPU time ............... 0.008 sec 917504[ 0/ 1]
- VM, current virtual memory .... 85929418752 bytes (16.0 GB)
- RSS, current real RAM use ..... 1683456 bytes (1.6 MB)
- RSS peak ...................... 14581760 bytes (13.9 MB)
- page reclaims ................. 11280384 bytes (10.7 MB)
- voluntary context switches .... 0 (threads waiting, locked)
- involuntary context switches .. 147 (time slice expired)
===============================================================

SLIMalloc, despite its features, is significantly faster and requires less “RSS peak” (working)
memory as well as “RSS” (effective memory usage) at the end of the KV Store exercise.

Note that in this particular exercise (a KV Store) Mimalloc is slightly faster than GLIBC. This
might be due to its ability to deliver better locality – an “invisible” feature which value becomes
tangible when you need it.

13 / 16
---------------------------------------------------------------------------------------------------------------------
(3) Intel Corp. & NMT University / EBIZZY TEST (multi-thread)
---------------------------------------------------------------------------------------------------------------------
Part of “Linux Test Project”, it was written by Intel Corp. and Val Henson from NMT University:
“Ebizzy is designed to replicate a common web search application server workload. A lot of
search applications have the basic pattern: (1) get a request to find a certain record, (2) index
into the chunk of memory that contains it, (3) copy it into another chunk, then (4) look it up via
binary search. The interesting parts of this workload are:
* large working set
* data alloc/copy/free cycle
* unpredictable data access patterns
The records per second should be as high as possible, and the system time as low as possible.”
=============================================================== ===============================================================
GLIBC MIMalloc (secure:4)
--------------------------------------------------------------- ---------------------------------------------------------------
- 32033 records/s - 10727 records/s
- total time: 10.000 seconds - total time: 10.000 seconds
- user CPU time ................. 21.677 sec - user CPU time ................. 59.954 sec
- system CPU time ............... 38.456 sec - system CPU time ............... 0.120 sec
- VM, current virtual memory .... 713445376 bytes (680.4 MB) - VM, current virtual memory .... 577224704 bytes (550.4 MB)
- RSS, current real RAM use ..... 273051648 bytes (260.4 MB) - RSS, current real RAM use ..... 307564544 bytes (293.3 MB)
- RSS peak ...................... 274522112 bytes (261.8 MB) - RSS peak ...................... 329601024 bytes (314.3 MB)
- page reclaims ................. 126233833472 bytes (117.5 GB) - page reclaims ................. 137371648 bytes (131.0 MB)
- voluntary context switches .... 57 (threads waiting, locked) - voluntary context switches .... 45 (threads waiting, locked)
- involuntary context switches .. 4001 (time slice expired) - involuntary context switches .. 3435 (time slice expired)
--------------------------------------------------------------- ---------------------------------------------------------------
- 31852 records/s - 10761 records/s
- total time: 10.000 seconds - total time: 10.000 seconds
- user CPU time ................. 21.789 sec - user CPU time ................. 59.937 sec
- system CPU time ............... 38.287 sec - system CPU time ............... 0.100 sec
- VM, current virtual memory .... 713445376 bytes (680.4 MB) - VM, current virtual memory .... 577224704 bytes (550.4 MB)
- RSS, current real RAM use ..... 272961536 bytes (260.3 MB) - RSS, current real RAM use ..... 307916800 bytes (293.6 MB)
- RSS peak ...................... 274440192 bytes (261.7 MB) - RSS peak ...................... 330092544 bytes (314.8 MB)
- page reclaims ................. 125523292160 bytes (116.9 GB) - page reclaims ................. 137768960 bytes (131.3 MB)
- voluntary context switches .... 71 (threads waiting, locked) - voluntary context switches .... 48 (threads waiting, locked)
- involuntary context switches .. 2348 (time slice expired) - involuntary context switches .. 3520 (time slice expired)
--------------------------------------------------------------- ---------------------------------------------------------------
- 32188 records/s - 10792 records/s
- total time: 10.000 seconds - total time: 10.000 seconds
- user CPU time ................. 22.523 sec - user CPU time ................. 59.964 sec
- system CPU time ............... 37.541 sec - system CPU time ............... 0.068 sec
- VM, current virtual memory .... 713445376 bytes (680.4 MB) - VM, current virtual memory .... 577224704 bytes (550.4 MB)
- RSS, current real RAM use ..... 272908288 bytes (260.2 MB) - RSS, current real RAM use ..... 331599872 bytes (316.2 MB)
- RSS peak ...................... 274456576 bytes (261.7 MB) - RSS peak ...................... 331608064 bytes (316.2 MB)
- page reclaims ................. 126843703296 bytes (118.1 GB) - page reclaims ................. 137781248 bytes (131.4 MB)
- voluntary context switches .... 67 (threads waiting, locked) - voluntary context switches .... 33 (threads waiting, locked)
- involuntary context switches .. 3426 (time slice expired) - involuntary context switches .. 3383 (time slice expired)
=============================================================== ===============================================================

=============================================================== ===============================================================
SLIMalloc heap[default].opt(8): guardpages trim SLIMalloc heap[default].opt(8): guardpages trim
--------------------------------------------------------------- ---------------------------------------------------------------
- 37257 records/s - 37224 records/s
- total time: 10.000 seconds - total time: 10.000 seconds
- user CPU time ................. 60.019 sec - user CPU time ................. 59.992 sec
- system CPU time ............... 0.104 sec - system CPU time ............... 0.136 sec
- VM, current virtual memory .... 77370953728 bytes (72.0 GB) - VM, current virtual memory .... 77370953728 bytes (72.0 GB)
- RSS, current real RAM use ..... 270364672 bytes (257.8 MB) - RSS, current real RAM use ..... 270422016 bytes (257.9 MB)
- RSS peak ...................... 279605248 bytes (266.6 MB) - RSS peak ...................... 277475328 bytes (264.6 MB)
- page reclaims ................. 275443712 bytes (262.7 MB) - page reclaims ................. 275439616 bytes (262.6 MB)
- voluntary context switches .... 78 (threads waiting, locked) - voluntary context switches .... 91 (threads waiting, locked)
- involuntary context switches .. 2652 (time slice expired) - involuntary context switches .. 2571 (time slice expired)
--------------------------------------------------------------- ===============================================================
- 37259 records/s
- total time: 10.000 seconds
- user CPU time ................. 60.021 sec The final “RSS” is high for all allocators because the Ebizzy
- system CPU time ............... 0.108 sec test does not bother to free all the memory it has allocated.
- VM, current virtual memory .... 77370953728 bytes (72.0 GB)
- RSS, current real RAM use ..... 270372864 bytes (257.8 MB) See (in our enhanced version of ebizzy.c) how we free these
- RSS peak ...................... 280129536 bytes (267.1 MB) blocks, find and free the GLIBC pthread_create memory leaks
- page reclaims ................. 275447808 bytes (262.7 MB) and finally reset the thread heaps after threads were gone(!)
- voluntary context switches .... 80 (threads waiting, locked) to get a final RSS of 1.9 MB with SLIMalloc.
- involuntary context switches .. 2564 (time slice expired) The other allocators do not let you do that.
---------------------------------------------------------------

SLIMalloc has again the highest performance and the lowest RSS (real memory usage). GLIBC
is much closer to SLIMalloc this time, and Mimalloc is not performing well in this test.

14 / 16
8. Conclusion

Programming has always been about execution time. A memory allocator's scalability reduces the
execution time for multi-threaded allocations. It's performance reduces the execution time.
Trimming (giving freed memory back to the OS) reduces the execution time on the long-term,
and locality reduces the execution time with short-term benefits that cumulate on the long-term.

These properties not only save money by extracting more performance from the same hardware,
they also enhance the system stability, and reliability – benefiting to the whole ecosystem.

Security is an ever-rising concern, and the source of ever-increasing expenses. It is time to


recognize that system components bare some responsibility in this exposure to risk. An operating
system “standard” memory allocator should be secure in 2020.

We have presented a new class of memory allocators designed for performance and which, at the
same time, delivers (1) the most advanced security features available, (2) an unprecedented real-
time invalid-pointer detection capability preventing allocator misuse such as double-free or
invalid-free/realloc errors, (3) new troubleshooting features to assist developers with the exact
location of allocation errors, (4) the detection, location and correction of memory leaks, (5)
tracing of allocation calls in third-party code, (6) a very small source-code implementation, (7)
higher performance than GLIBC and Mimalloc, (8) a novel architecture making room for
advanced features without compromising performance and scalability, (9) and the automatic
release of memory to the OS during or after heavy workloads. As far as we know, SLIMalloc is
the first scalable memory allocator able to catch and report invalid pointers in real-time.

This is the first version of SLIMalloc. And its authors believe that there is still room for tangible
improvements and more features aimed at making machines safer and people's life easier.
Stay tuned!

9. References

[1] “The GNU C library's (glibc's) malloc library” (1987-present)


https://sourceware.org/glibc/wiki/MallocInternals

[2] “SlimGuard: A Secure and Memory Efficient Heap Allocator” (2019)


Beichen Liu, Pierre Olivier, Binoy Ravindran

[3] “Mimalloc: Free List Sharding in Action” (2019)


Daan Leijen, Benjamin Zorn, Leonardo de Moura, Microsoft Research

[4] “Guarder : A Tunable Secure Allocator” (2018)


Sam Silvestro, Hongyu Liu, Tianyi Liu, Zhiqiang Lin, Tongping Liu

[5] “FreeGuard: A Faster Secure Heap Allocator” (2017)


Sam Silvestro, Hongyu Liu, Corey Crosser, Zhiqiang Lin, Tongping Liu

[6] “Profiling a warehouse-scale computer” (2015)


Svilen Kanev, Parthasarathy Ranganathan , Juan Pablo Darago, Kim Hazelwood, Tipp

15 / 16
Moseley , Gu-Yeon Wei , David Brooks (Harvard University , Universidad de Buenos Aires,
Google, Yahoo Labs)

[7] “Security vulnerabilities of the top ten programming languages: C, Java, C++, Objective-C,
C#, PHP, Visual Basic, Python, Perl, and Ruby” (2015)
Stephen Turner, Journal of Technology Research

[8] “SoK: Eternal War in Memory” (2013)


László Szekeres, Mathias Payer, Tao Wei, Dawn Song

[9] “Watchdog: Hardware for Safe and Secure Manual Memory Management and Full Memory
Safety” (2012) Santosh Nagarakatte , Milo Martin , Stephan A. Zdancewic

[10] “Enhanced Operating System Security Through Efficient and Fine-grained Address Space
Randomization” (2012) Cristiano Giuffrida , Anton Kuijsten , Andrew S. Tanenbaum

[11] “DieHarder: Securing the Heap” (2011)


Gene Novark, Emery D. Berger

[12] “Exploiting Memory Corruption Vulnerabilities in the Java Runtime” (2011)


Joshua J. Drake, Black Hat Abu Dhabi

[13] “Heap Taichi: Exploiting Memory Allocation Granularity in Heap-Spraying Attacks” (2010)
Yu Ding, Tao Wei, TieLei Wang, Zhenkai Liang, Wei Zou

[14] “Improving memory management security for C and C++” (2008)


Yves Younan, Wouter Joosen, Frank Piessens, Hans Van den Eynden

[15] “A Memory Allocation Model For An Embedded Microkernel” (2007)


Dhammika Elkaduwe, Philip Derrin, Kevin Elphinstone

[16] “DieHard: Probabilistic Memory Safety for Unsafe Languages” (2006)


Emery D. Berger, Benjamin G. Zorn

[17] “Shredding Your Garbage: Reducing Data Lifetime Through Secure Deallocation” (2005)
Jim Chow, Ben Pfaff, Tal Garfinkel, Mendel Rosenblum

[18] “Security of memory allocators for C and C++” (2005)


Yves Younan , Wouter Joosen , Frank Piessens , Hans Van den Eynden

[19] “Hoard: A Scalable Memory Allocator for Multithreaded Applications” (2001)


Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, Paul R. Wilson

16 / 16

You might also like