You are on page 1of 7

Memory consumption analysis for a functional

and imperative language

Jérémie Salvucci and Emmanuel Chailloux

Sorbonne Universités,
Université Pierre et Marie Curie (Paris 06),
UMR 7606, Laboratoire d’Informatique de Paris 6,
4 Place Jussieu, 75005 Paris, France
jeremie.salvucci@lip6.fr
emmanuel.chailloux@lip6.fr

Abstract. The omnipresence of resource-constrained embedded systems


makes them critical components. Programmers have to provide strong
guarantees about their behavior at runtime to make them reliable. Among
these, providing statically an upper bound of live memory at runtime is
mandatory to avoid heap overflows. In this paper, we try to address
this problem for a language à la ML mixing functional and imperative
features.

1 Introduction
Resource consumption analysis started in the late 70s with METRIC[8] target-
ting the worst case execution time of programs written in a pure subset of Lisp.
Based on recurrence relations, it can be adapted to memory consumption analy-
sis. Since then, new methods have emerged from both purely functional and im-
perative communities. Recently, two powerful type systems have been proposed
to get an upper bound of allocated memory for purely functional programs : sized
types[5] in the Embounded project developing HUME[7] and automatic amor-
tized analysis for the RAML[4] language. They do not require any actions from
the programmer to perform their analysis. This comes at a cost, only linear and
polynomial bounds can be infered. Aside type systems, two projects target the
Java language : COSTA[1] which relies on recurrence relations and JConsume[2]
based on invariants on iteration spaces that can be infered in some cases but
have to be provided by the programmer in complex cases.

Currently, dynamic allocation is often prohibited in the industrial environ-


ment to avoid heap overflows. We propose here to conceive a statically typed
functional and imperative language relying on regions to manage memory and
a static analysis simultaneously. This analysis combines automatic amortized
analysis and invariants on iteration spaces. Depending on the programming style
used for a function, one of these analysis is applied. To prevent over pessimistic
amounts of live memory, regions allow us to consider freed memory along the
execution. This would provide a way to use dynamic allocation at runtime and
avoid heap overflows at the same time.
2 Language

In modern high-level languages, memory management is often performed by a


garbage collector. Mainly ruled by dynamic criteria, predicting its behavior is a
difficult problem. But to prevent over pessimistic amounts of live memory, we
need to consider freed memory along the execution.

To circumvent this problem, we developed a statically typed language à la


ML equipped with a specific memory management mechanism : regions[6]. A
region represents a set of data whose lifespan is similar. Originally developed
to bring back lexical scope to heap-allocated values, they suffered memory leaks
due to the stack discipline. A recent work[3] shows that adding capabilities to the
system makes it more general without involving such a discipline. A capability
can be seen as a permission to operate on a region and as a witness that data
within the related regions are still necessary for the rest of the computation.
This is a compile-time mechanism which allows us to consider freed memory.

expressions description
hei ::= () unit
| hbi booleans
| hni integers
| hx i variables
| fun x → hei @ ρ functions
| he0 i he1 i function application
| if hec i then het i else hef i conditional
| let hx i = hea i in heb i variable binding
| let rec hf i hx i = heb i in hei recursive binding (function only)
| (hea i,heb i) @ hρi couple construction
| Πi hei couple projections
| ref hei @ hρi reference
| hei := hei assignment
| !hei dereference
| newrgn () new region primitive
| aliasrgn ρ in hei sharing a region handler
| freergn ρ free region primitive

Fig. 1. Expressions

From a programmer point of view, regions can be seen as a way to man-


age memory through the use of four primitives : newrgn, @, aliasrgn, freergn.
Correctness of these operations is ensured by the type system at compile-time.
The syntax in figure 1 shows that each expression whose evaluation turns into
heap-allocated value is annotated with @ to specify the region where it is al-
located at runtime. Each operation related to region-allocated values needs the
associated capability of the relevant region. The type system checks that the
right capability can be presented. If not, this is considered as a type error.
Types Description
hτs i ::= unit singleton
| bool boolean
| int integer
| α type variables
hτ i ::= hτs i
| (hτa i → hτb i, ρ) closure
| (hτa i × hτb i, ρ) couples
| (ref hτ i, ρ) references
| hnd ρ region handlers

Fig. 2. Types

To keep things concise, we do not introduce polymorphism in this extended


abstract but it does not prevent the analysis to add it. The typing judgement
has the following shape

C; Γ ` e : τ ; C 0
where C is a set of capabilities and Γ a typing environment. It reads as
follows: “given a set of capabilities C and a typing environment Γ , the expression
e has type τ and returns a set of capabilities C 0 ”.
There are two kinds of variables. Those bound to stack values and those
bound to region-allocated values. RVar is the rule for the last. As you can see,
the type (see figure 2) of this expression contains a region name. This rule checks
that the access to the corresponding region is still sound. The ⊕ is a set union
operator with some constraints over linear capabilities.

Γ (x) = (τ, r) C = rq ⊕ C 0
(RVar)
C; Γ ` x : (τ, r); C
Typing a function requires planning for future calls. For instance, if some free
variables are captured then the relevant set of capabilities has to be presented
at each call site. To perform this verification, the arrow type is augmented with
Cin and Cout . Cin represents the set of capabilities needed to evaluate the func-
tion body and Cout represents the new set of capabities once the evaluation is
over. Moreover, we need to be sure that the closure does not capture any linear
capabilities. The predicate unrestricted checks this.

Γ (ρ) = hnd r Cin ⊕ C = rq ⊕ C 0


unrestricted(Cin ) Cin ; Γ, x : τx ` e : τ ; Cout
(Fun)
in C
Cin ⊕ C; Γ ` f un x → e @ ρ : τx −−−→ τ ; Cin ⊕ C
Cout

The application rule, APP, follows immediately the function rule. At each
call site, we check that C contains Cin , the relevant set of capabilities to evaluate
the function body. Cout represents capabilities that are available after typing the
function body. Here ≤ can be seen as a subtyping relation : Cv has to allow as
many operations as Cin .
Cin
C; Γ ` ef : τx −−−→ τ ; Cf
Cout
Cf ; Γ ` ev : τx ; Cv Cv ≤ Cin
(App)
C; Γ ` ef ev : τ ; Cv − (Cin − Cout ) + (Cout − Cin )

To introduce capabilities in the system, the primitive newrgn has to be used.


We can see that the capability is qualified with a linear property. At creation,
we know that it has not been shared. This is an important criterion for freeing
a region.

r∈
/C
(New)
C; Γ ` newrgn () : hnd r; r1 ⊕ C
Sometimes duplicating a capability is necessary. Especially when you need to
pass twice a region handler to a function. To perform this, aliasrgn can be used.
Leaving the scope of this primitive requires some checks to ensure correctness.

r+ ⊕ C; Γ ` e : τ ; r+ ⊕ C 0
(Alias)
r1 ⊕ C; Γ ` aliasrgn ρ in e : τ ; r1 ⊕ C 0
The most interesting rule for the analysis is Free. Here, linearity is the
important part, it ensures that the region handler is not shared. Hence, the
corresponding region can be freed in a sound way.

Γ (ρ) = hnd r
1 (Free)
r ⊕ C; Γ ` f reergn ρ : unit; C

Hence, we have a set of rules giving information about the memory behavior
of our programs. That is essential to perform a memory consumption analysis.

3 Analysis
The goal of the analysis is to provide an upper bound at compile-time of live
memory at runtime considering freed memory. To perform this, we rely on the
correctness of the previous type system. The analysis computes the sizes of
the different regions involved symbolically. It proceeds by combining several
parameterized analyses depending on the programming style used at the function
level.

Here, we distinguish two programming styles : purely functional and imper-


ative. To do this, we rely on an effect system. If a function body performs a
side-effect, then the corresponding function is considered impure. If the side-
effect is local to the function then from a caller point of view, this function is
considered pure.
Pure functions are analyzed thanks to automatic amortized analysis. This
analysis does not require any annotations to provide a polynomial bound of the
memory consumption of the analyzed function. Imperative functions are ana-
lyzed with invariants on iteration spaces. These invariants are either provided
by external tools or directly by the programmer. Both analyses return symbolic
expressions describing sizes of the different regions. Thanks to the f reergn prim-
itive, we know when a region size can be safely considered empty. The imperative
part of the language can introduce side-effects visible in the purely functional
part of the program. Hence, we need to track them to propagate them correctly.
The region mechanism gives us the locations of these effects. With this informa-
tion, we can safely combine the different results and get the upper bound of live
memory.

4 Example

The following example shows how the analysis works. We assume that the lan-
guage is augmented with lists and pattern matching (this changes neither the
language nor the analysis). The main function is rev append which concatenates
two lists by reversing the first one to be tail recursive. This function can written
in at least two different styles : purely functional and imperative.
This program builds two regions, r and rr and allocates two lists, xs and ys,
in r and rr respectively. Then, it concatenates xs and ys thanks to rev append
and frees the region rr.
let r = newrgn () in
let ys = [12; 15; 18] @ r in
let rr = newrgn () in
let xs = [3; 6; 9] @ rr in
let zs = rev_append xs ys r in
freergn rr

The left version employs a purely functional style and the right version an
imperative one with the side-effect on the reference rs along the computation.
The effect system captures this difference and allows different analyses to be
performed and combined.

functional rev append imperative rev append


let rev_append xs ys r =
let rs = ref ys in
let rec rev_append xs ys r =
let rec loop xs =
match xs with
match xs with
| Nil -> ys
| Nil -> ()
| Cons (h , t ) ->
| Cons (h , t ) ->
let ys ’ = Cons (h , ys ) @ r in
rs := Cons (h ,! rs ) @ r ;
rev_append t ys ’ r
loop t
in loop xs ; ! rs
As you can see here, each region contains the structure of the list. The type
of the rev append function gives us information about its memory allocation
behavior.

rev append : (α list, ra ) → (α list, rb ) → hnd rb → (α list, rb )

The first list can be allocated in a region ra , the second in a region rb but in
the end the returned list will be allocated in region rb . This information is useful
to track the different lifespans of the regions involved in the computations.
Pure functions are analyzed with the automatic amortized analysis. It ex-
tracts a set of constraints of the function and tries to minimize it. In rev append,
the interesting part is the application of the data constructor Cons. If before
this application, the amount of memory available is n, then the constraint
n ≤ KCons + n0 , where n0 is the amount of memory available after, needs to
be satisfied. KCons represents the amount of memory necessary to allocate an
element of a list. In this case, the extracted cost is proportional to the size of
the first list. Pure functions do not act on the program state, hence there is no
information to propagate.
The imperative version of rev append is analyzed thanks to invariants on
iteration spaces. Here, the side-effect is local to the function. Hence, the amount
of allocated memory is the only information propagated. Here, the invariant is
length !rs = size xs + size ys where size ys is a constant. It is linear and could
be obtained in an automatic way. In other cases, we would rely on programmer
annotations. From this, we can deduce the amount of memory allocated. In this
case, it is also proportional to the length of the first list.
In this example, both analyses returned similar results. Then, we can instan-
tiate symbolic expressions to get the necessary amount of memory to execute the
program in a safe way. Here, we can see that the region rr is freed at the end. If
the program was larger, this region would have been considered as non-existent
to analyze the rest of the allocated memory.

5 Conclusion

This analysis allows us to treat programs mixing functional and imperative fea-
tures by combining analyses working well on each style. Discrimination between
these styles is made with an effect system at the function level. Automatic amor-
tized analysis for pure functions and invariants on imperatives ones. Combining
these analyses can be done in a sound way thanks to a type system based on
regions. These regions give us locations of side-effects and lifespans of the regions
involved in a computation.
Correctness of this analysis relies on the correctness of the type system which
has been proved through progress and preservation lemmas. To validate this
approach in practice, a prototype is currently being developed.
References
1. Albert, E., Arenas, P., Genaim, S., Puebla, G.: Closed-form upper bounds in static
cost analysis. J. Autom. Reasoning 46(2), 161–203 (2011)
2. Braberman, V.A., Garbervetsky, D., Hym, S., Yovine, S.: Summary-based inference
of quantitative bounds of live heap objects. Sci. Comput. Program. 92, 56–84 (2014)
3. Fluet, M., Morrisett, G., Ahmed, A.J.: Linear Regions Are All You Need. In: Pro-
ceedings of the 15th European Symposium on Programming. pp. 7–21. ESOP 06
(2006)
4. Hofmann, M., Jost, S.: Type-Based Amortised Heap-Space Analysis. In: Proceedings
of the 15th European Symposium on Programming. pp. 22–37. ESOP 06 (2006)
5. Hughes, J., Pareto, L., Sabry, A.: Proving the correctness of reactive systems using
sized types. In: Proceedings of the 23rd Symposium on Principles of Programming
Languages, POPL 96. pp. 410–423 (1996)
6. Tofte, M., Talpin, J.P.: Implementation of the Typed Call-by-value λ–calculus us-
ing a Stack of Regions. In: Proceedings of the 21st Symposium on Principles of
Programming Languages. pp. 188–201. POPL 94 (1994)
7. Vasconcelos, P.: Space cost analysis using sized types. Ph.D. thesis, University of St
Andrews (2008)
8. Wegbreit, B.: Mechanical program analysis. Commun. ACM 18(9), 528–539 (Sep
1975)

You might also like