Union and Cast in Deductive Verification ⋆

Yannick Moy1,2,3
France Télécom, Lannion, F-22307 INRIA Futurs, ProVal, Parc Orsay Université, F-91893 Lab. de Recherche en Informatique, Univ Paris-Sud, CNRS, Orsay, F-91405
2 1

3

Abstract. Deductive verification based on weakest-precondition calculus has proved effective at proving imperative programs, through a suitable encoding of memory as functional arrays (a.k.a. the Burstall-Bornat model). Unfortunately, this encoding of memory makes it impossible to support features like union and cast in C. We show that an interesting subset of those unions and casts can be encoded as structure subtyping, on which it is possible to extend the BurstallBornat encoding. We present an automatic translation from C to an intermediate language Jessie on which this extended interpretation is performed.

1 Introduction
Deductive verification aims at checking behavioral properties of programs, where behaviors are formally specified by logical annotations: function preconditions and postconditions, assertions, loop invariants, global invariants, etc. Verification tools based on deduction target mainly Java, annotated with J ML [14] or C#, annotated with the Spec# language [2]. These tools generate verification conditions (VC) from logical annotations given as stylized comments and program statements, using Hoare-style weakest precondition method. Proof of these verification conditions can be automated in most cases using automatic theorem provers, or otherwise performed manually in an interactive theorem prover. Our background framework is given by the W HY/Krakatoa/Caduceus family of tools [11] for the deductive verification of C and Java programs. It is common in deductive verification to model the heap as a set of functional arrays indexed by pointers, also known as the Burstall-Bornat memory model [3, 10]. This model relies on the strong assumption that memory is strongly typed, as in a Java-like language where objects are allocated on the heap in a typed-way (using new) and accessed only through their original type. This model makes it difficult to consider unions and casts which occur frequently in C, because these features allow reinterpreting typed memory through a different type. However, it is sometimes desirable to use union or cast and possible to do it in a safe way, as shown by the secure implementation for Managed Strings by CERT [4]. Safe use of union mandates that reading a variant field is only allowed if it was the last field written through. Safe use of cast relies on the layout of structures in memory and will be explained in Section 4. In this paper, we introduce support for safe union and cast in deductive verification. Our aim is to support as much as possible those unions and casts that do not lead to

This research is partly supported by CIFRE contract 2005/973 with France Télécom company, and ANR RNTL ‘CAT’

.2 Yannick Moy reinterpreting memory at the byte level. We show how to transform C programs with the kind of casts and unions that we target into Jessie programs.g. these inheritance-related casts account for 99% of casts in C programs. similar to Java class subtyping. We introduce the intermediate language Jessie as a cast-free. on which we perform deductive verification. but are used to simulate higher level constructs like discriminated unions or inheritance in a low level language like C. We finally conclude in Section 7. 2 From C to Jessie Jessie is an intermediate language for the deductive verification of Java and C. Overall. pointer arithmetic) and Java features not present in C (e. Section 3 and Section 4 describe respectively our support of union and cast. Arithmetic types integer and real are mathematical infinite precision integers and reals. Defining exactly when a type invariant holds. The remaining of this paper is organized as follows. with keywords in bold type.g. exception mechanism). it incorporates C features not present in Java (e. this intermediate language is designed for being suitable for deductive verification and a target of translation from C and Java. real y.col / 2. } unit darkens(Point[0] p) { if (p <: ColorPoint) { ColorPoint[0] q = p :> ColorPoint.g.col >= 0 && this... } type ColorPoint = Point with { integer col. union-free subset of C.col = q. 2. Section 5 reports on our current implementation and Section 6 discusses related work. when it . pointer cast) as well as the dynamic method call of Java. using the keyword with. Subtypes inherit all their parent type fields. } } First. Section 2 presents the Jessie language. according to [7]. q. type Point = { real x. Considering unions as a special case of cast. Logical annotations can be added in structures as type invariants or in functions as preconditions and postconditions. invariant color(this) = this. like in usual object-oriented languages. To some extent. with added support for structure subtyping. it is possible to define subtypes of a parent type. Jessie does not incorporate the low-level platform-dependent C features (e.col < 256. Logical annotations are part of the language.1 Illustrative Example The following program illustrates the most important features in Jessie.

Our goal is to prove C programs correct.2 Memory Model The weakest precondition calculus is performed using a model of memory based on functional arrays. σ) memory.g. In the following. that they follow their specification and they do not lead to errors (e. a VC is generated to prove its correctness. v). p.f = e’ is interpreted as store(f . σ) memory. .. Dynamic types of pointers in hierarchy S are represented by a functional array S_tag_table which associates a S tag to each S pointer. e) and a field update e. e. T ′ ) ⇒ instanceof (a.f is interpreted as select(f . E.g. i.. On our example. C program statements are translated to Jessie statements and C logical annotations to Jessie logical annotations. The type of a pointer p has the form S pointer where S denotes the root of the Jessie type hierarchy to which p source type belongs. an axiom is introduced: instanceof (a. Coercion. memory errors).. both pointers to Point and ColorPoint are interpreted as Point pointer. E. for each type hierarchy with root S and each structure name T in that hierarchy. e′ ) where select and store satisfy the theory of functional arrays: select(update(f . S) memory. If it is a downcast. we call object a value of type pointer in C. σ pointer → τ. The interpretation language we use is strongly typed. By extension. one may precise a pointer type with a range of valid indices. An instance-of operator. σ pointer. q) = select(f . A logical predicate instanceof of type (σ tag_table. If it is an upcast. e. This should be dealt with a type invariant system. σ tag) → boolean models the operator :>. A structure field of type t in any structure of S hierarchy has type (t. from a subtype to a parent type. p. we call object a value of a pointer type. σ pointer. denoted <:. σ) memory. q) if p = q. τ → (τ. T). allows viewing an object of a type as an object of another type in the same hierarchy. a symbol T_tag of type S tag represents source type T . p. In Jessie. field col of structure ColorPoint has type (integer.e. For each type subtype T ′ of T . Pointer[0] is a pointer that can be safely dereferenced. p) = v. Each source type in Jessie has a corresponding symbol of type tag that represents it. v).g. Functions select and store have polymorphic types: select : (τ. p.Union and Cast in Deductive Verification 3 may be violated and when it must be restored is an orthogonal problem. select(update(f . like the one in J ML [13] or Boogie [2].. To model the type hierarchy. denoted :>. We assume we have one such invariant system in this paper. from a parent type to a subtype. whose translation is a Jessie object. This means that a field access e. Point) memory. More precisely. Point_tag and ColorPoint_tag are symbols of type Point tag that represent respectively source types Point and ColorPoint. the simplest of which adds parameter type invariants as function preconditions and postconditions. it can be used freely. we first introduce a logical type tag. allows testing the type of an object in annotations.g. 2. update : (τ.

. we define an empty type packet_uni to represent the union. T_tag)).val1) //@ invariant discr1(this) = this→discr == 1 ⇒ variantof(this→uni. where variantof(p. it is possible to extract the corresponding Jessie program shown in Fig. The natural way to see the relation between a union type and its variants in Jessie is through the subtyping relation. the variant field val2 of field uni is set. we translate a union type into an empty structure type and each of its variants as a subtype of the empty structure type just defined. In our example. p :> T is interpreted into: assert(instanceof (S_tag_table. Stated otherwise.val2) } From this annotated C program. An implicit invariant of type packet is that whenever field discr is zero. the variant field val1 of field uni is set. Ergo [6] and Yices [9]. type packet_uni = { } type packet_uni_val1 = packet_uni with { integer val1. p. .] //@ invariant discr0(this) = this→discr == 0 ⇒ variantof(this→uni. 2. with any of the three automatic theorem provers we use: Simplify [8].f) means that pointer p of a union type t has currently variant f of t set. } type packet_uni_val2 = packet_uni with { real val2. struct packet { [. 1 shows a typical example of a C program where union feature is used in a principled way to simulate a discriminated union.. Thus. and whenever field discr is one. and two subtypes of packet_uni to represent the variants of the union.1 Motivating Example Fig. It is the case on our illustrative example. 3 Union As Subtyping 3. Statement p :> T is equal to p under the assumption that p <: T.4 Yannick Moy Casting is just the identity function with a VC. This axiomatization is simple enough that the VC generated are usually handled by automatic theorem provers. Before we describe how to do it automatically. } Type packet is then just a record with a discr field of base type integer and a uni field of object type packet_uni. We manually introduce the following logical annotations as stylized comments in packet definition to denote just that. Type invariants discr0 and discr1 are also translated to reflect translated types. we first do it step-by-step to show how our translation proceeds. p.

} real get(packet[0] p) requires discr0(p) && discr1(p). invariant discr0(this) = this. { switch (p.uni <: packet_uni_val2. } type packet = { integer discr. case 1: return p→uni. } type packet_uni_val2 = packet_uni with { real val2. float val2. } } Fig.val2. ensures discr0(p) && discr1(p).uni <: packet_uni_val1. union { int val1. Discriminated Union in C type packet_uni = { } type packet_uni_val1 = packet_uni with { integer val1. invariant discr1(this) = this.val1.val1.uni :> packet_uni_val2). } } Fig. float get(struct packet ⋆p) { switch (p→discr) { case 0: return p→uni.discr == 1 ⇒ this. }.uni :> packet_uni_val1). Discriminated Union in Jessie . 2.val2. packet_uni[0] uni. 1.Union and Cast in Deductive Verification 5 struct packet { int discr. } uni. case 1: return (p.discr == 0 ⇒ this.discr) { case 0: return (p.

f. packet_uni[0] uni. On this Jessie program.. our VC generator produces 11 VC that are all proved automatically. The result of the translation is shown in Fig. 1.uni <: packet_uni_val2..discr == 1 ⇒ this. Translate union field writes as coercions to the appropriate type. Translate each union type as a different empty type. 4 Cast As Physical Subtyping 4.discr == 0 ⇒ this. Ideally.. they need to be translated too. except types packet1 and packet2 . u->f = x becomes (u :>= T). All accesses to union fields should be proved correct by extracting from the program text the conditions that ensure a safe access. The second argument to variantof should be a field of a union. Additionally. E.2 General Translation Scheme Although C type invariants are necessary for this proof to succeed. invariant discr1(this) = this. Translate union field reads as coercions to the appropriate type. The new operator :>= is the strong coercion operator. when there are type invariants.1 Motivating Example Casts provide C programmers with additional flexibility. 4. they are not needed for the translation from C to Jessie.g. Translate variants of a union type as subtypes of its translation. and leads to the generation of VC later on. In more complex cases. and may invalidate fields of u previously available. as well as union assignment (when union is assigned as a whole). We also add the function preconditions and postconditions that are generated with a naive type invariant system. The automatic translation goes as follows: 1. the notation variantof is translated to the instance-of operator. as well as loop invariants. only the type invariant needs to be written. it may be necessary to add function preconditions and postconditions. E. 3 implements the same functionality as the one in Fig. it is the programmer’s responsability to provide the necessary logical annotations that account for a safe use of union. This applies to both program statements and logical annotations.6 Yannick Moy type packet { integer discr. 3. and does not lead to the generation of VC. It applies only to program statements. 2. as in our example. through weakest precondition computation. the program in Fig. invariant discr0(this) = this. E. 3.g. It must be noted that we do not consider here uses of the address-of operator. With this simple translation scheme. } Function get can then be translated with the appropriate coercion where accessing a field of the union.f = x. x = u->f becomes x = (u :> T).g. It is translated into the appropriate subtype that represents this variant in Jessie. Section 5 reports on our current work to support those constructs. Application of the strong coercion changes the dynamic type of pointer u. 2.uni <: packet_uni_val1. In particular.

val1. } and type packet2 = header with { real val2. }.Union and Cast in Deductive Verification 7 struct header { int discr. } and type packet1 = header with { integer val1. }. } } Fig. } } Fig. 4. case 1: return (p :> packet2). int val1. struct packet2 { struct header head. invariant discr1(this) = this. ensures discr0(p) && discr1(p).discr == 0 ⇒ this <: packet1. Discriminated Cast in C type header = { integer discr. } real get(header[0] p) requires discr0(p) && discr1(p). }. float val2. Discriminated Cast in Jessie . case 1: return ((struct packet2⋆) p)→val2.val2. { switch (p.discr) { case 0: return (p :> packet1). 3. struct packet1 { int discr. invariant discr0(this) = this.discr == 1 ⇒ this <: packet2. float get(struct header ⋆p) { switch (p→discr) { case 0: return ((struct packet1⋆) p)→val1.

struct packet1. The sequence of field types of type header is a prefix of the sequence of field types of type packet1.discr == 0 ⇒ this <: packet1. where typeof(p. This is what we call discriminated casts in C. However. it is safe 4 to view an object of type packet1 as an object of type header. invariant discr1(this) = this. // forward declaration struct packet2. we make all three types mutually recursive. as the standard mandates (6. Therefore.struct packet1) //@ invariant discr1(this) = this→discr == 1 ⇒ typeof(this. casts are used in a principled way to simulate a discriminated union. . we forbid use of gcc with option -fstrict-aliasing. the object is of type packet1. this expresses a subtyping relation. for different reasons. there is an implicit typing invariant for objects of type header: whenever the field discr of an object is zero. in Jessie. the object is of type packet2.g. 5]. but according to all application binary interfaces (ABI) implemented in C compilers we know of. in this version too. Again. whereas they corresponded to variants of the same union in the version with union. Both subtyping relations rely on some physical correspondance at the byte-level. type header has two subtypes packet1 and packet2. This can be made explicit using logical annotations as follows. see [16]). } and type packet1 = header with { integer val1.. which is also known as physical subtyping [18. invariant discr0(this) = this. For type packet2.struct packet2) } From this annotated C program. 4. and whenever the field discr of an object is one. things are simpler. Before we describe how to do it automatically.t) means that object p has type t.8 Yannick Moy seem unrelated in this version with cast.1 §13) that an object of type packet2 can be safely casted into an object of its first field type header. which expresses a subtyping relation.2. } 4 We do not consider here the possibility for the compiler to optimize code by assuming a restricted form of aliasing. we first do it step-by-step to show how our translation proceeds. type header = { integer discr.7. Although this is not mandated by the C standard [12] (contrary to a common belief. Finally. it is not obvious which type should be root of the hierarchy. Again. based on types.g. In our example. E.discr == 1 ⇒ this <: packet2. More generally.. it is possible to extract the corresponding Jessie program shown in Fig. In the version with cast. e. types packet1 and packet2 can be seen as subtypes of type header. To allow the translation of type invariant in type header to refer to its subtypes. // forward declaration struct header { int discr. their layout on this common prefix is the same. //@ invariant discr0(this) = this→discr == 0 ⇒ typeof(this. the form of the type hierarchy must be extracted from the program.

2 General Translation Scheme In order to automate the translation for casts. while we assume for simplicity that alignof (struct) is the common alignment of structures. which makes our analysis platform-independent. at offset zero. int). which is also the maximum alignment allowed.g. E. for packet1. that we sketch in Fig. on the size and alignment of types that are themselves platformdependent in C.. Function alignof returns the alignment of a base type. it is of the form (0. } 9 Translation of function get is very close to the one already seen in the version with union. The result of the translation is shown in Fig. We could easily adopt a finer . our VC generator produces 11 VC for this program to be correct.. as it depends. We are interested in the set of sub-objects of base type contained transitively in an object. The layout of type header is (0. which are proved automatically. 4. the type of the sub-object. the layout of types packet1 and packet2 can vary according to the platform. int) on every platform. Like in the version with union. field names may differ. 2 Definition 2. Definition 3.Union and Cast in Deductive Verification and type packet2 = header with { real val2. The layout of type t is the set of its base type sub-objects. Sub-objects may themselves have sub-objects. More precisely. we assume there is a simple algorithm to compute the layout of a structure. It must be noted that the layout of a type is platform-dependent. e. but on their order and type. We will call sub-objects of a type the sub-objects of an object of this type. This ensures accesses to a base type sub-object in memory through type t are also valid accesses if the original type of the object was type t ′ . that are pairs consisting in an offset from the beginning of the object where the sub-object begins and a type. we define formally our notion of physical subtyping. Size and alignment of types are manipulated symbolically. If the object is of base type. 4. A type t ′ is a physical subtype of type t if the layout of t is a subset of the layout of t ′ . Function sizeof returns the size of a base type. int) for some integer offset off . Contrary to the work of [5]. On the contrary. Example 1. 5. among other things. (off . A layout algorithm takes as input a list of types and returns a layout. We call base types in C all types that are not C structures. Definition 1. the syntax for the access may not be the same. Of course. An object may be of structure type or base type.g. we do not rely on the actual layout of fields in memory on a particular platform. we include it in the set. An object of structure type has sub-objects. We assume there exists such a layout algorithm A taking as input the list of field types of a structure.

alignof (float) = 4. int). int). int). On this other platform. the layout of header is still (0. The outtermost call to A on type t returns when current-layout contains the layout of t and current-size the size of t. Its first and last operations on lines 4 and 15 consist in increasing if necessary the current size to a multiple of the structure alignment. Consider now a different set of platform-dependent requirements: sizeof (int) = alignof (int) = 4. int). int). . Assume the following platform-dependent requirements: sizeof (int) = alignof (int) = 4. t) to current-layout increase current-size by sizeof (t) fi done increase current-size to multiple of alignof (struct) call A on field types of input structure Fig. float). The increase is kept as small as possible. float). packet1 and packet2.10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Yannick Moy init current-size to 0 init current-layout to empty set define A on input type-list : increase current-size to multiple of alignof (struct) for each t in type-list do alignt = if t is a structure then alignof (struct) else alignof (t) fi increase current-size to multiple of alignt if t is a structure then call A on the field types of t else add (current-size. (4. alignof (struct) = 4. but the layout of packet2 changes to (0. as in gcc for x86 [16]. Example 2.. int) and the layout of packet2 is (0. Its body is a loop (lines 5 to 14) that considers each type t in the list. as we do not know the exact size and alignment of types. Layout algorithm A notion of structure alignment. (8. to account for smaller structure alignement on structures that have only short and char fields. increases the current size to fit the alignment of t (line 7) and either calls A recursively if t is a structure (line 9) or adds the new base type sub-object to the layout (line 11) before increasing the current size by the size of t (line 12). A cannot be used directly. applying algorithm A gives the actual layouts of types header. alignof (float) = 4. alignof (struct) = 8. int).g. e. In the following. (4. as on line 7. we need to determine which of the pointed types is subtype of the other. we build a layout algorithm from A which makes it possible to decide physical subtyping in a platform-independent way. the layout of packet1 is (0. The layout of header is (0. the layout of packet1 is still (0. (4. int). A calls itself recursively on line 9 on sub-structures. On this platform. 2 Given a cast from pointer type to pointer type. 5.

. . stutter and trim. It returns the common prefix of these two lists. . tl′ ) = tl and tl = tl′ In order to prove that relation ⊳ is the expected physical subtyping relation. We define a dummy type ts to stand for the beginning and end of structures. First we define a type list tl attached to t and a type list tl′ attached to t ′ . ts flatten(t) = t Function stutter removes repeated occurrences of pseudo-type ts . prefix(tl. Using these we can define ⊳. tl) = stutter(ts . The list of field types of t is a layout type list for t. tl) = t. tl′ ) prefix(tl . t.tn } the type of a structure with field types t1 to tn in that order. A layout type list for a type t is a list of types for which there exists a layout algorithm returning the layout of t. flatten({t1 . denote tl a list of types. we introduce the notion of layout type list and prove some properties of functions flatten. and ε the empty list of types. tl = trim(stutter(flatten(t))) tl′ = trim(stutter(flatten(t′ ))) t ⊳ t′ iff prefix(tl . We write {t1 . Definition 4. flatten(t1 ) . flatten(tn ). Function flatten takes as input a type and returns a list of types (including pseudo-type ts ) that represent the input type. . stutter(tl) stutter(ε) = ε Function trim removes occurences of ts at the beginning and end of a list of types. prefix(t. . tl′ ) = t. ts . We use patternmatching in the rules we present below. ts ) = trim(tl) trim(tl) = tl Function prefix takes two lists of types in argument (separated by a colon). . . since algorithm A applied to it returns the layout of t. with the interpretation that t ⊳ t′ means that t ′ is a subtype of t. stutter(ts . Comma is used to separate types in a list.Union and Cast in Deductive Verification 11 Let t and t ′ be the pointed types. tl . trim(ts . tn }) = ts . with the rules that appear first being applied first. tl) stutter(t. tl) = trim(tl) trim(tl. tl′ ) = ε We denote the subtyping relation ⊳.

Take t and t ′ such that t ′ is a strict physical subtype of t. 2 Lemma 2. algorithm A ′′ returns a sub-object of base type t1 at offset zero for t and a sub-object of base type t2 = t1 at offset zero for t ′ . t ′ is a strict physical subtype of t. 2 Theorem 1. calling A ′ repeatedly on ts does not increase the current size after the first call. Proof. Proof. tl and tl′ share a common prefix. because ts would have been removed by trim. which violates the fact t ′ is a subtype of t. and we reach a contradiction. If tl and tl′ differ on their first element. tl is a layout type list for t and tl′ is a layout type list for t ′ . tl = tl′ and prefix(tl . and A ′′ applied to tl′ returns all base type sub-objects of t ′ . it is trivial that prefix(tl . with associated layout algorithm A ′′ . A ′′ simply calls A ′ . Suppose the contrary. since it is zero. According to lemma 3. Therefore. Given the layout algorithm A ′ associated to stutter(flatten(t)). A ′′ applied to tl returns all base type sub-objects of t. and differ on a first type t1 for tl and t2 for tl′ . i. algorithm A ′′ applied to tl′ returns a strict superset of the layout of t. whenever t ⊳ t′ . The layout algorithm A ′ associated to flatten(t) is also a layout algorithm associated to stutter(flatten(t)). Proof. We then show that ⊳ is the most precise strict physical subtyping relation given the assumptions we made in A . 2 Lemma 3. do as in the body of A . but it does not return the size of the structure together with its layout. but t ⊳ t′ does not hold. ⊳ defines a strict physical subtyping relation. Proof. tl′ ) = tl. Finally. algorithm A ′ applied to flatten(t) produces the same layout and size as algorithm A applied to the list of field types of t. Given the layout algorithm A associated to the list of field types of t. stutter(flatten(t)) is a layout type list for t. flatten(t) is a layout type list for t. trim(stutter(flatten(t))) is a layout type list for t. Final call to A ′ on ts might increase the current size. which correspond to the occurences of ts in flatten(t). the common prefix cannot be . Therefore. algorithm A ′′ applied to tl′ returns a superset of the layout of t. assuming a layout algorithm A . we build a layout algorithm A ′ associated to flatten(t). Consider every type in the list in turn. When the current type is ts .12 Yannick Moy Lemma 1. which is a multiple of structure alignment. Take tl and tl′ the lists of types obtained from t and t ′ as in the definition of ⊳. Indeed. Then. it must be a real type. Since tl is a prefix of tl′ . t ′ is a strict physical subtype of t. Subtyping relation ⊳ is the most precise platform-independent strict physical subtyping relation. Therefore. Initial call to A ′ on ts does not increase the current size. and the difference cannot be only pseudo-types ts that would have been removed by trim. If tl = tl′ . Since tl is a strict prefix of tl′ . tl′ ) = tl.e. We show first that given the assumptions we made in A . therefore removing ts at the end of the list has no effect. The only difference between A and A ′ is that A increases the current size to a multiple of the structure alignment at recursive calls entry and exit. but A ′′ ignores it. When the current type is not ts . according to our assumptions. Thus. since it is already a multiple of structure alignment. Then. increase the current size so that it is a multiple of the structure alignment. we build a layout algorithm A ′′ associated to trim(stutter(flatten(t))).

As mentioned in Section 3. Going back to our example.x. E. &u->f. trim(stutter(flatten(header))) = int trim(stutter(flatten(packet1))) = int. Then. it is not possible to decide that t ′ is a subtype of t.g.Union and Cast in Deductive Verification 13 empty. They introduce a platform-dependent notion of physical subtyping that is not directly comparable to ours.g. int trim(stutter(flatten(packet1))) = int. First. Generation of VC from Jessie programs is handled in the W HY/Krakatoa/Caduceus family of tools [11] for the deductive verification of C and Java programs. we can manage to have structures always referred to via pointers. 2 5 Implementation and Future Work We have implemented a translation from C to Jessie in F RAMAC. either t1 and t2 are both real types.. We are working to support those. building on what is done for structures in Caduceus [10]. so that the same pointer p can be used with a different type than those in the union hierarchy presented in this paper. or one of them is the pseudo-type ts . Given our lack of assumptions on base type alignment. Following what is done in [10]. Then. Translating the address-of operator applied to a union field should be straightforward. We present in Fig. we consider introducing multiple inheritance in Jessie. The Jessie language is itself currently developed to handle more features from C and Java. float This makes header a parent type of packet1 and packet2. It is more complex to translate the address-of operator applied to a subfield of a union field. packet1 and packet2.. The translation of [10] that inserts pointers to fields whose address is taken does not work well with our translation of union. a C analysis toolbox based on C IL [15]. ts . given our lack of assumptions on base type sizes. as expected. like the implementation for Managed Strings by CERT [4]. Secondly. In particular. we cannot know given the prefix if the current size reached is a multiple of any alignment. our presentation of physical subtyping is platform-independent. their algorithm emphasizes equality of field names . the address-of operator applied on a dereference of pointer p yields the same pointer p. 6 Related Work Siff and others [18] collected uses of casts in large C programs. Our implementation already handles the simple examples presented in this paper. For those cases. we compute the layout type list for header. we do not support yet uses of the address-of operator and union assignment. it is possible on some platform that struct T is a subtype of struct S while we cannot in general conclude the same on every platform. e. 2 Example 3. We are currently extending it to support real programs. Then. 6 examples to illustrate our comparison. algorithm A ′′ applied to tl and tl′ up to this prefix returns the same current size. an unknown strictly positive integer.

Nita et at. they do manual proofs in Isabelle/HOL. which is not needed for safe access. In their case. Moreover. they use this notion of physical subtyping to statically classify casts in upcasts. although they only precise that care must be taken to account for structure padding without formal description and proof. Writes to memory through a union type are performed in two steps: first. struct { int j. Although quite effective at discovering the proper invariants in large C programs. Our algorithm concludes that struct U is a subtype of struct T in a platform-independent way whereas their algorithm rejects such subtyping relation on any particular platform. even those that reinterpret typed memory through a different type. Fig. This produces constraints about the layout of structures and the size and alignment of types. downcasts or mismatch (remaining cases). Chandra and Reps use it in [5] to devise a physical type checking algorithm for C. } s. struct S { int i. we believe that the overhead of manually annotating union types is not big when doing program verification. In [16]. extract from programs the implicit platform-dependent assumptions that ensure all casts are either upcasts or downcasts in the physical subtyping hierarchy. struct S b. }. even on a platform where int and structures have the same alignment (and thus struct T is a physical subtype of struct S). Our physical subtyping bears much resemblance with the one in CCured [7]. int k. the original . at the cost of manually providing the rules for how the lifted views of the heap change during these unsafe operations. struct U { struct { int a.14 Yannick Moy and sub-structure boundaries for deciding subtyping. }. Andronick presents in her thesis [1] a similar idea to interpret C unions in deductive verification. Even more convincing in that matter is the case of struct T and struct U. in [20] to allow reasoning about C union and cast in deductive verification. struct T { int i. As mentioned in their paper. It is possible in their model to reason about any cast and union. } s.g. They choose to work with a byte-level memory model (described in [19]). that statically rejects programs that cannot be proved correct with respect to physical subtyping. Jhala et al. }. 6.. their method is limited to simple forms of invariants. In the context of Caduceus. int d. their algorithm fails to detect such subtyping relation. E. their algorithm does not target programs which emulate inheritance using discriminated union and cast as we do. It is rather an additional requirement they impose as a good software engineering practice. have described in [17] an algorithm to automatically discover the type invariants that guarantee proper use of union as discriminated union in C. Comparing Physical Subtyping Algorithms In [18]. with lifting functions providing a typed view of the heap. int j. Another direction was taken by Tuch et al.

Bart Jacobs. but also better suited for automatic proof. Acknowledgements I would like to thank Claude Marché and Pierre Crégut for their advisory work. 2006. June 2003. Robert DeLine. Therefore.cert. and use it to build a hierarchy of types based on the casts present in the program. CERT. 2. Mike Barnett. June Andronick. 2006.html. we assume a general algorithm to compute the layout of structures. Matthew Harren. Notes.fr/.Union and Cast in Deductive Verification 15 field is updated. In Mathematics of Program Construction. http: //ergo. Again. 3. http://www. it is sufficient to annotate the C program with a type invariant for the union to specify which variant should be accessible at any time. Leino. We deduce an algorithm to decide platform-independent physical subtyping. Jeremy Condit. Managed String Library. Richard Bornat. Scott McPeak. pages 102–126. References 1. Then. SIGSOFT Softw. Université Paris-Sud. The main problem is to define a physical subtyping relation in a platform-independent way. 7.org/secure-coding/ managedstring. Union used as discriminated union is easily translated into structure subtyping. Boogie: A modular reusable verifier for object-oriented programs. C programs rely on such subtyping in many cases where the C standard does not enforce it. 2000. pages 232–244. George C. 6. FMCO 2005. February 2006. . Satish Chandra and Thomas Reps. thus allowing only correctly typed access to memory. and sufficient in most cases. and K. Bor-Yuh Evan Chang. Modélisation et Vérification Formelles de Systèmes Embarqués dans les Cartes à Microprocesseur – Plate-Forme Java Card et Système d’Exploitation. Ergo automatic theorem prover. 7 Conclusion The classical Burstall-Bornat memory model for imperative languages does not directly fit with uses of union and cast in C. 5. and the anonymous referee for his useful remarks. PhD thesis. similar to what is done in C compilers. their use can be restricted to simulate inheritance in a low-level language like C. and Westley Weimer. rules for reinterpretation must be given manually. 4. we translate C programs with restricted unions and casts into Jessie programs on which we perform deductive verification. then a synchronization operation lifts the changes to other fields of the union. 1999. Sylvain Conchon and Evelyne Contejean. In Programming Language Design and Implementation.lri. Finally. Our structure subtyping feature is more restricted than both approaches. CCured in the real world. 24(5):66–75. Necula. Eng. Rustan M. Proving pointer programs in Hoare logic. Cast used as discriminated cast can be translated in a similar way. Physical type checking for c. where each variant of the union becomes a subtype. although with more difficulties. Although these features allow arbitrary reinterpretation of typed memory through different types.

and Mike Barnett. http: //www. State of the union: Type inference via craig interpolation. Albert L. 19th International Conference on Computer Aided Verification. Wolfram Schulte. editors. Minnesota. http://www.org/JTC1/SC22/WG14/www/docs/n1124.pdf. 1999. SIGSOFT Softw.csl. Leavens. volume 3835 of Lecture Notes in Computer Science. 16. 2002.cs. Clyde Ruby. The ANSI C standard (C99). Dan Grossman. Computer Science Laboratory. 2000. In Geoff Sutcliffe and Andrei Voronkov. http://yices. Saxe. Notes.sri. The YICES SMT Solver. Jean-Christophe Filliâtre and Claude Marché. and James B. November 2004. and Michael Norrish. Gerwin Klein. Erik Poll. Berlin. 20. In Werner Damm and Holger Hermanns. Rustan M. TACAS’07. OOPSLA ’00. Satish Chandra. UK. Proc. Types. 52(3):365–473. Harvey Tuch and Gerwin Klein. editors. 9. pages 213–228.washington. Springer. David Detlefs. Coping with type casts in c. In Proc. 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’07). pages 474–488. K. pages 180–198. USA. ESEC/FSE-7. 15. 13. CC ’02.com. In Proc. Krishna Kunchithapadam. 11. Multi-prover verification of C programs. A unified memory model for pointers. 6th International Conference on Formal Engineering Methods. Scott McPeak. International Organization for Standardization (ISO). Preliminary design of jml: a behavioral interface specification language for java. 12. 17. 12th International Conference on Logic for Programming Artificial Intelligence and Reasoning (LPAR-12). Jean-Christophe Filliâtre and Claude Marché.edu/homes/djg/papers/tidlls. December 2005. Greg Nelson. France. January 2007. Cil: Intermediate language and tools for analysis and transformation of c programs. and Bart Jacobs. Michael Siff.16 Yannick Moy 8. JML: notations and tools supporting detailed design in Java. Lecture Notes in Computer Science. Seattle. Jamaica. volume 3308 of Lecture Notes in Computer Science. Shree Prakash Rahul. In Martin Hofmann and Matthias Felleisen. bytes. and Craig Chambers. 31(3):1–38. Leino. 2007. George C. London. and Clyde Ruby. Gary T. Gary T. London. Germany. Marius Nita. 19. J. Baker. 2005. Simplify: a theorem prover for program checking. Nice. In Proc. ACM. and separation logic. Bruno Dutertre and Leonardo de Moura. editors. pages 15–29. editors. October 2006. 10. In Jim Davies. and Westley Weimer. The Why/Krakatoa/Caduceus platform for deductive program verification. Thomas Ball.pdf. 14. A theory of implementation-dependent low-level software. July 2007. Harvey Tuch. Rupak Majumdar. Ru-Gang Xu Ranjit Jhala.open-std. 2006. pages 97–108. In Proc. Leavens. and Thomas Reps. Springer. UK. Eng. . Necula. SRI International. WA. 18. 2006. pages 105–106.