Professional Documents
Culture Documents
Xavier Leroy
INRIA Paris-Rocquencourt
X. Leroy (INRIA)
Compiler verication
DTP 2008
1 / 55
Source program
Compiler
Executable machine code Bugs in the compiler can lead to incorrect machine code being generated from a correct source program.
X. Leroy (INRIA)
Compiler verication
DTP 2008
2 / 55
Source program
Compiler
Executable machine code Non-critical sofware: Compiler bugs are negligible compared with those of the program itself.
X. Leroy (INRIA)
Compiler verication
DTP 2008
2 / 55
Source program
Compiler
Test
Critical software certied by systematic testing: What is tested: the executable code generated by the compiler. Compiler bugs are detected along with those of the program.
X. Leroy (INRIA) Compiler verication DTP 2008 2 / 55
Source program
Formal verication
Compiler
Executable machine code Critical software certied by formal methods: What is formally veried: the source code, not the executable code. Compiler bugs can invalidate the guarantees obtained by formal methods.
X. Leroy (INRIA) Compiler verication DTP 2008 2 / 55
Formal verication
Compiler
Executable machine code Formally veried compiler: Guarantees that the generated executable code behaves as prescribed by the semantics of the source program.
X. Leroy (INRIA) Compiler verication DTP 2008 2 / 55
In reality. . .
A great many tools are involved in the production and verication of critical software: Model checker Program prover Static analyzers Compiler Code reviews Test
Executable
X. Leroy (INRIA)
Compiler verication
DTP 2008
3 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
4 / 55
Outline
1 2 3 4 5 6 7
Introduction: Can you trust your compiler? Formally veried compilers The Compcert experiment Programming the veried compiler Case study: construction of a control-ow graph Dependent types to the rescue? Perspectives
X. Leroy (INRIA)
Compiler verication
DTP 2008
5 / 55
Apply formal methods to the compiler itself to prove a semantic preservation property:
Theorem
For all source codes S, if the compiler generates machine code C from source S, without reporting a compilation error, then C behaves like S. Note: compilers are allowed to fail (ill-formed source code, or capacity exceeded).
X. Leroy (INRIA)
Compiler verication
DTP 2008
6 / 55
The observable behaviors of the source and compiled programs are identical: b, S b C b
X. Leroy (INRIA)
Compiler verication
DTP 2008
7 / 55
Compilers are allowed to rene behaviors, and replace undened behaviors by more specic behaviors. b, S b = (b, C b) (b , C b = S b ) If the target language is deterministic, this is implied by b, S b = C b
X. Leroy (INRIA)
Compiler verication
DTP 2008
8 / 55
Let Spec : behavior Prop be a functional specication for the program. If the source satises Spec, so does the compiled code. (b, S b b, S b = Spec(b)) = (b, C b b, C b = Spec(b)) Implied by the previous property (preservation of not going wrong behaviors). Special case: preservation of type and memory safety.
X. Leroy (INRIA)
Compiler verication
DTP 2008
9 / 55
Model the compiler as a function Comp : Source Code + Error and prove that S, C , b, Comp(S) = C = S C (observational equivalence) using a proof assistant. Note: complex data structures + recursive algorithms interactive program proof is a necessity.
X. Leroy (INRIA)
Compiler verication
DTP 2008
10 / 55
Validate a posteriori the results of compilation: Comp : Source Code + Error Validator : Source Code bool
If Comp(S) = C and Validator (S, C ) = true, success. Otherwise, error. It suces to prove that the validator is correct: S, C , Validator (S, C ) = true = S C The compiler itself needs not be proved.
X. Leroy (INRIA)
Compiler verication
DTP 2008
11 / 55
Comp : Source Prop Proof Code Certicate + Error Checker : Prop Code Certicate bool
If Comp(S, P) = (C , ) and Validator (P, C , ) = true, success. Otherwise, error. Assume that the checker is proved correct: P, C , , Checker (P, C , ) = true C |= P Enables the code consumer to check the validity of the compiled code without trusting the code producer and without having access to the source code. (Think mobile code.)
X. Leroy (INRIA) Compiler verication DTP 2008 12 / 55
Outline
1 2 3 4 5 6 7
Introduction: Can you trust your compiler? Formally veried compilers The Compcert experiment Programming the veried compiler Case study: construction of a control-ow graph Dependent types to the rescue? Perspectives
X. Leroy (INRIA)
Compiler verication
DTP 2008
13 / 55
Develop and prove correct a realistic compiler, usable for critical embedded software. Source language: a subset of C. Target language: PowerPC assembly. Generates reasonably compact and fast code some optimizations. This is software-proof codesign (as opposed to proving an existing compiler). The proof of semantic preservation is mechanized using the Coq proof assistant.
X. Leroy (INRIA) Compiler verication DTP 2008 14 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
15 / 55
Clight
C#minor
Cminor
LTLin
LTL
RTL
Optimizations: constant propagation, common subexpressions
spilling, reloading calling conventions layout of of stack frames PowerPC code generation
Linear
Mach
PPC
X. Leroy (INRIA)
Compiler verication
DTP 2008
16 / 55
C source
AST C
simplications
AST Clight
Executable
assembling linking
Assembly
Not proved
X. Leroy (INRIA)
Compiler verication
DTP 2008
17 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
18 / 55
Approximately 2.5 man.years and 40000 lines of Coq: 13% 8% 22% 50% Proof scripts 7% Misc
X. Leroy (INRIA)
Compiler verication
DTP 2008
19 / 55
Outline
1 2 3 4 5 6 7
Introduction: Can you trust your compiler? Formally veried compilers The Compcert experiment Programming the veried compiler Case study: construction of a control-ow graph Dependent types to the rescue? Perspectives
X. Leroy (INRIA)
Compiler verication
DTP 2008
20 / 55
Pros: + No need for a program logic; can reason directly over the program. + Can use rich types, incl. dependent types. . . Apparent cons that are pros if you believe in functional programming: Must get rid of errors and state, e.g. by using monads. Must use purely functional (persistent) data structures. Cons: Must prove that all functions terminate. . . . or hack around possible non-termination
(e.g. force termination on a compile-time error after 10N iterations)
X. Leroy (INRIA) Compiler verication DTP 2008 22 / 55
Coqs extraction mechanism produces executable Caml code from the specications. Execution speed is reasonable despite the overhead of purely functional data structure, Coqs binary integers, etc. Probably the biggest program ever extracted from a Coq development. Most of the Coq code is programmed in pedestrian ML style (HM types). A few uses of richer types (see later).
X. Leroy (INRIA)
Compiler verication
DTP 2008
24 / 55
Execution time
Nbody
Almabench
Mandelbrot
SHA1
Btree
Qsort
AES
X. Leroy (INRIA)
Compiler verication
DTP 2008
Spectral
25 / 55
FFT
Outline
1 2 3 4 5 6 7
Introduction: Can you trust your compiler? Formally veried compilers The Compcert experiment Programming the veried compiler Case study: construction of a control-ow graph Dependent types to the rescue? Perspectives
X. Leroy (INRIA)
Compiler verication
DTP 2008
26 / 55
The Cminor language: a low-level, much simplied C. Structured in functions, statements, and expressions. The RTL language: Machine-level instructions operating over variables and temporaries. Organized in a control-ow graph:
Nodes = instructions. Edge from I to J = J is a successor of I (J can execute just after I ).
X. Leroy (INRIA)
Compiler verication
DTP 2008
27 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
28 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
29 / 55
Idea: every expression or statement occurring in the Cminor source should correspond to a region of the control-ow graph.
Start node i1
in
End node
The instructions i1 , . . . , in in this region should compute the corresponding expression, leaving its value in a given temporary t, or execute the statement. A recursive decomposition of the control-ow graph.
X. Leroy (INRIA)
Compiler verication
DTP 2008
30 / 55
Examples of correspondences
e1 + e2 in r . . . s1 ; s2 . . . if (e) s1 else s2 s1 . . . while (e) s . . .
e1 in r1
e in r
e in r
if(r ) . . . e2 in r2 . . . s2 . . . r = r1 + r2 s1 . . . s2 . . .
if(r )
nop
X. Leroy (INRIA) Compiler verication DTP 2008 31 / 55
G : control-ow graph. : mapping Cminor variable RTL temporary. P: set of temporaries to preserve. G , , P e1 in r1 n, n1 G , , P {r1 } e2 in r2 n1 , n2 G (n2 ) = add(r , r1 , r2 , n ) r P / G , , P e1 + e2 in r n, n
X. Leroy (INRIA)
Compiler verication
DTP 2008
32 / 55
For statements: G ,
stmt node1 , node2 . G , , Codom() e in r n, n1 G (n1 ) = if(r , n2 , n ) G , s n2 , n3 G (n3 ) = nop(n) G, (while (e) s) n, n
G, G, G,
s1 n, n1 s2 n1 , n (s1 ; s2 ) n, n
X. Leroy (INRIA)
Compiler verication
DTP 2008
33 / 55
n, R, M . . . . . . t .t . . . .? . Inv Post .................................... n , R , M v, E , M Hypotheses: Left, a big-step evaluation of expression e to value v in Cminor. Top, invariant R(r ) = E (x) whenever (x) = r . Conclusions: Right, a sequence of transitions in the RTL semantics, from node n to n . Bottom, invariant plus postcondition: R (r ) = v r0 P, R (r0 ) = R(r0 ).
X. Leroy (INRIA) Compiler verication DTP 2008 34 / 55
e, E , M
The graph is built incrementally by recursive functions such as transl expr e r n Eect: add to the graph the instructions that evaluate expression e, deposit its result in temporary r , and branch to node n . Return rst node n of this sequence of instructions. If we were to write this function in ML, we would use: Mutable state to represent the current state of the CFG, the generators for fresh CFG nodes and fresh temporaries. Exceptions to report errors (e.g. unbound Cminor variables).
X. Leroy (INRIA)
Compiler verication
DTP 2008
35 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
36 / 55
Fixpoint transl_expr (map: mapping) (a: expr) (rd: reg) (nd: node) {struct a}: mon node := match a with | Evar v => do r <- find_var map v; add_move r rd nd | Eop op al => do rl <- alloc_regs map al; do no <- add_instr (Iop op rl rd nd); transl_exprlist map al rl no ...
X. Leroy (INRIA)
Compiler verication
DTP 2008
38 / 55
then at any later state S (and in particular in the nal state), we still have st code(S ), , P expr in reg node1 , node2
X. Leroy (INRIA)
Compiler verication
DTP 2008
40 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
41 / 55
Proving monotonicity
The properties on the state are guaranteed by the basic state operations (add a new node, etc), but we still have to show that all CFG construction operations, built on these basic operations, also guarantee these properties.
Lemma alloc_regs_ok: forall s map al rl s, state_wf s -> alloc_regs map al s = OK rl s -> state_wf s /\ state_incr s s. Lemma transl_expr_ok: forall s map a rd nd s ns s, state_wf s -> transl_expr map a rd nd s = OK ns s -> state_wf s /\ state_incr s s.
Long, boring proofs where Coqs proof automation doesnt work well. Dependent types can help!
X. Leroy (INRIA)
Compiler verication
DTP 2008
42 / 55
Outline
1 2 3 4 5 6 7
Introduction: Can you trust your compiler? Formally veried compilers The Compcert experiment Programming the veried compiler Case study: construction of a control-ow graph Dependent types to the rescue? Perspectives
X. Leroy (INRIA)
Compiler verication
DTP 2008
43 / 55
This property needs to be shown once for each basic function over the state (add a node, etc), but is then propagated for free through all RTL generation functions.
X. Leroy (INRIA)
Compiler verication
DTP 2008
44 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
45 / 55
46 / 55
Dep. typed monad for each operation of the monad + ret and bind through any computation monadic
for each operation of the abstract type through any well-typed computation
Compiler verication
DTP 2008
48 / 55
we get
let bind f g = fun s -> match f s with | Error -> Error | OK a s -> match g a s with | Error -> Error | OK b s -> OK b s
(* useless match *)
An open question
Are there other examples of monads that would benet from this kind of dependent typing?
X. Leroy (INRIA)
Compiler verication
DTP 2008
50 / 55
Outline
1 2 3 4 5 6 7
Introduction: Can you trust your compiler? Formally veried compilers The Compcert experiment Programming the veried compiler Case study: construction of a control-ow graph Dependent types to the rescue? Perspectives
X. Leroy (INRIA)
Compiler verication
DTP 2008
51 / 55
Preliminary conclusions
At this stage of the Compcert experiment, the initial goal proving correct a realistic compiler appears feasible. Moreover, proof assistants such as Coq are adequate (but barely) for this task. Quite a bit remains to be done: Handle a larger subset of C. (E.g. with goto.) Deploy and prove correct more optimizations. (Loop optimizations, instruction scheduling, . . . ) Target other processors beyond the PowerPC. Test usability on real-world embedded codes.
X. Leroy (INRIA)
Compiler verication
DTP 2008
52 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
53 / 55
X. Leroy (INRIA)
Compiler verication
DTP 2008
54 / 55
PPC
In progress: proof of a memory allocator and GC, written in Cminor, and interfaced with the compiler proof.
(T. Ramananandro, A. Tolmach)
Grand challenge: a bootstrap of Compcert and of the critical parts of Coq (kernel proof-checker + extraction) ?
X. Leroy (INRIA) Compiler verication DTP 2008 55 / 55