You are on page 1of 32


Intermediate Representation

Prof. O. Nierstrasz

Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes.

Intermediate Representation


>  > 

Intermediate representations Example: IR trees for MiniJava

© Oscar Nierstrasz


Intermediate Representation


>  > 

Intermediate representations Example: IR trees for MiniJava

© Oscar Nierstrasz


  Software engineering principle —  break compiler into manageable pieces 2.  Simplifies support for multiple languages —  different languages can share IR and back end 4.  Enables machine-independent optimization —  general techniques. multiple passes © Oscar Nierstrasz 4 .  Simplifies retargeting to new host —  isolates back end from front end 3.Intermediate Representation Why use intermediate representations? 1.

Intermediate Representation IR scheme •  front end produces IR •  optimizer transforms IR to more efficient program •  back end transform IR to target code © Oscar Nierstrasz 5 .

g..Intermediate Representation Kinds of IR >  Abstract syntax trees (AST) >  Linear operator form of tree (e. postfix notation) >  Directed acyclic graphs (DAG) >  Control flow graphs (CFG) >  Program dependence graphs (PDG) >  Static single assignment form (SSA) >  3-address code >  Hybrid combinations © Oscar Nierstrasz 6 .

DAGs) —  nodes and edges tend to be large —  heavily used on source-to-source translators >  Linear —  pseudo-code for abstract machine —  large variation in level of abstraction —  simple. compact data structures —  easier to rearrange >  Hybrid —  combination of graphs and linear code (e.Intermediate Representation Categories of IR >  Structural —  graphically oriented (trees.g. CFGs) —  attempt to achieve best of both worlds © Oscar Nierstrasz 7 .

Intermediate Representation Important IR properties >  Ease of generation >  Ease of manipulation >  Cost of manipulation >  Level of abstraction >  Freedom of expression (!) >  Size of typical procedure >  Original or derivative Subtle design decisions in the IR can have far-reaching effects on the speed and effectiveness of the compiler!  Degree of exposed detail can be crucial 8 © Oscar Nierstrasz .

Intermediate Representation Abstract syntax tree An AST is a parse tree with nodes for most non-terminals removed. Since the program is already parsed. non-terminals needed to establish precedence and associativity can be collapsed! A linear operator form of this tree (postfix) would be: x 2 y * - © Oscar Nierstrasz 9 .

Intermediate Representation Directed acyclic graph A DAG is an AST with unique. x := 2 * y + sin(2*x) z := x / 2 © Oscar Nierstrasz 10 . shared nodes for each value.

Intermediate Representation Control flow graph >  A CFG models transfer of control in a program —  nodes are basic blocks (straight-line blocks of code) —  edges represent control flow (loops. if/else. goto …) if x = y then S1 else S2 end S3 © Oscar Nierstrasz 11 .

3 x*2.Intermediate Representation Single static assignment (SSA) Ron Cytron. x1 x + 1. x1 + 1. 1991.. x3*2. doi:10. 7. et © Oscar Nierstrasz 12 . x  x2 7.1145/115372. http://en.” ACM TOPLAS.wikipedia. x4 := := := := 3.115320 >  Each assignment to a temporary is given a unique name —  All uses reached by that assignment are renamed —  Compact representation —  Useful for many kinds of compiler optimization … x x x x := := := := 3. “Efficiently computing static single assignment form and the control dependence graph..

Intermediate Representation 3-address code >  Statements take the form: x = y op z —  single operator and at most three names x – 2 * y t1 = 2 * y t2 = x – t1 >  Advantages: —  compact form —  names for intermediate values © Oscar Nierstrasz 13 .

Intermediate Representation Typical 3-address codes x = y op z assignments branches conditional branches procedure calls address and pointer assignments © Oscar Nierstrasz x = op y x = y[i] x = y goto L if x relop y goto L param x param y call p x = &y *y = z 14 .

Intermediate Representation 3-address code — two variants Quadruples Triples •  simple record structure •  easy to reorder •  explicit names •  table index is implicit name •  only 3 fields •  harder to reorder © Oscar Nierstrasz 15 .

Intermediate Representation IR choices >  Other hybrids exist —  combinations of graphs and linear codes —  CFG with 3-address code for basic blocks >  Many variants used in practice —  no widespread agreement —  compilers may need several different IRs! >  Advice: —  choose IR with right level of detail —  keep manipulation costs in mind © Oscar Nierstrasz 16 .

Intermediate Representation Roadmap >  >  Intermediate representations Example: IR trees for MiniJava © Oscar Nierstrasz 17 .

….Intermediate Representation IR trees — expressions CONST integer constant i NAME symbolic constant n TEMP register t BINOP +. — etc. e1 e2 NB: evaluation left to right 18 © Oscar Nierstrasz MEM contents of word of memory e CALL procedure call f [e1.en] ESEQ s e expression sequence .

e2 to word at a e evaluate e into temp t EXP e JUMP e [l1. jump to t or f LABEL n define name n as current address (can use NAME(n) as jump address) 19 © Oscar Nierstrasz .ln] evaluate e and discard transfer to address e with value l1 … CJUMP e1 e2 t f evaluate and compare e1 and e2.Intermediate Representation IR trees — statements MOVE TEMP t MOVE MEM e1 SEQ s1 s2 statement sequence e2 evaluate e1 to address a.….

Intermediate Representation Converting between kinds of expressions >  Kinds of expressions: —  Exp(exp) — expressions (compute a value) —  Nx(stm) — statements (compute no value) —  Cx.f) — convert to conditional © Oscar Nierstrasz 20 .f) — conditionals (jump to true/false destinations) >  Conversion operators: —  cvtEx — convert to expression —  cvtNx — convert to statement —  cvtCx(t.op(t.

f  Ex(MEM(+(e. arrays and fields Local variables: Array elements: e[i]  Ex(MEM(+(e.Intermediate Representation Variables. CONST(o)))) where o is the byte offset of field f © Oscar Nierstrasz 21 t  Ex(TEMP(t)) .cvtEx().cvtEx(). ×(i. CONST(w))))) where w is the target machineʼs word size Object fields: e.cvtEx().

word 11 label: . NAME(label for Tʼs vtable))) © Oscar Nierstrasz 22 .Intermediate Representation MiniJava: string literals. CONST(fields). object creation String literals: allocate statically .ascii “hello world” “hello world”  Ex(NAME(label)) Object creation: allocate object in heap new T() Ex(CALL(NAME(“new”).

Intermediate Representation Control structures >  Basic blocks: —  maximal sequence of straight-line code without branches —  label starts a new block >  Control structure translation: —  control flow links up basic blocks —  implementation requires bookkeeping —  some care needed to produce good code! © Oscar Nierstrasz 23 .

Intermediate Representation while loops while (c) s  Nx(SEQ(SEQ(c. SEQ(c.cvtCx(b.cvtCx(b. s.cvtNx())). SEQ(LABEL(b).x).LABEL(x)))) if not (c) jump done for example: body: s if c jump body done: © Oscar Nierstrasz 24 .x).

e1.cvtEx())) © Oscar Nierstrasz 25 .cvtEx(). m.index × w). -w).en)  Ex(CALL(MEM(MEM(e0.m(e1.Intermediate Representation Method calls eo. …en.….cvtEx().

Intermediate Representation case statements >  case E of V1 : S1 … Vn : Sn end —  evaluate E to V —  find value V in case list —  execute statement for found case —  jump to statement after case >  Key issue: finding the right case —  sequence of conditional jumps (small case set) –  –  –  ⇒ O(# cases) ⇒ O(log2 # cases) ⇒ O(1) 26 —  binary search of ordered jump table (sparse case set) —  hash table (dense case set) © Oscar Nierstrasz .

Intermediate Representation case statements — sample translation t := jump code jump code jump … code jump if t if t … if t code … expr test for S1 next for S2 next for Sn next = V1 jump L1 = V2 jump L2 = Vn jump Ln to raise exception 27 L1: L2: Ln: test: next: © Oscar Nierstrasz .

simplify trees —  No SEQ or ESEQ —  CALL can only be subtree of EXP() or MOVE(TEMP t.Intermediate Representation Simplification >  After translation. …) >  Transformations: —  Lift ESEQs up tree until they can become SEQs —  turn SEQs into linear list © Oscar Nierstrasz 28 .

BINOP(op. a)). ESEQ(s. e) ESEQ(s. e1. e1). MOVE(e1. a) = = = = = = = = = ESEQ(SEQ(s1.e)) CJUMP(op. ESEQ(s. CJUMP(op. e1. e1). e1). l2))) SEQ(s. TEMP t. e2. e2))) SEQ(MOVE(TEMP t. e2) CALL(f. l1. e) BINOP(op. JUMP(e)) SEQ(s. e2)) ESEQ(s. ESEQ(s. l1. BINOP(op. e2)) ESEQ(MOVE(TEMP t. MEM(e)) SEQ(s. e2)) CJUMP(op. ESEQ(s. e2). e)) JUMP(ESEQ(s. e2) MEM(ESEQ(s. e1. l1. e2. ESEQ(s2. l2) BINOP(op. e1). e1. e2. l1. s2). CALL(f. SEQ(s. TEMP(t)) 29 © Oscar Nierstrasz . l2)) ESEQ(MOVE(TEMP t. ESEQ(s.Intermediate Representation Linearizing trees ESEQ(s1. CJUMP(op. TEMP t. l2) MOVE(ESEQ(s. e1).

Semantic Analysis What you should know! ✎  Why do most compilers need an intermediate representation for programs? ✎  What are the key tradeoffs between structural and linear IRs? ✎  What is a “basic block”? ✎  What are common strategies for representing case statements? © Oscar Nierstrasz 30 .

Semantic Analysis Can you answer these questions? ✎  Why canʼt a parser directly produced high quality executable code? ✎  What criteria should drive your choice of an IR? ✎  What kind of IR does JTB generate? © Oscar Nierstrasz 31 .

org/licenses/by-sa/2. •  Any of these conditions can be waived if you get permission from the copyright holder. You must attribute the work in the manner specified by the author or licensor. Share Alike. transform. If you alter.5 You are free: •  to copy. •  For any reuse or distribution.5/ Attribution-ShareAlike 2. you must make clear to others the license terms of this work. distribute. or build upon this work. Your fair use and other rights are in no way affected by the above. and perform the work •  to make derivative works •  to make commercial use of the work Under the following conditions: Attribution. display. you may distribute the resulting work only under a license identical to this one.Intermediate Representation License >  http://creativecommons. © Oscar Nierstrasz 32 .