You are on page 1of 50
Compiler design Text Book: Compilers: Aho, Sethi, and Ullman. Topics: Compiler phases Lexical analysis Syntax analysis Code generation Home work: there will be two major programming assignments. They must be done independently. Examinations: there will be two hourly exams and a final exam. Grading: the total grade will be computed as follows: O 20% for each hourly exam O 15% for the home work O 50% for the final exam. ciples, theory, and techniques by FH | (a ee oe Overview of Compiler O Compiler is a program (written in a high-level language) that converts / translates / compiles source program written in a high level language into an equivalent machine code. compiler source program ——————__ machine code or object code What is a Compiler? O Definition: A compiler is a program that translates one language to another O Usually, the translation takes place between a high-leve! language and a low-level language O Clearly, our first step is to discuss some terminology... Terminology O Source language - the language that is being translated O Object language - the language into which the translation is being done O High-level language — a language that is far removed from a computer; one which is close to the problem area(s) for which the language is designed Terminology... O Low-leve! language - a language that is close to the machine (computer) upon which the language will run (execute) O Object language - (sometimes called machine code) the language of some computer. This language usually is not human readable (and is expressed in bits or hex) Terminology... O Intermediate language - a language that is used either: m@ because it is a temporary step in the translation process; or, ™ because it is neither particularly, high, nor low, and is the output of a translation O Assembly language — a language that translates almost one-to-one to machine language, but is in human readable form What's a Compiler?... O Today, compilers are written using high- level languages (such as Java, C++, etc.) O The earliest compilers were written using assembly language (e.g., FORTRAN and COBOL around 1954) O Sometimes a compiler is written in the same language for which one is writing a compiler. This is done through Bootstrapping. Why Should I learn Compiler Construction? OHow do compilers work? OHow do computers work? (instruction set, registers, addressing modes, run time data structures, ...) OWhat machine code is generated for certain language constructs? (efficiency considerations) OGetting "a feeling" for good language design Why Compilers? A Brief History O The first computers were “hard- wired” O That is, they were collections of physical devices that connected to one-another, in an assemblage designed to calculate particular kinds of results Why Compilers? A Brief History... U1 For example, Babbage’s Analytic Engine and his Difference Engine were assemblages of gears that solved numeric problems O The primary driving force was the calculation of ballistics tables for artillery O Jacquard’s loom is another example O And Holleriths’ work for the US Census bureau is another Why Compilers? A Brief History... O In the late 1940’s John von Neumann “invented” the stored program computer O The “invention” is the observation that just as you can store data in the memory of a computer, the data can be machine instructions O Then the computer can not only take its instructions from memory... Why Compilers? A Brief History... O But the computer can modify the instructions in its memory... O And, in fact, can write its own programs, storing them in memory O It quickly became apparent that the simplest way to store information in a computer was in the form of binary numbers Why Compilers? A Brief History... O So, to program a computer, you only needed to enter a sequence of binary numbers into memory, and then tell the computer at which memory address to start execution O This was programming in machine language O Instructions (and data) were entered from a console, one word (in binary) at a time... Why Compilers? A Brief History... O This form of coding (note the word!) quickly was replaced by programming in assembly language O A program was written (in machine language) which translated assembly language to machine language (called an assembler) Why Compilers? A Brief History... O After the first assembler was written, no one needed to code in machine language any longer O But, coding can take many instructions... O So, the thought was - can we create a program that translates something like into assembly language or into machine language? Why Compilers? A Brief History. Formal Languages O About the same time, in the mid- 1950’s, Noam Chomsky (M.I.T.) began investigating the formal structure of natural languages O His work led to the Chomsky hierarchy of type 0, 1, 2, 3 languages and their associated grammars Why Compilers? A Brief History. Formal Languages... O The type 2 (context-free) grammars turned out to be very good at describing computer languages O And, efficient ways to recognize the structure of a source program using a type 2 were developed O Such recognition is called parsing Why Compilers? A Brief History. Formal Languages... O Very closely related to context-free grammars are the type 3 grammars O These are equivalent to finite automata and regular grammars O An entire sub-branch of mathematics studies automata; it’s called automata theory Why Compilers? A Brief History. Formal Languages... O It turns out that type 3 (regular) grammars are very good at describing the “atoms” used in computer languages O These “atoms” are the reserved words, symbols, and user-defined words that are used in a computer language 0) Recognizing atoms is called scanning (or lexing) Why Compilers? A Brief History... O By far the most difficult and complicated problem has been how to generate object code that is concise, and most importantly, executes efficiently O This is called “optimization” Why Compilers? A Brief History... O Far simpler are the front-end issues of scanning and parsing = recognizing the source code O This is due to the fact that we’ve developed (semi-) automatic ways to create scanners and parsers... O using scanner generators and parser generators Programs Related to Compilers... UO interpreters - directly executes the code upon recognition; usually statement by statement O Assemblers — translate assembly language to machine language O Macro Assemblers — ditto, but with (powerful) macro capabilities Programs Related to Compilers... O Linkers - combine object modules to produce an executable module O Linkage Editors - manage the linking process, and are able to create/maintain object libraries Programs Related to Compilers... O Loaders —- load executable modules into memory, and launch execution O Dynamic Loaders - loaders that stay around during execution to handle the loading of DLLs (dynamically loadable libraries) Programs Related to Compilers... O Preprocessors — usually a separate program whose input is source code and whose output is source code; perform macro expansion, comment deletion, etc. Sometimes the first phase of a compiler Programs Related to Compilers... O Editors - allow the user to create and update source code O Smart Editors - include syntax coloring, parenthesis balancing, etc. O Debuggers —- a program that provides an environment in which code may be debugged; including single stepping, symbol tables, etc. Programs Related to Compilers... O IDEs - integrated development environments; provide integrated editor-debugger-execution environments O Profilers - collects statistics about where programs spend their time during execution; important for optimizing at the source code level Programs Related to Compilers... O Project Managers — programs that help software managers deal with hundreds or thousands of modules; build reports, etc. O SCCS - source code control systems; provide for multiple access to shared code in a control manner The Translation Process O The translation process consists _ of a collection of phases, with the output of one phase feeding the input of the next O The original source code is transformed into a sequence of intermediate representations (IRs) during this process as The vi Translation a Process Phases of Compiler Parallel to all other phases are two activities: O Symbol table manipulation. Symbol table is one of the primary data- structures that a compiler uses. This data-structure is used by all of the phases. OError detecting and handling The Scanner O The scanner reads the source program, as a stream of characters, and it performs |exical analysis - collecting sequences of characters into meaningful units called tokens O The scanner also may create a symbol table and _a literal table The Parser O The parser reads the tokens produced by the scanner and performs syntactic analysis — creating an IR (a parse tree or a syntax tree) showing the structure of the program O Syntax trees (abstract syntax trees) are reduced representations of the tree, with many irrelevant nodes eliminated The Semantic Analyzer O The semantics of a program are its “meaning” — what it is intended to accomplish O The semantic analyzer creates an intermediate data structure that contains this meaning — these are the static semantics O The dynamic semantics of a program only can be determined by executing the program The Semantic Analyzer.. O An example of the static semantics of a program is the data types of the variables (and expressions) O These static semantics usually are represented in the intermediate representations (IRs) as attributes O The IR usually is a tree, “decorated” with these attributes (Source) Code Optimization O Optimization may occur during several phases O Source code optimization rearranges the source (or the IR of the source) in order to produce more optimal results OE,g., can become O This is called constant folding (Source) Code Optimization... O Duplicated computations can be saved as temporaries and then their values re-used O Recursion can be converted to iteration O Repeated calculations can be moved out of loops O The possibilities are endless... The Code Generator O The code generator takes the IR and generates code for the target machine O Here the details of how various numeric and non-numeric quantities are represented become important O E.g., word length, hardware stack, hardware calling conventions, memory access, etc. The Target Code Optimizer O The target code optimizer examines the emitted target code to see if further possibilities for optimization are present and then capitalizes upon them O E.g., reuse of registers, using a_ shift instruction to replace a multiplication or division, etc. Phases of the compiler Source Program Scanner Parser Lexical Analyzer | Tokens Syntax Analyzer | Parse Tree Semantic Analyzer Abstract Syntax Tree with attributes Sample Program Compiled OConsider the example: int a, b{ a = 100; =f (a) + 3} Source Program Lexical Analyzer Token stream Sample Program Compiled eTokens are entities defined by the compiler writer which are of interest. A sequence of characters with collective meanings are grouped to form a token. e Examples of Tokens: OSingle Character operator: = + - * >< OMore than one character operator: ++, --,==,<= Q Numeric Constants: 1997 45.89 19.9e+7 Q Key Words: int, while, for O Identifiers: x, my_name, Your_Name, a e Homework: Identify all token types in C programs. Example Program Compiled-Continued What are the tokens in the example? ss # | Token type # | Toke type n 1. int keyword 8. = equal Ds a identifier 9. | 100 integer 3. , comma 10 ; Semicolon 4. b identifier 11 b identifier 5s < L parenthesis | 12 f identifier 6. a identifier 13 =p Plus f- ( L parenthesis | 14 3 integer Example Continued The parser produces a parse tree: it is a heterogeneous tree (nodes have different data types) toot_node___ stmt] stmt2 stmt] = a ‘100 — f 3 HAN a Intermediate-Code Generation QUsing temporary location to save values Oti = 100 Ostore ti, a Oload a, t2 Ot3 = f(t2) Or =t3 +3 Ostore t4, b Intermediate-Code Optimization OEliminate unnecessary code or statements that want be executed Ot1 = 100 Ostore ti, a Ot3 = foo(t1) Ot4 =t3 +3 Ostore t4, b Target-code Generation OMachine code generated for some machine ORi = 100 Ostore r1, 0x10 Ojsr _f Or2 = r0+3 Ostore r2, 0x16 Compiler Architecture Single pass vs. multi pass architecture Single pass: all passes interleaved, driven by Parser Sem.Analyser > Scanner A—>B AusesB aad Code Generator! 1 4/4 Symbol Table data flow oS Multi pass Each pass finishes before next starts OSaves main memory, communicate through files O Used if the language is complex or portability is important > Scanner> Parser ... > Code gen > * Ww w source file object file Front end & Back end OFront end: is the phases or parts of phases that depend on the source language. O Back end: is phases or part of phases that depend on the target machine. language dependent machine dependent Java Pentium Cn Pascal PowerPC any combination possible

You might also like