You are on page 1of 38

Low Level Virtual

Machine
C# Compiler
Senior Project
Proposal

Prabir Shrestha (4915302)


Myo Min Zin (4845411)
Napaporn Wuthongcharernkun
(4846824)
Agenda
§ Objective
§ Motivation
§ Scope
§ The Framework
§ Gantt Chart
§ Questions and Answers
Objective
A Naive
Compiler
Source Target
Language Language
Compiler Process

Front-
end

Back-
end
Motivation
§ Evolution of Computer Programming
§ Managed Code vs Unmanaged Code
§ Bulky .NET Framework
§ Operating Systems written in
managed code
Motivation
§ Why Low Level Virtual Machine?
– Source Language independent
– Retargetable code generator
– Supports various architectures
• X86, PowerPC, ARM
– Open source
Low Level Virtual
Machine
§ It is not
– a compiler,
– a virtual machine alike JVM, .NET
Framework
§ It is
– A modular compiler infrastructure
• a collection of (C++) libraries and tools to
help in building compilers, debuggers,
program analyzers etc.
Low Level Virtual
Machine
§ Commonly referred to as LLVM
§ Started as academic project at
University of Illinois on 2002.
§ Current development mainly by
Apple Inc.
§ Projects related to LLVM
– Clang: C/C++ front-end; aims to replace
gcc
– OpenGL engine in Mac OS X 10.5
– used by Adobe Systems Inc., Nvidia, Sun
Scope
§ Keywords- Categories
§ Operators and Special Characters
§ Source Language Features
Scope Categories
§ Types

§ Conditionals

§ Loops
Scope Categories
§ Single Inheritance

§ Encapsulation

§ Overloading Operators

§ Method Overloading / Method


Overriding
Scope Categories
§ Indexing & Properties(Accessor/
Mutator)

§ Modifiers

§ Type Casting
Keywords
Operators and Special
Characters
Source Language Feature
Summary
§ Single class Inheritance
§ Encapsulation
§ Overloadable Operators
§ Method Overloading/Overriding
§ Properties (Accessors / Mutators)
The Framework
§ Overall Process
§ Scanner
§ Parser
§ Semantic Analyzer
§ Code Generator
§ Assembling and Linking
Overall Process
Overall Process
Scanner
§ Tokenization Process- Identifying the
tokens from the input stream.
§ Skip meaningless characters, white
spaces,
§ Lexical Analysis- Checking for Lexical
Errors
§ Using Coco/R tool the scanner and
parser are generated at the same
time.
Parser
§ Syntax Analysis is performed at this
phase.
§ Coco/R generates a recursive
descent parser.
– Top down parsing method
– Procedural-like functions
– Generally for each production rule, one
procedure is generated.

§ Accepts Grammar in LL(k) Form


LL: Left to Right, Left most
Derivation
Parsing
§ Parser Error-Recovery Techniques
– Synchronization
– Weak Symbols

§ Synchronization Technique
– SYNC symbols are placed in the grammar,
where there’s unlikely to be errors.

– Upon error detection, parser skips input


symbols until it finds one that is expected
at a synchronization point.
Parsing Error Recovery
Technique
§ Weak Symbols
- Placed in front of tokens that are prone
to error,
often misspelled or missing.

- When error is encountered, reports error


and can jump to next synchronization
point.
Parsing Error Recovery
Technique
§ Synchronization Example
TypeDecl
=
SYNC
( "class" ident [ClassBase] ClassBody [";"]
| "struct" ident [Base] StructBody [";"]
| "enum" ident [":" IntType] EnumBody
[";"]
)
.
§ Weak Symbols Example
EnumBody
=
"{" EnumMember { WEAK ","
EnumMember} "}".
Overall Process
Semantic Analyzer
§ A phase that follows after the
generation of parser
§ To check semantic error once the
lexical and syntax errors have been
checked
§ Examples:
– type checks, scoping of variable,
constant values not being changed, no
redefinitions of a classes, method and
member variables
Overall Process
Code Generator
§ After AST and semantic analysis
§ Generating LLVM Intermediate
Representation (IR)
Low Level Virtual Machine
Intermediate
§ Language and Target independent
§ Designed to support multiple language
frontends
§ Represents the key operations of
ordinary processors
§ Avoids machine specific constraints
– physical registers, pipelining
Low Level Virtual Machine
Intermediate
§ Does not define runtime and OS
system functions
– these are defined by runtime libraries
§ IR is a typed Virtual Instruction Set
– unbounded number of registers
– operations are low level
– checked for consistency
LLVM Instruction Set
§ Usually 3-address code
%temp2 = add i32 %temp0, %temp1
§ Instructions are typed
§ Instructions are polymorphic
§ Usually Static Single Assignment (SSA)
Form
– new register for each result
– uses phi (ɸ) functions
– code generator tries to store these
Optimizations
Constant
Folding
§ Simplifies constant expressions at
compile time
Optimizations
Constant
Propagation
§ Substituting the values of known
constants in expressions at compile time
Optimizations
Strength
Reduction
§ Costly operation is replaced with
equivalent but less expensive operation
Optimizations
Elimination of Useless
Instruction
§ Drop instructions that do not modify any
memory storage
Overall Process
Assembling and Linking
User
Phases
Gantt Chart
Questions
and
Answers