You are on page 1of 8

WACC Compiler Report

Imperial College London


Department of Computing
Group 03
Andrei - Octavian Brabete, Ioan Budea, Adrian Catana, Alexandru Dan
December 9, 2016

Contents
1 Product
1.1 Getting started . . . .
1.2 Quality evaluation and
1.3 Efficiency analysis . .
1.4 Extensibility . . . . .

.
.
.
.

2
2
2
2
2

2 Project Management
2.1 Group workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2
2
3

3 Design Choices

. . . . . . .
correctness
. . . . . . .
. . . . . . .

.
.
.
.

4 Extension
4.1 Summary . . . . . . . . . . . . . . .
4.2 Lexer . . . . . . . . . . . . . . . . .
4.2.1 Regex engine . . . . . . . . .
4.2.2 NFA and DFA . . . . . . . .
4.2.3 Integration with ANTLR and
4.2.4 Parallels ANTLRs lexer . . .
4.3 Mark-Sweep Garbage Collector . . .
4.3.1 Heap Internal Representation
4.3.2 Collection Phase . . . . . . .
4.3.3 Profiling . . . . . . . . . . . .
4.4 Further Development . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Optimization
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

3
3
4
4
4
4
5
5
5
6
6
6

5 Conclusion

6 Appendix

1
1.1

Product
Getting started

WACC is a simple While language encountered in many program reasoning/verification courses (briefly
covered in the 2nd year Models of Computation course). It is characterized by some of the common language models you would expect of a While-like language, such as program variables, simple expressions,
conditional branching, looping and no-ops. It also covers some extra features, such as simple types,
functions, arrays and basic pair creation on the heap.
In the first group meeting we decided what programming language we should use to build the compiler.
We took into consideration Haskell and C++, but we unanimously decided that Java would be our best
option, since we were all confident that we could write clean and qualitative code using it. We decided
to use the ANTRLR library for the Lexer and Parser, even though one of our ideas was to implement
them from scratch (see Lexer extension).

1.2

Quality evaluation and correctness

Even though we decided to write the core of our compiler in Java, the project proved to be a complex
use of many other languages. During these 6 weeks of work we had the opportunity to test our knowledge
in C, C++, Ruby, Shell, Python or Assembly, strengthening our confidence that we could link different
tools together (see language graph from Git on appendix, Figure 1).
Frontend. After carefully writing the grammar, at this stage ANTLR generated the Lexer and the
Parser classes. Our next task was to do the Semantic Analysis, which in this case was implementing a
Visitor for the generated Abstract Syntax Tree. Since the main idea of this part was type checking and
variable existence, we decided to return, for each visitor method, an instance of our own defined Type
structure and to keep track of scopes using a stack of Symbol Tables. The Fronted works correctly and
we added an Error Handler feature: errors are not thrown, but saved in a list and the compiler continues
to analyze the rest of the program.
Backend. The issue encountered in the code generation segment was dealing with functions, as there
were some problems with the internal scoping, in particular context references for the visit of the body.
Apart from that, there was a good separation of the other libraries as they were open to extension. The
environment packages are well-designed, the whole code-base makes good use of visitors and it has a
clear code decoupling (three levels of visitors separated on specific parts).

1.3

Efficiency analysis

In terms of efficiency, the program uses two passes (for the first one the fast kill principle is used), each
having an individual role in the program compilation. This strategy allows saving of function definitions
which is needed for mutual calls between the functions. Overall, the total complexity of the program
remains linear.

1.4

Extensibility

By defining a clear line of decoupling between Frontend and Backend, both can be independently
changed and extended. However, there are some issues in Backend due to code repetition and lack of
intermediate code representation and thus it needs refactoring before further improvements.

2
2.1

Project Management
Group workflow

Throughout the duration of the project, the main goal was to get everyone involved by splitting the
work as evenly as possible. The primary communication channel was through face-to-face meetings,
which happened almost every day, and towards the final part of the milestones multiple times a day.
These helped us to coordinate our group work efficiently. Because the architecture of the project and
the design choices were decided on a common basis, everyone was aware of the project workflow. The
modules were intended to be as independent as possible, but on the important parts the whole group
worked together in order to reduce the number of bugs and to ensure everyone knew the core part of the
project. In the rest on the development process, pair programming was extensively used.
2

The main tools used for management of the project were Git (mainly branching for different purposes
and different implementation phases: developing, refactoring, profiling, testing etc.) and Slack. Slack
was important for group work coordination, such as progress tracking, keeping a list of unimplemented
features and sharing new ideas.

2.2

Testing

The objective for the testing phase was to automate local testing as much as possible in order to
test small individual parts independently, without the need to integrate the changes constantly. For
the Frontend part of the compiler, testing was done using Ruby scripts for both the valid and invalid
programs, which check if there is any error in the standard out after the compilation of the program. For
the Backend milestone, the same approach was used, this time the script was sending HTTP requests to
the emulator website for both the reference implementation assembly files and the generated assembly
file compiled using the designed implementation in order to check that the behavior was the same.
In the extension part, the garbage collector and lexer logic were tested using unit tests in both
Java and C++. For memory and time profiling of the garbage collector, the binary files were run on a
Raspberry Pi under Valgrind to check memory leaks and to have a measure of the heap usage.
Although our group coordination was most of the time adequate, there are a couple of improvements
which could be made for further projects. The time spent before starting the implementation could be
increased because when deciding the whole architecture at once, not all the details were suitably covered,
so minor modification appeared frequently. A project diagram may be useful to outline each individual
part properly. Furthermore, to save time, refactoring must be made more frequently and comments
added into the code must be checked for explanations about unimplemented features.

Design Choices

The main design pattern used for the first two milestones was the Visitor Pattern, as it allowed the
separation between our conventions and algorithms from the data structures created by the ANTLR
framework. This design choice proved to be suitable because the return value of all the visit functions
was an instance of our own defined type classes, making this choice helpful for semantic checking by
simply comparing the results of the calls to the visit functions. In order to model the usage of rules
within the WACC language, inheritance was used to get access to the methods from the previous levels
of construction.
The static environment used in the Frontend and Backend made it easier to avoid the use of recursive
methods, to emulate the Visitor Interface and to adjust the manner of keeping the relevant information
for the other parts (for example, saving the Symbol Tables in Frontend).
Another design decision was taken for the Lexer extension. The Adapter allowed conversion from
our own defined Lexer Token to the Token type from ANTLR (see Integration with ANTLR and
Optimization section below).
The program also makes use of the Builder pattern. At different level of implementation in our
program we had to generate specific classes by various rules. For example, for generating tokens in rules,
we would instantiate a token as follows: new GoodToken().setValue(5).setType(Int).
Using principles derived from Hexagonal Architecture, we split the direct connection between the
Garbage Collector implementation and the Compiler implementation, a strategy that allowed unit testing
on the Mark and Sweep logic independent on the processor architecture.

4
4.1

Extension
Summary

In developing our extension we considered the variations in memory profile through our garbage collector and tried to compare the runtime overhead of ANTLRs Lexer with the one of our own implementation. The two extensions integrate with our project well, as we extended the compile command
to allow calling alternatives for the garbage collector and the lexer using the -gc and -l optional flags.
Demonstrative examples can also be called via the class MainLexer and running the profiling script with
one argument representing the name of the WACC file.

4.2

Lexer

The main purpose of this extension is the detailed study of efficiency of lexing strategies together with
an analysis of specific tool efficiency (ANTLR). The final product consists of a linear DFA-based cached
lexer with running times less than 1s.
After a careful case study of various lexers, we decided on building a linear lexer using finite automaton. To effectively build a finite automaton, we realized we have to build our own regex engine as the
Java default could not directly be used to build non-deterministic finite automatons. Due to big cohesion
between the lexer parts, we decided upon a rather unusual workflow and design: build all main parts
together due to efficiency reasons.
4.2.1

Regex engine

The lexer engine had to be as lightweight as possible in order to be simply translated to finite automaton with classical constructions. We took the original regex representation with 3 rules: composition,
repetition and alternation. To keep the lightweight design, we decided upon the form of the regex to be
a string that will be processed to a regex tree.
At a design level, the decision resulted in the class being strongly coupled with the class that wrapped
the grammar. The rules themselves had to be represented as Unicode characters in order for them not to
be confused with regular characters in the regex. Another design was tried, but the former one proved
to give a 0.5s speed improvement.
LL(1) parsing is used for transposing the rule into a regex tree node. The chosen method for
implementing LL(1) parsing is indirect recursion. Each type of node in a regex tree has its own private
method and calls the methods below it. The methods are called by the priority of the operator. This part
of the logic is simple and clean, holding a nice linear complexity. The final detail was keeping isomorphic
regex trees equal. The problem was solved using a small collision hash-code, keeping the comparison of
two isomorphic trees to O(N ). This does not affect the final complexity as the comparison appears in
few places.
4.2.2

NFA and DFA

To transform the regex tree to a non-deterministic finite automaton (NFA) we used Thompsons
construction. We followed the usual work-flow of the algorithms, returning a new NFA graph from each
rule application. This kept the linear complexity on construction. At this point we had a working lexer
that was tested using JUnit framework. The complexity of the matching were O(N M ) where N is the
size of the matched string and M is the length of the regex. This holds from Thompsons construction
as the number of edges are proportional with the length of the regex.
Deterministic finite automaton (DFA) construction was one of the most challenging parts of the extension. Power-set construction was implemented in-place, constructing a O(2N ) graph that would yield
a matching complexity of O(N ). The graph was generated in two phases, the actual building and tightening of the accepting states. Most of the algorithms involved here were done using breadth-first search
with fast killing. Those algorithms include finding the closure of a node, finding the closure transitions
and pinning the accepting states. Due to the nature of the grammar, the complexity overestimates the
number of nodes. More precisely, a partial grammar with over 40 rules that passes all tests could have
been represented in only 552 nodes. This yielded a runtime of less than 3s for construction and under
1s for any file compilation (times including code generation and parsing).
4.2.3

Integration with ANTLR and Optimization

The integration with ANTLR was done by an exhaustive study of the framework and building own
components without interfering with the automatically generated code. To do so, we implemented an
interface for a token source that was almost similar to our own lexer interface. The representation of a
token in our lexer was made by a pair of strings: the value and the type. To convert it to the ANTLRs
representation, we used an adapter that mapped a token from string to integer.
The construction of the DFA is redundant as it is the same for a fixed grammar. We cached the
construction by serialization of the dependent classes and using a cache for the serialized data. This
improved the time performance to under 1s.

4.2.4

Parallels ANTLRs lexer

The extension has some coupling between the grammar and the regex engine, but besides that it is
open to extension and closed to modifications. More effective caching can relate to the fact that the
caching can be done via Javas class loader, so the graph would have been loaded directly from RAM
after class loading. This is partially similar to the ANTLRs structure as it directly encodes the states
into the code of the parser. On the grammar and regex engine sides, our approach is better for the
given purpose as we are keeping a lightweight grammar that might improve performance with careful
implementation. The graph sizes are hard to compare as ANTLRs lexer is coupled with the parser,
but on our side the graph edges are represented by maps. This is a nice abstraction, but a big speed
deficiency as arrays would have been much more efficient for multiple calls. Finally, we can conclude
that keeping a nice code-base would decrease speed, so making a lexer and parser generator is the best
approach for generating fast products and keeping a clean workflow.

4.3

Mark-Sweep Garbage Collector

The reason we have chosen a Mark-Sweep algorithm is its power of removing cyclic dependencies
between memory blocks. We have come to the conclusion that this implementation would provide a
practical asymptotic computational complexity for a single collection of O(M (M + R)) where M
represents the number of heap-allocated memory blocks and R the number of nested references of a
block. We have chosen to do an object-oriented cross implementation using C++ function calls from the
ARM assembly generated file.
In order to interfere, the garbage collector needed to construct its cache of memory blocks and to
periodically crawl through and check which ones should be kept further. Therefore, we took the
decision to modify our code generator to call caching functions for newly allocated memory blocks.
Moreover, we dealt with the high level processes that cause changes to occur in the state of the heap:
1. Scoping
When entering a new scope we get hold of the stack pointer and the number of variables which
are on the stack. This way, we have set the bounds of this scope beyond which any memory block
belonging to it should be marked as not alive and all root blocks should be cached as non-root.
Any root block is reachable from the stack in the current scope.
2. Declaration
(a) Fresh heap allocation
On each memory allocation, an external function call to a cached malloc creates a new
memory block and adds it to the vector of blocks contained in the garbage collector.
(b) Stack variable declaration
The new stack variables address will be added in the set of stack addresses of the referenced
memory block, which will be marked as root.
3. Assignment
The address, references and block size of the right hand side member are transferred to the one
on the left hand side.
4.3.1

Heap Internal Representation

We have chosen to represent heap blocks by a C++ class containing household data members that
indicate its size in bytes, whether it is root or it should be collected. All stack addresses that reference
the block are stored in an ordered set providing logarithmic complexity for retrieval and ease on chopping
stack addresses when exiting a scope and all outwards references of that block are stored in an unordered
set, providing constant time complexity for reference retrieval. The graph of memory blocks is stored as
an adjacency list in the garbage collector class.

4.3.2

Collection Phase

Split in two phases: Mark and Sweep. During the first phase we perform depth-first search from each
node, marking each dead block as ready for collection. A block is dead when it cannot be reached from
the stack via any path. The second step is to remove all the cached blocks from the vector and to perform
the actual free of the heap address. This process does not allow multiple deallocations to occur.
4.3.3

Profiling

The product is a stop-the-world garbage collector with a modifiable invocation rate. For testing, at
every two allocations, the garbage mark-sweep algorithm is run over the heap blocks, checking whether
any of the remaining blocks are unreachable and therefore eligible for removal. One of the analyzed
WACC programs consists of two nested while loops, the outer one allocating a uni-dimensional array
int[]a = [1, 2, 3], and the inner one a bi-dimensional array int[][]b = [a] (See Figure 2). Running the
valgrind memory check tool on a Raspberry Pi, we obtained 854 allocs, 854 frees, 14,636 bytes
allocated. The result proves that not only the root blocks have been freed, but also their inner references.
Using the massif memory profiling tool, under the assumption that the number of cached memory blocks
and the number of actual heap blocks have the same lifetime, we obtained the graph in Figure 3. The
time-irregular snapshots show the memory consumption against the number of instructions executed.
One of the main difficulties in profiling and memory leak detection on cross-compiled code is lack of
reliable tools. To make a general idea of how much memory our garbage collector consumes, we made a
Python script that would lunch a sub-process of an infinite loop WACC program in qemu and measure
the amount of memory it takes. To do that we embedded bash command inside the Python script and
plotted the final results. This plots helped us detecting memory leaks and approximate the efficiency of
our program (see Figure 4 and Figure 5).

4.4

Further Development

The garbage collector can be improved firstly in terms of efficiency. As we have not used any caches
(for example scope caches for memory blocks), all operations on blocks involve a linear time complexity
look-up. A rather more intuitive approach would be to use a tree data structure for a better, logarithmic
complexity, of the mark-sweep algorithm. Modifying the work-flow could also yield an improvement
by using more advanced data structures such as disjoint data sets and randomized treaps (average
complexity of O(N log N ) where log refers to the inverse of the Ackermann function). Moreover, the
aforementioned caches can build a whole new functional paradigm of our garbage collector and transform
it into a generational one. This way, only some newer generations can face the collection process, thus
improving time overhead.
Running the garbage collector periodically in a stop-the-world manner causes a non-uniform use
of computer resources (CPU cores), therefore we consider an improvement on the side of making a
multi-threaded collector.
The lexer can be improved in efficiency by using more lightweight data structures such as arrays and
also embedding the state caching inside the sources for caching during class loading phase. The Frontend
can cover wider range of program constructs by simply extending the grammar and adapting the proper
visitor level functions. For a better quality code generator, we can potentially include an intermediate
phase that would produce smart register allocation through graph coloring scheme. Furthermore, that
would open opportunities for code optimization algorithms.

Conclusion

Overall, the project proved to be challenging, but it highlighted some key, fundamental concepts from
the compilers universe. There were notable improvements in the design choices and in the overall managing aspects compared to the previous projects. The extension milestone gave us a deeper understanding
of more advanced concepts and experimenting with possible implementations, making this experience
even more interesting.

Appendix

Figure 1: Programming languages used in this project

Figure 2: WACC Program Used for Profiling

Figure 3: Heap Usage Spike Due to Periodic Garbage Collection

Figure 4: Infinite Loop Fragment Memory Usage with GC

Figure 5: Infinite Loop Fragment Memory Usage without GC