You are on page 1of 28

ANTLR in SSP

Xingzhong Xu
Hong Man
Aug. 11

Outline

ANTLR
Abstract Syntax Tree
Code Equivalence (Code Re-hosting)
Future Work

What is ANTLR?
ANTLR, ANother Tool for Language
Recognition, is a language tool that
provides a framework for constructing
recognizers, interpreters, compilers,
and translators from grammatical
description containing actions in variety of
target languages.
-- antlr.org
3

Why use ANTLR?


SSP
Looking for a framework to understanding the
signal processing source code semantically.

Classical analysis method in CS


Code Recognizer: Lexer & Parser
Interpreter: Cognitive Linguistic Modeling &
other syntax tree
Translator: code re-hosting, different target

How ANTLR Work? - I


Lexer
Converting a sequence of characters into a sequence
of tokens.

Parser
Converting a sequence of tokens which generated
from the Lexer to determine its grammatical structure.

Abstract Syntax Tree


Tree representation of the abstract syntactic structure
of source code.
The syntax is abstract which means it does not
represent every detail of the real syntax.
5

Example

How ANTLR Work? - II


In order to generate the Lexer, Parser and
AST. We need analyze the structure of the
target code and write related ANTLR
grammar.
Example: Matrix Declaration in Matlab
M1
M2
M3
M4

=
=
=
=

[1 2 3; 4 5 6];
[1,2,3;4,5,6];
[M1;M2];
1;
7

ANTLR Grammar - I
M1 = [1 2 3; 4 5 6];

Statement
[Variable] [Equal] [Expression] [Semicolon] (optional)

Expression
[Left Square Bracket] [Matrix] [Right Square Bracket] or [one
digit]

Matrix
[Line] [Semicolon] [Line] [Semicolon] .

Line
[digit] [comma] (optional) [digit] [comma] (optional)
8

ANTLR Grammar - II

Abstract Syntax Tree

M1 = [1 2 3; 4 5
6];

10

Abstract Syntax Tree

M2 = [1,2,3;4,5,6];

11
11

Abstract Syntax Tree


M3 = [M1;M2];

12

Abstract Syntax Tree

M4 = 1;

13

AST Example from GNU-Radio


Using ANTLR, some example from GNU-Radio
code has been tested.
http://sites.google.com/site/stevensxingzhong/

14

Code Equivalence
In order to re-hosting the code
The proper rule to abstract the code.
The functionality of the code segment.

Methodology
Abstraction
Code Segmentation
Functionality Analysis
Replace the segment by equivalence code.
15

Current Method in CS
Syntax Tree based Comparison
Generate AST or other related abstract tree, perform
tree-matching algorithm.
Use hash function to mapping the tree structure and
simplify the algorithm.
Radom Test Comparison
Code Chopper, segment the code.
Randomly test the Input/Output behavior.
Schwartz-Zippel lemma, enough time of the test can
derive the functionality.
16

Simplest Filter Example


Take the simplest filter as an example, following code
segments have exactly same functionality.
for (i = 0; i < n; i++)
acc0 += d_taps[i] *

input[i];

for (i = 0; i < n ; )
acc0 += d_taps[i] *

input[i++];

i = 0;
while ( i < n )
acc0 += d_taps[i] *

input[i++];

i = 0;
for ( ; i < n ; )
acc0 += d_taps[i] *

n 1

input[i++];

yn hi xi
i 0

17

Ordinary AST

for (i = 0; i < n; i++)


acc0 += d_taps[i] * input[i];

18

Modified AST
The ordinary AST is derived from the
programming grammar level.
Following the idea of the semantic signal
processing. For example, in signal
processing domain abstraction:
For, While, do while -> LOOP
+=, VAR = VAR + whatever ->
ACCUMLATE
19

Simplest Filter Example


for (i = 0; i < n; i++)
acc0 += d_taps[i] * input[i];

20

Simplest Filter Example

for (i = 0; i < n; )
acc0 += d_taps[i] *

input[i++];

21

Simplest Filter Example

i = 0;
while ( i < n )
acc0 += d_taps[i] *

input[i++];

i = 0;
for ( ; i < n ; )
acc0 += d_taps[i] *

input[i++];
22

Code Equivalence
Objection: From the syntax tree to
determine the code segments are
equivalence.
Abstraction
Tree matching.

Perform code re-hosting.

23

Simplest Filter Example


for (int i = 0; i < noutput_items; i++) {
gr_complex sum(0.0, 0.0);
for (k = 0; k < l; k++)
sum += d_taps[l-k-1]*in[j+k];
out[i] = sum;
}

From gr_adaptive_fir_ccf.cc

24

Abstraction
The basic element for the simplest filter
include:
LOOP
ACCUMLATION
MULTIPLY
ARRAY
MOVING INDEX

25

Similarity Tree Pattern


No abstraction can guarantee the same
functional code have precisely same abstraction
form. Therefore, we need perform a similarity
tree pattern recognition.
Similar enough to determine the equivalence

26

Future Work
Using ANTLR generate other language
Lexer and Parser for language
recognition.
Abstract the language into Cognitive
Linguistic Modeling.
Find proper method to perform a similarity
tree pattern recognition.

27

Reference
1.
2.
3.
4.

5.

6.

7.

Terence Parr, The Definitive Antlr Reference: Building Domain-Specific


Language (Pragmatic Programmers), 2007
http://www.antlr.org
http://www.stringtemplate.org
Jiang L. and Su, Z. 2009. Automatic Mining of functionally equivalent code
fragments via random testing. In Proceedings of the Eighteenth
international Symposium on Software Testing and Analysis
Gabel, M., Jiang, L., and Su, Z. 2008. Scalable detection of semantic
clones. In Proceedings of the 30th international Conference on Software
Engineering
C.K. Roy, J.R. Cordy and R. Koschke B. 2009. Comparison and Evaluation
of code Clone Detection Techniques and Tools: A Qualitative Approach.
Science of Computer Programming
Bertran, M., Babot, F., and Climent, A. 2005. An Input/Output Semantics
for Distributed Program Equivalence Reasoning. Electron. Notes Theor.
Comput. Sci. 137, 1 (Jul. 2005)

28

You might also like