Professional Documents
Culture Documents
Xingzhong Xu
Hong Man
Aug. 11
Outline
ANTLR
Abstract Syntax Tree
Code Equivalence (Code Re-hosting)
Future Work
What is ANTLR?
ANTLR, ANother Tool for Language
Recognition, is a language tool that
provides a framework for constructing
recognizers, interpreters, compilers,
and translators from grammatical
description containing actions in variety of
target languages.
-- antlr.org
3
Parser
Converting a sequence of tokens which generated
from the Lexer to determine its grammatical structure.
Example
=
=
=
=
[1 2 3; 4 5 6];
[1,2,3;4,5,6];
[M1;M2];
1;
7
ANTLR Grammar - I
M1 = [1 2 3; 4 5 6];
Statement
[Variable] [Equal] [Expression] [Semicolon] (optional)
Expression
[Left Square Bracket] [Matrix] [Right Square Bracket] or [one
digit]
Matrix
[Line] [Semicolon] [Line] [Semicolon] .
Line
[digit] [comma] (optional) [digit] [comma] (optional)
8
ANTLR Grammar - II
M1 = [1 2 3; 4 5
6];
10
M2 = [1,2,3;4,5,6];
11
11
12
M4 = 1;
13
14
Code Equivalence
In order to re-hosting the code
The proper rule to abstract the code.
The functionality of the code segment.
Methodology
Abstraction
Code Segmentation
Functionality Analysis
Replace the segment by equivalence code.
15
Current Method in CS
Syntax Tree based Comparison
Generate AST or other related abstract tree, perform
tree-matching algorithm.
Use hash function to mapping the tree structure and
simplify the algorithm.
Radom Test Comparison
Code Chopper, segment the code.
Randomly test the Input/Output behavior.
Schwartz-Zippel lemma, enough time of the test can
derive the functionality.
16
input[i];
for (i = 0; i < n ; )
acc0 += d_taps[i] *
input[i++];
i = 0;
while ( i < n )
acc0 += d_taps[i] *
input[i++];
i = 0;
for ( ; i < n ; )
acc0 += d_taps[i] *
n 1
input[i++];
yn hi xi
i 0
17
Ordinary AST
18
Modified AST
The ordinary AST is derived from the
programming grammar level.
Following the idea of the semantic signal
processing. For example, in signal
processing domain abstraction:
For, While, do while -> LOOP
+=, VAR = VAR + whatever ->
ACCUMLATE
19
20
for (i = 0; i < n; )
acc0 += d_taps[i] *
input[i++];
21
i = 0;
while ( i < n )
acc0 += d_taps[i] *
input[i++];
i = 0;
for ( ; i < n ; )
acc0 += d_taps[i] *
input[i++];
22
Code Equivalence
Objection: From the syntax tree to
determine the code segments are
equivalence.
Abstraction
Tree matching.
23
From gr_adaptive_fir_ccf.cc
24
Abstraction
The basic element for the simplest filter
include:
LOOP
ACCUMLATION
MULTIPLY
ARRAY
MOVING INDEX
25
26
Future Work
Using ANTLR generate other language
Lexer and Parser for language
recognition.
Abstract the language into Cognitive
Linguistic Modeling.
Find proper method to perform a similarity
tree pattern recognition.
27
Reference
1.
2.
3.
4.
5.
6.
7.
28