You are on page 1of 40

CS3215 – Software

Engineering Project
Team 8 – Static Program Analyzer
Presentation – Purpose
 To highlight USPs of Team8’s SPA
 “To show how unique we are and why we chose to
be so”

 “The Missing Piece” to our Final Report
 Few ideas could have been overlooked.
 Also provides clearer explanation to topics
covered briefly
 An Addendum to the Report
 Clearer understanding of our ideas, project
Presentation – Overall Achievements
 All Project Requirements have been met.
 Modifies, Uses and Calls – Data Structures in the
PKB
 Parent, Follows, Next, Affects – On Demand
 Clear and Efficient QES.
 Optimizations both from PKB and PQL.
 Flexible and Extensible SPA
Presentation - Agenda
 Team PKB (First Half)
 Parser
 PKB Design
 PKB API - The ‘Narrow Passage Approach’ 
 PKB Optimization

 Team PQL (Second Half)
 Query Processor
 Query Evaluation Strategy
 Query Optimization Principles
The Parser – Basic
 Approach
 Recursive subroutines
 Token–based

 Robust
 Capable of handling parse exceptions
 Detailed Reporting using the global Exception
class
The Parser – Exceptions
 Exceptions reported
 Token Mismatch
 Reports found and expected tokens and the line number
 Invalid Variable Name
 Empty Program or Empty Statement List
 Procedure calling itself (Direct Recursion)
 Called procedure is not found
 Multiple procedures having the same identifier (name)

 Exceptions API is used consistently
The Parser – Exception Class
 Generic, global class
 Inherited classes used both in the Parser and the
Query Pre-processor
 Exception: One of the minor reusable components
of our SPA
PKB – Overview (1/3)

‘Crucial-Entity’ Relationship
CacheTable
Tables Tables

VarTabl ProcTa StmtTa ModTab UsesTa CallsTabl
e ble ble le ble e

VarTable ProcTabl StmtTab Modifies Uses API Calls API
API e API le API API
PKB – Overview (2/3)

Abstract Syntax Control Flow
Tree Graph

CFG API
AST API

Follows Parent Affects
Next API
API API API
PKB – Overview (3/3)
 Crucial Entities Ec is a subset of all entities Eall (from
the Entity Table)
 Conditions:
 One of the arguments for any of the relationships in the
Relationships Table
 Cannot be a ‘derivative’ of another Ec. Eg. StmtList is a Statement

 Design Focus
 Speed - Data structures must enable quick data retrieval
thereby aiding in the speed of the query in a small way
 Extensibility

 Design Choice – Hash Maps 
PKB – Variable Table (1/3)
 ‘Two dimensional Hash’
 Index points to variable and variable points to a structure that includes index
 A KeyMapper keeps track of the keys
 Speed is guaranteed 

 Data is handled by the Modifies, Uses (friends) and Variable Table APIs

 Extensible – Can be easily extended to handle new relationships
 Take a look at the diagram (next slide)

 Procedure Table is quite similar except for Calls.
PKB – Variable Table (2/3)
Variable Variable Vector
Name
Index Stmts_used Stmt_modfd Proc_used Proc_modfd

0 4, 5, 7, 8, 9 4, 5, 14, 15, 18 P, Q Example, P
x

y 1 17, 18, 19 - R -

z 2 - 4, 5, 7, 8, 9 - P

i 3 24, 25 4, 5, 7, 8, 9 P R, Q
PKB – Variable Table (3/3)
 Variable Table = Hash_Map(Variable Name, Variable Vector)
 Extensibility
 Assume new relationship Coexist(variable, variable)
 So, there is a relationship R such that entity ‘variable’ is one of
the arguments.
 Two new columns need to be added to the Variable Table.
 variable_coexisting, variable_coexisted
 Generic View: <entity_relationship>
 “Structure accommodates change” – Extensible design
PKB – ModifiesTable (1/3)
 ‘Hash Map with a Vector Key’
 Key is a vector of two values (variable being modified and identifier of the
entity modifying the variable (From the entity table)
 Speed is guaranteed 

 Key is mapped to a Boolean Vector.
 Conserves Space. Each element = 1 bit only.

 Extensible – Can be easily extended to handle new entities
 Take a look at the diagram (next slide)

 UsesTable, Calls Table are quite similar
PKB – Modifies Table (2/3)
Modifies Key Modifies Vector

VarName Modifier_ID Boolean Vector

x 1 Taken from Entity 110
Table,
1 = Procedure, 3 =
Stmt
x 3 111101010110101010101000000
ith bit here corresponds to whether the ith statement
modifies variable ‘x’ or not

y 1 001

Varname 3
Taken from Variable Table
y 0000000000000000000000100
PKB – ModifiesTable (3/3)
 Modifies Table = Hash_Map(<vector> Modifies Key, Modifies Vector)
 Extensibility
 Assume new entity Function such that Modifies is extended to
include Modifies(Function, variable)
 Modifies Table neither be changed nor a new Modifies Table
be created
 “Structure accommodates change” – Extensible design
PKB API – The ‘Narrow Passage’
Approach (1/3)
 PKB API methods are covered by a ‘wrapper interface’
 Query Processor Access
 Restricted to the Entity Table, Relationship Table and the ‘wrapper’
interface – ‘Narrow Passage’
 Interface
 relationshipHandler(relationship, <vector> arguments)
 withHandler(<vector> arguments)
 patternHandler(<vector> arguments, pattern)

 Design - “Can be detached and added to the Evaluator”
PKB API – The ‘Narrow Passage’
Approach (2/3)
 Query Processor independent of the PKB subcomponents.
 Useful Scenario: Data structures are added. API widens.
 Query Processor needs to change if not for this approach.
 With NPA, PKB adjusts or accommodates itself
 Query Processor just looks at the change in the Relationship Table

 Minimizing PKB’s public API methods
 ‘Taking burden off the Evaluator’

 Easier to cache relationship calls in the PKB.
 Covered under PKB’s Cache Table.
PKB API – The ‘Narrow Passage’
Approach (3/3)
 relationshipHandler(relationship, <vector>
arguments)
Relationship index An argument has an
from the Relationship entity from the Entity
table Table and a value

 Ex. relationshipHandler(2, { {1, 12}, {v, 11} })
 It’s a call to the Modifies relationship with
arguments, a constant with a value 1 and a non-
constant variable
 Modifies(1,v)
PKB Optimization – Cache Table (1/4)
 Traditional view of PKB – “Static Knowledge Base”
 Cache – Missing ‘Dynamic Knowledge’ Component
 Dynamic Knowledge
 Derived Knowledge or ‘Learn from Experience’ principle
 Alternative to storing the Follows, Parent, Affects and
Next relationship calls in the PKB in separate data
structures. (Why?)
 Caching is done in the relationshipHandler() interface
method.
 Controlled by Global Parameters (constants)
PKB Optimization – Cache Table (2/4)
 Eliminating ‘computation on demand’
 By pre-computing and storing all design abstractions in the PKB
 Not elegant
 Space – Doesn’t work out in the real world!
 Time – Quite high for Affects*
 Extensible – Difficult to extend when RelTable increases in size
 Is it worth it?
 No 

 Fails when Q is a small for a program size of large N
PKB Optimization – Cache Table (3/4)
 Pre-computation time and PKB space depend of the program size
 S, T is proportional to N

 Fails as Pre-computation time and PKB space depend of the program size
and number of queries, Q
 S, T is proportional to Q, N

 Our Cache Table introduces the Q factor for design abstractions computed on
demand – “Don’t pre-compute, store when calculated”
PKB Optimization – Cache Table (4/4)
 Global Parameters
 CACHE, 0 or 1
 CACHE_MAX_QUERIES, 0 to max
 MIN_CALLS_CACHE_OPEN, 0 to max
 Extremely useful when the User knows the number of queries that will be entered in a
particular ‘Querier’ session. (Which is the case mostly)

 Cache miss doesn’t cost much as the Cache table is also a hash map.
Cache Table = Hash_map(QueryObj, QueryResult);

We call it ‘Query’ but it
refers to the relationship
call object and relationship
call result.
PKB Optimization – Other
 Restriction performed on Sets using the Entity Table
 The Entity Table included the complete set of values for each entity
 This set was pre-computed during parsing.
 Used to perform restriction to save on computation time.
 Ex. For Affects/Affects*, only assignment statements were to be used.
So, a intersection between all values of h
Program lines in which all
instances of the entity are
Entity Table found
Index Name Attribute Type Values
0 program progName name
1 procedure procName name P, Q, Example

2 stmtList
3 stmt stmt# integer 1, 2, 3 … 25
4 assign stmt# integer 2, 4, 5, …
5 call stmt# integer 4, 3
6 while stmt# integer 6, 9
7 if stmt# integer 10
8, 9, 10 plus, minus,
times
11 variable varName name x, y, z, i
12 constant value value
13 program line stmt# integer 1, 2, 3 … 25
Number of fields under Type, Entity
varies with the number of arguments
Relationship Table
Index Name Arguments Type1 Type2 Entity1 Entity2

0 Calls 2 name name 1 1
1 Calls* 2 name name 1 1
2 Modifies 2 both name 1, 3, 4, 5, 6, 7, 11
13

3 Uses 2 both name 1, 3, 4, 5, 6, 7, 11
13

4 Parent 2 integer integer 3, 6, 7, 13 3, 4, 5, 6, 7, 13

5 Parent* 2 integer integer 3, 6, 7, 13 3, 4, 5, 6, 7, 13

6 Follows 2 integer integer 3, 4, 5, 6, 7, 13 3, 4, 5, 6, 7, 13

7 Follows* 2 integer integer 3, 4, 5, 6, 7, 13 3, 4, 5, 6, 7, 13

8 Next 2 integer integer 3, 4, 5, 6, 7, 13 3, 4, 5, 6, 7, 13
Query Evaluation Strategy - Interface
Methods
 getValues(Entity) from Entity Table
 relationshipHandler(Name, Arg1, Arg2)
 patternHandler(Arg1, Arg2)
 withHandler(Arg1, Arg2)
Query Evaluation Strategy
Assume ‘Select’ variables in the form ‘Select
<s1, s2, …, sn>’)
 Step 1: getValues(), all values of s1, s2, …, sn.
 Step 2: patternHandler() and withHandler(),
filter ‘Select’ values.
 Step 3: Form combinations of output results.
[e.g. s1={4,5}, s2={6,7}. <s1, s2> will form {4
6, 4 7, 5 6, 5 7}.]
Query Evaluation Strategy
 Store non-’Select’ values in memory to aid in
finding ‘Select’ values as query answer.
 E.g. “program line n; assign a; Select a such
that Next*(13, n) and Affects*(a, n)”
 relationshipHandler() to find and store values
of n into memory, then use
relationshipHandler() to find values of a, given
n.
Query Evaluation Strategy
 Constants: 2, 3, 5, “Example”, “x”
 Non-constants: declaration variables and ‘_’

 Step 4: For each ‘Relationship’ clause,
 Case 1: [Relationship](constant, constant)
 Case 2: [Relationship](constant, non-constant)
 Case 3: [Relationship](non-constant, constant)
 Case 4: [Relationship](non-constant, non-constant)
Query Evaluation Strategy
 Case 1

 Case 2 and 3,
a) Non-constant is placeholder.
b) Non-constant is in ‘Select’ values.
c) Non-constant is not in ‘Select’ values.
Query Evaluation Strategy
 Case 4,
a) Both non-constant are the same
i. Placeholder
ii. ‘Select’ values
iii. Non-’Select’ values
b) Both non-constant are different
i. Placeholder in either argument
ii. Both arguments in ‘Select’ values
iii. ‘Select’ value in either argument
iv. Both arguments not in ‘Select’ values
Query Evaluation Strategy
 Variables not in ‘Select’ and ‘Relationship’
clauses, but are in ‘With’ and ‘Pattern’
clauses.
 E.g. “Stmt s, s1; Select s such that
Follows(2,s) with s1.stmt#=5”
Query Pre-Processor
 Validates queries
 Transforms query string into query tree for
efficiency of query optimization and evaluation
Example Query
 assign a, a1; stmt s, s1; Select <a, s> such
that Follows(a, s) and Next*(a1, s1) with
s.stmt# = 1 pattern a(_, _”x+z”_)
Query Validation
 Rules for PQL:
 Grammar table using static regular expressions
 Rules for relationships and entities stored in static
“tables”, e.g. RelTable, EntTable
 Rules are not “hardcoded”
Query Validation
 Check full syntax of query
 All variable synonyms are checked against
declared variable map
 Relationships are checked for correct number
of arguments, correct type of arguments, and
non-ambiguity
 Attribute references must correspond to those
of variable synonyms
Query Parsing
 Regex expressions in grammar table are used
to extract parts of query
 Parsing is recursive, similar to parsing of
SIMPLE source code
 As parsing is done, query tree is built
PQL Optimization
 Basic strategy
 Reordering relationships for optimal linear
evaluation of relationships from left to right
 Left: Most restrictive
Right: Least restrictive
 Priority given to joins between relationships
compared to crosses between relationships
PQL Optimization
 Order by relationship types
 Follows vs. Affects
 Order by number of occurrences for
relationship argument types
 Follows*(a1, a2) vs. Follows*(a1, s1)
 Order by combinations of variables and
constants as relationship arguments
 Modifies(“First”, “x”) vs. Modifies(p, v)
 Order by number of output variables with
relationship arguments
 Select <a, s> …; Follows(a1, s1) vs. Follows(a, s)