Professional Documents
Culture Documents
Problem Overview: Language Analysis
Problem Overview: Language Analysis
Language Analysis
S.Takahashi
Problem
File.java File.class
???????
S.Takahashi
What matters?
Well written
Works correctly
S.Takahashi
Syntax
Correct form
Formalism for describing correct form
Grammars - rules that describe how to
generate all correct programs
S.Takahashi
Semantics
Compilation stages
)
=
w
e
l
(
5
+
i
x
h
=
<i Lexer WHILE
Cerrar_P
ID("x")
INT(5)
ID("x")
PLUS
ABRIR_P
ID(“i”)
PYC
LTE
EQ
; ID("i")
S.Takahashi
*t
55=
+
p
m)
w
e
l
(
<
5
i
x
hi Lexer WHILE
ID(“temp”)
INT(55)
Cerrar_P
ID("x")
INT(5)
ID("x")
PLUS
ID("i")
ABRIR_P
ID(“i”)
LT
EQ
e star
S.Takahashi
Example
Example
tokens result
Parser
OK!!!!
(
Num(34)
+
Num(36)
)
*
Num(40)
S.Takahashi
Example
tokens result
Parser
2800
(
Num(34)
+
Num(36)
)
*
Num(40)
S.Takahashi
Example
tokens result
Parser *
(
Num(34)
+ + 40
Num(36)
) 34 36
*
Num(40)
S.Takahashi
Example
tokens Result
Parser
(
Num(34) Push 34
+ Push 36
Num(36) Plus
) Push 40
* Times
Num(40)
S.Takahashi
Definition: Alphabet
Example
λ
S.Takahashi
Example
A= {a,b,c}
aab
acab
λ
S.Takahashi
Operations on strings
Length: #
#.λ = 0
#.σβ = 1 + #.β
Concatenation: ωβ
λ
λβ = β
(σω)β = σ(ωβ)
S.Takahashi
Definition: Language
Set of strings
All possible words of an
S.Takahashi
alphabet
All strings of length 0
All strings of length 1
All strings of length 2
All strings of length 3
….
All strings of length n
….
S.Takahashi
Generalizing
All strings of length zero: A0 = {λ}
All strings of length k+1 :
Ak+1 = {σω :σ A, ω Ak }
All strings of length greater than or
equal to zero.
All strings of length greater than zero
S.Takahashi
syntax
We need a formalism
GRAMMARS
S.Takahashi
Definition: Grammar
Definition: Production
• (NT)*
• β (NT)*
S.Takahashi
Derivable
Given two strings 0 and n, we say that n can be derived
(in zero or more steps) from 0,
0 * n
If starting with 0 we can apply derivations to get to n
Formally: 0 * n
• if 0 = n
• If there is a string such that:
0 y * n
S.Takahashi
Generated Language
An example:
Example: Arithmetic
S.Takahashi
expressions
– A number is an arithmetic expression
– A variable is an arithmetic expression
– If A and B are arithmetic expressions the
following are also arithmetic expressions:
Þ A+B
Þ A*B
Þ A-B
Þ A/B
Þ A%B
Þ (A)
Þ -A
S.Takahashi
The Grammar:
G =({E}, {num,id,+,(,),-}, E, P)
P:
– Eid
– E num
– EE+E
– EE*E
– EE-E
– EE/E
– EE%E
– E(E)
– E -E
S.Takahashi
The Grammar:
G =({E,T,F,S}, {num,id,+,(,),-}, E, P)
P:
– EE+T
– ET
– T T*F
– TF
– FS
– F-S
– S(E)
– Sid
– S num
S.Takahashi
EE+T
ET
Ejemplo 35 + (10+30*4) T T*F
TF
E E+T E + ( E + num(30) * num(4) ) F(E)
F id
E+F E + ( T + num(30) * num(4) ) F num
E+( E )
E + ( F + num(30) * num(4) )
E+( E+T ) E + ( num(10) + num(30) * num(4) )
G=(N,T,S,P)
Unrestricted Grammar
Context Sensitive Rules are of the form: β where:
• #. ≥ #.β
Grammar • S l is allowed , only if S does not appear
on the right of any production tule
Context Free Grammars Productions are of the form A β where:
• A N , β (NT)*
Regular Grammars
Right regular X aX
• Left side is a single non-terminal; right side X Y
contains at most one non terminal and it is the X λ
last symbol on the right.
Y bbY
Rules are of the form A B or A Yc
where A N, B N , T*
Left regular X Xa
• Left side is a single non-terminal; right side X Y
contains at most one non terminal and it is the
first symbol.
X λ
Y Ybb
Rules are of the form A B or A Yc
where A N, B N , T*
S.Takahashi
Context-free grammars
S (L) X aX
X aXY
Sn X Y
X λ
L λ X λ
Y Ybb
LLS Y Ybb
Yc
Yc
S.Takahashi
Context-sensitive grammars
Right-hand side has to be longer than left-hand side. Unless left-hand side
is the start symbol and it does not appear on the left-hand side.
S (L) X aX X aXY
Sn X Y X λ
L λ X λ Y Ybb
LLS Y Ybb Yc
Yc
S.Takahashi
Context-sensitive grammars
S (L) X aX X aXY
Sn X Y X λ
L λ X λ Y Ybb
L LS Y Ybb Yc
Yc
S.Takahashi
Context-sensitive grammars
S λ S λ
S()
X aX S X
S (L) S (L) X aXY S X
X Y X aX X aXY
Sn Sn X λ
X λ Xa
L λ L LS Y Ybb X aY
LLS L S Y Ybb X Y Yc Y Ybb
Yc Y Ybb Yc
Yc
S.Takahashi
Abstract Machines
Turing Machine
• Finite number of states
• Infinite Tape
• At each step, depending
on the current state and
what is being read, it
overwrites the symbol
that was read, moves to
the right or to the left
and switches state.
S.Takahashi
Abstract Machines
Abstract Machines
Pushdown automaton
• Finite number of states
• Infinite tape, but starts with a finite
string, and cannot write on the tape.
• May have a separate tape for writing,
but it cannot read what it writes.
• Has a stack, it can read the top
elements of the stack, push onto the
stack and pop from the stack.
• At each step, depending on the current
state, what is being read, and symbols
on the top of the stack, it advances on
the input tape, pops from or pushes
onto the stack and switches state. If
there is an output tape, it also writes on
the output tape.
S.Takahashi
Abstract machines
Finite automaton
• Finite number of states
• Infinite tape, but starts with a finite
string, and cannot write on the tape.
• May have a separate tape for writing,
but it cannot read what it writes.
• At each step. Depending on the current
state, what is being read, it advances
on the input tape, and switches states. If
there is an output tape, it also writes on
the output tape.
S.Takahashi
Chomsky Hierarchy