You are on page 1of 46

Problem Overview

Language Analysis
S.Takahashi

Problem

File.java File.class

???????
S.Takahashi

What matters?
 Well written
 Works correctly
S.Takahashi

Syntax

 Correct form
 Formalism for describing correct form
 Grammars - rules that describe how to
generate all correct programs
S.Takahashi

Semantics

 Meaning of syntactic structures


 Describes programs’ behavior
 Describes how programs should be
translated
S.Takahashi

Compilation stages

Characters tokens result


Lexer Parser
S.Takahashi

while (i<=5) x = x+i ;

)
=
w
e
l
(
5
+
i
x
h
=
<i Lexer WHILE
Cerrar_P
ID("x")
INT(5)
ID("x")
PLUS
ABRIR_P
ID(“i”)
PYC
LTE
EQ
; ID("i")
S.Takahashi

while (i<5) x = x+i 55 temp *

*t
55=
+
p
m)
w
e
l
(
<
5
i
x
hi Lexer WHILE
ID(“temp”)
INT(55)
Cerrar_P
ID("x")
INT(5)
ID("x")
PLUS
ID("i")
ABRIR_P
ID(“i”)
LT
EQ
e star
S.Takahashi

Example

Chars tokens result


Lexer Parser
(
Num(34)
+
Num(36)
(34+36)*40
)
*
Num(40)
S.Takahashi

Example

tokens result
Parser
OK!!!!
(
Num(34)
+
Num(36)
)
*
Num(40)
S.Takahashi

Example

tokens result
Parser
2800
(
Num(34)
+
Num(36)
)
*
Num(40)
S.Takahashi

Example

tokens result
Parser *
(
Num(34)
+ + 40
Num(36)
) 34 36
*
Num(40)
S.Takahashi

Example

tokens Result
Parser
(
Num(34) Push 34
+ Push 36
Num(36) Plus
) Push 40
* Times
Num(40)
S.Takahashi

Definition: Alphabet

A finite set of symbols


 {a, b, c, d}
 {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
 {identifier, number, +,-,*, /}
 {while, if, ‘{‘, ‘}’, >, >= , < , <= }
 { TransType, BranchId,
AccountNumber,Amount}
S.Takahashi

Defintion: word or sequence

Finite sequence of symbols

Recursive definition (String over A)

• The empty string (λ) is a string over any alphabet


• If β is a string of A and σ  A then σβ is a string of A
S.Takahashi

Example

A= {id, num, +,-,*, /,(,)}

id * (id + id*num – num)

id * (id + id*num id id num( )

λ
S.Takahashi

Example

A= {a,b,c}

aab

acab
λ
S.Takahashi

Operations on strings
Length: #

#.λ = 0
#.σβ = 1 + #.β

Concatenation: ωβ
λ
λβ = β
(σω)β = σ(ωβ)
S.Takahashi

Definition: Language

Set of strings
All possible words of an
S.Takahashi

alphabet
 All strings of length 0
 All strings of length 1
 All strings of length 2
 All strings of length 3
 ….
 All strings of length n
 ….
S.Takahashi

Generalizing
 All strings of length zero: A0 = {λ}
 All strings of length k+1 :
Ak+1 = {σω :σ  A, ω  Ak }
 All strings of length greater than or
equal to zero.
 All strings of length greater than zero
S.Takahashi

Definition: Language over A


L is a language of A if L is a subset of A *.
How to define a language’s
S.Takahashi

syntax
 We need a formalism

GRAMMARS
S.Takahashi

Grammars: formalism for


definining syntax
Defines characteristics of strings in the
language

Can be used to determine if a program is


syntactically correct

May also be used in reverse engineering


S.Takahashi

Definition: Grammar

A grammar is a quadruple (N,T,S,P) where:


N : finite set of non terminal symbols ,
T : finite set of terminal symbols ,
S : a non-terminal symbol (S  N) called the start
symbol
P: set of productions.
S.Takahashi

Definition: Production

A rule of the form


β
Where:

•  (NT)*
• β  (NT)*
S.Takahashi

Definition: One step derivation


Given a grammar with a production: ω  .
If we have a string ω, we can obtain ,
replacing  for ω.
We say that  is derived in one step from ω
We write: ω .
S.Takahashi

Derivable
Given two strings 0 and n, we say that n can be derived
(in zero or more steps) from 0,
0 * n
If starting with 0 we can apply derivations to get to n
Formally: 0 * n
• if 0 = n
• If there is a string  such that:
0   y  * n
S.Takahashi

Generated Language

Given a grammar: G=(N,T,S,P), the language


generated by G is the set of all strings of
terminal symbols that can be derived from the
start symbol.

L(G) = {:  T*, S *  }


S.Takahashi

An example:
Example: Arithmetic
S.Takahashi

expressions
– A number is an arithmetic expression
– A variable is an arithmetic expression
– If A and B are arithmetic expressions the
following are also arithmetic expressions:
Þ A+B
Þ A*B
Þ A-B
Þ A/B
Þ A%B
Þ (A)
Þ -A
S.Takahashi

The Grammar:
 G =({E}, {num,id,+,(,),-}, E, P)
 P:
– Eid
– E  num
– EE+E
– EE*E
– EE-E
– EE/E
– EE%E
– E(E)
– E -E
S.Takahashi

The Grammar:
 G =({E,T,F,S}, {num,id,+,(,),-}, E, P)
 P:
– EE+T
– ET
– T  T*F
– TF
– FS
– F-S
– S(E)
– Sid
– S  num
S.Takahashi

EE+T
ET
Ejemplo 35 + (10+30*4) T  T*F
TF
E  E+T  E + ( E + num(30) * num(4) ) F(E)
F  id
 E+F  E + ( T + num(30) * num(4) ) F  num

 E+( E )
 E + ( F + num(30) * num(4) )
 E+( E+T )  E + ( num(10) + num(30) * num(4) )

 T + ( num(10) + num(30) * num(4) )


 E+( E+T*F)
 F + ( num(10) + num(30) * num(4) )
 E + ( E + T * num(4) )
 num(35) + ( num(10) + num(30) * num(4) )
 E + ( E + F * num(4) )
Types of grammars:
S.Takahashi

G=(N,T,S,P)
 Unrestricted Grammar
 Context Sensitive Rules are of the form: β   where:
• #.  ≥ #.β
Grammar • S  l is allowed , only if S does not appear
on the right of any production tule
 Context Free Grammars Productions are of the form A  β where:
• A  N , β  (NT)*

 Regular Grammars Must be context free and:


– Right regular Productions are of the form A  B or A   where:
• A  N, B N ,   T*

– Left Regular Productions are of the form A  B or A   where::


• A  N, B N ,   T*
S.Takahashi

Regular Grammars
 Right regular X  aX
• Left side is a single non-terminal; right side X Y
contains at most one non terminal and it is the X λ
last symbol on the right.
Y  bbY
Rules are of the form A  B or A   Yc
where A  N, B N ,   T*

 Left regular X  Xa
• Left side is a single non-terminal; right side X Y
contains at most one non terminal and it is the
first symbol.
X λ
Y  Ybb
Rules are of the form A  B or A   Yc
where A  N, B N ,   T*
S.Takahashi

Context-free grammars

Left-hand side is a single non-terminal

S (L) X  aX
X  aXY
Sn X Y
X λ
L λ X λ
Y  Ybb
LLS Y  Ybb
Yc
Yc
S.Takahashi

Context-sensitive grammars
Right-hand side has to be longer than left-hand side. Unless left-hand side
is the start symbol and it does not appear on the left-hand side.

S (L) X  aX X  aXY
Sn X Y X λ
L λ X λ Y  Ybb
LLS Y  Ybb Yc
Yc
S.Takahashi

Context-sensitive grammars

Right-hand side has to be longer than left-hand side.


Unless left-hand side is the start symbol and it does not
appear on the left-hand side.

S is the start symbol X is the start symbol X is the start symbol

S  (L) X  aX X  aXY
Sn X Y X λ
L λ X λ Y  Ybb
L LS Y  Ybb Yc
Yc
S.Takahashi

Context-sensitive grammars

Right-hand side has to be longer than left-hand side. Unless left-


hand side is the start symbol and it does not appear on the left-
hand side.

S λ S λ
S()
X  aX S X
S  (L) S  (L) X  aXY S  X
X Y X  aX X  aXY
Sn Sn X λ
X λ Xa
L λ L LS Y  Ybb X  aY
LLS L S Y  Ybb X  Y Yc Y  Ybb
Yc Y  Ybb Yc
Yc
S.Takahashi

Abstract Machines

Turing Machine
• Finite number of states
• Infinite Tape
• At each step, depending
on the current state and
what is being read, it
overwrites the symbol
that was read, moves to
the right or to the left
and switches state.
S.Takahashi

Abstract Machines

Linearly bounded automaton


• Finite number of states
• Infinite Tape but it is bounded at
a certain point to the left.
• At each step, depending on the
current state and what is being
read, it overwrites the symbol
that was read, moves to the right
or to the left and switches state.
S.Takahashi

Abstract Machines
Pushdown automaton
• Finite number of states
• Infinite tape, but starts with a finite
string, and cannot write on the tape.
• May have a separate tape for writing,
but it cannot read what it writes.
• Has a stack, it can read the top
elements of the stack, push onto the
stack and pop from the stack.
• At each step, depending on the current
state, what is being read, and symbols
on the top of the stack, it advances on
the input tape, pops from or pushes
onto the stack and switches state. If
there is an output tape, it also writes on
the output tape.
S.Takahashi

Abstract machines
Finite automaton
• Finite number of states
• Infinite tape, but starts with a finite
string, and cannot write on the tape.
• May have a separate tape for writing,
but it cannot read what it writes.
• At each step. Depending on the current
state, what is being read, it advances
on the input tape, and switches states. If
there is an output tape, it also writes on
the output tape.
S.Takahashi

Chomsky Hierarchy

Language Gereates Recognized


Regular Regular grammar Finite state
automaton
Context-free Context-free Pushdown
grammar automaton

Context- Context- Linearly bounded


sensitive sensitive automaton
grammar
Recursivamente Unrestricted Turing Machine
enumerable grammar
S.Takahashi

Compilation process phases

caracteres tokens Result


Lexer Parser

Regular Context-free grammar


Language

You might also like