0% found this document useful (0 votes)
770 views10 pages

BNF and EBNF in Automata Theory

The document discusses BNF (Backus-Naur Form) and EBNF (Extended Backus-Naur Form), which are notational ways to represent grammars. BNF provides a standard format using angle brackets and vertical bars. EBNF extends BNF with additional symbols like Kleene star and cross to make grammars more concise. Several examples are given to illustrate BNF and EBNF representations of grammars for numbers, expressions, and identifiers. Regular expressions are also introduced as a compact way to represent languages using operations like concatenation and repetition.

Uploaded by

ALi FarHan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
770 views10 pages

BNF and EBNF in Automata Theory

The document discusses BNF (Backus-Naur Form) and EBNF (Extended Backus-Naur Form), which are notational ways to represent grammars. BNF provides a standard format using angle brackets and vertical bars. EBNF extends BNF with additional symbols like Kleene star and cross to make grammars more concise. Several examples are given to illustrate BNF and EBNF representations of grammars for numbers, expressions, and identifiers. Regular expressions are also introduced as a compact way to represent languages using operations like concatenation and repetition.

Uploaded by

ALi FarHan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Theory of Automata Lecture Notes: 2.

BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen

2. Backus Naur Form (BNF) & Extended Backus Naur Form (BNF)
BNF is a new notational way to represent grammars. Before BNF, programming languages were specified by ambiguously, i.e., with simple English language or by some production rules in different ways. It is already shown in the first three grammar examples in the lecture notes of 1. Introduction & Theory of Formal Languages. The development of the standard format of BNF was a joint effort from John Backus (who invented FORTRAN FORMula TRANslator, the first high-level language in 1954) and a Danish Computer Scientist Peter Naur (1968). BNF provides a standard format for grammar representation. It uses an angle bracket notation < > for non-terminals, and terminals are without angle brackets. The vertical bar symbol | remains the same, (Cohen pp-241). For example the grammar for a number may be given as: Example-1: <number> <digit> where ::= means is defined as | means or ::= ::= <digit> | <number> <digit> 0|1|2|3|4|5|6|7|8|9

< > means a non-terminal (number, digit are non-terminal). Non-terminal means the term which have further definition, so non-terminal appears on the left hand side of a production. Symbols without angle brackets are terminals (0 to 9 are terminals). Terminals dont have further definitions. They do not appear on the left hand side of any production. Before BNF, non-terminals were represented by capital letters while terminals with lower case letter. Example -2 : Some common BNF examples are: <while loop> ::= while ( <condition> ) <statement> <assignment statement> ::= <variable> = <expression> <statement list> ::= <statement> | <statement list> <statement> <unsigned integer> ::= <digit> | <unsigned integer><digit>

In the above BNF examples, the only terminal is while and rests of all are nonterminals.

Inst: Dr. Mohammed Yousuf Khan

1 of 10

Theory of Automata Lecture Notes: 2. BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen BNF for Mathematical Expressions Example-3: <expression> <term> <factor> <primary> <element> <variable> <number> ::= ::= ::= ::= ::= ::= ::= <expression> + <term> | <expression> - <term> | <term> <term> * <factor> | <term> / <factor> | <factor> <primary> ^ <factor> | <primary> <primary> | <element> ( <expression> ) | <variable> | <number> a | b | c |.| z 0 | 1 | 2 |.| 9

What is EBNF? EBNF stands for Extended Backus-Naur Form. After the appearance of BNF, lots of people added their own extensions to optimize the grammar representation. The mathematician S. C. Kleene introduces an extended BNF as EBNF to optimize grammar representations. These extensions are also known as Kleene star * or Kleene cross +. EBNF is an extension to BNF which make expressing grammars more convenient; thus it is more concise than BNF. EBNF is no more powerful than BNF; that is, anything that can be expressed in EBNF can also be expressed in BNF. EBNF is widely used as the default standard to define programming languages. The symbols used in EBNF are: * (The Kleene Star): The closure over *, means 0 or more occurrences + (The Kleene Cross): The closure over +, means 1 or more occurrences ?: means 0 or 1 occurrences Use of parentheses for grouping The notations used in BNF or in EBNF are also known as Meta symbols of BNF. Some examples of BNF optimization using EBNF are listed below: Example -4: The grammar for signed or un-signed number is given as: BNF EBNF <expr> ::= - <num> | <num> <expr> := - ? <digit>+ (. <digit>+)? <num> ::= <digits> | <digits> . <digits> <digit> := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 <digits> ::= <digit> | <digit> <digits> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 EBNF makes the grammar much more concise!, An optional -, one or more digits, an optional decimal point followed by one or more digits. This grammar generates the numbers like: 1, 1.4, 33, -6.735 etc

Inst: Dr. Mohammed Yousuf Khan

2 of 10

Theory of Automata Lecture Notes: 2. BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen Example 5: Some more examples are: BNF <expr> ::= <digits> <digits> ::= <digit> | <digit> <digits> <expr> ::= <digits> | empty <expr> ::= <digit>* <id> ::= <letter> | <id><letter> | <id><digit> <id> ::= <letter> (<letter> | <digit>)* Some objective examples: If a language L does not contain the word , if we want to add to L, then by using the union of sets, denoted by + (or U), will form a new language as L + {}. This new language L + {} is not same as the language L. A language with no words is represented by , so the statement: is a word in the language is False. Since language has no words at all. This means that L + is same as L, because no new words have been added. The above two examples may be summarized as: If the language L does not contain and a language with no words is represented by then: (i) (ii) L + {} is same as language L. (T/F) False L + is same as language L. (T/F) True EBNF <expr> ::= <digit>+

Regular expression or Regular language: Regular expression is a compact way of representing a language. A re describes a finite or infinite set of strings. Such set of strings are described by the application of three types of operations on sets: concatenation, repetitions and Boolean functions. The symbols that appear in r.e are letter of alphabets , null string , parenthesis ( ), the star * and the plus sign +. Defining languages with concatenation, alteration and repetitions: Concatenation: Concatenation means forming a string by writing the first followed by the second. Concatenation of two words means two words written down side by side, producing a new word. There should be no space between them: Concatenating and gives Concatenation of with Concatenation of with Inst: Dr. Mohammed Yousuf Khan 3 of 10

Theory of Automata Lecture Notes: 2. BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen If u = xx and v = xxx then uv = xxxxx If u = abb and v = aa then uv = abbaa Length (uv) = length (u) +length (v) i denotes the concatenation of i copies of the string . If = 01 then 0 denotes = 1 denotes = 01 2 denotes = 0101 3 denotes = 010101 Using the language defining rules: To test a string of alphabet letters to see if it is a valid word. For, how to construct all the words in the language by some clear procedure. Example-6: Consider an example: = {x} is the alphabet having only one letter say x. We can define a language L1, in this language, any nonempty string of alphabet characters is a word, as: L1 = { x xx xxx xxxx ..}; i. e. L1 = { xn for n = 1 2 3 .}; As per our definition, L1 does not include the null string. Consider a language L2 that does not include the null string either. L2 = { x xxx xxxxx xxxxxxx } = { xodd} = {x2n+1 for n = 0 1 2 3 } If a = xxx and b = xxxxx then both words are in L2 but their concatenation i. e. ab which is xxxxxxxx is not in L2. Remember xn concatenated with xm xn+m. Both languages, L1 or L2 does not include the null string (the string of no letters). Now consider L3 which includes . L3 = { x xx xxx xxxx ..} = { xn for n = 0 1 2 ..} Here xn means the strings nxs. So x0 means zero x or no x or string of no letter. (It is not like the basic algebra of the kind of x0 = 1, it is x0 = .) Note: If a = xxx then length(a) =3. If any language includes the empty string then length() = 0 Repetition: Repetition (zero or more), i.e closure over *. For a given alphabet , * defines a language in which any string of letters from is a word including the null string as a word. Examples: (Cohen pp-14) If = {x} * = { x xx xxx .} Inst: Dr. Mohammed Yousuf Khan 4 of 10

Theory of Automata Lecture Notes: 2. BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen If = {0 1} * = { 0 1 00 01 10 11 000 001 .} If = {a b c} * = { a b c aa ab ac ba bb bc ca cb cc aaa .} If S is a set of words, then S* means: the set of all finite strings formed by concatenating words from S, where any word may be used as often as we like, and where the null string is also included. Example-7: (Cohen pp-15) S = {aa b} S*= { plus any word composed of factors of aa and b}; (rule) { plus all strings of as and bs in which the as occurs in even clumps} ; (rule) { b aa bb aab baa bbb aaaa aabb baab bbaa bbbb aaaab aabaa aabbb baaaa baabb bbaab bbbaa bbbbb ..} Here the string aabaaab is not in S*, since it has a clump of as of length 3. Example-8: (Cohen pp-15) S = {a ab} S* = { plus any word composed of factor of a and ab}; (rule) { plus all strings of as and bs except those that start with b and those that contains a double b} ; (rule) { a aa ab aaa aab aba aaaa aaab aaba abaa abab aaaaa aaaab aaaba aabaa aabab abaaa abaab ababa ..} For each word in S* every b must have an a immediately to its left, the sub string bb is impossible; (an another rule). Now check the string abaab is in S* or not ?. It should be written as (ab)(a)(ab), all those three factors are in the set S. The factors in this above example are unique but in some cases factor are not unique. For example: S = {xx xxx} S* = { and all strings of more than one x} = {xn for n = 0 2 3 4 5 } = { xx xxx xxxx xxxxx xxxxxx } The word x is not in the language S*, but xxxxxxx is in this closure. Three different factors are possible here: (xx)(xx)(xxx) or (xx)(xxx)(xx) or (xxx)(xx)(xx). If the alphabet has no letters, then its closure (*) is the language with the null string as its only word, because is always a word in a closure. If = 0 (the empty set) then * = {}, and if S = {} then S* = {} also, (Cohen pp-16) For any word w in any language, if length(w) = 0, then w = (T/F) .true

Inst: Dr. Mohammed Yousuf Khan

5 of 10

Theory of Automata Lecture Notes: 2. BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen Let the alphabet be = {a b} then: The r.e a | b denotes the set {a b} (a | b)(a | b) denotes { aa ab ba bb} that is the set of all strings of as and bs of length 2. It can also be written as aa | ab | ba | bb. The r.e a* denotes the set of all strings of zero or more as, that is { a aa aaa ..} The r.e (a | b)* denotes the set of all strings containing zero or more instances of a or b, that is: { | a | b | aa | bb| ab | ba | aaa | } The r.e a+ denotes the set of all strings of one or more digits.

Example-10: Some examples on closure *: ( a b* ) All words of the form one a followed by some numbers of bs (may be no bs at all) { a ab abb abbb ..} (ab)* or ab or abab or ababab . (ab*a) { aa aba abba abbbba } a* b* { a b aa ab bb aaa aab ..} If = {x}; L ={x, xx, xxx, xxxx, ..}; means L = {xn | n = 1, 2, 3, ..}. * = {, x, xx, xxx, .}; means L = {xn | n = 0, 1, 2, 3, ..}. Alteration: The notation | denotes either or alteration. (also represented by + for choice) Example-11: M|N is interpreted as M or N. (also represented as (M + N). ab | cde is interpreted as (ab) | (cde); may be represented as: ab or cde a | bc* is interpreted as (a) |(b (c)* ) (a | b)(a | b) denotes { aa ab ba bb} or aa | ab | ba | bb; i.e. set of all strings of as and bs of length(2). (a | b)* denotes the set of all strings containing zero or more instances of an a or b. ab*(c | de)* denotes a single a followed by zero or more bs followed by zero or more of either c or de, i.e. a ab ac abb abc acc ade abbb abbc abcc abde accc Example-12: Some more solved examples: The language L in which the first letter of each word is either a or b is given as: L=language(a + b) The language of all words that have at least two as can be written as: b*ab*a(a+b)*. The set of all possible strings of letters from the alphabet = {a b} including the null string is represented as: (a+b)*. The language contains all the strings of as and bs in which all the as (if any) come before all the bs is language(a* b*) = { a b aa ab bb aaa abb } 6 of 10

Inst: Dr. Mohammed Yousuf Khan

Theory of Automata Lecture Notes: 2. BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen a*b* (ab)* ; here the right may contain abab but left does not contain any word abab. a*b* = { a b aa ab bb aaa abb bbb } (ab)* = { ab abab ababab }

If L = , then L* = {} The Kleene closure L*, of a language L, always produces an infinite language unless L is empty or L={}. Kleene closure S* of a language S: Rule 1: is in S*. All words in S are in S*. Rule 2: If x and y are in S*, their concatenation xy is also in S*. If is a member of L, L* = L+. Otherwise L* = L+ - . Example-13: If = {x} + = { x xx xxx .} i.e. concatenation of some non-zero strings, which is the language L1. Some more examples are: S = {xx xxx }; S+ = { xx xxx ..}does not contain . S* = { xx xxx ..} contain . If S contains the word initially, then is in S+.

Example-14: (Cohen pp-17) S = { w1 w2 w3} S+ = { w1 w2 w3 w1w1 w1w2 w1w3 w2w1 w2w2 w2w3..} If w1 = aa w2 = bbb w3 = , then Note the rule: x = x = x S+ = { aa bbb aaaa aabbb aa ..} If S is a set of strings not including , then S* will include but S+ will not include . If S is language including , then S+ and S* both contains and so S+ = S*. The notations * and + are related as L+ L* ; L* = { } U L+ by definition. Prove that: L+ L* - { } is True, since is in L+ if is in L. Proof: If L = { x} L+ = { x xx xxx } ; L* = { x xx xxx } L includes initially, so L+ and L* both includes , i.e. L+= L* Subtracting from L*, gives: L* - {} = { x xx xxx } {} = {x xx xxx xxxx} which is not equal to L+ So L+ L* - { L+= L*, }; proved. Try: L+ = L* - { } is True, iff is not in L. Inst: Dr. Mohammed Yousuf Khan 7 of 10

Theory of Automata Lecture Notes: 2. BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen Try some more: By default L = {x}; L does not include initially, (L+ L*); for all L. L+ = {x xx xxx } ; L* = { x xx xxx }

L+ = L L* = L* L for all L Proof: L L* = {x} { x xx xxx } {x xx xxx xxxx} {x xx xxx xxxx} = L+ * L L = { x xx xxx } {x} {x xx xxx xxxx} {x xx xxx xxxx} = L+ * * + So, L L = L L = L ; proved. L L+ = L+ L Proof: L L+ = {x} {x xx xxx } { xx xxx xxxx} L+L = {x xx xxx } {x} { xx xxx xxxx} So, L L+ = L+ L; proved. For { } { }+ = { } and { }* = { } Proof: If L contains only then L = { } L+ = { }+ = {} L* = { }* = {} This shows that if L contains , so does the L* and L+. (L+ )+ = L+ ; (L+)* = L* Proof: L+ = {x xx xxx } (L+ )+ = {x xx xxx }+ = {x xx xxx }

= L+

L+ = {x xx xxx } (L+ )* = {x xx xxx }* = { x xx xxx } = L* (L+ )+ = L+ ; (L+)* = L* : proved. (L* )+ = L* ; Proof: L* = (L* )+ = (L* )* = (L*)* = L* { x xx xxx } { x xx xxx }+ = { x xx xxx } = L* { x xx xxx }* = { x xx xxx } = L*

(L* )+ = L* ; (L*)* = L* : proved. Inst: Dr. Mohammed Yousuf Khan 8 of 10

Theory of Automata Lecture Notes: 2. BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen

The notation * and + can also be defined as: L* = Li ; L* consist of all words formed by concatenating a finite number of, possible
i =0

zero, words from L. L+ = Li ; L+ consist of all words formed by concatenating a finite number, never zero,
i =1

of words from L. Q. If L D* and D is defined as all those words which do not contain any of the digits: 2,3,4, 9 then L contains , 0, 1, 01, 000, .. Sol: D = {0 1} D* = { 0 1 00 11 01 10 000 } So, the sub-set of D* which is L may contain , 0, 1, 00, 11, 01, 10, 000, Proved. In terms of precedence: concatenation has higher precedence than alteration i.e ab|cde is interpreted as (ab)|(cde). Closure* has higher precedence than either concatenation or alteration i.e a|bc* is interpreted as (a)|(b(c)*), if a and b are two string than concatenation of a and b is written as ab. If L = , then L* = {} The Kleene closure L*, of a language L, always produces an infinite language unless L is empty or L={}. Kleene closure S* of a language S: Rule 1: is in S*. All words in S are in S*. Rule 2: If x and y are in S*, their concatenation xy is also in S*. If is a member of L, L* = L+. Otherwise L* = L+ - . Exercises: A few grammar exercises to try on your own (The alphabet in each case is {a, b}.) a) Define a grammar for the language of strings with one or more a's followed by zero or more b's. b) Define a grammar for even-length palindromes. c) Define a grammar for strings where the number of a's is equal to the number b's. d) Define a grammar where the number of a's is not equal to the number b's. (Hint: Think about it as two separate cases...)

Inst: Dr. Mohammed Yousuf Khan

9 of 10

Theory of Automata Lecture Notes: 2. BNF & EBNF Text Book: Introduction to Computer Theory by Daniel I A Cohen Example-15: example of r.e: 1. The regular expression (r.e) given below: represents + or - or empty string () followed by a single digit(d), followed by any digit (including zero occurrence) of digits(d*). That is: +2, -4, 4, 6547 2. Consider the regular expression: E represents exponential. This r.e represents floating number. It may be clearer by dividing the above r.e in 3-parts. Part-1: (+ | - | ) represents optional sign. Part-2: (dd*. d* | d* .d d* ) for mantissa, must contain a decimal point and at least one digit ahead or behind the decimal point. Part-3: ( | ( E ( + | - | )dd*)) represents optional exponential part. E used for exponential, followed by an optional sign and at least on digit. Try for 4.5E-2. Try to modify the part-3 for: 4.57E-2.43, may be: ( | ( E ( + | - | ) d*(.?)dd*)) Remember that parenthesis may be used freely in a r.e to keep the relative ordering of the operations clear. Example-16: Some solved examples: Question1: A set of identifiers is the set of strings of letters and digits beginning with a letter. Give the regular definition of the set?. Ans: letter A | B | C | .. | Z | a | b | c | . | z digit 0 | 1 | 2 | . | 9 id letter (letter | digit)* Question2: Give the regular definition for the following signed or un signed numbers: 5280, -3.2, 39.37, 6.336E4, 1.894E-4 Ans: num sign digits optional-fraction optional-exponent digits digit+ optional-fraction ( . digits) ? optional-exponent ( E ( + | - ) ? digits )? digit 0 | 1 | |9 sign + | - | Try to write the above grammar in a proper EBNF format !!!!! Inst: Dr. Mohammed Yousuf Khan 10 of 10

(+ | - | ) dd*

(+ | - | ) (dd*. d* | d* .d d* ) ( | ( E ( + | - | )dd*))

You might also like