You are on page 1of 29

Languages

Introduction to languages
There are two types of languages
▪ Formal Languages (Syntactic languages)
▪ concerned with form only and not meaning

▪ Informal Languages (Semantic languages)


▪ concerned with both form and meaning (human languages)

▪ I ate one apple (OK in form and meaning)


▪ I ate one sky (OK in form, not OK in meaning)
▪ Computers deal with “form” only and not Diction & poetic metaphor
Languages
Alphabets
➢A finite non-empty set of symbols (called letters), is called
an alphabet. It is denoted by Σ (Greek letter sigma).
Example:
Σ={a,b}
Σ={0,1} (important as this is the language which the
computer understands.)
Σ={i, j, k}
Decimal numbers alphabet  = {0,1,2,,9}
Binary numbers alphabet  = {0,1}
Languages
Strings
Concatenation of finite letters from the alphabet or a sequence of
symbols from some alphabet is called a string.

Example
If Σ= {a, b} then
a, abab, aaabb, ababababababababab

Empty string or null string (word with no characters)


➢The empty string is the string with zero occurrences of
symbols. This string may be chosen from any alphabet.
➢It is denoted by (Capital Greek letter Lambda) Λ.
Languages
Words
Words are strings that are permissible in the language.
Example
If Σ= {x} then a language L can be defined as
L= {xn: n=1,2,3,…..} or L={x, xx, xxx,….}
Here x, xx,… are the words of L
The power notation is used to represent multiple occurrences of a
string; e.g. a3 = aaa, a2 = aa, etc.
Note
All words are strings, but not all strings are words
Words are only those strings that are permissible by the language. e.g. only is a word
but noly is not a word.
Languages
Language: A set of valid strings over a finite alphabet.
Example:
Alphabet:  = a, b, c,, z
Strings: cat, dog, house
Language: {cat, dog, house}
Language Definitions
◼ Powers of an alphabet
▪ If  is an alphabet, we can express the set of all strings of a certain
length from that alphabet by using an exponential notation.
▪ k is defined to be the set of strings of length k, each of whose symbols is
in .
◼ For example, given the alphabet  = {0,1,2} then:
▪ 0 = {Λ}
▪ 1 = {0,1,2}
▪ 2 = {00,01,02,10,11,12,20,21,22}
▪ 3 = {000,001,002,... 222}

◼ Note that  and 1 are different. The first is the alphabet; its
members are 0,1,2. The second is the set of strings whose
members are the strings 0,1,2, each a string of length 1.
Languages
Length of Strings
The length of string s, denoted by |s|, is the number of letters/symbols
from alphabet in the string.
Example
Σ={a,b}
s = ababa
|s| = 5
Example
Σ= {a, B, b, d}
s = BaBbabBd
Tokenizing = (B)(a)(B)(b)(a)(b)(B)(d)
|s| = 8

Example: |"hello world"| = 11


Languages
Reverse of a String
The reverse of a string s denoted by Rev(s) or sr, is obtained
by writing the letters of s in reverse order.
Example
If s=abc is a string defined over Σ={a,b,c}
then Rev(s) or sr = cba
Example
Σ= {a, B, b, d}
s = BaBbabBd
Rev(s)=dBbabBaB
String Terminology
◼ Substring - A sequence of consecutive symbols from a
string. "beg" is a substring of "begin".
◼ Prefix - A substring that comes at the beginning of a string.
◼ Suffix - A substring that comes at the end of the string.
◼To concatenate strings, we will simply put them right next to
one another.
▪ Example:
▪ If x and y are strings, where x=001 and y=111 then xy = 001111
▪ For any string w, the equation Λ w = w Λ = w.
Language Definition - Problem

◼A problem is the question of deciding whether a


given string is a member of some particular
language.
▪ More colloquially, a problem is expressed as membership
in the language.
Set-Forming Notation
◼ A notation we will commonly use to define languages is by
a “set-former”:
{ w | something about w }
◼ The expression is read “the set of words w such that
(whatever is said about w to the right of the vertical bar).”
◼ For example:
▪ {w | w consists of an equal number of 0’s and 1’s }.
▪ {w | w is a binary integer that is prime }
▪ { 0n1 | n>=0 }. This includes 1, 01, 001, 0001, 00001, etc.
Languages
Defining Languages
How can we define languages..???

The languages can be defined in different ways:


1. Descriptive definition
2. Recursive definition
3. Regular Expressions (RE)
4. Finite Automaton (FA)
Languages
Descriptive definition of language
The language is defined, describing the conditions imposed on
its words.
Example
The language L of strings of odd length, defined over Σ={a},
can be written as
L={a, aaa, aaaaa,…..}
Languages…examples
Example
The language L of strings that does not start with a, defined over
Σ ={a,b,c}, can be written as
L = {Λ, b, c, ba, bb, bc, ca, cb, cc, …}
Example
The language L of strings of length 2, defined over Σ ={0,1,2}, can be
written as
L= {00, 01, 02,10, 11,12, 20, 21, 22}
Example
The language L of strings ending in 0, defined over Σ ={0,1}, can be
written as
L= {0,00,10,000,010,100, ……..}
Languages….examples
Example
The language EQUAL, of strings with number of a’s equal to number of
b’s, defined over Σ={a, b}, can be written as:
{Λ ,ab, ba, aabb, abab, baba, abba,…}
Example
The language EVEN-EVEN, of strings with even number of a’s and
even number of b’s, defined over Σ={a, b}, can be written as
{Λ, aa, bb, aaaa, aabb, abab, abba, baab, baba, bbaa, bbbb,…}
Example
The language INTEGER, of strings defined over
Σ={-,0,1,2,3,4,5,6,7,8,9}, can be written as
INTEGER = {…,-2, -1, 0, 1, 2,…}
Languages…..examples
Example
The language EVEN, of stings defined over Σ={-,0,1,2,3,4,5,6,7,8,9},
can be written as
EVEN = { …,-4, -2, 0, 2, 4,…}
Example
The language anbn, of strings defined over Σ={a, b}, as
{an bn : n=1, 2, 3,…}, can be written as
{ab, aabb, aaabbb, aaaabbbb,…}
Example
The language anbnan of strings defined over Σ={a, b}, as
{an bn an: n=1,2,3,…}, can be written as
{aba, aabbaa, aaabbbaaa, aaaabbbbaaaa,…}
Languages……examples
Example
The language factorial, of strings defined over Σ={0,1,2,3,4,5,6,7,8,9}
i.e.{1, 2, 6, 24, 120,…}
Example
The language FACTORIAL, of strings defined over Σ={a}, as
{an! : n=1,2,3,…}, can be written as
{a, aa, aaaaaa,…}. It is to be noted that the language FACTORIAL can
be defined over any single letter alphabet.
Example
The language DOUBLEFACTORIAL, of strings defined over Σ={a,
b}, as {an!bn! : n=1,2,3,…}, can be written as
{ab, aabb, aaaaaabbbbbb,…}
Languages…….examples
Example
The language SQUARE, of strings defined over Σ={a}, as
{an 2 : n=1,2,3,…}, can be written as
{a, aaaa, aaaaaaaaa,…}
Example
The language DOUBLESQUARE, of strings defined over Σ={a,b}, as
{an 2 bn 2 : n=1,2,3,…}, can be written as
{ab, aaaabbbb, aaaaaaaaabbbbbbbbb,…}
Example
The language PRIME, of strings defined over Σ={a}, as
{ap : p is prime}, can be written as
{aa, aaa, aaaaa, aaaaaaa, aaaaaaaaaaa…}
Languages
PALINDROME
The language consisting of Λ and the strings s defined over Σ such that
Rev(s) = s.
It is to be denoted that the words of PALINDROME are called
palindromes.
Example
For Σ={a, b},
PALINDROME={Λ , a, b, aa, bb, aaa, aba, bab, bbb, ...}

◼ never odd or even


◼ radar
Languages
Note
Number of strings of length ‘m’ defined over alphabet of ‘n’
letters is nm.
Examples
The language of strings of length 2, defined over Σ={a,b} is
L={aa, ab, ba, bb} i.e. number of strings = 22
➢The language of strings of length 3, defined over Σ={a,b} is
L={aaa, aab, aba, baa, abb, bab, bba, bbb} i.e. number of
strings = 23
Kleen Closure or Kleen Star
➢ Given Σ, then the Kleene Star / Kleen Closure of the
alphabet Σ, denoted by Σ*, is the collection of all strings
defined over Σ, including Λ.
➢ If Σ = {a, b}, then Σ* is Λ and any combination of any
order/size from the alphabet.
➢ Σ* = {Λ, a, b, aa, ab, ba, bb, aaa, aab, …………}
➢ Kleene Star Closure can be defined over any set of strings.
◼In Lexicographic order : Notice that we listed the words in a
language in size order (i.e., words of shortest length first),
and then listed all the words of the same length
alphabetically.
Kleene Closure
◼ Example: Let Σ = { a, ab }. Then
◼ Σ * = { Λ plus any word composed of factors of a and ab },
◼ Or
◼ Σ * = { Λ plus all strings of a’s and b’s except those that start with
b and those that contain a double b },
◼ Or
◼ Σ * = { Λ, a, aa, ab, aaa, aab, aba, aaaa, aaab, abaa, abab,
aaaaa, aaaab, aaaba, aabaa, aabab, abaaa, abaab, ababa, … }
◼ Note that for each word in S*, every b must have an a immediately
to its left, so the double b, that is bb, is not possible; neither any
string starting with b.

Theory of Automata 22
How to prove a certain word is
in the closure language S*
◼ We must show how it can be written as a concatenation of
words from the base set Σ.

◼ In the previous example, to show that abaab is in b*, we can


factor it as follows:
abaab = (ab)(a)(ab)
◼ These three factors are all in the set Σ, therefore their
concatenation is in Σ *.
◼ Note that the parentheses, ( ), are used for the sole purpose of
demarcating the ends of factors.

Theory of Automata 23
Examples
Example:
If Σ = {x}
Then Σ* = {Λ, x, xx, xxx, xxxx, ….}
Example:
If Σ = {0,1}
Then Σ* = {Λ, 0, 1, 00, 01, 10, 11, ….}
Example:
If Σ = {a, c}
Then Σ* = {Λ, a, c, aa, ac, ca, cc, aac, aca, caa, ….}
Note

➢ Languages generated by Kleene Star Closure of set of


strings, are infinite languages.
➢ By infinite language, it is supposed that the language
contains infinite many words, each of finite length.
Positive Closure
PLUS Operation (+)
➢ Plus Operation is same as Kleene Star Closure except that
it does not generate Λ (null string), automatically.
Example
If Σ = {0,1}
Then Σ+ = {0, 1, 00, 01, 10, 11, ….}
Example
If Σ = {aab, c}
Then Σ+ = {aab, c, aabaab, aabc, caab, cc, ….}
Remark
➢ Note that Kleene Star can also be operated on any string
i.e. a* can be considered to be all possible strings defined
over {a}, which shows that a* generates Λ, a, aa, aaa, …
➢ Also note that a+ can be considered to be all possible non
empty strings defined over {a}, which shows that a+
generates a, aa, aaa, aaaa, …

➢ When would Σ* and Σ+ be the same..?


Remember that:
Sets  = { }  {}

Set size {} =  = 0

Set size {} = 1

String length  =0
◼ Word that has no letters () and language that
has no words . ᶲ
◼ Is  a part of ᶲ ?

You might also like