You are on page 1of 17

From Wikipedia, the free encyclopedia

In computer science, in the area of formal language theory, frequent use is made of a variety
of string functions; however, the notation used is different from that used oncomputer
programming, and some commonly used functions in the theoretical realm are rarely used when
programming. This article defines some of these basic terms.
Contents
[hide]

1 Strings and languages

2 Alphabet of a string

3 String substitution

4 String homomorphism

5 String projection

6 Right quotient

7 Syntactic relation

8 Right cancellation

9 Prefixes

10 See also

11 Notes

12 References

Strings and languages[edit]


A string is a finite sequence of characters. The empty string is denoted by . The concatenation
of two string

and

is denoted by

, or shorter by

makes no difference:

. Concatenating with the empty string

. Concatenation of strings is

associative:

For example,

A language is a finite or infinite set of strings. Besides the usual set operations like union,
intersection etc., concatenation can be applied to languages: if both
their concatenation
string from

and

are languages,

is defined as the set of concatenations of any string from

, formally

and any

. Again, the concatenation dot is

often omitted for shortness.


The language
language

consisting of just the empty string is to be distinguished from the empty

. Concatenating any language with the former doesn't make any

change:

, while concatenating with the latter always yields the empty

language:

. Concatenation of languages is

associative:

For example, abbreviating

, the

set of all three-digit decimal numbers is obtained as

. The set of all decimal numbers

of arbitrary length is an example for an infinite language.

Alphabet of a string[edit]
The alphabet of a string is the set of all of the characters that occur in a particular string. If s is
a string, its alphabet is denoted by

The alphabet of a language

is the set of all characters that occur in any string of

formally:

For example, the set

is the alphabet of the string

the above

is the alphabet of the above language

, and

as well as of the language

of all decimal numbers.

String substitution[edit]
Let L be a language, and let be its alphabet. A string substitution or simply
a substitution is a mapping f that maps letters in to languages (possibly in a different
alphabet). Thus, for example, given a letter a , one has f(a)=La where La * is some
language whose alphabet is . This mapping may be extended to strings as
f()=
for the empty string , and
f(sa)=f(s)f(a)
for string s L. String substitutions may be extended to entire languages as

[1]

Regular languages are closed under string substitution. That is, if each letter of
a regular language is substituted by another regular language, the result is still a
regular language.[2] Similarly, context-free languages are closed under string
substitution.[3][note 1]
A simple example is the conversion fuc(.) to upper case, which may be defined
e.g. as follows:
lette
r

mapped to
language

fuc(x)

{ A }

remark

map lower-case char to corresponding upper-case

char

{ A }

map upper-case char to itself

{ SS }

no upper-case char available, map to two-char string

{}

map digit to empty string

{}

forbid punctuation, map to empty language

...

similar for other chars

For the extension of fuc to strings, we have e.g.

fuc(Strae) = {S} {T} {R} {A} {SS} {E} = {STRASSE},

fuc(u2) = {U} {} = {U}, and

fuc(Go!) = {G} {O} {} = {}.

For the extension of fuc to languages, we have e.g.

fuc({ Strae, u2, Go! }) = { STRASSE } { U } { } = { STRASSE,


U }.

Another example is the conversion of an EBCDIC-encoded string to ASCII.

String homomorphism[edit]
A string homomorphism (often referred to simply as
a homomorphism in formal language theory) is a string substitution such that
each letter is replaced by a single string. That is, f(a)=s, where s is a string, for
each letter a.[note 2][4]
String homomorphisms are monoid morphisms on the free monoid, preserving
the binary operation of string concatenation. Given a language L, the set f(L) is
called thehomomorphic image of L. The inverse homomorphic image of a
string s is defined as
f1(s) = { w | f(w)=s }
while the inverse homomorphic image of a language L is defined as
f1(L) = { s | f(s) L }
In general, f(f1(L)) L, while one does have
f(f1(L)) L
and

L f1(f(L))
for any language L.
The class of regular languages is closed under
homomorphisms and inverse homomorphisms.[5] Similarly, the
context-free languages are closed under homomorphisms[note
3]

and inverse homomorphisms.[6]

A string homomorphism is said to be -free (or e-free) if f(a)


for all a in the alphabet . Simple single-letter substitution
ciphers are examples of (-free) string homomorphisms.
An example string homomorphism guc can also be obtained by
defining similar to the above substitution: guc(a) =
A, ..., guc(0) = , but letting guc undefined on punctuation
chars. Examples for inverse homomorphic images are

guc1({ SSS }) = { sss, s, s }, since guc(sss)


= guc(s) = guc(s) = SSS, and

guc1({ A, bb }) = { a }, since guc(a) = A, while bb


cannot be reached by guc.

For the latter language, guc(guc1({ A, bb })) = guc({ a }) =


{ A } { A, bb }. The homomorphism guc is not -free, since
it maps e.g. 0 to .

String projection[edit]
If s is a string, and

is an alphabet, the string

projection of s is the string that results by removing all letters


which are not in

. It is written as

. It is formally defined

by removal of letters from the right hand side:

Here

denotes the empty string. The projection of a string

is essentially the same as a projection in relational algebra.


String projection may be promoted to the projection of a
language. Given a formal language L, its projection is given
by

Right quotient[edit]
The right quotient of a letter a from a string s is the
truncation of the letter a in the string s, from the right

hand side. It is denoted as

. If the string does not

have a on the right hand side, the result is the empty


string. Thus:

The quotient of the empty string may be taken:

Similarly, given a subset


monoid

of a

, one may define the quotient

subset as

Left quotients may be defined similarly,


with operations taking place on the left of a
string.

Syntactic relation[edit]
The right quotient of a subset
a monoid

of

defines an equivalence

relation, called the right syntactic


relation of S. It is given by

The relation is clearly of finite index


(has a finite number of equivalence
classes) if and only if the family right
quotients is finite; that is, if

is finite. In this case, S is


a recognizable language, that is, a
language that can be recognized
by a finite state automaton. This is
discussed in greater detail in the
article onsyntactic monoids.

Right cancellation[edit]
The right cancellation of a
letter a from a string s is the
removal of the first occurrence of
the letter a in the string s, starting
from the right hand side. It is

denoted as

and is

recursively defined as

The empty string is always


cancellable:

Clearly, right cancellation


and projection commute:

Prefixes[edit]
The prefixes of a
string is the set of
all prefixes to a string,
with respect to a given
language:

here

The prefix
closure of a
language is

Example:
A language is
called prefix
closed if
.
The prefix
closure
operator
is idempotent:

The prefi
x
relation i

s a binary
relation
such
that
if
and only
if
. This
relation is
a
particular
example
of a prefix
order
From Wikipedia, the free encyclopedia

In computer science, in the area of formal language theory, frequent use is made of a variety
of string functions; however, the notation used is different from that used oncomputer
programming, and some commonly used functions in the theoretical realm are rarely used when
programming. This article defines some of these basic terms.
Contents
[hide]

1 Strings and languages

2 Alphabet of a string

3 String substitution

4 String homomorphism

5 String projection

6 Right quotient

7 Syntactic relation

8 Right cancellation

9 Prefixes

10 See also

11 Notes

12 References

Strings and languages[edit]


A string is a finite sequence of characters. The empty string is denoted by . The concatenation
of two string

and

is denoted by

, or shorter by

. Concatenating with the empty string

makes no difference:

. Concatenation of strings is

associative:

For example,

A language is a finite or infinite set of strings. Besides the usual set operations like union,
intersection etc., concatenation can be applied to languages: if both
their concatenation
string from

and

are languages,

is defined as the set of concatenations of any string from

, formally

and any

. Again, the concatenation dot is

often omitted for shortness.


The language
language

consisting of just the empty string is to be distinguished from the empty

. Concatenating any language with the former doesn't make any

change:

, while concatenating with the latter always yields the empty

language:

. Concatenation of languages is

associative:

For example, abbreviating

, the

set of all three-digit decimal numbers is obtained as

. The set of all decimal numbers

of arbitrary length is an example for an infinite language.

Alphabet of a string[edit]
The alphabet of a string is the set of all of the characters that occur in a particular string. If s is
a string, its alphabet is denoted by

The alphabet of a language

is the set of all characters that occur in any string of

formally:

For example, the set

is the alphabet of the string

the above

is the alphabet of the above language

, and

as well as of the language

of all decimal numbers.

String substitution[edit]
Let L be a language, and let be its alphabet. A string substitution or simply
a substitution is a mapping f that maps letters in to languages (possibly in a different
alphabet). Thus, for example, given a letter a , one has f(a)=La where La * is some
language whose alphabet is . This mapping may be extended to strings as
f()=
for the empty string , and
f(sa)=f(s)f(a)
for string s L. String substitutions may be extended to entire languages as

[1]

Regular languages are closed under string substitution. That is, if each letter of
a regular language is substituted by another regular language, the result is still a
regular language.[2] Similarly, context-free languages are closed under string
substitution.[3][note 1]
A simple example is the conversion fuc(.) to upper case, which may be defined
e.g. as follows:
lette
r

mapped to
language

fuc(x)

remark

{ A }

map lower-case char to corresponding upper-case


char

{ A }

map upper-case char to itself

{ SS }

no upper-case char available, map to two-char string

{}

map digit to empty string

{}

forbid punctuation, map to empty language

...

similar for other chars

For the extension of fuc to strings, we have e.g.

fuc(Strae) = {S} {T} {R} {A} {SS} {E} = {STRASSE},

fuc(u2) = {U} {} = {U}, and

fuc(Go!) = {G} {O} {} = {}.

For the extension of fuc to languages, we have e.g.

fuc({ Strae, u2, Go! }) = { STRASSE } { U } { } = { STRASSE,


U }.

Another example is the conversion of an EBCDIC-encoded string to ASCII.

String homomorphism[edit]
A string homomorphism (often referred to simply as
a homomorphism in formal language theory) is a string substitution such that

each letter is replaced by a single string. That is, f(a)=s, where s is a string, for
each letter a.[note 2][4]
String homomorphisms are monoid morphisms on the free monoid, preserving
the binary operation of string concatenation. Given a language L, the set f(L) is
called thehomomorphic image of L. The inverse homomorphic image of a
string s is defined as
f1(s) = { w | f(w)=s }
while the inverse homomorphic image of a language L is defined as
f1(L) = { s | f(s) L }
In general, f(f1(L)) L, while one does have
f(f1(L)) L
and
L f1(f(L))
for any language L.
The class of regular languages is closed under
homomorphisms and inverse homomorphisms.[5] Similarly, the
context-free languages are closed under homomorphisms[note
3]

and inverse homomorphisms.[6]

A string homomorphism is said to be -free (or e-free) if f(a)


for all a in the alphabet . Simple single-letter substitution
ciphers are examples of (-free) string homomorphisms.
An example string homomorphism guc can also be obtained by
defining similar to the above substitution: guc(a) =
A, ..., guc(0) = , but letting guc undefined on punctuation
chars. Examples for inverse homomorphic images are

guc1({ SSS }) = { sss, s, s }, since guc(sss)


= guc(s) = guc(s) = SSS, and

guc1({ A, bb }) = { a }, since guc(a) = A, while bb


cannot be reached by guc.

For the latter language, guc(guc1({ A, bb })) = guc({ a }) =


{ A } { A, bb }. The homomorphism guc is not -free, since
it maps e.g. 0 to .

String projection[edit]
If s is a string, and

is an alphabet, the string

projection of s is the string that results by removing all letters


which are not in

. It is written as

. It is formally defined

by removal of letters from the right hand side:

Here

denotes the empty string. The projection of a string

is essentially the same as a projection in relational algebra.


String projection may be promoted to the projection of a
language. Given a formal language L, its projection is given
by

Right quotient[edit]
The right quotient of a letter a from a string s is the
truncation of the letter a in the string s, from the right
hand side. It is denoted as

. If the string does not

have a on the right hand side, the result is the empty


string. Thus:

The quotient of the empty string may be taken:

Similarly, given a subset


monoid

of a

, one may define the quotient

subset as

Left quotients may be defined similarly,


with operations taking place on the left of a
string.

Syntactic relation[edit]
The right quotient of a subset
a monoid

of

defines an equivalence

relation, called the right syntactic


relation of S. It is given by

The relation is clearly of finite index


(has a finite number of equivalence
classes) if and only if the family right
quotients is finite; that is, if

is finite. In this case, S is


a recognizable language, that is, a
language that can be recognized
by a finite state automaton. This is
discussed in greater detail in the
article onsyntactic monoids.

Right cancellation[edit]
The right cancellation of a
letter a from a string s is the
removal of the first occurrence of
the letter a in the string s, starting
from the right hand side. It is
denoted as

and is

recursively defined as

The empty string is always


cancellable:

Clearly, right cancellation


and projection commute:

Prefixes[edit]
The prefixes of a
string is the set of
all prefixes to a string,
with respect to a given
language:

here

The prefix
closure of a
language is

Example:

A language is
called prefix
closed if
.
The prefix
closure
operator
is idempotent:

The prefi
x
relation i
s a binary
relation
such
that
if
and only
if
. This
relation is
a
particular
example
of a prefix
order

CmSc 365 Theory of Computation

Alphabets and Languages

Definitions

Operations on strings

Languages

Operations on languages

Problems

Learning goals
Exam-like problems

1. Some definiions and properties


Alphabet: A finite set of symbols.
E.G {a,b,c,x.y.z}. {0,1},{0,1,2,3,4,5,6,7,8,9}
String over an alphabet: A finite sequence of symbols from the alphabet
E.G.: thisisastring - over {a,b,c,,z}
01011 - over {0,1}
3786 - over {0,1,2,3,4,5,6,7,8,9}
A string with one symbol only = symbol itself
Empty string - no symbols, notation: e
Note: we use the letters a,b,c,, w,x,y,z both for naming strings and for writing instances of strings.
Usually for names of strings we use the last letters: w, x,y,z
Thus x = abc means that abc is a string and we call it x.
Length of a string - its length as a sequence (the number of symbols)
if w = abcd, |w| = 4
If w = classroom, |w| = 9
We can match a position in a string with the symbol there:
If w = classroom, w(3) = a, w(4) = s, and w(5) = s
To be able to distinguish between same symbols, we refer to them as different occurrences of the
same symbol.

2. Operations on strings
Concatenation: combines two strings by putting them one after the other.
E.G x = abc, y = mnop, then x y = abcmnop, or simply xy = abcmnop
The concatenation of the empty string with any other string gives the string itself:
x e = ex = x
Substring: If w is a string, then v is a substring of w if there exist strings x and y such that w = xvy

x is called prefix, and y is called suffix of w


The i-th concatenation of a string with itself is defined in the following way:
w0 = e
w i+1 = w i w for each i 0.
So w1 = w, bang 2 = bangbang
Kleene star operation on strings: Let w be a string. w* is the set of strings obtained
by applying any number of concatenations of w with itself, including the empty string.
Example: a* = { e, a, aa, aaa, aaaa, aaaaa, }
Reversal of a string w denoted w R is the string spelled backwards
Formal definition:
If w is a string of length 0, then w R= w = e
If w is a string of length n+1 > 0, then w = ua for some a , and w R= a u R.

3. Languages
If is an alphabet, then * is the set of all strings over .
Language: any set of strings over an alphabet , i.e. any subset of *.
* is a countably infinite set. Its elements can be ordered in the following way:
a. The alphabet is a finite set, so we can order the symbols in some way.
b. The set * can be partitioned into disjoint sets with respect to the length of the strings (there
are infinite number of strings, however each string has a finite length)
c.

For each k 0 first we enumerate all strings of length k before all strings of length k+1. This
means that we first order the strings of length 0 (this is the empty string), then strings of
length 1, then of length 2, etc.

d. Strings of length k, denoted as nk are enumerated lexicographically :


ai1ai2aik precedes aj1aj2ajk
if for some m , 0 m k-1, we have ih=jh for h = 1,,m
and im+1 < jm+1. Note that if ih=jh means that aih is the same as ajh .

4. Operations on languages
Languages are sets, so the operations union, intersection and difference are applicable. There are two
operations specific for languages:
Concatenation of languages
Concatenation of languages is defined in the following way:
If L1 and L2 are languages, then L = L1 L2 (or simply L1L2) is the set:
L = {w * : w = x y , x L1, y L2}
i.e. L consists of all possible concatenations between strings in L 1 and strings in L2.
Concatenation of languages corresponds to the Cartesian product of sets.
Kleene star of a language L: the set of all strings obtained by concatenating zero or more strings from L. It
is denoted by L*.

If we consider as a language, then * would be the Kleene star of that language.

5. Problems
a. Is the set of all possible meaningful English sentences countable?
b. Is the set of all possible meaningless English sentences countable?
c.

Define the relation < between words so that it describes the ordering of words in a dictionary.
Solution
The word wi precedes the word wj in the dictionary iff one of the following is true:
a. There is a nonempty string x, such that wj = wix.
This means that wi is a prefix of wj, e.g. class and classroom.
b. If wi = xaiyi, wj = xajyj, where x, yi and yj are strings (may be empty)
and ai and aj are letters in the alphabet such that ai < aj.
Examples:
car and cat: x = ca, ai = r , aj = t, r < t, yi and yj are empty.
result and theory: x is empty, ai = r , aj = t, r < t, yi =esult, yj = heory
stack and string: x = st, ai = a , aj = i, a < i, yi = ck, yj = ing

d. If the alphabet is {0,1} and L is the language containing strings of the type
0, 1, 01,001,0001,011,00011,00001111111, i.e. zeros to the left and 1s to the right,
how this language can be formally defined?
Solution
We can describe the language L as the following set:
L = {0n1m, n 0, m 0}
Note that this definition assumes that the empty string is a member of the language, while
the problem did not say this explicitly. The convention is to assume the empty string to be in
the language, and only if we want to consider a language without the empty string, to say
this explicitly.

Learning goals

Know the definitions of the concepts described above

Know the operations on strings and languages

Exam-like problems
1. Concatenation of languages corresponds to Cartesian products of sets. Explain why.
2. Give an example of a string w such that w 3 = w 4
3. Give an example of a string w such that w i = w i+1, i is nonnegative.
4. Let L1 = {a}*, L2 = {b}*. Give the set representation for L1 L2, L1 L2

Back to Contents page

Created by Lydia Sinapova