You are on page 1of 17

In computer science, in the area of formal language theory, frequent use is made of a variety
of string functions; however, the notation used is different from that used oncomputer
programming, and some commonly used functions in the theoretical realm are rarely used when
Contents
[hide]

## 1 Strings and languages

2 Alphabet of a string

3 String substitution

4 String homomorphism

5 String projection

6 Right quotient

7 Syntactic relation

8 Right cancellation

9 Prefixes

11 Notes

12 References

## Strings and languages

A string is a finite sequence of characters. The empty string is denoted by . The concatenation
of two string

and

is denoted by

, or shorter by

makes no difference:

## . Concatenating with the empty string

. Concatenation of strings is

associative:

For example,

A language is a finite or infinite set of strings. Besides the usual set operations like union,
intersection etc., concatenation can be applied to languages: if both
their concatenation
string from

and

are languages,

, formally

and any

The language
language

change:

## , while concatenating with the latter always yields the empty

language:

. Concatenation of languages is

associative:

, the

## of arbitrary length is an example for an infinite language.

Alphabet of a string
The alphabet of a string is the set of all of the characters that occur in a particular string. If s is
a string, its alphabet is denoted by

formally:

the above

, and

## of all decimal numbers.

String substitution
Let L be a language, and let be its alphabet. A string substitution or simply
a substitution is a mapping f that maps letters in to languages (possibly in a different
alphabet). Thus, for example, given a letter a , one has f(a)=La where La * is some
language whose alphabet is . This mapping may be extended to strings as
f()=
for the empty string , and
f(sa)=f(s)f(a)
for string s L. String substitutions may be extended to entire languages as

[1]

Regular languages are closed under string substitution. That is, if each letter of
a regular language is substituted by another regular language, the result is still a
regular language.[2] Similarly, context-free languages are closed under string
substitution.[3][note 1]
A simple example is the conversion fuc(.) to upper case, which may be defined
e.g. as follows:
lette
r

mapped to
language

fuc(x)

{ A }

remark

char

{ A }

{ SS }

{}

{}

...

U }.

## Another example is the conversion of an EBCDIC-encoded string to ASCII.

String homomorphism
A string homomorphism (often referred to simply as
a homomorphism in formal language theory) is a string substitution such that
each letter is replaced by a single string. That is, f(a)=s, where s is a string, for
each letter a.[note 2][4]
String homomorphisms are monoid morphisms on the free monoid, preserving
the binary operation of string concatenation. Given a language L, the set f(L) is
called thehomomorphic image of L. The inverse homomorphic image of a
string s is defined as
f1(s) = { w | f(w)=s }
while the inverse homomorphic image of a language L is defined as
f1(L) = { s | f(s) L }
In general, f(f1(L)) L, while one does have
f(f1(L)) L
and

L f1(f(L))
for any language L.
The class of regular languages is closed under
homomorphisms and inverse homomorphisms.[5] Similarly, the
context-free languages are closed under homomorphisms[note
3]

## A string homomorphism is said to be -free (or e-free) if f(a)

for all a in the alphabet . Simple single-letter substitution
ciphers are examples of (-free) string homomorphisms.
An example string homomorphism guc can also be obtained by
defining similar to the above substitution: guc(a) =
A, ..., guc(0) = , but letting guc undefined on punctuation
chars. Examples for inverse homomorphic images are

## guc1({ SSS }) = { sss, s, s }, since guc(sss)

= guc(s) = guc(s) = SSS, and

## guc1({ A, bb }) = { a }, since guc(a) = A, while bb

cannot be reached by guc.

## For the latter language, guc(guc1({ A, bb })) = guc({ a }) =

{ A } { A, bb }. The homomorphism guc is not -free, since
it maps e.g. 0 to .

String projection
If s is a string, and

## projection of s is the string that results by removing all letters

which are not in

. It is written as

. It is formally defined

Here

## is essentially the same as a projection in relational algebra.

String projection may be promoted to the projection of a
language. Given a formal language L, its projection is given
by

Right quotient
The right quotient of a letter a from a string s is the
truncation of the letter a in the string s, from the right

string. Thus:

monoid

of a

subset as

## Left quotients may be defined similarly,

with operations taking place on the left of a
string.

Syntactic relation
The right quotient of a subset
a monoid

of

defines an equivalence

## relation, called the right syntactic

relation of S. It is given by

## The relation is clearly of finite index

(has a finite number of equivalence
classes) if and only if the family right
quotients is finite; that is, if

## is finite. In this case, S is

a recognizable language, that is, a
language that can be recognized
by a finite state automaton. This is
discussed in greater detail in the
article onsyntactic monoids.

Right cancellation
The right cancellation of a
letter a from a string s is the
removal of the first occurrence of
the letter a in the string s, starting
from the right hand side. It is

denoted as

and is

recursively defined as

cancellable:

## Clearly, right cancellation

and projection commute:

Prefixes
The prefixes of a
string is the set of
all prefixes to a string,
with respect to a given
language:

here

The prefix
closure of a
language is

Example:
A language is
called prefix
closed if
.
The prefix
closure
operator
is idempotent:

The prefi
x
relation i

s a binary
relation
such
that
if
and only
if
. This
relation is
a
particular
example
of a prefix
order

In computer science, in the area of formal language theory, frequent use is made of a variety
of string functions; however, the notation used is different from that used oncomputer
programming, and some commonly used functions in the theoretical realm are rarely used when
Contents
[hide]

## 1 Strings and languages

2 Alphabet of a string

3 String substitution

4 String homomorphism

5 String projection

6 Right quotient

7 Syntactic relation

8 Right cancellation

9 Prefixes

11 Notes

12 References

## Strings and languages

A string is a finite sequence of characters. The empty string is denoted by . The concatenation
of two string

and

is denoted by

, or shorter by

## . Concatenating with the empty string

makes no difference:

. Concatenation of strings is

associative:

For example,

A language is a finite or infinite set of strings. Besides the usual set operations like union,
intersection etc., concatenation can be applied to languages: if both
their concatenation
string from

and

are languages,

, formally

and any

The language
language

change:

## , while concatenating with the latter always yields the empty

language:

. Concatenation of languages is

associative:

, the

## of arbitrary length is an example for an infinite language.

Alphabet of a string
The alphabet of a string is the set of all of the characters that occur in a particular string. If s is
a string, its alphabet is denoted by

formally:

the above

, and

## of all decimal numbers.

String substitution
Let L be a language, and let be its alphabet. A string substitution or simply
a substitution is a mapping f that maps letters in to languages (possibly in a different
alphabet). Thus, for example, given a letter a , one has f(a)=La where La * is some
language whose alphabet is . This mapping may be extended to strings as
f()=
for the empty string , and
f(sa)=f(s)f(a)
for string s L. String substitutions may be extended to entire languages as

[1]

Regular languages are closed under string substitution. That is, if each letter of
a regular language is substituted by another regular language, the result is still a
regular language.[2] Similarly, context-free languages are closed under string
substitution.[3][note 1]
A simple example is the conversion fuc(.) to upper case, which may be defined
e.g. as follows:
lette
r

mapped to
language

fuc(x)

remark

{ A }

char

{ A }

{ SS }

{}

{}

...

U }.

## Another example is the conversion of an EBCDIC-encoded string to ASCII.

String homomorphism
A string homomorphism (often referred to simply as
a homomorphism in formal language theory) is a string substitution such that

each letter is replaced by a single string. That is, f(a)=s, where s is a string, for
each letter a.[note 2][4]
String homomorphisms are monoid morphisms on the free monoid, preserving
the binary operation of string concatenation. Given a language L, the set f(L) is
called thehomomorphic image of L. The inverse homomorphic image of a
string s is defined as
f1(s) = { w | f(w)=s }
while the inverse homomorphic image of a language L is defined as
f1(L) = { s | f(s) L }
In general, f(f1(L)) L, while one does have
f(f1(L)) L
and
L f1(f(L))
for any language L.
The class of regular languages is closed under
homomorphisms and inverse homomorphisms.[5] Similarly, the
context-free languages are closed under homomorphisms[note
3]

## A string homomorphism is said to be -free (or e-free) if f(a)

for all a in the alphabet . Simple single-letter substitution
ciphers are examples of (-free) string homomorphisms.
An example string homomorphism guc can also be obtained by
defining similar to the above substitution: guc(a) =
A, ..., guc(0) = , but letting guc undefined on punctuation
chars. Examples for inverse homomorphic images are

## guc1({ SSS }) = { sss, s, s }, since guc(sss)

= guc(s) = guc(s) = SSS, and

## guc1({ A, bb }) = { a }, since guc(a) = A, while bb

cannot be reached by guc.

## For the latter language, guc(guc1({ A, bb })) = guc({ a }) =

{ A } { A, bb }. The homomorphism guc is not -free, since
it maps e.g. 0 to .

String projection
If s is a string, and

## projection of s is the string that results by removing all letters

which are not in

. It is written as

. It is formally defined

Here

## is essentially the same as a projection in relational algebra.

String projection may be promoted to the projection of a
language. Given a formal language L, its projection is given
by

Right quotient
The right quotient of a letter a from a string s is the
truncation of the letter a in the string s, from the right
hand side. It is denoted as

string. Thus:

monoid

of a

subset as

## Left quotients may be defined similarly,

with operations taking place on the left of a
string.

Syntactic relation
The right quotient of a subset
a monoid

of

defines an equivalence

## relation, called the right syntactic

relation of S. It is given by

## The relation is clearly of finite index

(has a finite number of equivalence
classes) if and only if the family right
quotients is finite; that is, if

## is finite. In this case, S is

a recognizable language, that is, a
language that can be recognized
by a finite state automaton. This is
discussed in greater detail in the
article onsyntactic monoids.

Right cancellation
The right cancellation of a
letter a from a string s is the
removal of the first occurrence of
the letter a in the string s, starting
from the right hand side. It is
denoted as

and is

recursively defined as

cancellable:

## Clearly, right cancellation

and projection commute:

Prefixes
The prefixes of a
string is the set of
all prefixes to a string,
with respect to a given
language:

here

The prefix
closure of a
language is

Example:

A language is
called prefix
closed if
.
The prefix
closure
operator
is idempotent:

The prefi
x
relation i
s a binary
relation
such
that
if
and only
if
. This
relation is
a
particular
example
of a prefix
order

## Alphabets and Languages

Definitions

Operations on strings

Languages

Operations on languages

Problems

Learning goals
Exam-like problems

## 1. Some definiions and properties

Alphabet: A finite set of symbols.
E.G {a,b,c,x.y.z}. {0,1},{0,1,2,3,4,5,6,7,8,9}
String over an alphabet: A finite sequence of symbols from the alphabet
E.G.: thisisastring - over {a,b,c,,z}
01011 - over {0,1}
3786 - over {0,1,2,3,4,5,6,7,8,9}
A string with one symbol only = symbol itself
Empty string - no symbols, notation: e
Note: we use the letters a,b,c,, w,x,y,z both for naming strings and for writing instances of strings.
Usually for names of strings we use the last letters: w, x,y,z
Thus x = abc means that abc is a string and we call it x.
Length of a string - its length as a sequence (the number of symbols)
if w = abcd, |w| = 4
If w = classroom, |w| = 9
We can match a position in a string with the symbol there:
If w = classroom, w(3) = a, w(4) = s, and w(5) = s
To be able to distinguish between same symbols, we refer to them as different occurrences of the
same symbol.

2. Operations on strings
Concatenation: combines two strings by putting them one after the other.
E.G x = abc, y = mnop, then x y = abcmnop, or simply xy = abcmnop
The concatenation of the empty string with any other string gives the string itself:
x e = ex = x
Substring: If w is a string, then v is a substring of w if there exist strings x and y such that w = xvy

## x is called prefix, and y is called suffix of w

The i-th concatenation of a string with itself is defined in the following way:
w0 = e
w i+1 = w i w for each i 0.
So w1 = w, bang 2 = bangbang
Kleene star operation on strings: Let w be a string. w* is the set of strings obtained
by applying any number of concatenations of w with itself, including the empty string.
Example: a* = { e, a, aa, aaa, aaaa, aaaaa, }
Reversal of a string w denoted w R is the string spelled backwards
Formal definition:
If w is a string of length 0, then w R= w = e
If w is a string of length n+1 > 0, then w = ua for some a , and w R= a u R.

3. Languages
If is an alphabet, then * is the set of all strings over .
Language: any set of strings over an alphabet , i.e. any subset of *.
* is a countably infinite set. Its elements can be ordered in the following way:
a. The alphabet is a finite set, so we can order the symbols in some way.
b. The set * can be partitioned into disjoint sets with respect to the length of the strings (there
are infinite number of strings, however each string has a finite length)
c.

For each k 0 first we enumerate all strings of length k before all strings of length k+1. This
means that we first order the strings of length 0 (this is the empty string), then strings of
length 1, then of length 2, etc.

## d. Strings of length k, denoted as nk are enumerated lexicographically :

ai1ai2aik precedes aj1aj2ajk
if for some m , 0 m k-1, we have ih=jh for h = 1,,m
and im+1 < jm+1. Note that if ih=jh means that aih is the same as ajh .

4. Operations on languages
Languages are sets, so the operations union, intersection and difference are applicable. There are two
operations specific for languages:
Concatenation of languages
Concatenation of languages is defined in the following way:
If L1 and L2 are languages, then L = L1 L2 (or simply L1L2) is the set:
L = {w * : w = x y , x L1, y L2}
i.e. L consists of all possible concatenations between strings in L 1 and strings in L2.
Concatenation of languages corresponds to the Cartesian product of sets.
Kleene star of a language L: the set of all strings obtained by concatenating zero or more strings from L. It
is denoted by L*.

## If we consider as a language, then * would be the Kleene star of that language.

5. Problems
a. Is the set of all possible meaningful English sentences countable?
b. Is the set of all possible meaningless English sentences countable?
c.

Define the relation < between words so that it describes the ordering of words in a dictionary.
Solution
The word wi precedes the word wj in the dictionary iff one of the following is true:
a. There is a nonempty string x, such that wj = wix.
This means that wi is a prefix of wj, e.g. class and classroom.
b. If wi = xaiyi, wj = xajyj, where x, yi and yj are strings (may be empty)
and ai and aj are letters in the alphabet such that ai < aj.
Examples:
car and cat: x = ca, ai = r , aj = t, r < t, yi and yj are empty.
result and theory: x is empty, ai = r , aj = t, r < t, yi =esult, yj = heory
stack and string: x = st, ai = a , aj = i, a < i, yi = ck, yj = ing

d. If the alphabet is {0,1} and L is the language containing strings of the type
0, 1, 01,001,0001,011,00011,00001111111, i.e. zeros to the left and 1s to the right,
how this language can be formally defined?
Solution
We can describe the language L as the following set:
L = {0n1m, n 0, m 0}
Note that this definition assumes that the empty string is a member of the language, while
the problem did not say this explicitly. The convention is to assume the empty string to be in
the language, and only if we want to consider a language without the empty string, to say
this explicitly.

Learning goals

## Know the operations on strings and languages

Exam-like problems
1. Concatenation of languages corresponds to Cartesian products of sets. Explain why.
2. Give an example of a string w such that w 3 = w 4
3. Give an example of a string w such that w i = w i+1, i is nonnegative.
4. Let L1 = {a}*, L2 = {b}*. Give the set representation for L1 L2, L1 L2