On the Language of Primitive Partial Words

Ananda Chandra Nayak and Kalpesh Kapoor
Department of Mathematics
Indian Institute of Technology Guwahati, Guwahati, India
{n.ananda,kalpesh}@iitg.ernet.in

Abstract. A partial word is a word which contains some holes known
as do not know symbols and such places can be replaced by any letter
from the underlying alphabet. We study the relation between language
of primitive partial words with the conventional language classes viz.
regular, linear and deterministic context-free in Chomsky hierarchy. We
give proofs to show that the language of primitive partial words over an
alphabet having at least two letters is not regular, not linear and not
deterministic context free language. Also we give a 2DPDA automaton
that recognizes the language of primitive partial words.
Keywords: Combinatorics on words, Partial word, Primitive word,
Regular, Linear, Deterministic Context-free, 2DPDA

1

Introduction

Let Σ be a finite alphabet. We assume that Σ is a nontrivial alphabet, which
means that it has at least two distinct symbols. A total word (referred to as simply a word) u = a0 a1 a2 . . . an−1 of length n can be defined by a total function
u : {0, . . . , n − 1} → Σ where each ai ∈ Σ. We use string and word interchangeably. The set Σ ∗ is the free monoid generated by Σ which contains all the
strings. The length of a string u is the number of symbols contained in it and
denoted by |u|. The string with length zero (also referred to as empty string)
is denoted asSλ. The set of all words of length n over Σ is denoted by Σ n . We
define, Σ ∗ = n∈N Σ n where Σ 0 = {λ} and, Σ + = Σ ∗ \ {λ}. A language L over
Σ is a subset of Σ ∗ .
A partial word u of length n over alphabet Σ can be defined by a partial
function u : {0, . . . , n − 1} → Σ. The partial word u contains some do not know
symbols known as holes along with the usual symbols. For 0 ≤ i < n, if u(i) is
defined, then we say i ∈ D(u) (the domain of u), otherwise i ∈ H(u) (the set of
holes). A word is a partial word without any holes. If u is a partial word of length
n over Σ, then the companion of u is the total function u♦ : {0, . . . , n − 1} →
Σ ∪ {♦} defined by u♦ (i) = u(i) if i ∈ D(u) and, u♦ (i) = ♦ otherwise [4]. We
denote the language of partial words with arbitrary number of holes as Σp∗ . The
set Σp∗ is also a monoid under the operation of concatenation where λ serves as
the identity. Peter Leupold in [14] has given some connection between partial
words and languages in Chomsky hierarchy [17].

The language Q is known to be not regular [7]. We briefly mention two basic concepts. we prove that the language of primitive partial words is not regular. The next section contains some basic concepts and describes the hierarchy of languages of partial words. Similarly. In Section 3. that are required to extend the definition of primitivity to partial words [3]. if all elements in D(u) are also in D(v) and u(i) = v(i) for all i ∈ D(u). W0 (Σ) ⊂ W1 (Σ) ⊂ W2 (Σ) ⊂ · · · ⊂ Wi (Σ) ⊂ · · · S where W0 (Σ) = Σ ∗ . The partial words u and v are called compatible if there exists a partial word w such that u ⊂ w and v ⊂ w. C. However. 8. not DCFL [16] but context sensitive language [13]. The language of primitive words plays a vital role in various fields such as coding theory. formal languages and applications and combinatorics on words [20]. Every nonempty word w can be expressed uniquely in the form w = xn . for every total word there is a word which is considered as its root. Hence a♦ = {ab. It is easy to see the following relation. The language of primitive words Q has been extensively studied and many facts have been proved about relation of Q with conventional formal language classes. it is still an open question whether the language of primitive words Q is context-free or not [19. n ≥ 2. We denote the language of partial . It is denoted by u ⊂ v. If w = xn and x is a primitive. The compatibility of u and v is denoted by u ↑ v. Let us denote the language of primitive partial words with at most i holes as Qip . Lemma 1 ([20]). Note that containment is not a symmetric relation where as compatibility is a symmetric relation. we give a 2DPDA automaton that accepts the language of primitive partial words and partial Lyndon words. a♦ ⊂ ab and a♦ ⊂ a2 . not linear and not deterministic context-free language. In Section 4. a2 }. for each partial word a set of words can be considered as root. 9]. if u = v n then it implies u = v andn = 1. then x is called as the primitive root of w. 15. containment and compatibility. then v is primitive as well[3]. 2 The Language of Primitive Partial Words A word u is said to be primitive if it cannot be represented as nontrivial power of another word. Kapoor The outline of the rest of the paper is as follows. We know that Q ∪ Q = Σ ∗ where Q is the set of all non-primitive words. Thus. Note that if u is primitive and u ⊂ v. that is. where n ≥ 1 and x is primitive. We denote Wi (Σ) as the set of words over alphabet Σ with at most i holes [4]. A partial word u is said to be primitive if there does not exists a word v such that u ⊂ v n . The language containing all primitive words over an alphabet is represented as Q. not linear [12]. For √ example. a♦ = {ab. Thus. If u and v are two partial words of equal length then u is said to be contained in v.2 A. We put W (Σ) = i≥0 Wi (Σ). a}. Nayak and K.

Thus. Let us recall the relation between the finiteness of the set of root terms with the set of regular languages. A regular language has finite root if and only if it can be described by a root term.On the Language of Primitive Partial Words 3 words with at most i holes as Σi∗ and the language of partial words with arbitrary number of holes as Σp∗ over the alphabet Σ ∪ {♦}. The following figure shows the hierarchy of language of partial words. All context-free languages with finite root are regular. Σi∗ = Qip ∪ Qip Σ2∗ Q2p Σ1∗ Q1p Q2p Σ0∗ Q0p = Q Q Q1p Fig. Qp = Q0p ∪ Q1p ∪ Q2p ∪ Q3p ∪ · · · . The hierarchy of language of partial words . The language of partial words with at most i holes can be viewed in a similar way as Chomsky hierarchy of formal languages. Theorem 3 ([14]). √ w = {p | p is primitive and total and there exists k such that w ⊂ pk } √ √ For a language L of partial words. Therefore. we define L = { w | w ∈ L}. 1. We denote the language of primitive partial words as Qp . we define Σp∗ = Σ0∗ ∪ Σ1∗ ∪ Σ2∗ ∪ Σ3∗ ∪ · · · The root of a partial word w is defined as follows. Theorem 2 ([14]).

. .4 A. . there exists a positive integer xj > 1 such that xy xj z = ak a(n−j)xj aj−k ♦am ♦am b = a(n−j)xj +j ♦am ♦am b = am ♦am ♦am b ⊂ (am b)3 ∈ / Qp which is a contradiction. For any fixed integer k. j = 0. . Theorem 6. We use the similar idea as used in [7]. where CF = {w | w is a partial word over {a. . and (iii) xy i z ∈ L for all i ≥ 0. xj > 1. 3 Properties of the Language of Primitive Partial Words In this section. (ii) |xy| ≤ n. not linear as well as not Deterministic Context-free Language (DCFL). x2 . Suppose that the language of primitive partial words Qp is regular. we shall prove that the language of primitive partial words Qp is not regular. is not regular. So there exist a decomposition of w into x. It is easy to observe that the language of primitive partial words with at most i holes. . then it must satisfy the other conditions of pumping Lemma for regular languages. 2. . b} that has a critical factorization} It has been proved that CF is not regular [2] but CF is context sensitive [6]. (i) |y| > 0. First we recall the pumping lemma for regular languages and linear languages and also some of the properties of DCFL which are required to prove our result. the language of primitive partial words Qp and its relation with the languages in Chomsky hierarchy. |y| > 0 and xy i z ∈ Qp for all i ≥ 0. Lemma 5 ([7]). . . Qp is not regular. n − 1}. For a regular language L. there exist a positive integer m such that the equation system (k − j)xj + j = m. . This is because pumping up a hole in a word having at most i holes will give a word which is not in Qip . Lemma 4 (Pumping Lemma for Regular Languages [10]). Note that w is a primitive partial word over Σ ∪ {♦}. where |Σ| ≥ 2 and a 6= b. z such that w = xyz. In this paper we study a different language viz. a special language class of partial words CF ⊆ Σp∗ is defined. Proof. Since w ∈ Qp and |w| ≥ n. k − 1 has a nontrivial solution with appropriate positive integers x1 . So there exist a natural number n > 0 depending upon the number of states of finite automaton for Qp . C. 1. t u . The following result proves the claim in general. y. . Qip . Kapoor In [5]. there exists an integer n > 0 such that for every word w ∈ L with |w| ≥ n. z = aj−k ♦am ♦am b. Nayak and K. Let x = ak . Hence the language of primitive partial words Qp is not regular. m > n. It is an open problem whether CF is context-free or not [5]. . y = a(n−j) . Consider the partial word w = an ♦am ♦am b. there exist a decomposition of w as w = xyz such that the following conditions holds. Now choose i = xj and since we know by Lemma 5 that for every j ∈ {0. 1.

Qp is not deterministic context-free. k. y = bl such that i + j + k + l ≤ n. y such that it satisfies the conditions of pumping Lemma for linear languages. n ∈ N } is also a DCFL. x = bk . uv t wxt y = am bm am ♦m ⊂ (am bm )2 for some t. j + k > 0. Therefore. Also. n and j such that t is an integer. and (iii) |uvxy| ≤ n. (i) uv m wxm y ∈ L ∀m ∈ N. v = aj . that is. i + tj + r = m ⇒ i + j + r + (t − 1)j = m ⇒ n + (t − 1)j = m m−n +1 ⇒t= j It is true that there will be some integers m.On the Language of Primitive Partial Words 5 Next. we prove that the language of primitive partial words Qp is not linear. Suppose that the language of primitive partial words Qp is deterministic context-free. i + j + r = s + k + l = n. j. There exists an integer n such that any word p ∈ L with |p| ≥ n. x. As the set of DCFLs is closed under complementation. Hence it is a contradiction that the language of primitive partial words Qp is a DCFL. so there exists a factorization of s into u. Hence Qp is not linear. r. Consider s = an bm am ♦n ∈ Qp be a partial word and m > n and a 6= b. Now it must be that uv t wxt y ∈ Qp for all t ∈ N. w. Let n > 0 be an integer.w = r m m q a b a ♦ and i. Proof. v. t u . Also. Let us recall the pumping lemma for linear languages. uv t wxt y = ai+tj+r bm am ♦q+tk+l = am bm am ♦m ⊂ (am bm )2 ∈ / Qp . The language of primitive partial words Qp is not linear. the complement of Qp . Theorem 9. Theorem 8. q ≥ 0. admits a factorization p = uvwxy satisfying the following conditions. l. n ∈ N } is not a Context Free Language (CFL) which can be proved by using pumping lemma for CFLs. In particular. Therefore the language Qp is not deterministic context-free. So 0 < i + j + k + l ≤ n. It will happen that i + tj + r = m = q + tk + l because if we consider the left hand side equality we get. But the language {an bm an bm | m. we know that the intersection of a DCFL with a regular language is also DCFL. Lemma 7 (Pumping Lemma for Linear Languages [11]). Qp ∩ {a∗ b∗ a∗ b∗ } = {an bm an bm | m. the set of DCFLs is closed under complementation. Proof. (ii) |vx| > 0. Let u = ai . So. t u Next we prove that the language of primitive partial words is not deterministic context-free language. However. We will use the closure properties of DCFL. Qp (set of partial periodic words) is also a DCFL. Let L be a linear language. Suppose that the language Qp is linear. Since |s| ≥ n.

where (a) (b) (c) (d) S is the states of the finite control. push B). There are three possible actions. Nayak and K. A 2DPDA is the same as ordinary DPDA but with an additional ability to move its input head in both directions. push B) provided B 6= Z0 0 δ(s. we show that the language of primitive partial words with at least two holes is accepted by a 2DPDA. T. I. . where d = −1. a})×(T ∪Z0 ). Next. l as seqk. we define the sequence of i relative to k. d). (g) st is one of the designated final state. d. We use Z0 as the bottom of stack and one left end and one right end symbol as ` and a. A) = (s . The operation push B means add the symbol B on the top of pushdown list and pop means remove the topmost symbol from the pushdown list. i1 . the input head scans a symbol a. . B ∈ T and d ∈ {−1. a. The value of δ(s. respectively.6 4 A. the machine enters in state s and moves its input head in the direction d. Let us recall the basic definition of 2DPDA and some of the results which are required in the proof. d. d. T is the pushdown list alphabet (excluding Z0 ). right or remain stationary. A) indicates the transition function of the machine when it is in the state s. to indicate to move its head to left. respectively. in+1 ) where (a) i0 = i = in+1 . Z0 . I is the input alphabet (excluding ` and a). +1 or 0. P . d). s0 . consists of 7-tuple P = hS. 0 0 0 if defined. A) is. is of one of the forms (s . δ is a mapping on (S−{st })×(I ∪{`. Kapoor 2DPDA For Qp In this section we define Partial Lyndon words denoted by Lp and prove that the language of primitive partial words with 1-hole Q1p is accepted by Two-way Pushdown Automaton (2DPDA). a. Let k and l be two positive integers satisfying k ≤ l. A 2DPDA. st i. Definition 10 ([1]). (s . In the above definition δ(s.l (i) = (i0 . (f ) Z0 is the special symbol that indicates the bottom of the pushdown list. We assume a 2DPDA makes no moves from the final state st and certain other states may have no moves defined. C. a. For 0 ≤ i < k + l. .  0  (s .  0 (s . 0. Definition 11 ([3]). or (s . pop) where 0 s ∈ S. pop) if A 6= Z0 0 In these transitions. +1}. (e) s0 ∈ S is the initial state of the finite control. δ. . d. and the pushdown list has the symbol A on top.

Let u be a partial word with at least two holes. . n. then u is not primitive. We say that z is (k. . . that is. |y|)-special. n.l)-special if there exists 0 ≤ i ≤ gcd(k.l (i) = (i0 . (b) If u is not (|x|. . If uv ↑ vu and uv is not (|u|. Lemma 14 ([4]). in+1 ) contains at least two positions that are holes of z while z(i0 )z(i1 ) . then the following hold: (a) If |x| = |y|. I. |v|)-special. w ∈ Qp ⇔ w = 6 λ and ∀u. y implies x =  or y = . Let u and v be two nonempty partial words such that |u| ≤ |v|. . If uu ↑ xuy for some partial words x. i1 . v ∈ Σ ∗ : (w ⊂ uv and w ⊂ vu ⇒ λ ∈ {u. ij is defined as ij = ij−1 + k ij−1 − l if 6= ij−1 < l. otherwise. that is. we recall some of them which will be useful below. Definition 18. Let u and v be two nonempty partial words such that uv contains at most one hole. A partial word w ∈ Σ + is primitive if and only if it is not contained in two non-empty commuting words. then there exists a word w such that u ⊂ wm and v ⊂ wn for some integer m. Lemma 17 ([4]). (c) If u is (|x|. then u is primitive. Let k and l be two positive integers satisfying k ≤ l and let z be a partial word of length (k + l). If uu ↑ xuy for some nonempty partial words x and y satisfying |x| ≤ |y|. The cardinality of the class of conjugates of a primitive partial word w of length n is n. . A primitive partial word w is a partial Lyndon word if and only if it is minimal in its conjugate class (with respect to lexicographic order where we assume a < b < · · · < ♦). Lemma 16 ([4]). ij 6= i  (c) For 1 ≤ j ≤ n + 1. v}). Then u is primitive if and only if uu ↑ xuy for some partial words x. y implies x =  or y = . l) such that seqk. . The words u and v commute if and only if they are contained in powers of the same words. then u is not contained in a power of a word of length |x|. uv ↑ vu if there exists a word w such that u ⊂ wm and v ⊂ wn for some integer m. Definition 12 ([4]). Lemma 19. Definition 15. Lemma 13 ([4]). Let u be a partial word with one hole. II.On the Language of Primitive Partial Words 7 (b) For 1 ≤ j ≤ n. |y|)-special. Several facts about primitive partial words are known. then u is not primitive (it is contained in a power of a word of length |x|). z(in+1 ) is not 1-periodic.

y = ♦ and w = ♦b. pushes all symbols of w onto the stack and pops one symbol. Let ` w a be the input partial word with at most one hole augmented with two end markers. P skips the last symbol of w and pushes the remaining symbols of w onto its stack. Example 21. a 2DPDA automaton for the language of primitive partial words Qp is presented. . So by the method described in Theorem 20 we have 0 0 0 x = a. We can see that a partial word is contained in a set of words. Hence the 2DPDA accepts w if and only if w is not compatible to a factor of 0 0 w yxw by using Lemma 16. then w ⊂ {aaba. abbb}. Let w be a primitive partial word over the alphabet Σ ∪ {♦} of length n. Theorem 20. Then P pops the topmost symbol from the pushdown and repeats the process. Again P moves to a. t u Observe that the above proof method does not work for the set of primitive partial words with at least two holes. Since w is primitive and we know that the primitive word is closed under cyclic permutation [20]. aabb. then P moves its head towards a . C. if v is not primitive then u is also not primitive. Nayak and K. If not. So every new partial word which is generated by cyclic permutation of w is a conjugate of w. If w = λ. Proof. t u In [18]. P rejects. If the symbols match (assuming that a 6= b and ♦ = a for any a ∈ Σ). We know that if a partial word u is contained in another word v and u is primitive then v is also primitive. the contents of the stack will be: 0 0 w yxw Z0 The automaton P compares w with the pushdown contents one symbol at a time. If w = a♦b♦. If we 0 write the input w = xw y with x. that is. So the number of such conjugates of w is n. The language of primitive partial words Qp with at least two holes is accepted by a 2DPDA. This proves our claim. Now w yxw = ♦b♦a♦b and w is compatible to a 0 0 factor of w yxw . the head moves right and P pops the pushdown. Let w = a♦b♦. abba. Kapoor Proof. A counter example is given below.8 A. If a mismatch is encountered P moves the input head back to ` and pushes the symbols scanned during this move. The informal idea is as follows: Let P be a 2DPDA. Theorem 22. If the pushdown becomes empty then P accepts. y ∈ Σ ∪ {♦} (assuming |w| ≥ 2). If in this way the entire word 0 0 w is completely scanned and is compatible to a factor of w yxw P rejects (by assuming that the hole ♦ is compatible to any of the symbol in Σ). Next we extend the Theorem 20 to show that the language Qp with at least two holes is also recognizable by a 2DPDA by using a similar idea as in proof of Theorem 20. then each of the conjugate of w is also primitive. We show that a 2DPDA can also be constructed for Q1p by using the same idea used in [18]. Q1p is accepted by 2DPDA.

If 00 we write w1 = xw y where x. let ` w a be the input where w ∈ Σ ∪ {♦}. Next. Then A pops one symbol and repeats the process. Suppose ` w1 a is the input to A. in particular a don’t know symbol matches only with a don’t know symbol. . J. Now we will give input each of the wi . otherwise continue the same process. The procedure is then repeated. The informal idea is as follows: since w is a partial word. Let w is contained in the set {w1 . If the symbols match.E. wk } is accepted by A. w2 .. If the pushdown is empty then A chooses the next word w2 from the set and continue. . Then B compares the topmost symbol of stack with w. wk }. < ♦) or reach the bottom most symbol Z0 . Let ` w a be a partial word augmented with two end markers which is the input to A. B scans the input w and push the symbols of w into the pushdown while scanning. . w2 .: Design & Analysis of Computer Algorithms. then it rejects. then the partial word w is rejected and stop the process. . Then A skips the last symbol of w1 and pushes the remainder of w1 onto its pushdown store. Let A be a 2DPDA. If B reached the symbol ` in the input and some symbols are left in the pushdown then B accepts. y ∈ Σ with |w1 | ≥ 2 the contents of pushdown are of the form: 00 00 w yxw Z0 The automaton A compares w1 with the pushdown contents symbol by symbol. . t u Lemma 23. A advances its input head to a.On the Language of Primitive Partial Words 9 Proof. . 1 ≤ i ≤ k to A one at a time and if at least one of the wi is rejected by A.V. . The key idea is based on that a partial Lyndon word w is smaller than each of its proper right factors. . If in this way w1 is completely scanned A rejects. . If a mismatch occurs A moves the input head back to ` and pushes the symbols scanned during this move. If w = λ then A rejects. Let us describe an automaton B that accepts Lp . If the symbol on the stack is greater than the symbol in the input then B returns to ` and push the symbols onto stack during the move. wk }. w2 . If B encounters a symbol a in the stack and symbol b in w with a < b (assuming the comparison a < b < . The language of partial Lyndon words Lp is accepted by 2DPDA. . Proof. . . then B pops the top symbol of stack and moves its head one position right. Hopcroft. the head moves right and A pops the pushdown. Then A moves its head to a again and pushes all of w1 onto its store and pop one symbol. Otherwise it computes the set of words {w1 . Similar to the previous theorem. first compute the set of words in which w is contained. Pearson Education India (1974) . If the symbols in w and the topmost symbol in stack matches. Aho. . . A. t u References 1. The automaton A accepts the partial word w if and only if each of the word in the set {w1 .

4427 (2011) 17. Kunimochi. 4877–4881 (2008) 10. J. Horv´ ath. Motwani. Springer (1993) 9. pp.. B. Kapoor 2. P. F. D. Ito. Blanchet-Sadri.. E. Miller. Blanchet-Sadri. Blanchet-Sadri. Petersen. R. 95. Ito.: Handbook of mathematical psychology. arXiv preprint arXiv:1104..0023 (2010) 12.: On the language of primitive words.D. Shallit.. 11–58. J.: Primitive partial words. pp. arXiv preprint arXiv:1012.. R. 1562. 194– 203. Nayak and K. D¨ om¨ osi. C. pp. ACM SIGACT News 32(1).. Davis. Hopcroft. Ullman.: Open problems on partial words.. Preprint 3. S. In: Symposium on Theoretical Aspects of Computer Science. Blanchet-Sadri.. Springer (1994) 19. Merca¸s. Nagy. Horv´ ath. 195–213 (2005) 4. D¨ om¨ osi.. R. 890–900 (2009) 7. H. pp.. Handbook of Mathematical Psychology (1963) 18. Zhang. F. P.: Unbordered partial words. C. pp. vol. J.: A context sensitive grammar generating the set of all primitive words. Leupold. 60–65 (2001) 11. Y. pp. F. Computations and their Applications.: The ambiguity of primitive words. Springer (2010) 16. Languages. In: Proc. G. CRC Press (2007) 5..: Marcus contextual languages consisting of primitive words. Blanchet-Sadri. P. Luce.: Algorithmic combinatorics on partial words. Discrete Applied Mathematics 157(5). 191–194 (2005) 8. 679–690. K´ aszonyi.. 141–156 (1996) 20.: Primitive words are unavoidable for context-free languages.: Languages of partial words..: On the critical factorization theorem. G.: Introduction to automata theory. M. Dodge. In: New Developments in Formal Languages and Applications. Chomsky. Galanter.: Strong interchangeability and nonlinearity of primitive words. S. G. J. P. M.A. Horv´ ath.E. 403–413.: Pumping lemmas for linear and nonlinear context-free languages.. M. languages. Springer (2008) 6. Cambridge University Press (2008) .. vol.10 A. Horv´ ath.. In: Fundamentals of Computation Theory. D¨ om¨ osi. 179–192 (2004) 15. Discrete Applied Mathematics 148(3). F. Moorefield. M. H.R. 173–178 (1995) 13. Bulletin of European Association for Theoretical Computer Science 87. Leupold..: Primitive words and roots of words. N.: A second course in formal languages and automata theory. Workshop AMAST Workshop on Algebraic Methods in Language Processing.R. 143–145 (2007) 14. In: Language and Automata Theory and Applications. Petersen. and computation.. Lischke. Katsura. Grammars 7. P. J. In: Algebras.: The language of primitive words is not regular: two simple proofs. S. G. Marcus.. Theoretical Computer Science 161(1). F. L. Bush. Discrete Mathematics 308(21).: Formal languages consisting of primitive words.