Professional Documents
Culture Documents
(A)
School of Computer Science, University of Waterloo, Waterloo, ON, N2L 3G1 Canada
Department of Computer Science, University of Western Ontario
London, Ontario, N6A 5B7, Canada
lila.kari@uwaterloo.ca
(B)
Department of Mathematics, Indian Institute of Technology Madras
Chennai, Tamilnadu, 600036, India
ma16ipf01@smail.iitm.ac.in (M.S. Kulkarni)
kmahalingam@iitm.ac.in (K. Mahalingam)
ABSTRACT
We introduce the concept of relative Watson-Crick primitivity of words and its gener-
alization, the relative θ-primitivity of words, where θ is a morphic or an antimorphic
involution. Similar to relatively prime integers which do not share any common fac-
tors, we call two words u and v relatively θ-primitive if they do not share a common
θ-primitive root. We study some combinatorial properties of relatively θ-primitive
words, as well as establish relations between each of the two words u and v and the
result of some binary word operation between u and v, from this perspective.
1. Introduction
Periodicity and its opposite, primitivity of words, are fundamental properties of words
in combinatorics on words and formal language theory. In addition, the detection of
repetitions in strings plays an important role in, e.g., pattern matching and text com-
pression [1, 2, 14]. On the other hand, tandem repeats in DNA, that is, sequences
of two or more contiguous, approximate copies of a pattern, occur in the genomes of
both eukaryotic and prokaryotic organisms, and have biological and medical signifi-
cance. In this paper we bring together these concepts from two different fields with
the notion of relatively prime numbers from number theory, to define the relative
Watson-Crick primitivity of words.
Recall that, in the framework of formal language theory, a single DNA strand – a
sequence of nucleotides that can be of four different kinds, Adenine, Cytosine, Guanine
or Thymine –, can be modelled as a word w over the alphabet ∆ = {A, C, G, T }. DNA
strands can be either single-stranded or double-stranded, with the latter being formed
2 L. Kari, M.S. Kulkarni, K. Mahalingam
when two single strands that are Watson-Crick complementary bind to each other,
to form the familiar double-helix. The fact that the Watson-Crick complement of
a DNA single strand is its reverse complement, wherein A complements and binds
to T and viceversa, while G complements and binds to C and viceversa, has been
traditionally modelled by a morphic involution combined with the mirror image [13]
or, most often, by an antimorphic involution θ [6]. An antimorphic involution is a
function θ on an alphabet Σ∗ that is an antimorphism, θ(xy) = θ(y)θ(x), ∀x, y ∈ Σ∗
(this models the “reverse” part), and an involution, θ2 = id, the identity (this models
the “complement” part, in that the complement of the complement of a letter is the
original). With this formalism, the Watson-Crick complement of a word u is its image
θ∆ (u) through the involution θ∆ : ∆ −→ ∆ defined as θ∆ (A) = T and θ∆ (G) = C, and
naturally extended to an antimorphism of Σ∗ . In this paper we call an (anti)morphic
involution an involution that is either a morphism or an antimorphism.
Due to the fact that an (anti)morphic involution is a bijection, a word u and
its image θ(u) through an (anti)morphic involution θ are retrievable from one an-
other, and can be considered in some sense “identical”. This idea, of extending the
notion of identity to more general functions such as (anti)morphic involutions, led
to natural generalizations of classical notions in combinatorics of words to notions
such as pseudo-periodicity and pseudo-primitivity [3, 11], pseudo-palindromes [4, 9],
Watson-Crick conjugate and commutative words [8], pseudoknot-bordered words [10],
pseudo-repetitions [5], etc.
In this paper, we introduce the concept of relative Watson-Crick primitivity of
words and its generalization, the relative θ-primitivity of words, where θ is any
(anti)antimorphic involution. In the same way in which the concept of primitive
word was inspired by the notion of prime number, this is a concept inspired by the
notion of relatively prime integers in number theory. In this setting, two words are
said to be relatively θ-primitive if they do not share a common θ-primitive root. Note
that, if θ is the identity function on Σ, extended to a morphic involution on Σ∗ , then
we obtain a particular case, that of relatively primitive words, i.e., words that do not
share a common primitive root.
The paper is organized as follows: In Section 2, we introduce some basic definitions,
notations and results that are used throughout the paper. In Section 3, we introduce
the concept of relatively θ-primitive words for (anti)morphic involutions θ, and study
some combinatorial properties of this relation. The relation between the result of
a binary word operation between two words u and v, and either u or v, is studied
in Section 4, for various binary operations such as shuffle, perfect shuffle, and θ-
catenation. We end with concluding remarks in Section 5.
An alphabet Σ is a finite non-empty set of symbols, and Σ∗ denotes the set of all words
over Σ including the empty word λ, while Σ+ is the set of all non-empty words over
Σ. The length of a word u ∈ Σ∗ (i.e., the number of symbols in a word) is denoted
by |u|. We denote by |u|a the number of occurrences of a letter a in u. By Σm we
denote the set of all words of length m > 0 over Σ. A language L is a subset of Σ∗ .
Relative Watson-Crick primitivity of words 3
The following analogous results from [8] provide a characterization of words that
θ-commute or are θ-conjugate.
Proposition 4. [8] Let u, v ∈ Σ+ be two words such that u θ-commutes with v, that
is, uv = θ(v)u.
4 L. Kari, M.S. Kulkarni, K. Mahalingam
In this section we introduce and investigate the concept of relatively θ-primitive words
for a given (anti)morphic involution θ.
Note that if (u, v)θ 6= λ, then (u, v)θ is the common θ-primitive root of u and v.
If ρθ (u) = ρθ (v) = x, then u and v are not relatively θ-primitive, and this is
denoted by u 6⊥θ v.
If θ∆ is the Watson-Crick antimorphic involution on ∆∗ , where ∆ is the DNA
alphabet, then two words that are relatively θ∆ -primitive are called relatively Watson-
Crick primitive. If θ is the identity function on Σ extended to a morphism of Σ∗ ,
then two words that are relatively θ-primitive are called relatively primitive, and this
is denoted by u ⊥ v.
Observe that, for all u ∈ Σ∗ , (u, λ)θ = (λ, u)θ = λ. The following lemma follows
directly from Definition 6.
However, the relation ⊥θ is transitive on Qθ . Note that the relation 6⊥θ is reflexive
on Σ+ and symmetric on Σ∗ . The following lemma shows that the relation 6⊥θ is
transitive.
Proof. Since (u, v)θ = x, ρθ (u) = ρθ (v) = x. Also (v, w)θ = y implies ρθ (v) =
ρθ (w) = y, where x, y ∈ Σ+ . Since the θ-primitive root of a word is unique, this
further implies that x = y and (u, w)θ = x.
It is clear from (I) and (IV) of Lemma 8, and Lemma 10 that the relation 6⊥θ is an
equivalence relation on Σ+ .
The concept of a common θ-primitive root can be extended to n ≥ 2 words,
by defining the common θ-primitive root of words u1 , u2 , . . . , un ∈ Σ+ to be
(u1 , u2 , . . . , un )θ = x iff ρθ (ui ) = x for all 1 ≤ i ≤ n, and to be λ if no such x
exists. The following result is an immediate consequence of Lemma 10.
Example 14. Let Σ = {a, b, c} and θ be an antimorphic involution such that θ(a) =
c and vice versa, and θ(b) = b. Let x = acbbac and y = acb. Note that, x, y ∈ Q and
hence x ⊥ y. But x = yθ(y) for y ∈ Qθ and hence ρθ (x) = y, i.e., x 6⊥θ y.
In the following, we consider two words v and w such that v θ-commutes with
w, and give conditions under which a word u is relatively θ-primitive with words
v and w. We say that two nonempty sets of nonempty words {u1 , u2 , . . . , un } and
{v1 , v2 , . . . , vm } are relatively θ-primitive, and we denote this by {u1 , u2 , . . . un } ⊥θ
{v1 , v2 , . . . vm }, where n, m ≥ 1 and ui , vj ∈ Σ+ for 1 ≤ i ≤ n, 1 ≤ j ≤ m, iff ui and
vj are relatively θ-primitive for all 1 ≤ i ≤ n and 1 ≤ j ≤ m.
Proof. If u ⊥θ v then (u, v)θ ⊥θ w and ((u, v)θ , w)θ = λ. We have the following two
cases:
Case (1): If v ⊥θ w then u ⊥θ (v, w)θ and (u, (v, w)θ )θ = λ.
Case (2): If v 6⊥θ w, i.e., ρθ (v) = ρθ (w) = x, with x ∈ Σ+ , then (u, (v, w)θ )θ =
(u, x)θ = λ, since u ⊥θ v. Thus, u ⊥θ (v, w)θ and (u, (v, w)θ )θ = λ.
If, on the other hand, u 6⊥θ v, then let ρθ (u) = ρθ (v) = x0 with x0 ∈ Σ+ . Then we
have the following two cases:
Relative Watson-Crick primitivity of words 7
Case (2): β = θ(x)γ where γ ∈ {x, θ(x)}∗ , i.e., vu = x2 θ(x1 )θ(x2 )γαx1 = x1 x2 σ.
Then x2 θ(x1 ) = x1 x2 , i.e., the word x2 θ-commutes with θ(x1 ). Thus by Proposition 4
either x2 , θ(x1 ) ∈ p+ for p ∈ Pθ , or θ(x1 ) = [qθ(q)]m and x2 = θ(q)[qθ(q)]n for some
q ∈ Σ+ , m ≥ 1 and n ≥ 0. This further implies that for some k ≥ 2, x = x1 x2 = pk ,
that is x ∈/ Qθ , or that x ∈ θ(q){q, θ(q)}+ ∈ / Qθ , a contradiction.
Consider now the second possibility, where u ends inside θ(x), that is, u = αθ(x1 ),
v = θ(x2 )β, for some x1 , x2 ∈ Σ+ with x = x1 x2 , and let β 6= λ. Then, as stated
before, because u starts with x, we must have α 6= λ, and we have following two cases:
Case (1’): β = xγ where γ ∈ {x, θ(x)}∗ , i.e., vu = θ(x2 )x1 x2 γαθ(x1 ) = x1 x2 σ.
Then x1 x2 = θ(x2 )x1 , i.e., x1 θ-commutes with x2 . Thus we get a similar contradic-
tion as that of Case (2).
Case (2’): β = θ(x)γ where γ ∈ {x, θ(x)}∗ , i.e., vu = θ(x2 )θ(x1 )θ(x2 )γαθ(x1 ) =
x1 x2 σ. Then x1 x2 = θ(x2 )θ(x1 ). Thus x1 is a θ-conjugate of θ(x1 ) and, by Propo-
sition 3, there exists p, q ∈ Σ∗ such that x1 = pq and either θ(x1 ) = qθ(p) and
x2 = (θ(p)θ(q)pq)i θ(p), or θ(x1 ) = θ(q)p and x2 = (θ(p)θ(q)pq)i θ(p)θ(q)p for some
i ≥ 0.
Case (2’(a)): Let θ(x1 ) = qθ(p) and x2 = (θ(p)θ(q)pq)i θ(p). Then θ(p)θ(q) = qθ(p),
which implies that either p = λ, q 6= λ, θ(q) = q, or that q = λ, p 6= λ, or that
p, q ∈ Σ+ and θ(p) θ-commutes with θ(q).
In the first case, p = λ, q 6= λ, we have x1 = q, θ(x1 ) = q, x2 = (θ(q)q)i = q 2i
for some i ≥ 0. As x2 6= λ we have i 6= 0, and x1 x2 = q 2i+1 , q ∈ Pθ ∩ Σ+ , which
contradict the θ-primitivity of x = x1 x2 .
In the second case, q = λ, p 6= λ, we have x1 = p, θ(x1 ) = θ(p), and x2 =
(θ(p)p)i θ(p) for some i ≥ 0. This further implies x1 x2 = p(θ(p)p)i θ(p), p ∈ Σ+ which
contradicts the θ-primitivity of x = x1 x2 .
In the third case, where p, q ∈ Σ+ and θ(p) θ-commutes with θ(q) , by Proposition 4,
either θ(p), θ(q) ∈ s+ for s ∈ Pθ , or θ(q) = [tθ(t)]m and θ(p) = θ(t)[tθ(t)]n for
some t ∈ Σ+ , m ≥ 1 and n ≥ 0. This further implies that either for k ≥ 3, x =
x1 x2 = pq(θ(p)θ(q)pq)i θ(p) = sk ∈ / Qθ , or x ∈ t{t, θ(t)}+ ∈ / Qθ , both leading to
contradictions.
Case(2’(b)): Let θ(x1 ) = θ(q)p and x2 = (θ(p)θ(q)pq)i θ(p)θ(q)p. Then θ(q)p =
θ(p)θ(q), which implies that either p = λ, q 6= λ, or that q = λ, p 6= λ, p = θ(p), or
that p, q ∈ Σ+ and θ(q) θ-commutes with p. In all three cases we reach contradictions
similar to those of Case (2’(a)).
Since the two possibilities where u ends inside of x, u = αx1 , v = x2 β and β 6= λ,
or u ends inside of θ(x), u = αθ(x1 ), v = θ(x2 )β and β 6= λ both led to contradictions,
if these kinds of decompositions occur, we can only have β = λ.
Thus either u ends inside of x, with u = αx1 and v = x2 , i.e., uv = αx1 x2 = αx
or, alternatively, u ends inside of θ(x), with u = αθ(x1 ) and v = θ(x2 ), and uv =
αθ(x1 )θ(x2 ) = αθ(x).
In the first situation, if α = λ then uv = x1 x2 and vu = x2 x1 which, along with the
fact ρθ (uv) = ρθ (vu) = x, imply x1 x2 = x2 x1 . This further implies x1 , x2 ∈ p+ for
p ∈ Σ+ , i.e., x ∈ / Qθ , a contradiction. If α 6= λ, that is, α = xγ with γ ∈ {x, θ(x)}∗ ,
then uv = αx = x1 x2 γx, vu = x2 x1 x2 γx1 . Since ρθ (uv) = ρθ (vu) = x, x1 x2 = x2 x1
which implies x1 , x2 ∈ p+ for p ∈ Σ+ which further implies x ∈ / Qθ , a contradiction.
Relative Watson-Crick primitivity of words 9
In the second situation, if α = λ then uv = θ(x1 )θ(x2 ) and vu = θ(x2 )θ(x1 ) which,
along with the fact that ρθ (uv) = ρθ (vu) = x implies θ(x1 x2 ) = θ(x2 x1 ) = x1 x2 .
This further implies x1 x2 = x2 x1 which leads to x1 , x2 ∈ p+ for p ∈ Σ+ , i.e., x ∈/ Qθ ,
a contradiction. If α 6= λ, i.e., α = xγ with γ ∈ {x, θ(x)}∗ , then uv = αθ(x) =
x1 x2 γθ(x), vu = θ(x2 )x1 x2 γθ(x1 ). Since ρθ (uv) = ρθ (vu) = x, x1 x2 = θ(x2 )x1 , that
is, x1 θ-commutes with x2 . We arrive at a similar contradiction as that of Case (2).
Thus all possible cases where u, v 6∈ x{x, θ(x)}∗ led to contradictions. The only
remaining possibility is u, v ∈ x{x, θ(x)}∗ , which implies that ρθ (u) = ρθ (v) = x.
The converse is straightforward.
Recall that, given a word w ∈ Σ+ , w = a1 a2 . . . ak , k ≥ 1, and numbers 1 ≤ i ≤
j ≤ k, we denote by w[i..j] the subword of w that starts with ai and ends with aj ,
that is, the word ai . . . aj . The following result [5] provides an algorithm to determine
whether or not a given word w ∈ Σ+ is θ-primitive, and finds all the θ-primitive
factors (subwords) of w.
Case (4): There exist triplets (1, |u|, k) and (1, |v|, k 0 ) such that u[1..|u|] ∈ {t, θ(t)}k
0
and v[1..|v|] ∈ {t0 , θ(t0 )}k . Let u = t1 t2 · · · tk and v = t01 t02 · · · t0k0 where tl ∈ {t, θ(t)}
for 1 ≤ l ≤ k, and tm ∈ {t0 , θ(t0 )} for 1 ≤ m ≤ k 0 . If t1 6= t01 then u ⊥θ v, else u 6⊥θ v.
0
In the previous section, we have seen various properties of two binary relations on
words : relative θ-primitivity, u ⊥θ v, vs. not having common θ-primitive root u 6⊥θ v.
Given two relatively θ-primitive words u, v ∈ Σ+ , we now investigate the relationship
between the words u, v, and the result of the application of a word operation to u and
v. In this setting we consider various binary word operations such as perfect shuffle,
shuffle and θ-catenation. Given u, v ∈ Σ+ , define their shuffle u v as
u v = {u1 v1 · · · uk vk | k ≥ 1, u = u1 u2 · · · uk , v = v1 v2 · · · vk , ui , vi ∈ Σ∗ for 1 ≤ i ≤ k}.
Similarly, the perfect shuffle of two words u and v of the same length, |u| = |v| = k,
k ≥ 1, is defined as
u p v = a1 b1 · · · ak bk where u = a1 a2 · · · ak , v = b1 b2 · · · bk , ai , bi ∈ Σ for 1 ≤ i ≤ k.
In the following proposition we show that there cannot exist two θ-palindromes
of equal length that are relatively θ-primitive, with the property that their perfect
shuffle is a θ-palindrome.
Proof. Assume that u p v ∈ Pθ such that |u| = |v| = m.
Case (1): m is even, i.e., m = 2k for some k ≥ 1. Let u = a1 a2 · · · ak · · · a2k ,
implying θ(u) = θ(a2k ) · · · θ(ak ) · · · θ(a2 )θ(a1 ). Since u is a θ-palindrome, we have
that u = a1 a2 · · · ak θ(ak ) · · · θ(a1 ) where ai ∈ Σ for 1 ≤ i ≤ k. Similarly, v =
b1 b2 · · · bk θ(bk ) · · · θ(b1 ) where bj ∈ Σ for 1 ≤ j ≤ k. Then,
Since u p v ∈ Pθ , θ(bk ) = θ(ak ), . . . , θ(b1 ) = θ(a1 ) which implies bi = ai for 1 ≤ i ≤ k
which further implies u = v, a contradiction since u ⊥θ v.
Case (2): m is odd, i.e., m = 2k + 1 for some k ≥ 1. Since u is a θ-
palindrome, we have that u is of the form u = a1 a2 · · · ak a0 ak+1 · · · a2k = θ(u) =
θ(a2k ) · · · θ(a0 ) · · · θ(a2 )θ(a1 ). Thus u = a1 a2 · · · ak a0 θ(ak ) · · · θ(a1 ) where ai , a0 ∈ Σ
for 1 ≤ i ≤ k and θ(a0 ) = a0 . Similarly, v = b1 b2 · · · bk b0 θ(bk ) · · · θ(b1 ). where bj , b0 ∈ Σ
for 1 ≤ j ≤ k and θ(b0 ) = b0 . Then,
Since u p v ∈ Pθ , we should have b0 = θ(a0 ) = a0 , θ(bk ) = θ(ak )...θ(b1 ) = θ(a1 ) which
imply b0 = a0 , bi = ai for 1 ≤ i ≤ k which further implies u = v, a contradiction since
u ⊥θ v.
Since both the cases lead to a contradiction, u p v ∈ / Pθ .
In Proposition 23, we will prove that, under certain conditions, if two equi-length
words u and v are relatively θ-primitive then u and any word in the shuffle (u v)
are relatively θ-primitive, and the same holds for v. For a letter a ∈ Σ we denote
by |u|a,θ(a) the number of occurrences of a’s and θ(a)’s in u (see [3]). Note that,
for a word u ∈ Σ∗ , a letter a ∈ Σ, and an (anti)morphic involution θ, we have that
|u|a,θ(a) = |θ(u)|a,θ(a) .
Proof. Assume that u 6⊥θ v, i.e., ρθ (u) = ρθ (v) = x, x ∈ Σ+ . Then u, v ∈ x{x, θ(x)}∗
and, since |u| = |v|, we have that u = xx1 x2 . . . xm−1 and v = xy1 y2 . . . ym−1 , where
m ≥ 1 and xi , yi ∈ {x, θ(x)}, 1 ≤ i ≤ m−1. For all a ∈ Σ, since |x|a,θ(a) = |θ(x)|a,θ(a) ,
we have that |u|a,θ(a) = m|x|a,θ(a) = |v|a,θ(a) . This contradicts the hypothesis that
there exists a ∈ Σ such that |u|a,θ(a) 6= |v|a,θ(a) . Thus u ⊥θ v.
Proof. Assume for the sake of contradiction that there exists w ∈ u v such that
either w 6⊥θ u or w 6⊥θ v. Without loss of generality, assume w 6⊥θ u. Then ρθ (w) =
ρθ (u) = x, x ∈ Σ+ . Since ρθ (u) = x, we have that u = xx1 x2 · · · xm−1 for some
m ≥ 1 and xi ∈ {x, θ(x)} for 1 ≤ i ≤ m − 1. Since ρθ (w) = x, we have that
w = xx1 x2 · · · xk−1 for some k ≥ 1 and xj ∈ {x, θ(x)} for 1 ≤ j ≤ k − 1. Note that,
since w ∈ u v and |u| = |v|, we have that k = 2m.
Let a be an arbitrary letter in Σ. We have that |u|a,θ(a) = m|x|a,θ(a) , and |w|a,θ(a) =
2m|x|a,θ(a) . Moreover, since w ∈ u v we have that |w|a,θ(a) = |u|a,θ(a) + |v|a,θ(a) .
This, further implies that |v|a,θ(a) = |w|a,θ(a) − |u|a,θ(a) = 2m|x|a,θ(a) − m|x|a,θ(a) =
m|x|a,θ(a) = |u|a,θ(a) , for all a ∈ Σ. This contradicts the hypothesis that there exists
a letter a ∈ Σ such that |u|a,θ(a) 6= |v|a,θ(a) . Hence (u v) ⊥θ u.
u v = {uv, uθ(v)}.
Example 25. Let Σ = {a, b, c, d} such that θ(a) = b, θ(c) = d and vice versa for an
antimorphic involution θ. Then, for u = ac, v = db and u v = {acdb, acac}. We
have that u ⊥θ v but ρθ (acdb) = ρθ (acac) = ac and thus acdb 6⊥θ u and acac 6⊥θ u.
We observe in the above example that u 6⊥θ θ(v), and hence for uv to be relatively
θ-primitive with u, it is necessary to have the condition that u ⊥ {v, θ(v)}.
Proof. Assume that for some x ∈ u v, x 6⊥θ u, i.e., ρθ (x) = ρθ (u) = y. This
implies that either uv ∈ y{y, θ(y)}∗ or uθ(v) ∈ y{y, θ(y)}∗ . Since u ∈ y{y, θ(y)}∗ ,
this further implies that either v ∈ {y, θ(y)}∗ or θ(v) ∈ {y, θ(y)}∗ . Thus, either v 6⊥θ u
or θ(v) 6⊥θ u, a contradiction. Hence, for all x ∈ u v, we have that x ⊥θ u.
Proof. Assume that for some x ∈ u v, x 6⊥θ v, i.e., ρθ (x) = ρθ (v) = y. Then either
uv ∈ y{y, θ(y)}∗ or uθ(v) ∈ y{y, θ(y)}∗ which further implies that u ∈ y{y, θ(y)}∗ .
Thus u 6⊥θ v, a contradiction. Hence, for all x ∈ u v, we have that x ⊥θ v.
For a given (anti)morphic involution θ and a word x ∈ Σ+ , consider the language
of all words w that are relatively θ-primitive with x,
Lθ,λ (x) = {w ∈ Σ∗ |w ⊥θ x}
as well as its complement, the language of words that have a common θ-primitive root
with x
Lθ (x) = {w ∈ Σ∗ |w 6⊥θ x}.
Note that, for a given(anti)morphic involution θ and x ∈ Σ+ we have that Lθ,λ (x)∪
Lθ (x) = Σ∗ . In addition, the language Lθ (x) can be described using the θ-catenation
power. Indeed, define [7] the iterated θ-catenation i , for i ≥ 0, as L1 0 L2 = L1
and L1 i L2 = (L1 i−1 L2 ) L2 . Then the i-th -power of a non-empty language
L is defined as L(0) = λ and L(i) = L i−1 L for i ≥ 1. We can now describe the
language of all words that have a common θ-primitive root with x ∈ Σ+ as
Finally, note that there does not exist any language L that is independent with
respect to ⊥θ , and that the language of all θ-primitive words, Qθ , is independent with
respect to 6⊥θ .
References