You are on page 1of 11

Theoretical Computer Science 578 (2015) 212

Contents lists available at ScienceDirect

Theoretical Computer Science


www.elsevier.com/locate/tcs

On the state complexity of partial word DFAs


Eric Balkanski a , F. Blanchet-Sadri b, , Matthew Kilgore c , B.J. Wyatt b
a
b
c

Department of Mathematical Sciences, Carnegie Mellon University, Wean Hall 6113, Pittsburgh, PA 15213, USA
Department of Computer Science, University of North Carolina, P.O. Box 26170, Greensboro, NC 27402-6170, USA
Department of Mathematics, Lehigh University, 14 East Packer Avenue, Bethlehem, PA 18015, USA

a r t i c l e

i n f o

Article history:
Received 28 October 2013
Received in revised form 19 May 2014
Accepted 14 January 2015
Available online 19 January 2015
Keywords:
Automata and formal languages
State complexity
Regular languages
Partial languages
Partial words
Deterministic nite automata
Non-deterministic nite automata

a b s t r a c t
Recently, Dassow et al. connected partial words and regular languages. Partial words are
sequences in which some positions may be undened, represented with a hole symbol .
If we restrict what the symbol  can represent, we can use partial words to compress
the representation of regular languages. Doing so allows the creation of so-called -DFAs,
smaller than the DFAs recognizing the original language L, which recognize the compressed
language. However, the -DFAs may be larger than the NFAs recognizing L. In this paper,
we investigate a question of Dassow et al. as to how these sizes are related.
2015 Elsevier B.V. All rights reserved.

1. Introduction
The study of regular languages dates back to McCulloch and Pitts investigation of neuron nets (1943) and has been extensively developing since (for a survey see, e.g., [2]). Regular languages can be represented by deterministic nite automata,
DFAs, by non-deterministic nite automata, NFAs, and by regular expressions. They have found a number of important applications such as compiler design. There are well-known algorithms to convert a given NFA to an equivalent DFA and to
minimize a given DFA, i.e., nd an equivalent DFA with as few states as possible (see, e.g., [3]). It turns out that there are
languages accepted by DFAs that have 2n states while their equivalent NFAs only have n states, these DFAs with 2n states
being optimal, i.e., minimal.
Let be a nite alphabet of letters. A (full) word over is a sequence of letters from . We denote by the set of
all words over , the free monoid generated by under the concatenation of words where the empty word serves as
the identity. A language L over is a subset of . It is regular if it is recognized by a DFA or an NFA. A DFA is a 5-tuple
M = ( Q , , , q0 , F ), where Q is a set of states, : Q Q is the transition function, q0 Q is the start state, and
F Q is the set of nal or accept states. In an NFA, maps Q to 2 Q . We call | Q | the state complexity of the automaton.
Recently, Dassow et al. [4] connected regular languages and partial words. Partial words rst appeared in 1974 and are
also known under the name of strings with dont cares [5]. In 1999, Berstel and Boasson [6] initiated their combinatorics
under the name of partial words. Since then, many combinatorial properties and algorithms have been developed (see,

This material is based upon work supported by the National Science Foundation under Grant No. DMS-1060775. Part of this paper was presented at
CIAA13 [1]. We thank the referees of a preliminary version of this paper for their very valuable comments and suggestions.
Corresponding author.
E-mail address: blanchet@uncg.edu (F. Blanchet-Sadri).

http://dx.doi.org/10.1016/j.tcs.2015.01.021
0304-3975/ 2015 Elsevier B.V. All rights reserved.

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

e.g., [7]). One of Dassow et al.s motivations was to compress DFAs into smaller machines, called partial word DFAs or
-DFAs, which may have transitions labelled by , a dont care symbol, that can replace letters of the alphabet. This feature
provides a type of restricted nondeterminism that allows -DFAs to fall between NFAs and DFAs in regards to state complexity. In this paper, we solve several problems raised by Dassow et al. providing a better understanding of the structure
of the -DFAs, their relation to DFAs and NFAs, and the eciency of using them to accept regular languages.
More precisely, setting  = {}, where 
/ represents an undened position, or a hole, and matches every letter
of , a partial word over is a sequence of symbols from  (a full word is a partial word without holes). For example,
abbbc ba is a partial word with two holes over {a, b, c }. Denoting the set of all partial words over by  , a partial
language L  over is a subset of  . It is regular if it is regular when being considered over  . In other words, we
dene languages of partial words, or partial languages, by treating  as a letter. A regular partial language over , subset
of  , can be recognized by a -DFA, i.e., a DFA of the form ( Q ,  , , q0 , F ). Partial languages over , subsets of  ,

can be transformed to languages over , subsets of , by using -substitutions over . A -substitution :  2

satises (a) = {a} for all a , () , and (uv ) = (u ) ( v ) for u , v  . As a result, is fully dened by (),
e.g., if () = {a, b} and L  = {b, c } then ( L  ) = {ab, bb, ac , bc }. If we consider this process in reverse, we can compress
languages into partial languages.
Given a regular language L, L , the minimal state complexity among all -DFAs that accept partial languages L  ,
L   , such that ( L  ) = L for some -substitution , is referred to as min-DFA ( L ). We consider the following question
from Dassow et al. [4]: Is there a regular language L such that min-DFA ( L ) is (strictly) less than minDFA ( L ), the
minimal state complexity of a DFA accepting L? Reference [4, Theorem 4] states that for every regular language L, we
have minDFA ( L ) min-DFA ( L ) minNFA ( L ), where minNFA ( L ) denotes the minimal state complexity of an NFA accepting L,
and there exist regular languages L such that minDFA ( L ) > min-DFA ( L ) > minNFA ( L ). On the other hand, [4, Theorem 5]
states that if n 3 is an integer, regular languages L and L  exist such that min-DFA ( L ) n + 1, minDFA ( L ) = 2n 2n2 ,
minNFA ( L  ) 2n + 1, and min-DFA ( L  ) 2n 2n2 . This has been the rst step towards analyzing the sets:

 

D n = m  there exists L such that min ( L ) = n and min( L ) = m ,


-DFA
DFA



N n = m  there exists L such that min ( L ) = n and min( L ) = m .


-DFA
NFA

Hence, D n describes the increase in state complexity when using a DFA instead of a -DFA, while N n describes the decrease
in state complexity when using an NFA instead of a -DFA.
Our paper, whose focus is the analysis of D n and N n , is organized as follows. We obtain in Section 2 values belonging
to D n by looking at specic types of regular languages, followed by values belonging to N n in Section 3. We show, in
particular, that 2n 1 is the least upper bound for values in D n when we consider languages with some arbitrarily long
words. Due to the nature of NFAs, generating a sequence of minimal NFAs from a -DFA is dicult. However, in the
case minDFA ( L ) > min-DFA ( L ) = minNFA ( L ), we show how to use concatenation of languages to create an L  with systematic
differences between min-DFA ( L  ) and minNFA ( L  ). We also develop a way of applying integer partitions to obtain such values.
We conclude with some remarks in Section 4.
2. Constructs for D n
This section provides some values for D n by analyzing several classes of regular languages. In the description of the
transition function of our DFAs and -DFAs, all the transitions lead to the error state (a sink non-nal state) unless otherwise
stated. Also, in our gures, the error state and transitions leading to it have been removed for clarity. We will often refer to
the following algorithm.
Given a -DFA M  = ( Q  ,  ,  , q0 , F  ) and a -substitution , Algorithm 1 gives a minimal DFA that accepts ( L ( M  )):

Build an NFA N = ( Q  , , , q0 , F  ) that accepts ( L ( M  )), where (q, a) = {  (q, a)} if a \ () and (q, a) =
{  (q, a),  (q, )} if a ().
Convert N to an equivalent minimal DFA.
2.1. Languages of words of equal length
First, we look at languages of words of equal length. We give three constructs. Our rst construct is illustrated in Fig. 1.
1 2
Theorem 1. For n 1, n
+ n3 1 + 2 + (n 1) mod 3 D n .
3
1
Proof. Let d = n
and r = (n 1) mod 3. We dene the DFA M = ( Q , , , q0 , F ) as follows:
3

Q = {(0, 0)} Q 1 Q 2 Q 3 , where q0 = (0, 0), Q 1 = {(1, i ) | 0 i < d}, Q 2 = {(2, i ) | 1 i < d2 }, and Q 3 = {(i , 0) | 3
i r + 4}, F = {(r + 3, 0)}, and (r + 4, 0) is the error state;
= {a0 , . . . , ad1 } {b1 , . . . , bd1 } {bd+1 , . . . , b2d1 } {c };

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

Fig. 1. DFA M (left) and -DFA M  (right) from Theorem 1, n = 11. The error states are omitted.

Fig. 2. DFA M (left) and -DFA M  (right) from Theorem 2, n = 11, x = 4. The error states are omitted.

is dened as follows:
((0, 0), ai ) = (1, i ) for all ai ,
((1, i ), a j ) = (2, id + j ) if (1, i ), (2, id + j ) Q and a j ,
((2, i ), b j ) = (3, 0) if (2, i ) Q , b j , j = di or j = (i mod d) + d,
((i , 0), c ) = (i + 1, 0) for 3 i 2 + r.
Observe that L = L ( M ) = {ai a j bk c r | ai , a j , bk ; k = i or k = d + j } and that M contains | Q 1 | + | Q 2 | + | Q 3 | + 1 =
d + d + r + 2 states. Furthermore, each state from Q 2 is reached after M reads a unique two-letter sequence, and (3, 0)
is reachable from each state of Q 2 for a unique subset of letters, so M is minimal. We can build a -DFA M  such that
( L ( M  )) = L for () = {ai | ai }. Let M  = ( Q  ,  ,  , q0 , F  ), where:
2

Q  = {(0, 0)} Q 1 Q 2 Q 3 , where Q 2 = {(2, i ) | 1 i < 2d, i = d}, q0 = (0, 0), F  = {(r + 3, 0)}, and the error state is
(r + 4, 0);
 is dened as follows:
 ((0, 0), ) = (1, 0) and  ((0, 0), ai ) = (1, i ) for all ai , i = 0,
 ((1, 0), ai ) = (2, d + i ) and  ((1, i ), ) = (2, i ) for 1 i < d,
 ((2, i ), b i ) = (3, 0) for all (2, i ) Q  ,
 ((i , 0), c ) = (i + 1, 0) for 3 i 2 + r.
Then L ( M  ) = {ai b j c r | ai , b j ; j = d + i } {ai b j c r | ai , b j ; j = i }. Furthermore, M  is minimal with 3d + r + 1 = n
states over all -substitutions since each pair of states (1, i ), (2, j ) is necessary. If i = 0, the pair is needed for a j d b j ;
otherwise, it is needed for ai b j . 2
Our next construct is illustrated in Fig. 2.
Theorem 2. For n 1, if x =

1+8(n1)1

2

Proof. We start by writing n as n = r +


the DFA dened as follows:

then 2x + n 1 x(x2+1) D n .

x

i =1 i

such that 1 r x + 1 (from [8], x is as stated). Let M = ( Q , , , q0 , F ) be

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

Fig. 3. DFA M (top) and -DFA M  (bottom) from Theorem 3, k = 3, l = 1, r = 0. The error states are omitted.

{(i , j ) | 0 i < x, 0 j < 2i , (i , j ) = (x 1, 0)} {(i , 0) | x i x + r } = Q , q0 = (0, 0), F = {(x + r 1, 0)}, and (x + r , 0)
is the error state;

= {a0 , a1 , c } {bi | 1 i < x};


is dened as follows:
((i , j ), ak ) = (i + 1, 2 j + k) for all (i , j ), (i + 1, 2 j + k) Q , ak , i = x 1, with the exception of ((x 2, 0), a0 ) =
(x + r , 0),
((x 1, i ), b j ) = (x, 0) for all (x 1, i ) Q , b j where the jth digit from the right in the binary representation
of i is a 1,
((i , 0), c ) = (i + 1, 0) for x i < x + r.
Each word accepted by M can be written in the form w = ub i c r 1 , where u is a word of length x 1 over {a0 , a1 } except for
a0x1 , and b i belongs to some subset of unique for each u. This implies that M is minimal with 2x + n 1 x(x2+1) states.
We can build the minimal equivalent -DFA for () = {a0 , a1 }, giving M  = ( Q  ,  ,  , q0 , F  ) with n states as follows:

{(i , j ) | 0 i < x, 0 j i , (i , j ) = (x 1, 0)} {(i , 0) | x i x + r } = Q  , q0 = (0, 0), F  = {(x + r 1, 0)}, and (x + r , 0)
is the error state;

 is dened as follows:
 ((i , 0), a1 ) = (i + 1, i + 1) for 0 i < x 1,
 ((i , j ), ) = (i + 1, j ) for all (i , j ) Q  \{(x 2, 0)} where i < x 1,
 ((x 1, i ), b xi ) = (x, 0) for 1 i < x,
 ((x + i , 0), c ) = (x + i + 1, 0) for 0 i < r 1.
Observe that L ( M  ) = {xi 1 a1 i 1 b i c r 1 | 1 i < x}, so ( L ( M  )) = L ( M ). Each accepted word consists of a unique prex
r 1

of length x 1 paired with a unique b i
,xand r states are needed for the sux c , which implies that M is minimal
over all -substitutions. Note that | Q  | = ( i =1 i ) + r = n. 2
The two previous constructs both used an alphabet of variable size. Our next construct restricts this to a constant k. It is
illustrated in Fig. 3.
Theorem 3. For k > 1 and l, r 0, let n =

k(k+2l+3)
2

+ r + 2. Then

2k+1 + l 2k 1 + r D n .
Proof. We start by dening M = ( Q , , , q0 , F ) as follows:

Q = Q 1 Q 2 Q 3 , where Q 1 = {(i , j ) | 0 i k and 0 j < 2i }\{(k, 0)},


Q 2 = {(i , j ) | k < i k + l and 1 j < 2k },
Q 3 = {(i , 0) | k + l < i k + l + r + 2}, q0 = (0, 0), F = {(k + l + r + 1, 0)}, and (k + l + r + 2, 0) is the error state;
= {a0 , . . . , ak1 };

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

Fig. 4. DFA M (top) and -DFA M  (bottom) from Theorem 4, n = 7 and m = 15 (l = 3, r = 1). The error states are omitted.

is dened as follows:
((i , j ), a0 ) = (i + 1, 2 j ) for all (i , j ) Q , 0 i < k, except (k 1, 0),
((i , j ), a1 ) = (i + 1, 2 j + 1) for all (i , j ) Q , 0 i < k,
((i , j ), a0 ) = (i + 1, j ) for all (i , j ) Q , k i < k + l,
((k + l, i ), a j 1 ) = (k + l + 1, 0) for all (k + l, i ) Q and a j 1 where the jth digit from the right in the binary
representation of i is 1,
((i , 0), a0 ) = (i + 1, 0) for k + l + 1 i k + l + r.
Each word accepted by M can be written as xyai z, where x is a word of length k over {a0 , a1 } with at least one a1 , y = al0 ,
z = ar0 , and rev(x)[i ] = a1 , where rev(x) denotes the reversal of x. Each x corresponds to a different non-empty subset of
representing which letters ai are allowed in position k + l (numbering of positions starts at 0), meaning that each prex xy
must be represented by a unique state in M, which accounts for the states in Q 1 and Q 2 . The set Q 3 contains (k + l + 1, 0),
which M reaches after reading any prex xyai of an accepted word, along with the r states needed to spell z and the error
state. Thus, M is minimal with 2k+1 + l(2k 1) + r states.
We now dene a -DFA M  = ( Q  ,  ,  , q0 , F  ) as follows:

Q  = Q 1 Q 2 Q 3 where Q 1 = {(i , j ) | 0 i k and 0 j i }\{(k, 0)},


Q 2 = {(i , j ) | k < i k + l and 0 < j k},
Q 3 = {(i , 0) | k + l < i k + l + r + 2}, q0 = (0, 0), F  = {(k + l + r + 1, 0)}, and (k + l + r + 2, 0) is the error state;
 is dened as follows:
 ((i , j ), ) = (i + 1, j ) for all (i , j ) Q  , 0 i < k, except for (i , j ) = (k 1, 0),
 ((i , 0), a1 ) = (i + 1, i + 1) for all (i , j ) Q  , 0 i < k,
 ((i , j ), a0 ) = (i + 1, j ) for all (i , j ) Q  , k i < k + l,
 ((k + l, i ), aki ) = (k + l + 1, 0) for all (k + l, i ) Q  and aki ,
 ((i , 0), a0 ) = (i + 1, 0) for k + l + 1 i k + l + r.
We have ( L ( M  )) = L for () = {a0 , a1 }. State (k + l + 1, 0) is reachable from each state (k + l, i ) through a single, uniquely
labelled transition. Thus, M  is minimal with n states. 2
2.2. Languages of words of bounded length
Next, we look at languages of words of bounded length. The following theorem is illustrated in Fig. 4.
Theorem 4. For n 3, [n, n +
Proof. Write m = n + r +

(n2)(n3)

n3
i =l

] Dn .

i for the lowest value of l 1 such that r 0. Let M = ( Q , , , q0 , F ) be dened as follows:

= {a0 , ar } {ai | l i n 3};


Q = {(i , 0) | 0 i < n} {(i , j ) | a j and 1 i j }, q0 = (0, 0), F = {(n 2, 0)} {(i , i ) | i = 0, (i , i ) Q }, and (n 1, 0)
is the error state;

is dened by ((0, 0), ai ) = (1, i ) for all ai where i > 0, ((i , j ), a0 ) = (i + 1, j ) for all (i , j ) Q , i = j, and
((i , i ), a0 ) = (i + 1, 0) for all (i , i ) Q .

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

Fig. 5. -DFA R 7 (top if the dashed edges are seen as solid) and minimal DFA for ( L 7 ) (bottom if the dotted element is ignored and the dashed edges are
seen as solid) where 0 = a, 1 = b, 2 = c , 3 = d and () = {a0 , a2 , a3 , a4 , a5 , b3 , b4 , b5 , c 4 , c 5 , d5 }. The error states are omitted.

Then L ( M ) = {ai an03 | ai } {ai a0i 1 | ai , i = 0}. For each ai , i = 0, M requires i states. These are added to the

error state and the n 1 states needed for an02 . Thus, M is minimal with m states. Let M  = ( Q  ,  ,  , q0 , F  ), where
Q  = {i | 0 i < n}, q0 = 0, F  = {n 2}, and n 1 is the error state;  is dened by  (0, ) = 1,  (0, ai ) = n 1 i for all
ai , i > 0, and  (i , a0 ) = i + 1 for 1 i < n 1. For () = , we have ( L ( M  )) = L ( M ). Furthermore, M  needs n 1
states to accept an03 L ( M  ), so M  is minimal with n states. 2
Theorem 4 gives elements of D n close to its lower bound. To nd an upper bound, we look at a specic class of machines.
Let n 2 and let

R n = {0, . . . , n 1}, {a0 } (i ) j  2 i + 2 j n 2  ,  , 0, {n 2}

(1)

be the -DFA where n 1 is the error state, and  is dened by  (i , ) = i + 1 for 0 i < n 2 and  (i , (i ) j ) = j
for all (i ) j . Fig. 5 gives an example when n = 7. Set L n = ( L ( R n )), where is the -substitution that maps  to the
alphabet. Note that R n is minimal for L ( R n ), since we need at least n 1 states to accept words of length n 2 without
accepting longer strings. Furthermore, R n is minimal for , as each letter (i ) j encodes a transition between a unique pair
of states (i , j ). This also implies that R n is minimal for any -substitution. The next two theorems look at the minimal DFA
that accepts L n . We refer the reader to Fig. 5 to visualize the ideas behind the proofs.
Referring to Fig. 5, in the DFA, each explicitly labelled transition is for the indicated letters. From each state, there
is one transition that is not labelled this represents the transition for each letter not explicitly labelled in a different
transition from that state. (For example, from state 0, a3 transitions to {1, 3}, a2 transitions to {1, 2}, a4 transitions to {1, 4},
a5 transitions to {1, 5}, and all other letters a0 , b3 , b4 , b5 , c 4 , c 5 , d5 transition to {1}.) We introduce a new letter, e, into
the alphabet and add a new state, {2, 3, 4, 5}, along with a transition from {1, 3} to {2, 3, 4, 5} for e. We want to alter the
-DFA to accommodate this. So we add a transition for e from 1 to 3 and from 3 to 5 (represented by dashed edges). All
other states transition to the error state for e. Now consider the string a3 e. We get four strings that correspond to some
partial word that produces a3 e after substitution: a3 e, a3 , e, and . When the -DFA reads the rst, it halts in state 5;
on the second, it halts in 4; on the third, it halts in 3; and for the fourth, it halts in 2, which matches the added state
{2, 3, 4, 5}. Finally, we need to consider the effect of adding e and the described transitions to the -DFA does it change
the corresponding minimal DFA in other ways? To show that it does not, all transitions with dashed edges in the DFA
represent the transitions for e, e.g., from state {2, 3}, an e transitions to {3, 4, 5}.
Theorem 5. Let Fib be the Fibonacci sequence dened by Fib(1) = Fib(2) = 1 and for n 2, Fib(n + 1) = Fib(n) + Fib(n 1). Then
for n 1, Fib(n + 1) D n .

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

Proof. For n 2, applying Algorithm 1, convert M  = R n to a minimal DFA M = ( Q , , , q0 , F ) that accepts L n , where
Q 2{0,...,n1} . For each state {i } Q for 0 i n 2, M requires additional states to represent each possible subset of one
or more states of {i + 1, . . . , n 2} that M  could reach in i transitions. Thus M is minimal with number of states

1+

,n2i }
n
2 min{i

n2i

i =0

j =0


= Fib(n + 1),

where the summand 1 refers to the error state and where the inside sum refers to the number of states with minimal
element i. 2
We can use the machine M from the proof of Theorem 5 to construct a machine that gives the least upper bound for
D n for languages of words of bounded length.
Theorem 6. For n 3,
n
1

n 1 log2 i 

i =0


Dn.

Proof. Let M and M  be as in Theorem 5 and let () = = {a0 } {(i ) j | 2 i + 2 j n 2}. For each state P
Q \{{n 2}, {n 1}}, consider the set P  Q of possible states to which M transitions when reading one symbol. For each
state i in P , if i + 1 n 1 then i + 1 is in each state p P  , as M must track, for each symbol, the state M  would reach
from a current state on reading . At most, for each i in P , i n 4, p may contain one additional state j, i + 2 j n 2,
to track non- transitions in M  that do not end in the error state. Thus for each i, 1 i n 2, M  needs at maximum
one state corresponding to each non-empty subset of {i , . . . , n 2}. Counting the number of resulting states gives our result
as the upper bound for m in this case.
For n 7, M contains fewer states than our bound. However, we can modify M to produce a machine M 1 that reaches
this bound. Let N = ( Q  ,  ,  , q0 , F  ) be the NFA accepting L ( M ) such that  (q, a) = {(q, a)} for each (q, a). Set Q  =
Q S 2 S n4 , where S i is the set of all subsets of {0, . . . , n 2} of size no greater than 2i containing i as its lowest
member. Then for each state P Q  that is unreachable, we add a letter
/ to our alphabet and dene  for as
follows. Let i be the lowest state in P . We select a state P 1 Q  such that P 1 is a subset of P of size | P | 1 that contains i.
(If no such P 1 yet exists, we move to a different P ; eventually, such a P 1 is generated with this method.) We then select a
state P 2 Q  such that there is a transition  ( P 2 , a) = P 1 for some a , a = a0 . We then dene  ( P 2 , ) = P .
We convert N into a minimal equivalent DFA M 1 = ( Q 1 , 1 , 1 , {q0 }, F 1 ), which has the same number of states as our
bound. Then for () = 1 , we derive a minimal -DFA M 1 so that L ( M 1 ) = ( L ( M 1 )). Note that M 1 and M  have the same
number of states, and the transition functions are identical for all states and symbols common to both machines. For each
added letter , all but two transitions lead to the error state. 2
Our next result restricts the alphabet size to 2.
Theorem 7. For n 1,

n2 ( n2 + 1) + n2 1 ( n2 1 + 1)
2

+ 1 Dn.

Proof. Let M  = ( Q  ,  ,  , q0 , F  ) be the -DFA, where Q  = {0, . . . , n 1}, = {a, b}, q0 = 0, F  = {n 2}, n 1 is the
error state, and  is dened by  (i , ) = i + 1 for i < n 1, and  (i , b) = i + 2 for i < n 2. For a word w over , let | w |a
and | w |b be the number of occurrences in w of a and b, respectively. Observe that ( L ( M  )) = { w | n 2 | w |b
| w |a + | w |b n 2} for () = , and that M  is minimal, as an2 ( L ( M  )), but ai
/ ( L ( M  )) for i > n 2. Next, let


M be the minimal DFA constructed for ( L ( M )) using Algorithm 1. Then M has a state corresponding to {i } 2 Q for each
n2

0 i n 2 to accept a
. For each b read by M from a state corresponding to a subset {i } of Q where 0 i n 4,
M has a state corresponding to {i + 1, i + 2} to represent the states in Q  that M  reaches from i after reading either
a  or a b. Continuing this way, M has a state corresponding to each subset {i + j , i + j + 1, . . . , i + 2 j } of Q  , where
0 i n 2, 0 j n22i . Thus we can calculate the number of non-error states in M. If we add 1 to this number for
the error state, we get our result. 2
2.3. Languages with some arbitrarily long words
Finally, we look at languages with some arbitrarily long words.
Theorem 8. For n 1, 2n 1 is the least upper bound for m D n .

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

Proof. Let M  be a minimal -DFA with -substitution . If we convert this to a minimal DFA accepting ( L ( M  )) using
Algorithm 1, the resulting DFA has at most 2n 1 states, one for each non-empty subset of the set of states in M  . Thus an
upper bound for m D n is 2n 1.
We show that there exists a regular language L such that min-DFA ( L ) = n and minDFA ( L ) = 2n 1. Let M  =
( Q  ,  ,  , q0 , F  ) with Q  = {0, . . . , n 1}, = {a, b}, q0 = 0, F  = {n 1}, and  dened by  (i , ) = i + 1 for 0 i < n 1,
{, a};  (n 1, ) = 0 for {, a}; and  (i , b) = 0 for 0 i < n. Then M  is minimal, since n1 L ( M  ) but i / L ( M  )
for 0 i < n 1. After constructing the minimal D F A M = ( Q , , , q0 , F ) using Algorithm 1 for () = {a, b}, we claim
that all non-empty subsets of Q  are states in Q . To show this, we construct a word that ends in any non-empty subset
P of Q  . Let P = { p 0 , . . . , p x } with p 0 < < p x . We start with a p x . Then create the word w by replacing the a in each
position p x p i 1, 0 i < x, with b.
We show that w ends in state P by rst showing that for each p i P , some partial word w  exists such that w ( w  )
and M  halts in p i when reading w  . First, suppose p i = p x . Since | w | = p x , let w  =  p x . For w  , M  halts in p x . Now,
suppose p i = p x . Let w  =  p x p i 1 b p i . After reading  p x p i 1 , M  is in state p x p i 1, then in state 0 for b, and then in
state p i after reading  p i .
Now suppose a partial word w  exists such that w ( w  ) where M  halts in p for p
/ P . Suppose p > p x . Each state
i Q  is only reachable after i transitions and | w  | = p x , so M  cannot reach p after reading w  . Now suppose p < p x .
Then M  needs to be in state 0 after reading p x p symbols to end in p, so we must have w  [ p x p 1] = b. However,
w [ p x p 1] = a, a contradiction.
Furthermore, no states of Q are equivalent, as each word w ends in a unique state of Q . Therefore, M has 2n 1 states,
and 2n 1 D n . 2
To further study intervals in D n , we look at the following class of -DFAs. For n 2 and 0 r < n, let

R n,r {s1 , . . . , sk } = {0, . . . , n 1}, {a0 , a1 , . . . , ak } ,  , 0, {n 1}

(2)

be the -DFA where {s1 , . . . , sk } is a set of tuples whose rst member is a letter ai , distinct from a0 , followed by one
or more states in ascending order, and where  (q, ai ) = 0 for all (q, ai ) that occur in the same tuple,  (i , ) = i + 1 for
0 i n 2,  (n 1, ) = r, and  (q, ai ) =  (q, ) for all other (q, ai ). Since R n,r {} is minimal for any -substitution, and
since  and non- transitions from any state end in the same state, Algorithm 1 converts R n,r {} to a minimal DFA with
exactly n states. The next result looks at -DFAs of the form R n,r {(a1 , 0)}.
Theorem 9. For n 2 and 0 i < n, n + (n 1)i D n .
Proof. Let a0 = a and a1 = b, let r = n i 1, let
{a, b} ,  , 0, {n 1}), where

() = = {a, b}, and let M  be the -DFA R n,r {(b, 0)} = ({0, . . . , n 1},

0
1

a
1
2

b
0
2

1
2

..
.

..
.

..
.

..
.

n2
n1

n1
r

n1
r

n1
r

Using Algorithm 1, let M = ( Q , , , {0}, F ) be the minimal DFA accepting ( L ( M  )). For all words over of length less
than n, M must halt in some state P Q , a subset of consecutive states of {0, . . . , n 1}. Moreover, any state P Q of
consecutive states of {0, . . . , n 1}, with minimal element p, is reached by M when reading bq a p for some q 0. Also, any
accept states in Q that are subsets of {0, . . . , n 1} of size n r or greater are equivalent, as are any non-accept states that
are
n subsets of size n r or greater such that the n r greatest values in each set are identical. This implies that M requires
j =ni j states for words of length less than n.
For words of length n or greater, M may halt in a state P Q that is not a subset of consecutive states of {0, . . . , n 1},
as for some r < p < n 1, it is possible to have r , n 1 P but p
/ P . This only occurs when a transition from a state P
with n 1 P occurs, in which case, M moves to a state P  containing r, corresponding to  (n 1, ) for all  . Thus,
all states can be considered subsets of consecutive values if we consider r consecutive to
1 or, in other words, if we
ni
1
allow values from n 1 to r to wrap around to each other. This means that M requires
j =1 j states for words of length
n or greater. Therefore,

n

j =ni

j+

i 1

j =1

j = n + (n 1)i D n .

3. Constructs for N n
Let be an alphabet, and let i = {ai | a } for all integers i, i > 0. Let i : i such that a  ai , and let # j be a
symbol in no i , for all i and j. Given a language L over , the ith product of L and the ith #-product of L are, respectively,
the languages

10

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

i ( L ) =

i ( L ) = 1 ( L )

j ( L ),

j =1

{# j 1 } j ( L ).

j =2

In general, we call any construct of this form, languages over different alphabets concatenated with # symbols,
a #-concatenation. With these denitions in hand, we obtain our rst interval for N n .
1
Theorem 10. For n > 0, [n n
, n] Nn .
3

Proof. Let L = {aa, ba, b} be a language over = {a, b}. A minimal NFA recognizing i ( L ) is dened as having 2i + 1
states, q0 , . . . , q2i , with accept state q2i , start state q0 , and transition function dened by (q2 j , b j +1 ) = {q2 j +1 , q2( j +1) },
(q2 j , a j +1 ) = {q2 j+1 }, and (q2 j+1 , a j +1 ) = {q2( j +1) } for j < i. It is easy to see this is minimal: the number of states is
equal to the maximal length of the words plus one. A minimal -DFA recognizing i ( L ) is dened as having 3i + 1 states,
q0 , . . . , q3i 1 and qerr , with accept states q3i 1 and q3i 2 , start state q0 , and transition function dened as follows:

(q0 , b1 ) = q2 , (q0 , ) = q1 , and (q1 , a1 ) = q2 ;


(q3 j1 , a j +1 ) = q3 j , (q3 j1 , b j +1 ) = q3 j+1 , (q3 j , a j +1 ) = q3( j +1)1 , and (q3 j+1 , a j +1 ) = q3( j +1)1 for 0 < j < i;
(q3 j+1 , a j +2 ) = (q3( j +1)1 , a j +2 ) and (q3 j+1 , b j +2 ) = (q3( j +1)1 , b j +2 ) for 0 < j < i 1.
The -substitution corresponds to 1 = {a1 , b1 } here. It is easy to see that this is minimal.
1
Now, x n; take any i n
. We can write n = 3i + r + 1, for some r 0. Let { j }0 jr be a set of symbols not in
3
the alphabet of i ( L ). Minimal NFA and -DFA recognizing i ( L ) {0 r } can clearly be obtained by adding to each a
series of states q0 = q0 , q1 , . . . , qr , and qr +1 = q2i and qr +1 = q3i 1 respectively, with (qj , j ) = qj +1 for 0 j r. Hence,
1
for i n
, we can produce a -DFA of size n = 3i + r + 1 which reduces to an NFA of size 2i + r + 1 = n i.
3

Theorem 10 gives an interval for N n based on i ( L ) and Theorem 12 will give an interval for N n based on i ( L ),
where no -substitutions exist over multiple j s. To do this, we rst need the following lemma which establishes some
relationships between minNFA (i ( L )), min-DFA (i ( L )), minNFA ( L ), min-DFA ( L ), and minDFA ( L ).
Lemma 1. Let L , L  be languages recognized by minimal NFAs N = ( Q , , , q0 , F ) and N  = ( Q  ,  ,  , q0 , F  ), where  = .
Moreover, let #
/ ,  . Then L  = L {#} L  is recognized by the minimal NFA N  = ( Q Q  ,  ,  , q0 , F  ), where  (q, a) =
(q, a) if q Q and a ;  (q, a) =  (q, a) if q Q  and a  ;  (q, #) = {q0 } if q F ; and  (q, a) = otherwise. Consequently,
the following hold:
1. For any L, minNFA (i ( L )) = i minNFA ( L );
2. Let L 1 , . . . , L n be languages whose minimal DFAs have no error states and whose alphabets are pairwise disjoint, and without loss
of generality, let minDFA ( L 1 ) min-DFA ( L 1 ) minDFA ( L n ) min-DFA ( L n ). Then

min L 1 {#1 } L 2 {#2 } L n = 1 + min ( L 1 ) +


-DFA
-DFA

i =2

min( L i ).
DFA

This lemma allows us to obtain the following linear bound.


Theorem 11. Let L be a language whose minimal DFA has no error state. Moreover, assume min-DFA ( L ) = minNFA ( L ). Fix some n
nmin-DFA ( L )1
. Then n j (minDFA ( L ) min-DFA ( L )) 1 Nn .
and j, 0 < j
min ( L )
DFA

Proof. Since 0 < j

nmin-DFA ( L )1
,
minDFA ( L )

we can write n = 1 + min-DFA ( L ) + j minDFA ( L ) + r for some r. Then, by Lemma 1(2),


this corresponds to n = min-DFA ( j +1 ( L ) { w }), where w is a word corresponding to an r-length chain of states, as we
used in the proof of Theorem 10. We also have minNFA ( j +1 ( L ) { w }) = ( j + 1) min-DFA ( L ) + r using Lemma 1(1) and our
assumption that min-DFA ( L ) = minNFA ( L ). Alternatively,

min
NFA

j+1 ( L ) { w } = n j min( L ) min ( L ) 1.

Our result follows.

DFA

-DFA

The above linear bounds can be improved, albeit with a loss of clarity in the overall construct. Consider the interval
( y 2)( y 3)
of values obtained in Theorem 4. Fix an integer x. The minimal integer y such that x y +
is clearly nx =
2

 3+
let

8x15
, for x 4. Associate with x and nx the
2
L n,m be the language in the proof with minimal

corresponding DFAs and -DFAs used in the proof of Theorem 4, i.e.,


-DFA size n and minimal DFA size m. If we replace each -transition

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

11

Fig. 6. Integer partitions = (6, 4, 1, 1) (left) and T = (4, 2, 2, 2, 1, 1) (right).

in the minimal -DFA and remove the error state, we get a minimal NFA of size n 1 accepting L n,m (this NFA must be
minimal since the maximal length of a word in L n,m is n 2). Noting that all deterministic automata in question have error
states, we get, using Lemma 1, that

min
-DFA
min

i ( Lnx ,x ) = 1 + min ( Lnx ,x ) + (i 1) min( Lnx ,x )


-DFA

NFA

DFA

= nx + (i 1)(x 1),

i ( Lnx ,x ) = i min( Lnx ,x ) = i (nx 1).


NFA

We next prove our interval based on

i ( Lnx ,x ).

nx
Theorem 12. For n > nx 4, [n (x nx ) nx
1 1, n] N n .

Proof. For any n and xed x, write n = nx + (i 1)(x 1) + r, for some 0 r < x 1, which is realizable as a minimal -DFA
by appending to the minimal -DFA accepting i ( L nx ,x ) an arbitrary chain of states of length r, using letters not in the
alphabet of i ( L nx ,x ), similar to what we did in the proof of Theorem 10. This leads to a minimal NFA of size i (nx 1) + r,
nx
giving the lower bound n (x nx ) nx
1 1 if we solve for i. Anything in the upper bound can be obtained by decreasing
i or replacing occurrences of L nx ,x with L nx ,x j (for some j) and in turn adding additional chains of states of length r, to
maintain the size of the -DFA. 2
We can obtain even lower bounds by considering the sequence of DFAs dened in Theorem 8. Recall that for any
n 1, we have a minimal DFA, which we call M n , of size 2n 1; the equivalent minimal -DFA, M n , has size n. Applying
Algorithm 1 to M n , the resulting NFA of size n is also minimal. Let n0 n1 nk be a sequence of integers and consider

min L ( M n0 ){#1 } L ( M n1 ) {#k } L ( M nk ) = 1 + n0 +


-DFA

2ni 1 ,

(3)

i =1

where the equality comes from Lemma 1(2). Iteratively applying Lemma 1 gives

min L ( M n0 ){#1 } L ( M n1 ) {#k } L ( M nk ) =


NFA

ni .

(4)

i =0

To understand the difference between (3) and (4) in greater depth, let us view (n1 , . . . , nk ) as an integer partition, , or as
a Young Diagram and assign each cell a value (see, e.g., [9]). In this case, the ith column of has each cell valued at 2i 1 .
Transposing about y = x gives the diagram corresponding to the transpose of , T = (m1 , . . . , mn1 ), in which the ith row
has each cell valued at 2i 1 . Note that m1 = k and there are, for each i, mi terms of 2i 1 . Fig. 6 gives an example of an
n
k
k
integer partition and its transpose. Dene ( T ) = i =1 1 2i 1 mi = i =1 (2ni 1) and () = i =1 ni .
Given this, we can view the language L described in (3) and (4), i.e., L = L ( M n0 ){#1 } L ( M n1 ) {#k } L ( M nk ), as being
dened by the integer n0 and the partition of integers = (n1 , . . . , nk ) with n0 n1 . This gives

min ( L ) = 1 + n0 + T
-DFA

and

min( L ) = n0 + ().
NFA

To further understand this, we must consider the following sub-problem: Let () = n. What are the possible values of
()? To proceed here, we dene the sequence pn recursively as follows: if n = 2k 1 for some k, pn = k; otherwise,
letting n = m + (2k 1) for k maximal, pn = k + pm . This serves as the minimal bound for the possible values of ().
Theorem 13. If () = n, then () pn . Consequently, for all n and k = log2 (n + 1) , k + pn N 1+k+n .

12

E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

Proof. To show that pn is obtainable, we prove that the following partition, n , satises (n ) = n and (n ) = pn : if n =
2k 1 for some k, n = (1k ); otherwise, letting n = m + (2k 1) for k maximal, n = 2k 1 + m . Here, the sum of two
partitions is the partition obtained by adding the summands term by term; (1k ) is the k-tuple of ones. Clearly, for partitions
and  , ( +  ) = () + ( ) and ( +  ) = () + ( ). By construction, (n ) = n and (n ) = pn . To see this,
k
if n = 2k 1 for some k, (n ) = ((1k )) = ((k) T ) = 2k 1 = n and (n ) = ((1k )) = i =1 1 = k = pn . Otherwise,

 
(n ) = (2k 1 ) + (m ) = 1k + (m ) = 2k 1 + m = n,
 
(n ) = (2k 1 ) + (m ) = 1k + (m ) = k + pm = pn .

To show that pn , or n , is minimal, we can proceed inductively.


From the above, each pn is obtainable by a partition of size k, where k is the maximal integer with n 2k 1. Alternatively, k = log2 (n + 1) . Fixing n, we get k + pn N 1+k+n . 2
4. Conclusion
For languages of words of equal length, Theorem 2 gives the maximum element in D n found so far and Theorem 3
gives that maximum element when we restrict to a constant alphabet size. For languages of words of bounded length,
Theorem 6 gives the least upper bound for elements in D n based on minimal -DFAs of the form (1) and Theorem 7 gives
the maximum element found so far when we restrict to a binary alphabet. For languages with words of arbitrary length,
Theorem 8 gives the least upper bound of 2n 1 for elements in D n , bound that can be achieved over a binary alphabet.
We conjecture that for n 1, [n, 2n 1] D n . This conjecture has been veried for all 1 n 7 based on all our constructs
from Section 2.
In Section 3, via products, Theorem 10 gives an interval for N n . If we replace products with #-concatenations, Theorem 12
increases the interval further. Theorem 13 does not give an interval, but an isolated point not previously achieved. With the
exception of this latter result, all of our bounds are linear. Some of our constructs satisfy min-DFA ( L ) = minNFA ( L ), ignoring
error states. As noted earlier, this is a requirement for #-concatenations to produce meaningful bounds. Constructs without
this restriction are often too large to be useful.
References
[1] E. Balkanski, F. Blanchet-Sadri, M. Kilgore, B.J. Wyatt, Partial word DFAs, in: S. Konstantinidis (Ed.), 18th International Conference on Implementation and
Application of Automata, CIAA 2013, Halifax, Nova Scotia, Canada, in: Lecture Notes in Computer Science, vol. 7982, Springer-Verlag, Berlin, Heidelberg,
2013, pp. 3647.
[2] S. Yu, Regular languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, vol. 1, Springer-Verlag, Berlin, 1997, pp. 41110, Ch. 2.
[3] J.E. Hopcroft, R. Motwani, J.D. Ullman, Introduction to Automata Theory, Languages, and Computation International Edition, 2nd ed., AddisonWesley,
2003.
[4] J. Dassow, F. Manea, R. Mercas,
Regular languages of partial words, Inform. Sci. 268 (2014) 290304.
[5] M. Fischer, M. Paterson, String matching and other products, in: R. Karp (Ed.), 7th SIAMAMS Complexity of Computation, 1974, pp. 113125.
[6] J. Berstel, L. Boasson, Partial words and a theorem of Fine and Wilf, Theoret. Comput. Sci. 218 (1999) 135141.
[7] F. Blanchet-Sadri, Algorithmic Combinatorics on Partial Words, Chapman & Hall/CRC Press, Boca Raton, FL, 2008.
[8] N.J.A. Sloane, The On-Line Encyclopedia of Integer Sequences, http://oeis.org.
[9] G.E. Andrews, K. Eriksson, Integer Partitions, Cambridge University Press, 2004.

You might also like