Counting Markov Types

Philippe Jacquet
1
and Charles Knessl
2†
and Wojciech Szpankowski
3‡
1
INRIA, Rocquencourt, 78153 Le Chesnay Cedex, France. Philippe.Jacquet@inria.fr
2
Dept. Math. Stat. &Compt. Sci., University of Illinois at Chicago, Chicago, IL 60607-7045, U.S.A. knessl@uic.edu
3
Department of Computer Science, Purdue University, West Lafayette, IN, USA. spa@cs.purdue.edu
The method of types is one of the most popular techniques in information theory and combinatorics. Two sequences
of equal length have the same type if they have identical empirical distributions. In this paper, we focus on Markov
types, that is, sequences generated by a Markov source (of order one). We note that sequences having the same
Markov type share the same so called balanced frequency matrix that counts the number of distinct pairs of symbols.
We enumerate the number of Markov types for sequences of length n over an alphabet of size m. This turns out
to be asymptotically equivalent to the number of the balanced frequency matrices as well as with the number of
special linear diophantine equations, and also balanced directed multigraphs. For fixed m we prove that the number
of Markov types is asymptotically equal to
d(m)
n
m
2
−m
(m
2
−m)!
,
where d(m) is a constant for which we give an integral representation. For m →∞ we conclude that asymptotically
the number of types is equivalent to

2m
3m/2
e
m
2
m
2m
2
2
m
π
m/2
n
m
2
−m
provided that m = o(n
1/4
) (however, our techniques work for m = o(

n)). These findings are derived by analytical
techniques ranging from multidimensional generating functions to the saddle point method.
Keywords: Markov types, integer matrices, linear diophantine equations, multidimensional generating functions,
saddle point method
1 Introduction
The method of types is one of the most popular and useful techniques in information theory (e.g., source
and channel coding) and combinatorics. Two sequences of equal length are of the same type if they
have identical empirical distributions. The essence of the method of types was known for some time
in probability and statistical physics. But only in the 1970’s Csisz´ ar and his group developed a general

This author was supported by NSF Grant DMS-0800568 and NSA Grant H98230-08-1-0092

The work of this author was supported by NSF STC Grant CCF-0939370, NSF Grants DMS-0800568, and CCF-0830140, and
NSA Grant H98230-08-1-0092.
2 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski
method and made it a basic tool of information theory of discrete memoryless systems [5]; see also
[4; 6; 9; 11; 12; 14; 20; 21].
In this paper, we discuss Markov types. For concreteness, we first focus on Markov sources of order
one. Let A = {1, 2, . . . , m} be an m-ary alphabet, and consider a class of Markov distributions P
n
(m) on
A
n
, n ≥ 1. We often simply write P
n
:= P
n
(m). Throughout, we study sequences x
n
= x
1
. . . x
n
of length
n. For a distribution P ∈ P
n
the type class of x
n
is defined as
T
n
(x
n
) ={y
n
: P(x
n
) = P(y
n
)},
that is, all sequences having the same distribution as P(x
n
). Clearly,

x
n T
n
(x
n
) =A
n
. The number of type
classes, or equivalently, the number of distinct distributions, is therefore equal to the cardinality of P
n
(m)
that we also often simplify to |P
n
| := |P
n
(m)|. We aim at deriving an asymptotic expression for |P
n
| for
fixed or large m when n →∞. For example, for binary memoryless sources, there are |P
n
(2)| = n+1 type
classes, and a class consisting of all sequences containing k 1’s has cardinality |T
n
(x
n
)| =
_
n
k
_
.
Markov sources X
t
and corresponding Markov types are completely characterized by their transition
probabilities that we denote by P, that is P={p
i j
}
i, j∈A
is the transition matrix of p
i j
=P(X
t+1
= j|X
t
=i).
The empirical distribution of a sequence x
n
(with some fixed initial state) is
P(x
n
) =

i, j∈A
p
k
i j
i j
,
where k
i j
is the number of pairs (i j) ∈ A
2
in the sequence x
n
. For example, P(01011) = p
2
01
p
10
p
11
. In
passing we should point out that in the above p
i j
can be viewed either as “a formal indeterminate” or
better as the ratio of the number of pairs (i j) ∈ A
2
to the length of the string (empirical distribution).
For a fixed transition matrix P, the frequency matrix k ={k
i j
}
i, j∈A
defines a type class. This matrix is
an integer matrix satisfying two properties:

i, j∈A
k
i j
= n −1,
and additionally for any i ∈ A [11; 21]
m

j=1
k
i j
=
m

j=1
k
ji
±δ(x
1
= x
n
), ∀i ∈ A ,
where δ(A) = 1 when A is true and zero otherwise. The last property is called the flow conservation
property and is a consequence of the fact that the number of pairs starting with symbols i ∈ A must equal
to the number of pairs ending with symbol i ∈ A with the possible exception of the last pair. To avoid
this exception, throughout we only consider cyclic strings in which the first element x
1
follows the last x
n
.
Thus, we consider integer matrices k = [k
i j
] satisfying the following two properties

i, j∈A
k
i j
= n, (1)
m

j=1
k
i j
=
m

j=1
k
ji
, ∀ i ∈ A . (2)
Counting Markov Types 3
Such integer matrices k will be called balanced frequency matrices. We shall call (2) the “conservation
law” equation. In this paper, we enumerate the number of Markov types |P
n
| which is, for n →∞, asymp-
totically the same as the number of distinct balanced frequency matrices satisfying (1) and (2). We call
the number of these solutions F
n
(m).
Example. Let’s first consider a binary Markov source. The balanced frequency matrix is of the following
form
k =
_
k
11
k
12
k
21
k
22
_
where the nonnegative integers k
i j
satisfy
k
11
+k
12
+k
21
+k
22
= n,
k
12
= k
21
.
The above system of linear equations can be reduced to
k
11
+2k
12
+k
22
= n, (3)
and the enumeration of Markov types over a binary alphabet reduces to finding the number of nonnegative
solutions of (3). The answer is obviously
|F
n
| =

n
2


k
12
=0
(n −2k
12
+1)
=
__
n
2
_
+1
_
(n −
_
n
2
_
+1) =
n
2
4
+O(n).
Let’s now look at the m = 3 case. The balanced frequency matrix has nine elements {k
i j
}
i, j∈{1,2,3}
, and
they satisfy
k
11
+k
12
+k
13
+k
21
+k
22
+k
23
+k
31
+k
32
+k
33
= n
k
12
+k
13
= k
21
+k
31
k
12
+k
32
= k
21
+k
23
k
13
+k
23
= k
31
+k
32
.
How many nonnegative integer solutions does the above system of linear equations have? We shall show
that it is asymptotically
n
6
12·6!
.
Our goal is to enumerate the number of Markov classes, that is, to find (asymptotically) the cardinality
of |P
n
|. Our previous example demonstrated that this number is asymptotically equivalent to the number of
nonnegative integer solutions to the system of linear equations (1)-(2). Such an enumeration, for a general
class of system of homogeneous diophantine equations, was investigated in Chap. 4.6 of Stanley’s book
[16] (cf. also [10]). Stanley developed a general theory to construct the associated generating function.
However, ultimately only the denominator of this generating function is given in a semi-explicit form in
[16], thus allowing the author to derive the growth rate of the number of integer solutions.
4 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski
In this paper, we propose an approach based on previous work of Jacquet and Szpankowski [11] where
analytic techniques such as multidimensional generating functions and the saddle point method were used.
This allows us to derive precise asymptotic results. In particular, for fixed m we prove that the number of
Markov types is asymptotically equal to
|P
n
| ∼d(m)
n
m
2
−m
(m
2
−m)!
, n →∞,
where d(m) is a constant for which we only found an integral representation.
§
For large m → ∞ with
m
4
= o(n) we find that asymptotically the number of types is
|P
n
| ∼

2m
3m/2
e
m
2
m
2m
2
2
m
π
m/2
n
m
2
−m
.
However, our techniques also allow us to derive asymptotics for m
2
= o(n). In passing we observe that
the number of Markov types are often needed for minimax redundancy evaluation [1; 2; 11; 13; 15; 19].
Markov types were studied in a series of papers; see [11; 12; 20; 21]. However, existing literature
mostly concentrates on finding the cardinality of a Markov type class, that is, |T (x
n
)| with an exception
of Martin et al. [12]. In particular, Whittle [21] already in 1955 computed |T (x
n
)| for Markov chains
of order one Regarding the number of types, it was known for a long while [4; 5; 6] that they grow
polynomially, but only in [20] Weinberger et al. mentioned (without proof) that |P
n
| = Θ(n
m
2
−m
). This
was recently rigorously proved by Martin et al. in [12] for tree sources (that include Markov sources) for
fixed m. However, the constant was never identified. We accomplish it here, as well as present asymptotic
results for large m.
The paper is organized as follows. In Section 2 we formulate precisely our problem and present our
main results for fixed and large m. The proofs are given in Section 3.
2 Main Results
In this section we present our main results and some of their consequences. We start with some notation.
Throughout the paper, we let F be the set of all integer matrices k satisfying the conservation law equation
(2), that is,
m

j=1
k
i j
=
m

j=1
k
ji
, ∀ i ∈ A .
For a given n, we let F
n
be a subset of F consisting of matrices k such that the balance equations (1) and
(2) hold.
We first make some general observations about generating functions over matrices, and summarize
some results obtained in [11]. In general, let g
k
be a sequence of scalars indexed by matrices k and define
the generating function
G(z) =

k
g
k
z
k
§
It is a simple exercise to extend these results to r order Markov chains. Indeed, one needs to replace m by m
r
.
Counting Markov Types 5
where the summation is over all integer matrices and z = {z
i j
}
i, j∈A
is an m×m matrix that we often
denote simply as z = [z
i j
] (assuming the indices i and j run from 1 to m). Here z
k
= ∏
i, j
z
k
i j
i j
where k
i j
is
the entry in row i column j in the matrix k. We denote by
G

(z) =

k∈F
g
k
z
k
=

n≥0

k∈F
n
g
k
z
k
the F -generating function of g
k
, that is, the generating function of g
k
over matrices k ∈ F satisfying the
balance equations (1) and (2). The following useful lemma is proved in [11] but for completeness we
repeat it here. Let [z
i j
x
i
x
j
] be the matrix ∆
−1
(x)z∆(x) where ∆(x) = diag(x
1
, . . . , x
m
) is a diagonal matrix
with elements x
1
, . . . , x
m
, that is, the element z
i j
in z is replaced by z
i j
x
i
/x
j
.
Lemma 1 Let G(z) =∑
k
g
k
z
k
be the generating function of a complex matrix z. Then
G

(z) :=

n≥0

k∈F
n
g
k
z
k
=
_
1
2iπ
_
m
dx
1
x
1
· · ·

dx
m
x
m
G([z
i j
x
j
x
i
]) (4)
with the convention that the i j-th coefficient of [z
i j
x
j
x
i
] is z
i j
x
j
x
i
, and i =

−1. In other words, [z
i j
x
j
x
i
] =

−1
(x)z∆(x) where ∆(x) = diag(x
1
, . . . , x
m
). By the change of variables x
i
= exp(iθ
i
) we also have
G

(z) =
1
(2π)
m

π
−π

1
· · ·

π
−π

m
G([z
i j
exp((θ
j
−θ
i
)i)]
where [z
i j
exp(θ
j
−θ
i
)] = exp(−∆(θ))zexp(∆(θ)).
Proof. Observe that
G(∆
−1
(x)z∆(x)) = G([z
i j
x
j
x
i
]) =

k
g
k
z
k
m

i=1
x
∑j
k
ji
−∑j
k
i j
i
. (5)
Therefore, G

(z) is the coefficient of G([z
i j
x
j
x
i
]) at x
0
1
x
0
2
· · · x
0
m
since ∑
j
k
ji
−∑
j
k
i j
= 0 for matrices k ∈ F .
We write it in shortly as G

(z) =
_
x
0
1
· · · x
0
m
¸
g([z
i j
x
j
x
i
]). The result follows from the Cauchy coefficient
formula (cf. [18]).
We consider the number of solutions to (1) and (2), which is is asymptotically equivalent to the number
of Markov types |P
n
(m)| over the alphabet A , whose generating function is
F

m
(z) =

n≥0
|F
n
(m)|z
n
.
Then applying the above lemma with z
i j
= zx
i
/x
j
we conclude that
F

m
(z) =
1
(1 −z)
m
[x
0
1
x
0
2
· · · x
0
m
]

i=j
_
1 −z
x
i
x
j
_
−1
(6)
and thus, by the Cauchy formula,
|F
n
(m)| = [z
n
]F

m
(z) =
1
2πi

F

m
(z)
z
n+1
dz.
In the next section we evaluate asymptotically this expression to yield the following main result of this
paper
6 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski
Theorem 1 (i) For fixed m and n →∞ the number of Markov types is
|P
n
(m)| = d(m)
n
m
2
−m
(m
2
−m)!
+O(n
m
2
−m−1
) (7)
where d(m) is a constant that also can be expressed by the following integral
d(m) =
1
(2π)
m−1


−∞
· · ·


−∞
. ¸¸ .
(m−1)−f old
m−1

j=1
1
1 +φ
2
j
·

k=ℓ
1
1 +(φ
k
−φ

)
2

1

2
· · · dφ
m−1
. (8)
(ii) When m →∞ we find that
|P
n
(m)| ∼

2m
3m/2
e
m
2
m
2m
2
2
m
π
m/2
· n
m
2
−m
(9)
provided that m
4
= o(n).
Remark 1. It is easy to count the number of matrices k satisfying only equation (1), that is, ∑
i j
k
i j
= n.
Indeed, in this case, it coincides with the number of integer solution of (1), which turns out to be the
number of combinations with repetitions (the number of ways of selecting m
2
objects from n), that is,
_
n +m
2
−1
n
_
=
_
n +m
2
−1
m
2
−1
_
∼n
m
2
−1
.
Thus the conservation law equation (2) decreases the above by the factor Θ(n
m−1
).
Remark 2. The evaluation of the integral (8) is quite cumbersome, but for small values of m we computed
it to find that
|P
n
(2)| ∼
1
2
n
2
2!
(10)
|P
n
(3)| ∼
1
12
n
6
6!
(11)
|P
n
(4)| ∼
1
96
n
12
12!
(12)
|P
n
(5)| ∼
37
34560
n
20
20!
(13)
for large n. It appears that the coefficients of n
m
2
−m
are rational numbers, though we have no proof of this.
Remark 3. We now compare the coefficient at n
m
2
−m
for fixed m in (7) with its asymptotic counterpart
in (9). They are shown in Table 1. Observe extremely small values of these constants even for relatively
small m.
Counting Markov Types 7
Tab. 1: Constants at n
m
2
−m
for fixed m and large m.
m constant in (7) constant in (9)
2 1.920140832 10
−1
2.500000000 10
−1
3 9.315659368 10
−5
1.157407407 10
−5
4 1.767043356 10
−11
2.174662186 10
−11
5 3.577891782 10
−22
4.400513659 10
−22
3 Analysis and Proofs
In this section we prove Theorem 1. Our starting formula is (6) that we repeat below
F

m
(z) =
1
(1 −z)
m
[x
0
1
x
0
2
· · · x
0
m
]

i=j
_
1 −z
x
i
x
j
_
−1
. (14)
We first compute this explicitly for m = 2, 3, 4, 5.
When m = 2, we have
F

2
(z) =
1
(1 −z)
2
[x
0
1
x
0
2
]
_
1
1 −z x
1
/x
2
1
1 −z x
2
/x
1
_
. (15)
Let us set A = x
1
/x
2
so we need the coefficient of A
0
in (1 −Az)
−1
(1 −z/A)
−1
. Using a partial fractions
expression in A, we have
1
1 −Az
1
1 −z/A
=
1
1 −z
2
_
1
1 −Az
+
z
A−z
_
.
For definiteness, we can assume that |z| <|A| <|1/z| so that the coefficient of A
0
in (1−Az)
−1
is one and
that in z(A−z)
−1
is zero. Hence, F

2
(z) = (1 −z)
−2
(1 −z
2
)
−1
= (1 +z)
−1
(1 −z)
−3
and
|P
n
(2)| ∼|F
n
(2)| =
1
2πi

1
z
n+1
1
1 +z
1
(1 −z)
3
dz
=
n
2
4
+n +
3
4
+
1
8
[1 +(−1)
n
] ∼
1
2
n
2
2!
, n →∞. (16)
For m ≥ 3 we use recursive partial fractions expansions. When m = 3 we set x
1
/x
2
= A, x
1
/x
3
= B so
that we wish to compute
[A
0
B
0
]
_
1
1 −zA
1
1 −z/A
1
1 −Bz
1
1 −z/B
1
1 −Az/B
1
1 −Bz/A
_
. (17)
8 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski
First we do a partial fractions expansion in the A variable, for fixed B and z. Thus the factor inside the
parentheses in (17) becomes
1
1 −zA
1
1 −z
2
1
1 −Bz
1
1 −z/B
1
1 −1/B
1
1 −Bz
2
+
1
1 −z/A
1
1 −z
2
1
1 −Bz
1
1 −z/B
1
1 −z
2
/B
1
1 −B
+
1
1 −Az/B
1
1 −B
1
1 −z
2
/B
1
1 −B/z
1
1 −z/B
1
1 −z
2
+
1
1 −Bz/A
1
1 −Bz
2
1
1 −1/B
1
1 −Bz
1
1 −z/B
1
1 −z
2
. (18)
To coefficient of A
0
in the first term in (18) is
1
1 −z
2
1
1 −Bz
1
1 −z/B
1
1 −1/B
1
1 −Bz
2
, (19)
and that in the third term is
1
1 −B
1
1 −z
2
/B
1
1 −Bz
1
1 −z/B
1
1 −z
2
, (20)
while the coefficients of A
0
are zero in the second and fourth terms. Combining (19) and (20) we must
now compute
[B
0
]
_
1 +z
2
1 −z
2
1
1 −Bz
1
1 −z/B
1
1 −Bz
2
1
1 −z
2
/B
_
. (21)
Now expanding (21) by a partial fractions expansion in B leads to
1 +z
2
1 −z
2
[B
0
]
_
1
1 −Bz
1
1 −z
2
1
1 −z
1
1 −z
2
+
1
1 −z/B
1
1 −z
2
1
1 −z
3
1
1 −z
+
1
1 −1/z
1
1 −z
3
1
1 −Bz
2
1
1 −z
4
+
1
1 −z
3
1
1 −1/z
1
1 −z
4
1
1 −z
2
/B
_
=
1 +z
2
1 −z
2
_
1
1 −z
2
1
1 −z
1
1 −z
3
+
−z
(1 −z)
1
1 −z
3
1
1 −z
4
_
=
1 −z +z
2
(1 −z)
4
(1 +z)
2
(1 +z +z
2
)
.
Hence,
F

3
(z) =
1 −z +z
2
(1 −z)
7
(1 +z)
2
(1 +z +z
2
)
.
For z →1, F

3
(z) ∼
1
12
(1 −z)
−7
so that
|P
n
(3)| ∼
1
12
n
6
6!
, n →∞. (22)
Using similar recursive partial fractions expansions, with the help of the symbolic computation program
MAPLE, we find that for m = 4 and m = 5
F

4
(z) =
z
8
−2z
7
+3z
6
+2z
5
−2z
4
+2z
3
+3z
2
−2z +1
(1 −z)
13
(1 +z)
5
(1 +z
2
)(1 +z +z
2
)
2
(23)
Counting Markov Types 9
Tab. 2: Poles and their orders for various m.
m\ root 1 –1 e
±2πi/3
±i e
±2πi/5
e
±4πi/5
2 3 1 – – – –
3 7 2 1 – – –
4 13 5 2 1 – –
5 21 8 4 2 1 1
and
F

5
(z) =
Q(z)
(1 −z)
21
(1 +z)
8
(1 +z
2
)
2
(1 +z +z
2
)
4
(1 +z +z
2
+z
3
+z
4
)
, (24)
where
Q(z) = z
20
−3z
19
+7z
18
+3z
17
+2z
16
+17z
15
+35z
14
+29z
13
+45z
12
+50z
11
+ 72z
10
+50z
9
+45z
8
+29z
7
+35z
6
+17z
5
+2z
4
+3z
3
+7z
2
−3z +1.
These results show that it is unlikely that a simple formula can be found for F

m
(z) for general m.
By expanding (23) and (24) near z = 1 we conclude that as n →∞
|P
n
(4)| ∼
1
96
n
12
12!
, |P
n
(5)| ∼
37
34560
n
20
20!
. (25)
It is easy to inductively show that at z = 1, F

m
(z) has a pole of order m
2
−m+1 and the other singular-
ities are poles at the roots of unity that are of order < m
2
−m+1. These poles and their orders are given
in Table 2.
Thus, for n →∞, we have
|P
n
(m)| ∼d(m)
n
m
2
−m
(m
2
−m)!
, (26)
where
d(m) = lim[(1 −z)
m
2
−m+1
F

m
(z)]
as z →1. However, there seems to be no simple formula for the sequence of constants d(m). We proceed
to characterize d(m) as an (m−1) fold integral.
First consider the simple case m = 2. Setting A = e

and using a Cauchy integral, we have
[A
0
]
1
1 −z/A
1
1 −Az
=
1

π
−π

1 −2zcosΦ+z
2
.
10 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski
Now set z = 1−δ and expand the integral for z →1. The major contribution will come from where δ ≈0
and scaling Φ=δφ and using the Taylor expansion 1−2(1−δ)cos(δφ) +(1−δ)
2

2
[1+φ
2
] +O(δ
3
),
we find that
F

2
(z) ∼
1
δ
2
1


−∞
δ
δ
2
[1 +φ
2
]
dφ =
1
2
1
δ
3
, δ →0.
When m = 3, we use (3.4) and the Cauchy integral formula with A = e

and B = e

to get
1
(2π)
2

π
−π

π
−π
1
1 −2zcosΦ+z
2
·
1
1 −2zcosΨ+z
2
·
1
1 −2zcos(Φ−Ψ) +z
2
dΦdΨ.
Again expanding the above for z = 1−δ →1 and Φ=δφ = O(δ), Ψ=δψ= O(δ), we obtain the leading
order approximation
1
δ
4
1
(2π)
2


−∞


−∞
1
1 +φ
2
1
1 +ψ
2
1
1 +(φ−ψ)
2
dφdψ =
1
δ
4
·
1
12
.
Thus as z →1, F

3
(z) ∼
1
12
δ
−7
=
1
12
(1 −z)
−7
which follows also from the exact generating function.
For general m a completely analogous calculation shows that as δ = 1 −z →0, F

m
(z) ∼ δ
m−m
2
−1
d(m)
where
d(m) =
1
(2π)
m−1


−∞
· · ·


−∞
. ¸¸ .
(m−1)−f old
m−1

j=1
1
1 +φ
2
j
·

k=ℓ
1
1 +(φ
k
−φ

)
2

1

2
· · · dφ
m−1
. (27)
We have verified that for m = 4 and m = 5, the integral agrees with our previous results. The second
product in the above is over all distinct pairs (k, ℓ), so that this may also be written as
m−2

ℓ=1
m−1

k=ℓ+1
1
1 +(φ
k
−φ

)
2
. (28)
This completes the proof of part (i) of Theorem 1.
We now use the saddle point method to prove part (ii) of Theorem 1. Since

k
i j
_
z
z
i
z
j
_
k
i j
= (1 −z
z
i
z
j
)
−1
and setting z
i
= e

i
we find that
F

m
(z) =
1
(2iπ)
m

· · ·


i j
(1 −z
z
i
z
j
)
−1
dz
1
z
1
· · ·
dz
m
z
m
(29)
= (2π)
−m

π
−π
· · ·

π
−π

i j
(1 −zexp(i(θ
i
−θ
j
))
−1

1
· · · dθ
m
. (30)
Counting Markov Types 11
By noticing that the expression ∏
i j
(1−zexp(i(θ
i
−θ
j
))
−1
does not change when the θ
i
are all incremented
of the same value, one can integrate over θ
1
to obtain
F

m
(z) = (2π)
−m+1

π
−π
· · ·

π
−π

i
(1 −zexp(iθ
i
))
−1
(1 −zexp(−iθ
i
))
−1
×
1
1 −z

i>1, j>1
(1 −zexp(i(θ
i
−θ
j
))
−1

2
· · · dθ
m
. (31)
Let now
L(z, θ
2
, . . . , θ
m
) = log(1 −z) +

i
log(1 −zexp(iθ
i
))(1 −zexp(−iθ
i
))
+

i>1, j>1
log(1 −zexp(i(θ
i
−θ
j
)).
An alternative form of the above is
L(z, θ
2
, . . . , θ
m
) = log(1 −z) +
m

i=2
log(1 −2zcosθ
i
+z
2
)
+
1
2
m

i=2
m

j=2
log(1 −2zcos(θ
i
−θ
j
) +z
2
).
Notice that L(z, 0, . . . , 0) = m
2
log(1 −z). Hence
|F
n
(m)| =
1
i(2π)
m

π
−π
· · ·

π
−π
exp(−L(z, θ
2
, . . . , θ
m
))
dz
z
n+1

2
· · · dθ
m
. (32)
In order to find the asymptotics of this integral we use the multidimensional saddle point method [18]. The
quantity L(z, θ
2
, . . . , θ
m
)+nlogz attains its minimum value at (θ
2
, . . . , θ
m
) = (0, . . . , 0) and z =z
n
=
n
m
2
+n
.
The minimum value is therefore
m
2
log(1 −z
n
) +nlogz
n
= m
2
log(m
2
/(m
2
+n)) +nlog(n/(m
2
+n))
or
m
2
log(m
2
) +nlog(n) −(m
2
+n)log(m
2
+n).
Then
m
2
log(1 −z
n
) +nlogz
n
= m
2
logm
2
−m
2
logn −m
2
+O(m
4
/n)
provided that m
4
= o(n).
After computations it turns out that at this point (z, θ
2
, . . . , θ
m
) = (z
n
, 0, . . . , 0):

2
∂z
2
L(z, θ
2
, . . . , θ
m
) = −
m
2
(1 −z
n
)
2
∀i :

2
∂z∂θ
i
L(z, θ
2
, . . . , θ
m
) = 0
∀i :

2
∂θ
2
i
L(z, θ
2
, . . . , θ
m
) =
2(m−1)z
n
(1 −z
n
)
2
∀i = j :

2
∂θ
i
∂θ
j
L(z, θ
2
, . . . , θ
m
) = −
2z
n
(1 −z
n
)
2
, m ≥3.
12 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski
In other words, the second derivative matrix Q
2
of L(z, θ
2
, . . . , θ
m
)+nlogz at (z, θ
2
, . . . , θ
m
) = (z
n
, 0, . . . , 0)
is
Q
2
=
_

m
2
(1 −z
n
)
2

n
(z
n
)
2
_
u
z
⊗u
z
+
2mz
n
(1 −z
n
)
2
I
θ

2z
n
(1 −z
n
)
2
u
θ
⊗u
θ
where u
z
= (1, 0, . . . , 0),
u
θ
=
1

m−1
(0, 1, . . . , 1),
and
I
θ
= I −u
z
⊗u
z
,
i.e., the identity restricted on θ components. In the above ⊗ is the tensor product (in our case, it is a
product of two vectors resulting in a matrix). For example,
u
θ
⊗u
θ
=
_
_
_
_
0 0 . . . 0
0 1 . . . 1
. . . . . . . . . . . .
0 1 . . . 1
_
_
_
_
.
An application of the saddle point method yields
|F
n
(m)| ∼
1
(2π)
m/2
z
n
_
det(Q
2
)
exp(−m
2
log(1 −z
n
) −nlogz
n
)
where det(·) denotes the determinant of a matrix. Since
|det(Q
2
)| =
_
m
2
(1 −z
n
)
2
+
n
(z
n
)
2
__
z
n
(1 −z
n
)
2
_
m−1
2
m−1
m
m−2
∼n
2m
m
−3m
2
m−1
,
we find that for m
4
= o(n)
|P
n
(m)| ∼|F
n
(m)| ∼
_
m
−2m
2
+3m/2
e
m
2
2
−m
π
−m/2

2
_
n
m
2
−m
,
and this completes the proof. The condition m
4
= o(n) is needed since we used the approximation for
m
2
log(1 −z
n
) below (32).
References
[1] K. Atteson, The Asymptotic Redundancy of Bayes Rules for Markov Chains, IEEE Trans. on Infor-
mation Theory, 45, 2104-2109, 1999.
[2] A. Barron, J. Rissanen, and B. Yu, The Minimum Description Length Principle in Coding and Mod-
eling, IEEE Trans. Information Theory, 44, 2743-2760, 1998.
[3] P. Billingsley, Statistical Methods in Markov Chains, Ann. Math. Statistics, 32, 12-40, 1961.
[4] T. Cover and J.A. Thomas, Elements of Information Theory, John Wiley & Sons, New York, 1991.
Counting Markov Types 13
[5] I. Cszizar and J. K¨ orner, Information Theory: Coding Theorems for Discrete Memoryless Systems,
Academic Press, New York, 1981.
[6] I. Cszizar, The Method of Types, IEEE Trans. Information Theory, 44, 2505-2523, 1998.
[7] L. D. Davisson, Universal Noiseless Coding, IEEE Trans. Information Theory, 19, 783-795, 1973.
[8] L. D. Davisson, Minimax Noiseless Universal coding for Markov Sources, IEEE Trans. Information
Theory, 29, 211 - 215, 1983.
[9] M. Drmota and W. Szpankowski, Precise Minimax Redundancy and Regret, IEEE Trans. Informa-
tion Theory, 50, 2686-2707, 2004.
[10] P. Flajolet and R. Sedgewick, Analytic Combinatorics, Cambridge University Press, Cambridge,
2008.
[11] P. Jacquet and W. Szpankowski, Markov Types and Minimax Redundancy for Markov Sources, IEEE
Trans. Information Theory, 50, 1393-1402, 2004.
[12] A. Mart´ın, G. Seroussi, and M. J. Weinberger, Type classes of tree models, Proc. ISIT 2007, Nice,
France, 2007.
[13] J. Rissanen, Fisher Information and Stochastic Complexity, IEEE Trans. Information Theory, 42,
40-47, 1996.
[14] G. Seroussi, On Universal Types, IEEE Trans. Information Theory, 52, 171-189, 2006.
[15] P. Shields, Universal Redundancy Rates Do Not Exist, IEEE Trans. Information Theory, 39, 520-
524, 1993.
[16] R. Stanley, Enumerative Combinatorics, Vol. II, Cambridge University Press, Cambridge, 1999.
[17] W. Szpankowski, On Asymptotics of Certain Recurrences Arising in Universal Coding, Problems of
Information Transmission, 34, 55-61, 1998.
[18] W. Szpankowski, Average Case Analysis of Algorithms on Sequences, Wiley, New York, 2001.
[19] Q. Xie, A. Barron, Asymptotic Minimax Regret for Data Compression, Gambling, and Prediction,
IEEE Trans. Information Theory, 46, 431-445, 2000.
[20] M. J. Weinberger, N. Merhav, and M. Feder, Optimal sequential probability assignment for individ-
ual sequences, IEEE Trans. Inf. Theory, 40, 384-396, 1994.
[21] P. Whittle, Some Distribution and Moment Formulæ for Markov Chain, J. Roy. Stat. Soc., Ser. B.,
17, 235-242, 1955.

(2) j=1 ∑ ki j j=1 ∑ k ji.2 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski method and made it a basic tool of information theory of discrete memoryless systems [5]. see also [4. throughout we only consider cyclic strings in which the first element x1 follows the last xn . j=1 m m ∀i ∈ A . The empirical distribution of a sequence xn (with some fixed initial state) is P(xn ) = i. For concreteness. For a fixed transition matrix P. we study sequences xn = x1 . 20. This matrix is an integer matrix satisfying two properties: i. k Markov sources Xt and corresponding Markov types are completely characterized by their transition probabilities that we denote by P. 6. . 14. j∈A m ∑ ki j = n. The number of type classes. m . for binary memoryless sources. j∈A defines a type class. = (1) ∀i∈A. and additionally for any i ∈ A [11. Let A = {1. there are |P n (2)| = n + 1 type classes. xn of length n. the frequency matrix k = {ki j }i. j∈A ∏ pi ji j . . Clearly. and a class consisting of all sequences containing k 1’s has cardinality |T n (xn )| = n . or equivalently. In this paper. all sequences having the same distribution as P(xn ). j∈A ∑ ki j = n − 1. 21] j=1 ∑ ki j = ∑ k ji ± δ(x1 = xn). For example. To avoid this exception. 2. we consider integer matrices k = [ki j ] satisfying the following two properties i. the number of distinct distributions. 11. 12. and consider a class of Markov distributions P n (m) on A n . is therefore equal to the cardinality of P n (m) that we also often simplify to |P n | := |P n (m)|. m} be an m-ary alphabet. For a distribution P ∈ P n the type class of xn is defined as T n (xn ) = {yn : P(xn ) = P(yn )}. 9. that is P = {pi j }i. For example. . we discuss Markov types. xn T n (xn ) = A n . Throughout. We often simply write P n := P n (m). The last property is called the flow conservation property and is a consequence of the fact that the number of pairs starting with symbols i ∈ A must equal to the number of pairs ending with symbol i ∈ A with the possible exception of the last pair. Thus. . we first focus on Markov sources of order one. k where ki j is the number of pairs (i j) ∈ A 2 in the sequence xn . P(01011) = p2 p10 p11 . We aim at deriving an asymptotic expression for |P n | for fixed or large m when n → ∞. . j∈A is the transition matrix of pi j = P(Xt+1 = j|Xt = i). In 01 passing we should point out that in the above pi j can be viewed either as “a formal indeterminate” or better as the ratio of the number of pairs (i j) ∈ A 2 to the length of the string (empirical distribution). . where δ(A) = 1 when A is true and zero otherwise. 21]. that is. n ≥ 1.

3} . also [10]). How many nonnegative integer solutions does the above system of linear equations have? We shall show n6 that it is asymptotically 12·6! . thus allowing the author to derive the growth rate of the number of integer solutions. We call the number of these solutions F n (m). The balanced frequency matrix has nine elements {ki j }i. ultimately only the denominator of this generating function is given in a semi-explicit form in [16]. Example. Let’s first consider a binary Markov source. = k21 . The balanced frequency matrix is of the following form k11 k12 k= k21 k22 where the nonnegative integers ki j satisfy k11 + k12 + k21 + k22 k12 The above system of linear equations can be reduced to k11 + 2k12 + k22 = n. that is. Our previous example demonstrated that this number is asymptotically equivalent to the number of nonnegative integer solutions to the system of linear equations (1)-(2). Stanley developed a general theory to construct the associated generating function. Our goal is to enumerate the number of Markov classes. Such an enumeration. to find (asymptotically) the cardinality of |P n |. In this paper. We shall call (2) the “conservation law” equation. (3) = n.6 of Stanley’s book [16] (cf. for n → ∞.2. for a general class of system of homogeneous diophantine equations. . and the enumeration of Markov types over a binary alphabet reduces to finding the number of nonnegative solutions of (3). we enumerate the number of Markov types |P n | which is. However. 4. + 1 (n − + 1) = 2 2 4 Let’s now look at the m = 3 case.Counting Markov Types 3 Such integer matrices k will be called balanced frequency matrices. The answer is obviously |F n | = = ⌊n⌋ 2 k12 =0 ∑ (n − 2k12 + 1) n n n2 + O(n). was investigated in Chap. j∈{1. and they satisfy k11 + k12 + k13 + k21 + k22 + k23 + k31 + k32 + k33 k12 + k13 k12 + k32 k13 + k23 = n = k21 + k31 = k21 + k23 = k31 + k32 . asymptotically the same as the number of distinct balanced frequency matrices satisfying (1) and (2).

[12]. The proofs are given in Section 3. for fixed m we prove that the number of Markov types is asymptotically equal to nm −m . we propose an approach based on previous work of Jacquet and Szpankowski [11] where analytic techniques such as multidimensional generating functions and the saddle point method were used. j=1 ∑ ki j = ∑ k ji .§ For large m → ∞ with m4 = o(n) we find that asymptotically the number of types is √ 3m/2 m2 2m e 2 |P n | ∼ nm −m . existing literature mostly concentrates on finding the cardinality of a Markov type class. 12. In particular. Markov types were studied in a series of papers. 11. let gk be a sequence of scalars indexed by matrices k and define the generating function G(z) = ∑ gk zk k § It is a simple exercise to extend these results to r order Markov chains. This allows us to derive precise asymptotic results. In Section 2 we formulate precisely our problem and present our main results for fixed and large m. it was known for a long while [4. 20. However. |P n | ∼ d(m) 2 (m − m)! 2 n → ∞. 2m2 2m πm/2 m However. as well as present asymptotic results for large m. one needs to replace m by mr . The paper is organized as follows. see [11. 13. 19]. we let F n be a subset of F consisting of matrices k such that the balance equations (1) and (2) hold. that is. and summarize some results obtained in [11]. We accomplish it here. For a given n. This was recently rigorously proved by Martin et al. Throughout the paper. but only in [20] Weinberger et al. . In particular. 21]. our techniques also allow us to derive asymptotics for m2 = o(n). that is. 2 Main Results In this section we present our main results and some of their consequences. In general. 15. we let F be the set of all integer matrices k satisfying the conservation law equation (2). j=1 m m ∀i∈A.4 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski In this paper. 2. 5. |T (xn )| with an exception of Martin et al. mentioned (without proof) that |P n | = Θ(nm −m ). We first make some general observations about generating functions over matrices. In passing we observe that the number of Markov types are often needed for minimax redundancy evaluation [1. However. 6] that they grow 2 polynomially. in [12] for tree sources (that include Markov sources) for fixed m. Whittle [21] already in 1955 computed |T (xn )| for Markov chains of order one Regarding the number of types. We start with some notation. where d(m) is a constant for which we only found an integral representation. Indeed. the constant was never identified.

Then xj dxm G([zi j ]) (4) xm xi n≥0 k∈F n √ x x x with the convention that the i j-th coefficient of [zi j xij ] is zi j xij . Then applying the above lemma with zi j = zxi /x j we conclude that ∗ Fm (z) = xi 1 [x0 x0 · · · x0 ] ∏ 1 − z m m 1 2 (1 − z) xj i= j −1 (6) and thus. . by the Cauchy formula. . whose generating function is ∗ Fm (z) = n≥0 ∑ |F n (m)|zn . G(∆−1 (x)z∆(x)) = G([zi j x m xj ∑ k ji −∑ j ki j . . . xm ) is a diagonal matrix with elements x1 . Let [zi j x ij ] be the matrix ∆−1 (x)z∆(x) where ∆(x) = diag(x1 . . . ]) = ∑ gk zk ∏ xi j xi i=1 k (5) Therefore. [zi j xij ] = ∆−1 (x)z∆(x) where ∆(x) = diag(x1 . and i = −1. that is. xm ). j∈A is an m × m matrix that we often ∑ g k zk = ∑ ∑ g k zk n≥0 k∈F n the F -generating function of gk . j zi ji j where ki j is the entry in row i column j in the matrix k. By the change of variables xi = exp(iθi ) we also have G∗ (z) := 1 2iπ m ∑ ∑ g k zk = dx1 ··· x1 G∗ (z) = 1 (2π)m π −π dθ1 · · · π −π dθm G([zi j exp((θ j − θi )i)] Proof. which is is asymptotically equivalent to the number of Markov types |P n (m)| over the alphabet A . The result follows from the Cauchy coefficient m 1 formula (cf. . Lemma 1 Let G(z) = ∑k gk zk be the generating function of a complex matrix z. . In other words. The following useful lemma is proved in [11] but for completeness we x repeat it here. . ∗ Fm (z) 1 dz. Observe that where [zi j exp(θ j − θi )] = exp(−∆(θ))z exp(∆(θ)). [18]). . m 1 2 x We write it in shortly as G∗ (z) = x0 · · · x0 g([zi j xij ]).Counting Markov Types k 5 denote simply as z = [zi j ] (assuming the indices i and j run from 1 to m). We denote by G∗ (z) = k∈F where the summation is over all integer matrices and z = {zi j }i. G∗ (z) is the coefficient of G([zi j xij ]) at x0 x0 · · · x0 since ∑ j k ji − ∑ j ki j = 0 for matrices k ∈ F . Here zk = ∏i. that is. . the generating function of gk over matrices k ∈ F satisfying the balance equations (1) and (2). 2πi zn+1 In the next section we evaluate asymptotically this expression to yield the following main result of this paper ∗ |F n (m)| = [zn ]Fm (z) = . We consider the number of solutions to (1) and (2). xm . the element zi j in z is replaced by zi j xi /x j . .

in this case. Indeed. It is easy to count the number of matrices k satisfying only equation (1). 2−1 n m Thus the conservation law equation (2) decreases the above by the factor Θ(nm−1 ). Observe extremely small values of these constants even for relatively small m. it coincides with the number of integer solution of (1).6 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski Theorem 1 (i) For fixed m and n → ∞ the number of Markov types is |P n (m)| = d(m) 2 nm −m + O(nm −m−1 ) 2 − m)! (m 2 (7) where d(m) is a constant that also can be expressed by the following integral d(m) = 1 (2π)m−1 ∞ −∞ ··· ∏ −∞ ∞ m−1 j=1 1 1 · dφ1 dφ2 · · · dφm−1 . that is. 2 Remark 3. ∑i j ki j = n. We now compare the coefficient at nm −m for fixed m in (7) with its asymptotic counterpart in (9). but for small values of m we computed it to find that |P n (2)| ∼ |P n (3)| ∼ |P n (4)| ∼ |P n (5)| ∼ for large n. It appears that the coefficients of nm 2 −m 1 n2 2 2! 1 n6 12 6! 1 n12 96 12! 37 n20 34560 20! (10) (11) (12) (13) are rational numbers. Remark 2. that is. which turns out to be the number of combinations with repetitions (the number of ways of selecting m2 objects from n). though we have no proof of this. They are shown in Table 1. The evaluation of the integral (8) is quite cumbersome. 2 n + m2 − 1 n + m2 − 1 = ∼ nm −1 . 2 ∏ 1 + (φ − φ )2 1 + φ j k=ℓ k ℓ (8) (m−1)− f old (ii) When m → ∞ we find that |P n (m)| ∼ provided that m4 = o(n). . √ 2 2m3m/2 em · nm −m 2m2 2m πm/2 m 2 (9) Remark 1.

When m = 3 we set x1 /x2 = A. F2∗ (z) = (1 − z)−2(1 − z2 )−1 = (1 + z)−1 (1 − z)−3 and |P n (2)| ∼ |F n (2)| = = 1 2πi 1 zn+1 1 1 dz 1 + z (1 − z)3 3 1 1 n2 n2 + n + + [1 + (−1)n] ∼ .Counting Markov Types Tab. 4 4 8 2 2! n → ∞. 5. we have F2∗ (z) = 1 1 1 . 1 − zA 1 − z/A 1 − Bz 1 − z/B 1 − Az/B 1 − Bz/A (17) . Hence.400513659 10−22 3 Analysis and Proofs In this section we prove Theorem 1. x1 /x3 = B so that we wish to compute [A0 B0 ] 1 1 1 1 1 1 .315659368 10−5 1. we have 1 1 1 1 z . Using a partial fractions expression in A.174662186 10−11 4. 4. (16) For m ≥ 3 we use recursive partial fractions expansions. Our starting formula is (6) that we repeat below ∗ Fm (z) = 1 xi [x0 x0 · · · x0 ] ∏ 1 − z m m 1 2 (1 − z) xj i= j −1 .920140832 10−1 9. we can assume that |z| < |A| < |1/z| so that the coefficient of A0 in (1 − Az)−1 is one and that in z(A − z)−1 is zero. = + 2 1 − Az 1 − Az 1 − z/A 1 − z A−z For definiteness.500000000 10−1 1. 3. 1: Constants at nm 2 7 −m for fixed m and large m.157407407 10−5 2.577891782 10−22 constant in (9) 2. (14) We first compute this explicitly for m = 2.767043356 10−11 3. [x0 x0 ] (1 − z)2 1 2 1 − z x1 /x2 1 − z x2 /x1 (15) Let us set A = x1 /x2 so we need the coefficient of A0 in (1 − Az)−1 (1 − z/A)−1. m 2 3 4 5 constant in (7) 1. When m = 2.

1 + z2 1 1 1 − z + z2 1 1 1 −z = + . Thus the factor inside the parentheses in (17) becomes 1 1 1 1 1 1 2 1 − Bz 1 − z/B 1 − 1/B 1 − Bz2 1 − zA 1 − z 1 1 1 1 1 1 1 − z/A 1 − z2 1 − Bz 1 − z/B 1 − z2 /B 1 − B 1 1 1 1 1 1 1 − Az/B 1 − B 1 − z2 /B 1 − B/z 1 − z/B 1 − z2 1 1 1 1 1 1 . 12 6! n → ∞. Combining (19) and (20) we must now compute 1 1 1 1 1 + z2 . 2 1 − z2 1 − z 1 − z3 3 1 − z4 4 (1 + z)2 (1 + z + z2) 1−z (1 − z) 1 − z (1 − z) ∗ F3 (z) = For z → 1. for fixed B and z. (21) [B0 ] 1 − z2 1 − Bz 1 − z/B 1 − Bz2 1 − z2 /B Now expanding (21) by a partial fractions expansion in B leads to 1 + z2 0 1 1 1 1 1 1 1 1 [B ] + 2 2 1 − z 1 − z2 2 1 − z3 1 − z 1−z 1 − Bz 1 − z 1 − z/B 1 − z 1 1 1 1 1 1 1 1 + 3 1 − Bz2 1 − z4 3 1 − 1/z 1 − z4 1 − z2 /B 1 − 1/z 1 − z 1−z + = Hence.8 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski First we do a partial fractions expansion in the A variable. with the help of the symbolic computation program MAPLE. 2 1 − 1/B 1 − Bz 1 − z/B 1 − z2 1 − Bz/A 1 − Bz 1 1 1 1 1 . (1 − z)7 (1 + z)2 (1 + z + z2) 1 n6 . 2 /B 1 − Bz 1 − z/B 1 − z2 1−B 1−z + + + (18) To coefficient of A0 in the first term in (18) is (19) (20) while the coefficients of A0 are zero in the second and fourth terms. so that |P n (3)| ∼ (22) Using similar recursive partial fractions expansions. 2 1 − Bz 1 − z/B 1 − 1/B 1 − Bz2 1−z and that in the third term is 1 1 1 1 1 . we find that for m = 4 and m = 5 ∗ F4 (z) = z8 − 2z7 + 3z6 + 2z5 − 2z4 + 2z3 + 3z2 − 2z + 1 (1 − z)13(1 + z)5 (1 + z2)(1 + z + z2)2 (23) . F3∗ (z) ∼ 1 −7 12 (1 − z) 1 − z + z2 .

we have 2 nm −m |P n (m)| ∼ d(m) 2 . These poles and their orders are given in Table 2. By expanding (23) and (24) near z = 1 we conclude that as n → ∞ |P n (4)| ∼ 1 n12 . we have [A0 ] 1 1 1 = 1 − z/A 1 − Az 2π π −π dΦ . 96 12! |P n (5)| ∼ 37 n20 . (26) (m − m)! where d(m) = lim[(1 − z)m 2 −m+1 ∗ Fm (z)] as z → 1. Fm (z) has a pole of order m2 − m + 1 and the other singularities are poles at the roots of unity that are of order < m2 − m + 1. We proceed to characterize d(m) as an (m − 1) fold integral. ∗ These results show that it is unlikely that a simple formula can be found for Fm (z) for general m. for n → ∞. First consider the simple case m = 2.Counting Markov Types Tab. 34560 20! (25) ∗ It is easy to inductively show that at z = 1. Setting A = eiΦ and using a Cauchy integral. (24) where Q(z) = z20 − 3z19 + 7z18 + 3z17 + 2z16 + 17z15 + 35z14 + 29z13 + 45z12 + 50z11 + 72z10 + 50z9 + 45z8 + 29z7 + 35z6 + 17z5 + 2z4 + 3z3 + 7z2 − 3z + 1. 1 − 2z cosΦ + z2 . 9 m\ root 2 3 4 5 1 3 7 13 21 –1 1 2 5 8 e±2πi/3 – 1 2 4 ±i – – 1 2 e±2πi/5 – – – 1 e±4πi/5 – – – 1 and ∗ F5 (z) = Q(z) (1 − z)21 (1 + z)8(1 + z2 )2 (1 + z + z2)4 (1 + z + z2 + z3 + z4 ) . However. Thus. there seems to be no simple formula for the sequence of constants d(m). 2: Poles and their orders for various m.

2 −1 = ∗ For general m a completely analogous calculation shows that as δ = 1 − z → 0. . ℓ). Fm (z) ∼ δm−m where ∞ m−1 ∞ 1 1 1 ··· d(m) = ∏ 1 + φ2 · ∏ 1 + (φk − φℓ)2 dφ1 dφ2 · · · dφm−1 . the integral agrees with our previous results. 1 + (φk − φℓ )2 (28) This completes the proof of part (i) of Theorem 1. Since ∑ ki j z zi zj ki j zi = (1 − z )−1 zj and setting zi = eiθi we find that ∗ Fm (z) = = 1 (2iπ)m (2π)−m ··· π −π ∏(1 − z z j )−1 ij π −π i j zi dz1 dzm ··· z1 zm (29) (30) ··· ∏(1 − z exp(i(θi − θ j ))−1 dθ1 · · · dθm . 1 − 2z cosΦ + z2 1 − 2z cosΨ + z2 1 − 2z cos(Φ − Ψ) + z2 Again expanding the above for z = 1 − δ → 1 and Φ = δφ = O(δ). The second product in the above is over all distinct pairs (k. F3∗ (z) ∼ 1 −7 12 δ ∞ ∞ −∞ −∞ 1 1 1 1 1 dφdψ = 4 · . we use (3. We now use the saddle point method to prove part (ii) of Theorem 1. Ψ = δψ = O(δ). we obtain the leading order approximation 1 1 δ4 (2π)2 Thus as z → 1. we find that F2∗ (z) ∼ 1 1 δ2 2π ∞ −∞ 1 1 δ .4) and the Cauchy integral formula with A = eiΦ and B = eiΨ to get 1 (2π)2 π −π π −π 1 1 1 · · dΦdΨ. dφ = δ2 [1 + φ2] 2 δ3 δ → 0. so that this may also be written as m−2 m−1 ℓ=1 k=ℓ+1 ∏ ∏ 1 . 1 + φ2 1 + ψ2 1 + (φ − ψ)2 δ 12 which follows also from the exact generating function. m−1 (2π) −∞ j=1 −∞ j k=ℓ (m−1)− f old 1 −7 12 (1 − z) d(m) (27) We have verified that for m = 4 and m = 5.10 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski Now set z = 1 − δ and expand the integral for z → 1. The major contribution will come from where δ ≈ 0 and scaling Φ = δφ and using the Taylor expansion 1 − 2(1 − δ) cos(δφ) + (1 − δ)2 = δ2 [1 + φ2 ] + O(δ3 ). When m = 3.

. . . θm ) = (0. . θm ) = 0 ∂z∂θi ∂2 2(m − 1)zn ∀i : 2 L(z. . θ2 . zn+1 (32) In order to find the asymptotics of this integral we use the multidimensional saddle point method [18]. . . . . 1 − z i>1. m2 log(1 − zn) + n logzn = m2 log m2 − m2 log n − m2 + O(m4 /n) m2 ∂2 L(z. j>1 Let now L(z. .Counting Markov Types 11 By noticing that the expression ∏i j (1−z exp(i(θi −θ j ))−1 does not change when the θi are all incremented of the same value. 2 i=2 j=2 π m m Notice that L(z. . . θ2 . . θm ) = (zn . . θm ) = − . After computations it turns out that at this point (z. . . . j>1 = log(1 − z) + ∑ log(1 − z exp(iθi ))(1 − z exp(−iθi )) + i>1. . θ2 . θ2 . 0. . . . . θ2 . . . θ2 . . one can integrate over θ1 to obtain ∗ Fm (z) = (2π)−m+1 × π −π ··· π −π i ∏(1 − z exp(iθi ))−1 (1 − z exp(−iθi ))−1 (31) 1 ∏ (1 − z exp(i(θi − θ j ))−1 dθ2 · · · dθm . θm ) = − 2 ∂z (1 − zn)2 provided that m = o(n). . . θm ) = log(1 − z) + ∑ log(1 − 2z cosθi + z2 ) i=2 + 1 ∑ ∑ log(1 − 2z cos(θi − θ j ) + z2). . 0) = m2 log(1 − z). . . Hence |F n (m)| = 1 i(2π)m π −π ··· −π exp(−L(z. . 0. θm ) + n log z attains its minimum value at (θ2 . . . . θm ) i ∑ log(1 − z exp(i(θi − θ j )). . . . 0): ∀i : ∂2 L(z. . θm )) dz dθ2 · · · dθm . . . θ2 . . . . θm ) = (1 − zn)2 ∂θi ∀i = j : ∂2 2zn L(z. m An alternative form of the above is L(z. . . 0) and z = zn = m2n . θ2 . . . . . . ∂θi ∂θ j (1 − zn)2 m ≥ 3. . θ2 . . . The quantity L(z. +n The minimum value is therefore m2 log(1 − zn ) + n logzn = m2 log(m2 /(m2 + n)) + n log(n/(m2 + n)) or Then 4 m2 log(m2 ) + n log(n) − (m2 + n) log(m2 + n).

Math. Billingsley.  uθ ⊗ uθ =  . it is a  0 1  .. New York. In other words. Elements of Information Theory. . . on Information Theory. J. 0. Information Theory. 2743-2760. the identity restricted on θ components. . . Statistical Methods in Markov Chains.  0 1 . ... IEEE Trans.. .. and B. 45. 1). . θm ) = (zn . In the above ⊗ product of two vectors resulting in a matrix). . [4] T. The condition m4 = o(n) is needed since we used the approximation for m2 log(1 − zn ) below (32). m−1 Iθ = I − uz ⊗ uz . 1998.. . Atteson.. 1961. Cover and J. The Asymptotic Redundancy of Bayes Rules for Markov Chains. . . [2] A. .. Since |det(Q2 )| = we find that for m4 = o(n) |P n (m)| ∼ |F n (m)| ∼ m−2m 2 +3m/2 n m2 + 2 (1 − zn) (zn )2 zn (1 − zn )2 m−1 2m−1 mm−2 ∼ n2m m−3m 2m−1 .. References [1] K. . 0) is 2mzn n 2zn m2 uz ⊗ uz + − I − uθ ⊗ uθ Q2 = − 2 2 2 θ (1 − zn) (zn ) (1 − zn) (1 − zn )2 1 uθ = √ (0. 0. The Minimum Description Length Principle in Coding and Modeling. θm )+n log z at (z. . For example. John Wiley & Sons. 2104-2109. θ2 . . An application of the saddle point method yields |F n (m)| ∼ 1 (2π)m/2 zn det(Q2 ) is the tensor product (in our case. . Barron. . . [3] P. . the second derivative matrix Q2 of L(z. 44.e.. and this completes the proof. Ann. 0 1 . ..A. and i. 1991. 32.  1 exp(−m2 log(1 − zn) − n logzn ) where det(·) denotes the determinant of a matrix. 1999. . 1. 0). Thomas.  0 0 .. . Rissanen.... IEEE Trans. 12-40.12 Philippe Jacquet and Charles Knessl and Wojciech Szpankowski where uz = (1. Statistics. . √ 2 2 em 2−m π−m/2 2 nm −m . Yu. θ2 . .

46. 1393-1402. Merhav. Some Distribution and Moment Formulæ for Markov Chain. Information Theory. Weinberger. [17] W. 783-795. 1973. 1996. Markov Types and Minimax Redundancy for Markov Sources. 1993. N. 1983. Cambridge. On Asymptotics of Certain Recurrences Arising in Universal Coding. Precise Minimax Redundancy and Regret. 52. Xie. Ser. [14] G. o Academic Press. 211 . K¨ rner.Counting Markov Types 13 [5] I. and Prediction. Stat. 29. Cambridge. 2686-2707. 40. Information Theory. ı France. 235-242.. 431-445. New York. 50. Asymptotic Minimax Regret for Data Compression. [11] P. IEEE Trans. 55-61. [12] A. IEEE Trans. Nice. 2007. D. ISIT 2007. Wiley. Barron. 2004. [8] L. New York. [18] W. [21] P. [16] R. Type classes of tree models. and M. 1998. B. Szpankowski. Jacquet and W. 1981. . and M. 17. Seroussi. On Universal Types. [15] P. The Method of Types. 1998. Stanley. 50. IEEE Trans. J. Cszizar. Information Theory. Szpankowski. Average Case Analysis of Algorithms on Sequences. IEEE Trans. J. 19. 40-47. 44. Information Theory. Information Theory. Gambling. Whittle. Cambridge University Press. 2505-2523. Soc. Szpankowski. Cambridge University Press. 1999. Problems of Information Transmission. Seroussi. Enumerative Combinatorics. [6] I. IEEE Trans. Shields. Mart´n. 2001. II. Optimal sequential probability assignment for individual sequences. IEEE Trans. [20] M. Information Theory. [19] Q. Roy. Proc. 42. G. Sedgewick. [10] P. IEEE Trans. Information Theory: Coding Theorems for Discrete Memoryless Systems. 2006. 34. A. 1955. 2004. 2008.. IEEE Trans. Cszizar and J. [13] J. 520524. Szpankowski. Universal Noiseless Coding. Minimax Noiseless Universal coding for Markov Sources. Information Theory. Vol. Flajolet and R. Davisson. Drmota and W. [9] M. Information Theory. Universal Redundancy Rates Do Not Exist. Analytic Combinatorics. Feder. Weinberger. Rissanen. 2000. Theory. Information Theory. IEEE Trans. 39. 1994. J. IEEE Trans. 384-396. Fisher Information and Stochastic Complexity. D. Inf. Davisson.215. [7] L. 171-189.

Sign up to vote on this title
UsefulNot useful