Professional Documents
Culture Documents
Mixing
Springer-Verlag
New York Berlin Heidelberg London Paris
Tokyo Hong Kong Barcelona Budapest
Paul Doukhan
Department of Economy
University of Cergy-Pontoise
33, Bd. du Port
95011 Cergy-Pontoise
France
All rights reselVed. This work may not be translated or copied in whole orin part without the written
permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY
10010, USA), except for brief excerpts in connection with reviews or scholady analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the
fonnerare not especially identified, is not to be taken as a sign that such names, as understood by the
Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
987654321
to Geraldine
Introduction
These notes are devoted to the study of mixing theory. The underlying goal is to provide
statisticians dealing with problems involving weak dependence properties, with a powerful and
easy tool. Up to now, this approach to dependence has been mainly considered from an abstract
point of view. For an excellent review on this subject we refer to :
The aim of this work is to study applications of these results. We obtain bounds for the decay
of mixing coefficient sequences associated to random processes or to random fields which are
actually used in Statistics. In some cases we will give counterexamples which show that some
frequently held ideas are wrong. In fact, it would be of little interest to study a probabilistic
technique with no field of application.
This work is divided in two parts. In the first part we focus on the definitions and probabilistic
properties of mixing theory. The second part describes the mixing properties of classical
processes and random fields. Let us describe in more details the contents of these two parts.
Strong mixing is the weakest mixing condition used in statistics, even if it is much stronger
than the mixing conditions used in ergodic theory (2). We shall essentially focus on the strong
mixing properties of processes and fields. We only note some differences between the various
properties. Only note that mixing for random processes and for random fields are very distinct
notions. The corresponding conditions are classical in the case of processes. In one aim of
simplicity, we authoritatively define a notion of mixing random field; notations are very close
and the reader should take care to distinguish the mixing coefficient sequences associated to
random fields and to random processes. We do not intend to give very extensive results or
bibliography which may be found in the previously cited collective work. The Doctoral
Dissertation of A.V. Bulinskii (3) and the book by B. Nahapetian (4) deal mainly with the case
of random fields. X. Guyon (1992) presents the main statistical aspects of the theory of random
fields. The main basic other references are included in the Bibliography at the end of this
volume.
PART 2. The second part presents a review of examples. It also generalizes various results
already known in order to unify them. For instance, mixing properties for new models in
financial mathematics and linear or Gaussian fields arc studied here in detail. Prohahilistic
results may also imply a certain knowledge of the decay rate for the mixing sequences. Thus the
results presented determine the rate of decay of mixing sequences associated with the classical
modcls - in an optimal way for some cases. We show that most of the processes or fields used
in Statistics that satisfy some stability or ergodicity property are strongly mixing. The classes of
processes investigated are the Gaussian Random Fields, the Gibhs Markov Fields and the
General Linear Random nelds. The properties of discrete time Markov Processes yield the
properties of ARMA, Bilinear, ARCH and GARCH processes, as well as their non linear
versions. In order to investigate Continuous Time Processes we recall topics concerning the
hypermixing property.
This study is mainly self-contained hut for the hasic notions of Prohahility Theory and
Mathemutical Analysis. For the sake of shortness we do not present proofs for all the results.
Rather, we only give them when they arc easy or they correspond to fundamental results, in
order to introduce the reader to some of the suhtleties in mixing theory. We also include the
proofs of results which arc not well known. We intend to give an introductory exposition of
mixing techniques and references which will refer the reader to significative extensions as well
as to the proofs of the results included here.
Results are numbered linearly inside each chapter, thus Proposition 4 in chapter 1.5 will be
referrcd as Proposition 1.5.4 in anothcr chapter and only as Proposition 4 inside chapter 1.5.
Bihliographical comments at thc cnd of each chapter indicate developments as well as the
sources of previous results. The final hibliography includes the relationship hetween the
references and the related chapters of this volume. An index and a list of notations may be
useful tools for the reader. The end of a proof is identified by the symhol •.
I want to express my gratitude to all those who helped me while I was writing
this work specially Xavier Guyon, Abdelkader Mokkadem and Emmanuel Rio. Their results
arc widely developed in these notes. Many other colleagues and friends helped me, providing
for instance adequate references or performing calculations included here as Denis Bosq, Jean
Bretagnolle, Alexander Bulinskii, Hans Fiillmer, IIdar Ihragimov, Jose Rafacl Leon, JoaquIn
Ortega, AlexanderTsyhakov and Sergei Uteev. My orthograph was essentially smoothed hy
Joaquin Ortega. Marie-Claude Viano as well as the Referee have really improved the quality of
the manuscript hy their very attentive reading.
2. Examples ..•.•.........................................•............ 55
COORd) space of real valued continuous functions on [Rd, with limit 0 at 00.
cX(k; a, b) = sup {cx(A; 8); A, BeT, IAI::;; a, IBI::;; b}, mixing coefficient for
random fields.
u n
00 00
Iiminf An Am.
n=1 m=n
n
00 00
Iimsup An U Am.
n=1 m=n
IR d
J.1®V tensor product of the measures J.1 and v defined on a product cr-algebra.
10(a )1
a function st limsup t-~O() ___at.1 < 00
•
Landau's notation.
o(a)
a funetion st lim 1--)00 ___ 1_. = O. Landau's notation.
at
I. :"\
-C)
set of real numbers .
1.1 denotes either the standard norm of a normed space or the cardinal number
of a finite set.
In this part, we intend to present some of the most important properties of mixing
processes and random fields. For the sake of simplicity we shall omit some proofs as well as
some obvious generalizations. Our aim is to give some important asymptotic results as well as
some useful tools leading to them.
In the first section we present some of the measures of dependence between O'-fields and
their immediate properties.
In section 1.2 we present the reconstruction results for dependent random variables of
Berbee and Bradley and we give the fundamental covariance inequalities for dependent 0'-
fields. Together with the reconstruction results they are the only available basic tools of mixing
techniques.
We are then in a position to define mixing processes and mixing random fields in section
1.3. Mixing random fields may be defined in several distinct ways, only one of them will be
presented because it only involves the metric structure of the index set. We also present some
important relations linking some the various notions of mixing.
Section 1.4 is devoted to obtaining tools for the mixing theory. That is inequalities for the
moments of sums of mixing processes and mixing random fields. We give in detail a mixing
analogue of the Rosenthal inequality and we give without proofs other inequalities for moments
of sums. We also present exponential inequalities which generalize those of Hoeffding and
Bernstein to mixing cases. After that we recall the maximal inequalities given by Billingsley and
Moricz, Serfling and Stout in the case of processes, and by Moricz in the case of random fields
as well as Ottaviani's inequalities.
Coefficient (1) is called the strong mixing or a-mixing coefficient and (see (2)),
Coefficient (2) is called the absolute regularity or ~-mixing coefficient, it may be rewritten as
(2')
the supremum is taken over all the partitions (Ui), (Vj) of n with Ui E V, Vj E 'V'. Set
1P'lf for the restriction of IP to the a-algrebra V. This relation may also be written as
~(V, 'V')= 1I1P'lf®lPqr -1P'lf®qrIlVar and thus ~(V, 'V') is also the supremum of
f W [dIP 'If ®IP qr - dIP 'If ®qr] for random variables W defined on the product probability
space (nxn, V ®'V') with ° S; W S; 1.
The coefficient (3) is the uniform mixing coefficient or <!>-mixing coefficient. One can prove that
(3') <!>(V,'V')=ess-sup{IIP(V/V)-IP(V)I;VE'V'}.
The coefficient (4) is the *-mixing or ,!,-mixing coefficient. It may be shown that
1
(4') \jf(V, 0/) = ess-sup{- HP(V/V) - lP (V)I; V E 0/, lP (V) :f. O}.
lP(V)
Remark 1. The ranges of the previous dependence coefficients are defined by the following
inequalities
Os;a(V,o/)s;~, OS;q,(V,o/)S;I, OS;\jf(V,o/)S;oo, OS;p(V,o/)S; 1.
Moreover if one of them vanishes then the related a-fields V and 0/ are independent.
The inequality p(V, 0/) S; 2 q, 1I2(V, 0/) is a consequence of Theorem 3-(1), § 1.2.2 for
the case p = q = ~. The symmetric inequality p(V, 0/) S; 2 q,112(V, 0/) q,112(0/, V) is
I
proved using the same arguments (see Peligrad (1983». Let X = I, Xi 11 u. and
i=i 1
J
Y = I, Yj l1 y . be simple V and 0/ -measurable random variables, Shwartz's inequality
j=i J
yields
ICov(X, Y)12 S; I, x~ lP(V i) [I, lP(V i) {I, Yj IlP(Vj I Vi) - lP(V j )I}2]
i i j
S; [EX 2 [I, lP(V i) (I, y]llP(Vj I Vi) - lP(Vj)I)(I, IlP(Vj I Vi) - lP(Vj)I)]
i j j
S; [E X2 [E y2 Maxi I, IlP (Vj I Vi) - lP (Vj)1 Maxj I, IlP (Vi I Vj ) - lP (V i)1
j i
Let Ai (resp. B i) be the union of those Vj for which lP(Vj I Vi) - lP(Vj);:::: 0 (resp. < 0),
I, IlP (Vj I Vi) - lP(Vj)l::; IlP(Ai I Vi) - lP(Vj)1 + IlP(Bi I Vi) - lP(Vj)1 S; 2 q,(V, 0/).
j
A symmetry argument completes the proof.•
We also recall the following result in Bradley (1986), part (b) is due to Csaki & Fischer (1963).
Theorem 1. Let Vn and o/n denote two sequences of a-fields such that ( Vn vo/n)n~J are
independent then
Properties: Dependence of a-algebras 5
~ L c( V n, 'V'n}' if c = a,
GO
This theorem may be used in order to symmetrize mixing random variables. Let c denote the
dependence coefficient used between the a-fields generated by random variables X and Y. It is
usual to consider an independent copy (X', Y') of the random variable (X, y) in order to use
Paul Levy's symmetrization inequality relating the tails of X and X - X'. The previous result
relates the dependence coefficients associated to the a-algebras a(X), a(Y) to those associated
to the couples of a-algebras a(X, X'), a(Y, Y') or a(X - X') and a(Y - Y').
Many other dependence coefficients have been introduced. For instance, consider for a, b ;::: 0
(6) aa b('lJ, 0/) =Sup{ HP(U) ~(V) - ~(UnV)I; U E 'lJ, V E 0/, IP(U) IP(V);t:O}.
, IP (U) IP (V)
In the sequel (0, ~, IP) will always denote the underlying probability space.
The coefficient a has been introduced by Rosenblatt (1956). The ~-mixing coefficient,
introduced by Kolmogorov, first appeared in the paper by Wolkonski & Rozanov (1959).
Ibragimov (1962) introduced the coefficient cp, see also in Ibragimov & Rozanov (1978). Blum,
Hanson & Koopmans (1963) the *-mixing coefficient. Hirschfeld (1935), Gebelein (1941)
introduced the coefficient p and Kolmogorov & Rozanov (1960) defined the corresponding
dependence condition. Moreover Bradley (1983, 1985, 1987), Peligrad (1983) and Bulinskii
(1984, 1987, 1989) introduced various related measures of dependence. Among them,let us
mention also the Information Regularity Coefficients introduced Wolkonski & Rozanov (1959)
and related to the classical measure of entropy. Statuljavichus (1983) describes an almost
Markov regularity coefficient. A variant of which is later used extensively in Veijanen (1989).
Further information may be found in Bradley (1986) and Bulinskii (1984 & 1989 b). Bradley
& Bryc (1985) and Bulinskii (1987) define aa,b.
This part presents the basic tools concerning weakly dependent a-fields. These tools are
the reconstruction techniques and the covariance inequalities. They are summed up in the two
forthcorning subsections.
Let E and F be two Polish spaces and (X, Y) some ExF-valued random variable. We shall set
P= p(a(X), a(Y)) and a = a(a(X), a(Y)) for the mixing coefficients relative to the
a-algebras generated by X and Y. We assume throughout this section that the probability space
on which X and Yare defined is rich enough to define another random variable with uniform
distribution on the interval [0,1] and independent of X and Y.
Theorem 1. A random variable y* can be defined with the same probability distribution as Y,
independent of X and such that [P(Y:;f: Y*) = (3. For some measurable function f on
ExFx[O, 1], and some uniform random variable .1 on the interval, y* takes the form
y* = f(X, Y, .1).
Lemma 1. Let T be a finite set with cardinality N and let A and /.l be two probability measures
on T. There exists a probability distribution V on T with marginals A and /.l such that
v(f(t, t)}) = A({t}) A /.l(ft}).
Proof of Lemma 1. Order the set T = I tl , ... , t N } in such a way that Ai ~ ~i for
i = 1, ... , k and Ai > ~i for i = k+1, ... , N, where Ai = A({tiD and ~i = ~({tiD for
k N
i = 1, ... , N. The relation a =I. (~i - Ai) = I. (Ai - 11) follows from the fact that the
i=1 i=k+1
total mass of a probability measure is equal to 1. We define the NXN-matrix Q = (qi)ij by
setting
if j ~ k and i > k,
if j~N,
otherwise.
Sketch of the proof of Theorem 1. Let A be a Borel set in E such that [P (X E A) > o.
We first assume that Y is atomic and let ffi = {B I , ... , B N } the set of Y's atoms. Set
8 Mixing
[P((XE A)n(Y E B i ))
T = {I, ... , N}, AA,i = and !Li = [P (Y E B i). The probability
[P(XEA)
distribution v A built as in Lemma 1 on T2 satisfies
Cb ~ 1 ~ 1[P((X E A)n(Y E B i)) - [P(XE A) [P(Y E Bi)1
c( w, A) = IIv A - II.A ®!LIIVar= 2" £.., .
i=1 [P(XEA)
Setting !D (AxB ixB j) = v A,i,j [P (X E A) one obtains a distribution on ExF2. Note that
[P((X E A)n(Y E B i)) = !D(AxBixF) and [P(XE A) [P(Y E B) = !D(AxFxB/ There
exists an F-valued random variable Y* such that (X, Y, y*) has the distribution !D. From
Lemma 1, we obtain that [P(Y"* Y*I XE A) = c( ffi, A), hence
[P (Y "* y*) = Sup L c( ffi, A k ) [P (X E A k ) = ~.
k
The supremum is considered over measurable partitions {Ak } of E. To extend the proof to non
atomic random variables Y, choose (1) a sequence ffi n of finite measurable partitions of F such
that
Theorem 2. Let rand q be positive numbers. If Y is a real random variable with moments up
to order y, then a random variable y* can be defined with the same probability distribution as
Y, independent of X and such that
IP(IY - Y*I :::?q) ~ 18 (a2(lEIYlrFrt(2r+1).
q
For some measurable function f on ExFx[O, 11, and some uniform random variable .1 on the
interval, y* takes the form y* = f(X, Y, .1).
We shall only sketch the proof of this result which may be found in Bradley (1983 b). The
following lemma gives a sharp inverse bound relating ~ and a, when Y is a discrete random
I Proceed as in Bryc (1982). From the tightness of Y's distribution choose a compact set Kn such that
lP(Y;; K)::; n-I. Compactness allows to determine elements (x.I, n) of Kn such that the n-I-balls centered at
those points are a finite covering of Kn' Let J!1,n denote the union of such ball, then limn !f(Y 1J!1, n) = Y in
probability. Some subsequence of J!1, n is thus convenient.
Properties: Basic Tools 9
Lemma 2. If the probability distribution of Y is atomic with N atoms, then f3 ::; {8N a..
Proof of Lemma 2. Szarek (1976)'s bound in Khinchin inequality leads Bradley (1983 b)
to prove that it is possible to extract subsets S c {I, ... , M} and T c {I, ... , N} from a real
valued matrix A = (aij)!Si:>M,!SjSN such that I~ ai) ;:: "3~N ~ lai} Let now B I ,· .. , BN
SxT I,J
Proof of Theorem 2. Using Lemma 2 and Theorem 1, Bradley proves that if HI"'" HN is
a mesurable partition of the support of Y's probability distribution in IR with [P(Y E Hi) > 0,
then it is possible to construct y* with the required properties and such that
L
N
[P(3 i; Y, y* E H i):::;.,j8N a.. For this use the discrete random variable Y I = Yi 1l{YeHi }
i=1
where Yi is chosen in Hi' Let now a be some positive real number, m be some integer and
N = 2 m + 3. Set Hi = [~+ (i - l)a, ~+ i a[ for Iii:::; m and H_ m- 1 =]-oo,-~-ma[,
a
Hm+I=[Z+ ma,+oo[.
Then [P (IY - Y*I ;:: a) :::; {8N a. + 2 [P (IYI ;:: ~ + rna). Use Markov inequality provides a
bound for the second term. Choose the best possible values a and m yields the result. •
Let X and Y be measurable random variables with respect to V and CV' respectively. Recall
that we denote IIXlip = (IE IXIP) lip for p < 00, and IIXII= = ess-sup IXI. An essential property
of the mixing coefficients defined in § 1.1. is given by the following covariance ineqUalities.
Theorem 3.
(1) ICov(X, Y)I::;8 ex.
11
reV, 0/) IIXllp IIYllq,for any p, q, r 21 and
1
p 1
+ q+ r = 1.
1
(3) ICov(X, Y)I ::; 2 cpl/P( V, 0/) IIXllp IIYllq,for any p, q 21 and ~+ ~ 1.
(4) ICov(X, Y)I::; lfI( V, 0/) IIXlljllYll1'
(5) ICov(X, Y)I ::;p(V, 0/) IIX11 2 11Y11 2 •
10 Mixing
Remark 1. For the sake of homogeneity with § 1.1, we do not give any inequality (2)
concerning ~-mixing, the power of this notion lies in the reconstruction Theorem 1, § 1.2.1.
Inequality (1) is due to W olkonski & Rozanov (1959) in the case p q 00 and to Davydov = =
(1970) in the present form. Davydov (1968) proved it in the weaker form where the constant 8
is replaced by 12. Inequality (3), due to Ibragimov (1962), may be found in Billingsley (1968,
p. 170). Inequality (4) is due to Blum, Hanson & Koopmans (1963). Inequality (5) is evident
taking in account the defInition of p.
Remark 2. Almost all of these inequalities are easily extended to separable Hilbert space
valued random variables. However, the inhomogeneous mixing condition (1) is much more
complicated to extend in this setting. This was achieved by Dehling (1983), using deep Banach
space theory arguments. The factor 8 is then replaced by 15 in Inequality (1).
Remark 3. Bulinskii replaces LP-norms by convenient Luxemburg norms (see (2)) in (1), in
Bulinskii (1987, 1990). He gets inequalities in Orlicz spaces. The loss due to the
inhomogeneity of inequality (1) is thus reduced as much as possible. That is, let ell E ff p'
'I' E
ff q for p+q=
11
1. Setting e(t) =t eIl(C
-1-1-
) 'I'(t- ) for eIl(t) = Inf{s ~ 0; eIl(s) ~ t}, the
following inequality due to Bulinskii (1987) extends inequality (1) (see (3))
ICov(X, Y)I ~ 10 8(a(V, 0/)) IIXII<f> IIYlI'f'.
We do not present those results in detail because the constraint to get better covariance
inequalities yields a big loss on the function of the mixing coefficient; see Bulinskii, Doukhan
= =
(1987). For instance if eIl(t) tP InU(Uvt) and 'I'(t) tq Inv(Yvt) are as previously for U and
Y big enough, we get e(t) = O(ln- w (Wvt)) for w = ~ + ~ and W big enough. It is worth
mentioning that before the systematic work of Bulinskii, Hermdorf (1985) used Orlicz norms
in order to prove central limit results.
Proof of Lemma 3. Let u = sign(1E (Xlo/) - IE X), v = sign(1E (YI '\.!) - IE Y) (using
notation (4)), then: ICov(X, Y)I = IIE(X IE (YI'\.!) -IEY)I::; IIXlioo IE lIE (YI'\.!) -IEYI
::; IIXlioo IE (v(1E(YI'\.!) - lEY)) ::; IIXlioo lIE (vY) - IE v IEYI
Similar arguments lead to ICov(X, Y)I::; IIXlloollYlioollEvu -lEv lEul.
Let now U+ ={u = I}, U- = {u =- I}, y+ = {v = I}, Y- = {v =- I}, then
2 Set 'fF p = (ell: IR+ ~ IR+, eIl(O) = 0, eIl;c 0 and convex, x-Pell(x) i} for p> I.
IIXII
Luxemburg norms are defined for ell e 'fF P by IIXII<f> = Inf{ t > 0; IE eIl(-t-) :s; I}.
3 In the case of separable Hilbert valued rvs we replace, in Bulinskii, Doukhan (1987), the factor 10 by 16.
4 sign(x) = -I, 0, or I according to the fact that x < 0, x = 0 or x> O.
Properties: Basic Tools 11
Hence,l[vu-[v[ul:5:4a(V,o/).+
Now Markov's inequality leads to [I X I :5: 2 a [IXIP. Set [I XaplP = a (V , 0/) then
- aP
I-lip
Icov(X, Y)I :5: 6 a (V, 0/) IIYII~ IIXlip. +
Alternative proof for (ii) and (iii). Define X=X11{IXI$;aj and Y=Y11{IYI$;b)" Write
cov(X, Y)=cov(X, Y)+cov(X, }':)+cov(~, Y)+cov(~,}':) which follows from the
Hence
[IYlq [IXIP {IIXlip IIYllq}r/(r-I»)
Icov(X, Y)I :5: 2 a b ( 2 a(V, 0/) + - - + - - + -=- -b .
bq aP a
Choosing the best constant a and b in this expression would lead to a constant smaller than 8 in
. . [IYlq [ IXIP
(1). The value 81S obtamed for - - = - - = 2 a(V , 0/). +
bq aP
12 Mixing
Markov chains. The related paper by Bosq (1991) is also of interest. We also omit the results
by Gordin (unpublished, Vilnius conference, 1975) or Gordin (1969). He approximates
mixing sequences by martingales, having in view a CLT as in Mc Leish's result, Theorem 1, in
§ 1.5.1., see in Hall & Heyde (1980).
The previous results in § 1.2.2. may be found in the initial papers. See Blum, Hanson &
Koopmans (1963), Davydov (1970), Rosenblatt (1956), Ibragimov & Rozanov (1974) and
Wolkonski & Rozanov (1959). Additional informations may be found in Billingsley (1968), in
Doob (1953), in Hall & Heyde (1980), in Ibragimov & Linnik (1974), in Iosifescu (1980), in
Rosenblatt (1971) or in Roussas & Ioannides (1987). Bradley (1986) and Peligrad (1986)
present other measures of dependence. Bulinskii (1987) proves sharp ineqUalities in Orlicz
space. The work of Dehling (1983) extends the mixing inequality for covariances to separable
Hilbert spaces in the difficult strong mixing case; Bulinskii, Doukhan (1987) extend this result
to Orlicz spaces. Finally Rio (1994) proves a new covariance inequality (5) which does not
involve LP-norms but integrals of quantile functions with respect to the distribution generated
by the strong mixing sequence: this inequality is sharper than Davydov's (1970).
J
2et
5 Set Qx(u) =Inf{t; !P(IXI > t):5 uJ, then Icov(X, YI:5 2 Qx(u) Qy(u) du for rvs X, Y with a finite
1.3. Mixing
Once the measures of dependence between two <i-algebras have been introduced,
various notions of mixing may be defined for general processes. A multitude of definitions of a
mixing random field can be introduced, we shall focus on the simplest in § 1.3.1. The
definition of a mixing random process proposed in § 1.3.2. is the classical one. Useful
definitions from ergodic theory are recalled in § 1.3.3. and relations between the mixing
notions for processes and fields are given in § 1.3.4.
We shall thus write ux(k; u, v), ~x(k; u, v), Px(k; u, v), <l>x(k; u, v) or 'l'x(k; u, v).
Many other coefficients could be introduced here, depending on the stucture of the index set T.
Let note also that cx(k; u, v) is a decreasing function with respect to k and an increasing
function with respect to u and v. We shall make use of a classical convention setting
cx(k; u, 00) = SUPy cx(k; u, v) and cx(k) = cx(k; 00, 00). The main interest of those
coefficients is perhaps the fact that they only depend on <i-fields. Thus considering Y = (Yt)
with Yt = ft(X t) instead of X = (Xt) may only make the corresponding coefficients decrease.
Another definition of mixing (see e.g. Bradley (1986» does not satisfy this property.
Let X = (Xt)teT be a real second order random field, set Lx(A) for the closure in L2(Q) of
Span{Xt; tEA}, the vector space spanned by XA . A measure of dependence of X is defined
using the linear correlation coefficient or r-mixing coefficient
This property is a second order property, only depending on first or second order moments of
X. This mixing notion is the cosine of the angle beetween the linear spans of paste and future in
the Hilbert space L 2(Q). The inequality rx(A, B) ~ Px(A, B) is clear and a reverse inequality
holds for the gaussian case (§ 2.1.); indeed in this case the whole distribution of a random field
is determined by second order properties. Linear correlation mixing can be defined following
the same lines as for other coefficients.
16 Mixing
IAI:S;u
IB I:S;v
dCA, B) ~ k
Remark 1. Bulinskii (1987) gives a wider definition for random fields indexed by :zd.
C(B)
C(A)
IAI:S;u
IBI:S;v
d(C(A), C(B» ~ k
This notion is thus really weaker than the classical one given before. However this complicated
notion does not appear to be of interest to us because the current random fields used in statistics
fit our definition of mixing. This kind of unnatural generalization will thus be systematically
omitted.
An important inequality was recently proved by Bradley (1991 a). It shows that a-mixing and
p-mixing - uniform with respect to u, v - are equivalent conditions for stationary random fields
indexed by Z d for d > 1.
Theorem 1. If X = (Xt)tE Zd is a strictly stationary random field and has the mixing
property limk--7oo aX(k; 00, 00) = 0 ,then
ax(k; 00, 00) S Px(k; 00, 00) S 21r:a X (k; 00, 00).
Sketch of proof. This inequality is well known in the Gaussian case (see § 2.1) and thus it
will be proved here using a Central Limit Theorem argument. Let e > 0 be arbitrarily small real
number, then there exist A and B, two finite subsets of T separeted by a distance k with
peA, B) 2: p(k, 00, 00) - e. There also exist adapted and normalized rvs X = f(X A) and
Y = g(X B) with [XY = peA, B). Assume first that d l' 1. There is some direction e,
orthogonal to the one minimizing the distance of the subsets A and B. The shifted rvs
Xi = f(XA+I(i)e)' Yi =g(XB+I(i)e) are equally distributed and almost independent if I(i)
increases very fast to infinity with i. If d = 1 the rvs Xi and Yi take the same form but e
denotes here a positive number larger than the diameter of AuB. Thoses rvs obey a CLT and
n n
Theorem 2.1.1 may now be used with the gaussian limits of _~ I, Xi and _~ I, Y i'.
'I n i = I ' I n i=1
The following result in -Bradley (1989) shows that /3-mixing is a trivial notion in the case of
random fields indexed by Z d if dependence on the cardinality of the subsets considered is not
allowed. This will appear clearly all along the examples.
=
Sketch of proof. In fact, Bradley shows that /3 x (k; 00, 00) 0 or 1. The proof is similar
to that of Theorem 1 and we assume as well first the assumption d l' 1. It runs as follows.
Assume that /3(X(A), X(B)) > 0 for some finite subsets A and B separated at least k. There
1 A random field (Xt)tE T is said to be m-dependent if for any subsets S, S' c T, d(S, S') ;0: m implies that
X.S is independent of X. s'; this implies in turn that for all the dependence measures cX(m; u, v) = o.
If (Zn) is an independent sequence then for any finite non zero sequence (a 1, ... , am) the moving average
process Xn = a 1 Zn + ... + am Zn_m+l is m-dependent but not (m-I)-dependent.
18 Mixing
the total variation on X(AuB) of the difference of the image distribution P = [Pxand the
AvB
product Q = [P X ®[P X . Under the distribution P, the sequence (Y i) is ergodic so that
A B
li~~~ ~ (Y I + ... + Y n) = P(AnB), P-a.s. The same holds for Q. This and the relation
Q(AnB) = P(A)P(B) *" P(AnB) imply P[limn~~ ~(Y 1+"'+ Y n) = P(AnB)] = 1 and
Remarks 2. The previous argument does not work in the case of mixing processes (this
notion defined for d = 1 in section § 1.3.2 does not allow to consider interlaced subsets A and
B; Example 3 provides a counterexample) neither without the mixing assumption (set Xt = B
for some binomial random variable with a small parameter). The part d=1 in the proofs of
Theorems 1 & 2 seems to be new and was suggested by an anonymous referee.
In most of the interesting cases, we shall see that there is no dependence over u or v, and we
shall thus write cX,k = SUP{cX,k;u,v; u, v ~ O}.
A B
15555555 ~N~~~N~~~~~~\'<
o t+k
Definition of a mixing coefficient for processes
Figure 1.3.3.
The previous definitions are non overlapping as show the following examples.
Properties: Definition of Mixing 19
Example 1. There exist stationary processes which are a-mixing without being p-mixing,
or a-mixing without being p-mixing. Examples are given in Bradley (1981 a, 1980).
Example 3. For stationary Markov processes - see § 2.4. - Bradley (1986) states that
c n = C(Ci(X O)' Ci(X n» for the previous measures of dependence. Moreover if p, <I> or
\jI-mixing condition holds, then the decay of the corresponding sequence is geometric. A
geometrically ergodic Markov process which is not Doeblin recurrent is p-mixing and not
<I>-mixing. The simple AR-process Xn+l = ~ Xn + Zn for some independent and identically
JV'(O, I)-distributed sequence (Zn) is so. However, if the distribution of (Zn) is binomial, then
the process is not even a-mixing; see Andrews (1984). It is possible to construct a stationary
Markov process with denumerable states, such that the decay of an is less than geometric; see
Davydov (1973), examples 1 and 2 or Kesten & O'Brien (1976), corollary l. Such a process is
a-mixing and not p-mixing in view of previous results. Moreover Rosenblatt (1971) gives an
example of a p-mixing real valued stationary Markov process which fails to be p-mixing.
Finally we recall the celebrated example of continuous fractions introduced by Levy, see e.g.
Billingsley (1968). Any real number x in the interval ]0, 1[ may be expanded in an unique way
in the form
t
Consider the probability distribution with the density [In 2] 1 + x] on the interval ]0, 1[. It is
shown in Philipp (1970) that the subsequent process (Xl' X 2, ... ) is 'V-mixing and that the
\jI-mixing sequence has a geometric decay.
* fr
~-mixing ~ a-mixing
\jI-mixing ~ <j>-mixing ~ { &
p-mixing : a-mixing
Asymptotic results concerning mixing coefficients for mixing sequences are included in the
work of Bradley (see Bradley (1986) for a review). In particular it may be shown that the
mixing sequences have very few asymptotic possibilities (that is, few possible limiting values)
under ergodic theory mixing assumptions: they equal 1 or decrease to O.
IT> (Em r p) ::; _~ (l + 0(1» where the 0 is uniform with respect to r. Set D = lw, u+w],
, , '11tm
b) Bulinskii and Zhurbenko (1976) gave such an example of random field for which
limk-7~ ax(k; u, v) = 0 and ax(k; =, =) = ~. This example is also based on Bernoulli
sequences. Let (11 n)nE;Z be as before an independent and identically distributed sequence of
Bernoulli random variables, they set Xn(k) = 11 a(k) 11 b(k) 11 c(k)' where a(k) = 4 k-I ,
b(k) = 2 4 k- l , c(k) = 3 4 k- 1, n(k) = 4k - 1 for k> 1 and else, a(l) = 0, b(1) = 1,
c(I) = 2, n(I) = 3, and Xn = 11n if n is not an element of the sequence n(k). Here,
2 As a matter of fact, part 2 is mainly devoted to give useful examples of processes such that
limk .... ~ c X•k = O.
3 That is : IP (e n = I) = IP (e n = - I) = 12 and IP (T] n = 1) = IP (T] n = - I) = 12 .
Properties: Definition of Mixing 21
Let X = (Xn)ne Z be a stationary process, for convenience assume that the underlying
probability space is the canonical space IR Z equipped with the probability IP, defming the
distribution of X on the Borel sets $ of IR Z . Let 9 be the shift operator defined by
(9 X)n = Xn+l·
X = (Xn)ne Z is mixing in the sense of ergodic theory if for any events A, B in $,
limn~co IP(An9 nB) =IP(A) IP(B).
X =(Xn)ne Z is ergodic if the a-field of invariant events is trivial [if the event A is invariant
by 9 then IP(A) = 0 or IP(A) = 1].
Let $ (IR Z -) denotes the Borel field on IR Z -, set of the bilateral sequences vanishing for
nonnegatives values of the time index. X = (Xn)ne Z is uniformly ergodic if the following
holds liffio~ SupA,Be aJ{lR z-) I~~ IP (An9 nB) - IP (A) IP (B)I = O.
x = (Xn)ne Z is regular if the tail a-field X -co = r;' Xl-co,tl is trivial.
Now mixing in the sense of ergodic theory implies ergodicity. Regularity implies mixing in the
sense of ergodic theory. Strong mixing implies regularity as well as uniform ergodicity.
Uniform ergodicity implies ergodicity, and a mixing uniformly ergodic stationary process is
strongly mixing (Rosenblatt, 1972).
We introduce now a terminology used mainly in the case of Markov processes (see
n-l
§ 2.4.). Let Nn(A) = Lll A(Xk) be the time spent by the process X in the measurable set A
k=O
n-l
until time n. This defines a measure setting Nn(f) =L f(X k ) for any random variable f on
k=O
(E, ~). The process X is said recurrent if there exists a a-finite nonnegative measure fl such
that for any fl-integrable bounded random variables f and g with Jg(x) fl(dx) "" 0 on (E, ~)
we have the following ergodic theorem
N (f) Jf(x) fl(dx)
Nn(g) n~co Jg(x) fl( dx)
_n_ ~ a.s.
The process X is said positive recurrent if fl is bounded and null recurrent else. fl is called a
stationary distribution in the positive recurrent case.
22 Mixing
1.3.4. Miscellany
Note that cX,k;u,v :5; cx(k; u, v) thus a mixing field on the index set Ili or 2 is always
a mixing sequence. Very interesting inverse results follow from Takahata (1986)
Theorem 3. Let (Xt)tET be a random process indexed by some subset T of IR, then
ax(k; u, v) ::; 3 (v -1) f3x , k;uvv,uvv ¢x(k; u, v) ::; 3 (v -1) lfIx, k;uvv,uvv'
A f3-mixing (resp. lfI-mixing) random sequence is an a-mixing (resp. ¢-mixing) random field
(see (4)).
From now on we shall essentially focus on the strong mixing properties of processes
and fields. Such properties are the weakest which can be used in statistics even if they are much
stronger than those generally used in ergodic theory. Indeed in order to construct tests of
hypothesis, Statisticians need limit theorems in distribution.
Authors cited in the previous chapters have naturally defined the corresponding notions of
mixing structures as it may be seen by reading the papers or the books (6) by Billingsley
(1968), Blum, Hanson & Koopmans (1963), Bradley & Bryc (1985), Bradley (1983, 1986,
1987), Bulinskii (1989), Doob (1953), Gebelein (1941) and Hirschfeld (1935), Ibragimov &
Linnik (1971) [they detail the links between the various notions in ergodic theory and mixing],
Ibragimov (1962, 1975), Ibragimov & Rozanov (1978), Iosifescu (1980), Peligrad (1983,
1986), Rosenblatt (1956, 1985), Roussas & Ioannides (1987), Wolkonski & Rozanov (1959).
A detailed description of examples in Davydov (1973) and in Herrndorf (1983) is given in
Nahapetian (1991). Linear correlation coefficients are not intrinsic and the present exposition is
short; for this see Bradley (1986) and Bulinskii (1989) and Remarks 1.5.1 & 1.5.2 or
§ 2.1.2., moreover Bradley (1985) relates p-mixing with strong mixing.
Takahata (1986) relates the mixing properties of fields to those of processes. Bradley (1986)
and Bulinskii (1989) give various examples proving that the notions presented are sharp in the
sense that it is possible to construct processes satisfying a notion of mixing and not another,
subject to the natural restrictions given by the Proposition 1.1.1.
I think it is also important to recall here some alternative ways of defining mixing introduced in
Gastwirth & Rubin (1975) - see also in Hall & Heyde (1980) - they involve norms of linear
operators following Rosenblatt (1973). Withers (1981) is interested in linear measurable
functionals of the initial process. Mac Leish (1975) introduced a generalization of martingales
integrating the mixing processes. Finally, in view of statistical applications, we refer to Duflo
(1990) and Meyn & Tweedie (1992). The ergodicity assumption is replaced by the weaker one
of stability for Markov processes.
Properties: Tools 25
1.4. Tools
This chapter is devoted to give and/or to recall the most important tools known in the
field of mixing theory. All of them follow from the results in § 1.2. Three subsections include
respectively:
The main interest of Rosenthal moment inequalities is their use for triangular arrays;
they lead to evaluations of the oscillation of empirical processes (see for instance Doukhan,
Portal (1987) or Massart (1987)) and give the right bound for integrated moments of non
parametric estimations (see for instance Doukhan, Portal (1983), Doukhan & Bulinskii (1987),
or Doukhan (1991)).
Let (Yt)tET be a 'finite family of real random variables, we shall use the following
notations.
D('t, E, T) = L('t, 0, T) if 0 < 't ::;; 1, E ;::: 0, and
for E> 0, D('t, E, T) = Max{L('t, E, T), [L(2, E, T)l~/2l, if't> 2
D('t, E, T) = L('t, E, T) if 1 < 't ::;; 2,
with L(Jl, E, T) = I.tET (lElYt 1~L+€)Jl/(Jl+e) = I.
tET
IIY tll~+E.
Example 1. Set MIl,E = SUPtE T IIY tll~+E' then L(Jl, E, T) ::;; ITI MIl,E. Thus, in this case,
~/2 ~12 ~12 ~/2 .
D( 't, E, T) ::;; Max {ITI M1:,E' ITI M 2,El. The second term has order ITI M2 E III the
stationary situation. Let us set precise bounds of these expressions in the special case of
triangular arrays Yt = fn(X~ and for T = {I, ... , nl. Assume that (Xt)tE £' is a stationary
a Jl ·aJl/(Jl+e)
Let fbe a density function on the real line and fn(x) = fen x), IIfn(XOlII Il+E = O(n ) if
Xo has a bounded density. Different choices of a yield very different behaviours of D('t, E, T).
26 Mixing
This example comes from kernel estimation; it shows the interest of such a complicated
formulation of the forthcoming results.
(1) ~ sr bC'u
£... r [ u y(r; u, v)]EI(C+E) < 00.
r=1
Here sr and br denote the maximal value of the cardinal number of a ring with thickness 1 and
radius r or of a ball with radius r in the metric of the index set I of Y. Moreover condition (1)
will be empty for 't ::;; 1 and then we shall set c = O.
For the case of Zd consider the metric Izl = maXi IZil for Z = (zl"'" zd) in Zd. It is easy to
see that there are constants 8d and 0d> 0 only depending on d with br ::;; 8d rd and c r ::;; 0d rd-l.
Theorem 1. Assume the previous strong mixing assumption (1) for some 't", e> 0 and let c
be the smallest even integer such that c ;? 't". Let T be any finite subset of ;Zd. If Yt belongs to
L 1:+10 and is centered for t E T, then there is a constant C only depending on 't" and on the
mixing coefficients of Y, ay<r; u, v), for u + v::; c, such that
!ElL Y t 11: ::; C D( 't", e, T).
tET
The case of processes is analogue (I). In fact the strong mixing random fields - with
index set Z - are strong mixing random processes. However, the order structure of the line
yields a similar result. Consider the mixing assumption,
(2) :3 e > 0, :3 c E 2!l'l, c;::: 't : L (r+ l)c-2 [UY,r]EI(C+E) < 00.
r=1
Theorem 2. Assuming the previous strong mixing assumption (2), let T be afinite subset of
f1'I such that if t E T, then Yt belongs to L HE and is centered. Then there is some constant C
only depending on 't" and on the mixing coefficients of Y, a y , r such that
!ElL Y t 11: ::; C D('t", e, T).
tET
Remarks 1. For 0 < 0 ::;; 1 the inequality (x + y)O ::;; XO + yO, holds for x, y;::: O. It
concludes the proof of the previous results. In this case C = 1, 't = 0 and this inequality has
nothing to do with mixing. Else the previous results are a consequence of Uteev (1984),
Bulinskii, Doukhan (1990), Doukhan, Leon & Portal (1984), and Doukhan, Portal (1983,
1 In fact the strong mixing random fields with index set £. are strong mixing random processes, see
Theorem 1.3.3.
Properties: Tools 27
1987) for 0 > 1, and are proved below. They are extensions of the Rosenthal inequality. That
is also the case of the weaker and partial results in Dasgupta (1988). This work is perhaps the
fIrst one of this kind (2).
In the independent case, the result in Theorems 1 and 2 is called Rosenthal's ineqUality; see
Petrov (1975) or Hall & Heyde (1980). It holds with £ = 0 and it is optimal. The only loss
here is the fact that £ > O. Moreover, a sharp constant C with order T,'t!2 is given in this case.
The following interpolation lemma, due to Uteev (1984), will allow us to consider only the case
when T, is an even integer. Remark that contrarily to Uteev, we do not allow, here, the value
£=0(3).
Let F = (if)lg::;n be a family of sub a-algebras of.54, and B be a separable Banach space. A
family of centered random variables TJ = (TJi)l::;i::;n' defIned on a space (n, .54,), is said to be
(F, B)-adapted if the random variable TJi is B-valued, fFCmesurable
Lemma 1. Assume that, for some v ~ 1 and a fixed constant c, any family 1] = (1] ih 5i5n'
n
centered and (F, B)-adapted satisfies LElIL i=l
1]iW S c Q(v, 8, 1]).
Then if 8 > 0, for any t sv, there is a constant C = C(c, v, 8, t) = c 24 (v(v-t)/o+1) such
that any centered family cp = (CPi) 15i5n' (F, B)-adapted satisfies
n
LElIL CPi lit s C Q(t, 8, cp).
i=l
lit
Proof of Lemma 1. Set Q = Q(t, 0, <1», y = Q , <l>i = 'l'i + TJi' 'l'i = Ti - [T i,
TJi = Y i - [Y i' with Ti = <l> i ll {1Iq,i"::; y}' Y i = <l> i ll {1Iq,i" > y} for i = 1, ... , n. The
convexity of [x -7 xt] yields
2 An attentive reading shows that the manuscript was first proposed in 1982.
3 This point was not clear in the proof by Uteev and the present proof comes from personal discussions with
A. Bulinskii.
28 Mixing
(L
n n n
[ilL ll j lll::;; 2 - I (
V
[IL {lIlljlltlv - [lllllV} IV + [Hlll V f),
j=1 j=1 j=1
n n
u) M(v, 0, \jf) = L
n
([II\jfjllv+ Of /(v+O)::;; 2 v Ln ([ IITl(v+O)t(U(V+O)
i=1 i=1
h
were · u = v(t+o)
we d e f me --> _ 1 so t h at rL
IL
liT iIIU(v+O) <
_ Y(u(V+O)-(I+O» rL
IL
11m
't'j 111+0 an d
t(v+o)
M(v, 0, \jf) ::;; 2 v ( Q(t, 0, <1»
) vlt
.
This inequality yields, considering the different cases, Q(v, 0, \jf) ::;; 2 v Q vii.
o II v+o v/(v+o) l(v+O)/v
L L
tI 0 v/(v+o)
v) M(v, 0,1;) = ([llIllili v - [lIll jll VI) ::;; 2 v ([lIll jll ) and
i=1 j=1
v+t n
analogously, M(v, 0, 1;)::;;2 L ([IIY jlt+o)
l)
11(1+0)
::;;c2
v+t
Q.
i=1
w)
)V ( 0 tI )V
W = ( L [Hlllv ::;; 2 v L ([HYjll) v .
n
j=1 j=1
t ~v-0 yields ([IIY jll)tlv = ([P(II<1>jll>y) [(II<1>jll/ll<l>jll > y»tlv.
tlV tlv tlv-I t
We get from Markov inequality ( [IIYjll ) ::;; [P (lI<1>jll>y)y ([(II<1>jll I II <l>j II > y» .
tlV tlv IIv-1 I
Jensen inequality implies ( [IIYjll ) ::;; [P (lI<1>jll > y) y [(lI<I>ill/ll<l>jll > y).
tlV tlv-l tlv-I I
Thelastinequalityisrewrittenas ( [IIYjll ) ::;;[P (lI<1>jll>y)y [(lI<I>illll{lhhll>y})'
tlV tlv-l+O/(t+O) tlv-t 1+0 tI(t+ll)
We obtain from Holder inequality ( [IIYjll ) ::;; [P (lI<1>jll>y)y ([II<1>jll) .
( ) IIv t/v-I t+o t/(l+o)
Now t/v-l+o/(t+o) ~ 0 leads to, [IIYjll ::;; y ([ II<I>jll) for i such that
[P(II<1>jll > y) 7= 0; else this inequality is trivial. Thus'W::;; 2 v Q.
Add the previous inequalities to get the lemma with CCt) = c 24v, if t ~ v - O. For general
t < v, if k ~ v-t is some integer, we get by recurrence CCt) = c 24vk. A suitable constant is
o
thus C = c 24v «v-I)/0+1) .•
Properties: Tools 29
The following computational lemma will also be very useful for our purpose, it is an
extension of previous results in Doukhan, Portal (1983,1987) or in Doukhan (1992).
Y
Proof of Lemma 2. Replacing Y t by Lt leads to Lc = 1, Dc = D(c, e, T) = Lc v 1
c
with Lc = L(c, e, T), for e > 0 and where c;:: 2 is an integer. Note that
Da Db = La Lb v La V Lb V 1 is the maximum value of four terms. Set c = a + b. We
have to bound La by some function of Lc' In order to do this, note that HOlder's inequality
. r 'th (c+E)(a-2)
Imp les, WI u = C _ 2 ,v =
(2+E)(c-a)
c _2
[E IYl a+E ::; ([E IYlc+E)uJ(C+E) ([E IYI 2+E)y/(2+E).
ua a a-2 C+E ~ .sI(l-r). a c-a 2+E
Set r = c(a+E) = = =
C c-2 a+E' and A £." IIY ~12+E ' with s 2: c-2 a+E' It follows
teT
that La::; A (LJ.
s a
Remark that l-r;:: I to deduce that A::; 1 and r::; c so that La ::; (Lc) ~
. Thus
Da Db::; La+b V 1 = D a+b, concluding the proof of this lemma.•
in T and assume that it realizes the maximum distance to the other points tl ,... , tc_1
c-l
L et v = -, = c+e
-, an d r~ =-.
c+e J.l e The IDlxmg
. . .mequality lor covanances YleIds
C • •
uc-u c+e
Also
30 Mixing
u c-u
1I~lIv IIY~ ... Y Ic-}Il::; IIYA:+£ rrIlY~lIc+E'
i=!
Now for each 't = (t!, ... ,tc-d consider the index s for which IIY JI C+Etakes its maximal value to
see that
L
SE T
L IIYslI~+E rrllY~
~ET(s.c-u.r) i=!
IIC+E::;
C-u-! C c-u-l
::; 2 (c - u) srbr I, IIYsll c+E::; 2(c - u) srbr Mc(T),
SET
Proof of Theorem 2. Let T = {I, 2, ... , n}, and assume first that c = 2 q ;:: 2 is an
even integer in assumption (2) and denote by C the bound for the sum of the corresponding
series. Note that
Let r = r(t!, ... ,tc) be the largest interval among successive points in the sequence {t!, ... ,tc },
c+e c+e e
r = tm+! - tm, (r = Max!$i<c (ti+1 - ti»' we set v = m' 11 = c-m and 1;, = -.
c+e
Using the
Davydov inequality for covariance and HOlder inequality yields
Lemma 2 implies the result for even integers, and Lemma I extends it to arbitrary nonnegative
real exponents. +
Remark 2. Assume that IIXJI1:+E::;M, then D('t,e,T)::;n~/2M~, and if the mixing assumption
Properties: Tools 31
~ 't12-l li/('t+Ii) .
£.oJ (r+l) [<Xx(r)] <00 holds then Yokoyama (1980) has shown that there IS a
r=1
constant K not depending on moments of X, with lElX l + ... + Xl
~ K M ncl2. This result
relaxes the previous mixing assumption, however the inequality obtained has not the form of
Rosenthal inequality.
In the case where the random variables are bounded by 1, Yokoyama (1980) shows that
lElX l + ... + Xnl't < K n'tl2 - the author gives there a distinct interpolation argument - under
the weaker mixing assumption L (r+ 1)'t/2-l <Xx(r) < 00. Extensions of this result may be
r=1
found in Nahapetian (1991).
In the case where the random variables are bounded by 1, we use the inequality
in the proof of Theorems 1 and 2 with different values of (~, Jl, v), to get the following result
Theorem 3. Set now R(c, e, T) = Max{L IIYtlC D(c, e, T)j then we have for any
tET
even integer c
lfIX/+ ... +Xn( ~constR(c,e,T).
Here X is either a random field such that L(r+l)Cd-dU+d-1 [aX<r;u, v)~(2+E) < 00 for u, v~2
r=l
with u+v~c,
or X is a random process such that L (r+ 1y-2 [ax( r )]EI(2+E) < 00.
r=l
All the results indicated for random sequences naturaly hold for random fields via evident
modifications of the assumptions.
Remark 3. If Y is a vector valued random field with 't-order moments and values in a
separable Hilbert space H, Theorem 2 holds using Dehling's inequalities for covariances (in
Dehling (1983» and the development given in Doukhan LeOn & Portal (1984)
that for any 'Y:::; 't + E, [ IY t = [Iy t / [ I~ II Y• Thus it is easily shown that
[II, yt:::; c ITI~/2 MaxtETlyg+E' extending Yokoyama's result to Hilbert space valued
lET
random variables.
Remark 4. Let Y be a cj>-mixing stationary and centered random sequence with finite 't-order
moments and such that I, <l>E:(2+E) < 00. Ibragimov (1962) proved the inequality
r~1
[IY 1+ ... + Y nit < Kn tl2, the mixing assumption on the convergence rate of the mixing
sequence is very weak because the author shows that for p-mixing sequences
[IY 1+ ... + y/+o:::; K([IY 1+ ... + y/)(2+0)12.
Yokoyama (1980) shows that such a result does not directly hold for the strong mixing case.
However all the results proved here in the the a-mixing case extend to the cj>-mixing case for
even integers changing only the corresponding assumptions, putting now E =0 and using
mixing assumptions analogue to those of Theorems 1 & 2
r=1
The case of bounded random variables given in Theorem 3 is now given for u, v;:: 2,
u + v :::; c
I, (r+1)cd-du+d-1 [cj>x(r; u, v)]1/2 < 00,
r=1
r=1
We do not know how to extend this result to arbitrary exponents c;:: 1 because Uteev's
interpolation lemma does not work for 0 = O. The same problem is true for Rosenthal
inequalities in Orlicz spaces when strong mixing holds. Bulinskii & Doukhan (1987) prove
such an inequality only for even numbers.
Applications to W.L.L.N.
Remark 5. If Y is a stationary and centered random sequence, the WLLN holds with the rate
n-~/2 for processes with 't+E-order finite moments.
fn (x) = ki ~(r)(x-Xr) estimates the density function of the marginal distribution of the
r=1
process X. Assume that k is a compactly supported function with integral 1 on the real line. The
bias of such estimates converges to 0, uniformly over compact subsets of [R, under a regularity
assumption on f. The variance of this estimate is controlled using Theorem 2 applied to the
Properties: Tools 33
adapted random variables Yn(x) = ~(n)(x-Xn) - [~(nlx-Xn)' The previous results show
n
that Sn(x) =I, Yr(x) has 't-th order moments with order n m~(l-fi(HE))(n); in the case of
r=1
<j>-mixing set e = 0 for even integers 't under a suitable mixing assumption. The loss is
decreasing for increasing values of e. A generalization is given for any moment of such
estimates in Doukhan (1992).
Proof of Proposition 1. The proof of Theorem 2 implies for any integer p, the inequalities
n 2p-l
[II, X /P ~ C 2p Max {na2, (na 2)p} for constants C2p ~ 8 C(2p) + I, CZJ) C m C 2p _m
t=1 m=l
where,
~
C(p) = I, (r+ 1)p-2 cjJl:2 < 00 and 0'2 = SUPn [E Ixi in the uniform mixing case.
1'=1
Note that a2 :0; SUPn ([E IXi)21(2+EJ in the ftrst case and combinatorial arguments lead in the
case of a geometric decrease of the mixing sequences (see Doukhan, Portal (1983), Doukhan,
A p t)4P 4p
Leon & Portal (1984) and Massart (1987)) to C 2p :O; ( --€ and C2p :0; (A p)
I-a
respectively for constants A> 0 and 0:0; a < 1. Now Markov's inequality implies that if
n0'22:: 1, then for a constant B > 0
n B 2 2p
!P(II, Xtl 2:: XCO'I-€{Ii) :0; (7) in the strong mixing case and
t=1
n B 2 2p
!P(II, X tl 2:: x 0' {Ii) :0; (7) in the uniform mixing case.
1=1
Optimizing the previous inequalities implies the result; see Doukhan, Portal (1983 a, 1987) and
Massart (1987) for more details .•
Statuljavichus & Yackimavicius (1989) give very precise similar results using the cumulant
sums technique. This method is developed in Nahapetian (1991).
We now indicate a result due to Collomb (1984), the proof of this result is based on a direct
grouping argument
I cfi n <
00
IXtl ~1 such that 00, b- 1 = 8 (1 + 4 I, cfi n), a = 2 exp 3-{e if n and x satisfy
n=1 n=1
2 O'{n cfik 1
n 0' ;? 1, 0 ~ x ~ 8 b kn for kn = Inf{k; k ~ ,/ then for any x;? 0
n
!poI, X tl ;? x ,{Ii 0') ~ a exp{ -b x 2 }.
t=1
Remarks 7. A close result is shown in Bosq (1975). This result was extended in Carbon
(1983) to the strong mixing case. We do not state it in its complete form because of its
complexity. We only present it for examples of decay rates of strong mixing sequences. For
any 0 < a < 1, there is some b > 0 such that for n big enough,
n n
if a) an:o; v e for O:O;v < 1 then !P(II, Xtl 2:: x ,{Ii):O; 2 exp{-bn l12 - a x)),
t=1
n
if b) an:o; v n for 0:0; v < 1 then !P(lI, Xtl 2:: x "Ii):o; 2 exp{ -b n- aJ2 x)} and
t=1
n _ I-a
if c) an:o; n- v for v> 0 then !P(II, Xtl 2:: x" n):O; 2 exp{ -bx l~}.
t=1 ..[ii
Assumption (a) seems inadequate but it leads to a good result while the more frequent case. (b)
leads to a loss n€ in the exponential factor of Theorem 1 and a gain replacing by x. A recent -vx
result due to Bosq (1991) using the reconstruction results of § 1.2. improves sharply the
inequality of Carbon (1983). The main problem in this inequality seems to be a minorization
Properties: Tools 35
assumption. The equilibrium is not good for a O(Jn) deviation in the case (c). Moreover the
We now present results which seem much closer to their independent analogues for the case of
~-mixing and <j>-mixing processes. In order to fix some ideas, let us first set the general
assumptions. The sequence Xl' X2, ... is a sequence of real valued random variables assuming
some of the forthcoming assumptions.
(i) VtEIM*,[EXt=O.
(1'1') :::I
:J (j
2 E IR+"'"
*, v n, mE ",,*
U1
1 fLeX n+···+ X
: ill LL
)2 <
n+m - (j 2 .
(iii) V t E 1M * , IXtl ::; M.
(iv) V t E 1M', [E1Xl::; Mr.
Remark 8, (ii) is the only assumption related to stationarity. Note that it holds if (i) and the
bound (j2 = (1 + 2 L
a O~(2+O») M 20/(2+O) SUPn ([EX~)2/(2+O) < 00.
n=O
If (iii) is not assumed but (iv) holds, then the M 20/(2+O) factor must be replaced by
M 2ro/(r-2)(2+O). We do not present an explicit use of condition (iv), however a truncation
argument and the use of the Borel-Cantelli lemma yields classically to exponential inequalities
under this assumption. An explicit example ofthis technique is given in Uteev (1985).
Now if (iii) is satisfied and an ::; a e-bn for some positive constants then (ii) still holds
[EX2
for some constant c only depending on a, band M, (j2 = C SUPn n for any
[1 v ln~(
[EX n
w> 1. That may be shown using Remark 3 of § 1.2.2 and Bulinskii & Doukhan (1987 :
proposition 5, 1990; formula (19).
A better result may be shown by a direct argument following the previous remark. For
this optimize the previous bound with respect to 0 and note that f a~/(2+O)
n=O
::; const. . We
0
2 [EX 2
get the bound (j = const. SUPn n 1 .
1 v In--
[EX 2 n
Those assumptions naturally extend to the case of a random field (Xt)tE ;t'd, assumption
(ii) becomes ~ [E (L xl::; (j2. The previous remarks still hold. In the case of a p-mixing
A
random field, we set (j2 = [L Cd n d - 1 pen; 1, 1)] SUPt [EX~ < 00. In the case of an a-
n=O
36 Mixing
Using now the same tools as Bosq (1991), we get, using the classical Bernstein
inequality, the following Bernstein inequality in which only a loss of In n is observed in the
geometric mixing case
Theorem 4. Assume the sequence to satisfy the f3-mixing condition, and (i), (ii), (iii), then
for any £ > 0, let (} = ~ and for any nonnegative real number q !> ~
1+(}
we have:
A first reduction for the Proofs. Set x t = X[t-ll' the continuous time process x = (x t)
obtained satisfies of course, let s and t be nonnegative real numbers then assumptions write
1 s+t
f
now: (i') Ixtl::; 1, (ii') IE x t = 0, (iii') tIE ( Xu du)2 ::; 0'2 and for any dependence
S
Card{zE£d/n<lIzlI<n+l). d-l. . .
4 In fact cd ;::: Maxn>o d-l ' for mstance cd = d 2 IS a convement chOIce
n
when IIzll = max1:>j:>d IZjl if Z= (zl ,... , zd)'
Properties: Tools 37
v
measure c, Cx,t::; cX,[tj-l' We shall denote by T(u, v) the integral T(u, v) = J x t dt. The
u
alternative assumption to (iii), (iv) IEIXl::; 1 for some r> 2 becomes (iv') IElxl::; 1. We
shall write a = 1 under assumption (iii) and a = nllr under assumption (iv). Moreover a
rescaling allows to consider only the case M = 1.
This transformation will allow to use blocking techniques without divisibility precautions.
n
Let Sn = f Xu du, define for any integer 1 and any real number 9, q = 1(1+9)
o
_n_ ; we set
i(q-l)(l+9)+q iq(1+9)
Ui = f Xu du and Vi = f Xu du, for 1::; i ::; 1.
(i-l )q(l +9) (i-l)q(1 +9)+q
I 1 I
Then Sn =L (U i + Vi)' Set An = LUi' Bn = L Vi then for any u, v > 0
i=l i=l i=l
!P (ISnl ;?: u + v) ::; !P (lAnl ;?: u) + !P (IBnl ;?: v).
Proof of Theorem 4. In the ~-mixing case, apply the reconstruction Theorem 2.1.1 to get
independent random sequences (U;)l$i$I' (V;)l$i$1 such that !P(U; "# U i) ::; ~[eqj-I' U; has
the same distribution as Ui (resp. !P(V; "# Vi) ::; ~[eqj-l) and V; has the same distribution as
I I
Vi (5). Bernstein's inequality applied to those sequences yields with;\; = L U;, B: = LV;
i=l i=l
2
!P(i;\;1 ;?: u) ::; 2 exp { - 2u )}, and analogously
na 1
2(--+-qau)
1+9 3
!P{lBn* l ;?: v) ::; 2exp {
- V 2
2}.
n9a 1
2( 1+9 +3 q9av )
5 For this, use the induction technique and apply theorem 1.2.1. at each step r, using the random variables X,
Y defined as
X = (U',
1 U'»l<'<
J _I_f
and
38 Mixing
n n 1~
then 1P(f') ~ 2 q ~[9ql-I' Hence 1P(ISnl ~ x) ~ - ~[9ql-1 + 4exp{- 2 } for the
q 2(ncr2 +3"qx)
case of bounded random variables.•
The case of a random field follows analogously. Here the mixing coefficient is a function of the
nl nd
cardinality such that ~(n-I; a, b) :::; (aAb)r cn' Set Sn = .L. ....L.
Xil' .... id for
11=1 Id=1
n = (n I'"'' nd)' Using the same arguments as before this may be rewritten
nl nd 2d
Sn = J... J xSI" ",sd ds I'" dSd and now Sn = ? Aj where the Aj's are integrals of xt on
j=1
, 6 6 p (j)
the union of ld-rectangles with an area less or equal to d N for j > 1 (= -d- N for
2 2
1 ~ p(j) ~ d) 6~ N for j = 1 for N = nl ... nd; moreover the previous
and equal to (1 -
2
rectang!es are separated at least 6q (for j fixed); thus there are (Ui.j)ISjS2d.ISiSld where
I
A·j = k'" U··
I.j and for instance
i=1
(i l -I)(I+EI)ql+ql (i d-l)(I+EI)qd+qd
Ui •1 = f .. ··..
(i l -I)(I+EI)ql
f
(id-I)(I+EI)qd
Xs dS
ifi -7 (i l , ... , id) is a bijection {I, ... ,ld} -7 {I, ... , l}d. That means that the reconstruction
techniques still enable us to provide the representation of Aj as Aj =Aj outside of a set of
N
probability 2d _Q ~([6q]-I; [2dQ], N) with qi =~, Q = ql ... qd and q = mini qi'
(1+6)1
6 the 0(.) terms come from the fact that I =(1 :a)q E IN.
Properties: Tools 39
if j:f. 1, x = u [1 - 2 d ...
-\JI ~],
(8+1)
if j = 1. Hence Ie = 1 - £ for 0::0; £ ::0; ~e if
The following result comes essentially from Lin (1989). The loss with respect to
independence is now no more a function of n but only a logarithm of Ci in the case of a
geometric decay of the mixing sequence
Theorem 5. Assume the sequence to satisfy the l/J-mixing condition, and (i), (ii), (iii), then
there are constants a, b > 0 with
n
a) IP(II, X II :2: x-vn) ::; a exp{- b x 2} if the sequence n l/Jn is bounded and,
1=1
n_ bx2
b) 1P(II,XII:2:x-vn)::;aexp{- c;2}iflimn-7oonl/Jn=O, and where
1=1 c;2 + x M lfI( )
{iz
lfI(t) = Inf{p E f/'/; p l/Jp ::; t}.
Note that if l/Jn ::; u v n for o::;v < 1 and u > 0 then lfI(c;2) = C In Ci- 1 and if l/Jn::;u n- V
for v > 0 then lfI(c;2) = C c;21(J-v) for some constant C only depending on u and v. In
inequality b), the constants involved do not depend on M and d.
Proof of Theorem 5 (7). Let x in [0, 1] and q = [~] we first group the random variables
{XI"'" Xn} in 21 blocks with 1 the integer such that 21 - 1 < ~::o; 21.
2iq+q
Set Ui = I, X k (see 8), Tn = I, U2i and T~ = I, U2i + l , it is clear that Sn = Tn + T~.
k=2iq+! i i
Now, In [Eexp{ x Sn} ::0; 21 (In [Eexp{ 2x Tn} + In [Eexp{ 2x T~}). We bound one of these
terms, since bounds will be analogous for both (note that stationarity is not assumed and thus
m
the terms are not equal). Set ~ = L, V 2i and Lm =In [exp{2xZm }, the inequality (3) for
;=1 .
covariances in Theorem 3 in § 1.2.2. yields, with p = 1 and q = 00
Lm+l :,; Lm + In (24)q e 2qx + SUPt>O [exp{2x V t }) and thus 1:,; ~:,; nx so that,
In [exp{2xT o }:'; n x In (24) q e 2qx + SUPt>O [exp{2xV t }).
The first point follows from a rough bound In [exp {x So} :,; c n x 2 for 0:'; x :,; 1 and
Markov's inequality. For the second point assume first that x 'l'(cr 2) :2: 1 then
In [exp{xS o } :,; 20nx 2 cr2. Else, we obtain In [exp{xS o } :,; 20nx'l'(cr2) .•
Remark 9. Introducing a disequilibrium between even and odd blocks leads to an inequality
of the same form, where b is as close as we want from !, as in the case of Theorem 4. Li has
achieved this in a chinese written paper quoted in Lin (1989).
Remark 10. The interest of such sharp inequalities is to get a Bounded Law of the Iterated
S
Logarithm I~___ 0 < C a.s .. The Law of the Iterated Logarithm may be
-/2 n In (In n) cr 2
found in Nahapetian (1991).
Maximal inequalities are the basic tool to get almost sure asymptotic results, such as Laws of
Large Numbers, Laws of the Iterated Logarithm and Strong Invariance Principles. Since
martingale maximal inequalities do not work in the weakly dependent case, we present suitable
partial inequalities. We first recall the general results in Moricz, Serfling, Stout (1982) for the
case of sequences and in Moricz (1983) for the case of random fields. They yield maximal
inequalities from suitable non maximal inequalities in § 1.4.1 and 1.4.2. Their application in
the mixing case is easy. After that, we derive maximal mixing inequalities for mixing
processes, generalizing Ottaviani's inequality in Reznick (1968) and proved by Massart (1987).
we get the result with the constant A(n) =(1 - QaJ1: 2-<a-l)/1:f1: if a > 1 and if a =Q = 1
the result is shown by recurrence over v for n = 2v using the first inequality and
In 2 n
A(n) =(1 + lil2)'
b) Choose q> 1 with B =X(q) ::;; ~ and set A =2P for ~ + ~ = 1, the result holds
for n = 1, assuming that it holds for n < N we prove it for n = N.
F or 1 <
_m< _ N ,we h ave IE e tM(I, N) < _ IE e tM(I, m-I) + IE e tIS(l, m)1 e tM(m+l, N) , an d
we choose m as in a). HOlder's inequality implies with the induction hypothesis
IE etM(l, m-l)::;; (IE etqM(I, m-l»l/q,
IE etM(I, m-I)::;; (AKeB«!l(qt)g(l, m-I»lIq ::;; A IIq KeBQ«!l(qt)g(l, N)/(2q),
IE etM(I, m-I) A IIq KeB«!l(t)g(1, N)I2, using the definition of q, and analogously,
IE etIS(I, m)1 etM(m+I, N)::;; (IE etpIS(I, m)I)lIp (1E etqM(m+I, N»lIq,
IE etIS(I, m)1 etM(m+l, N)::;; (Ke«!l(pt)g(l, m»l/P(AKeB«!l(qt)g(m+l, N»lIq,
IE e tIS(I, m)1 tM(m+I, N) < AlIqK Bg(l, N)[«!l(pt)/(pB)+2«!l(qt)/(qQ)]
e _ e ,
IE e tIS(I, m)1 e tM(m+I, N) <
_ AlIqKe B«!l(t)g(1, N)[X(P)/(pB)+X(q)/(qQ)]
,
42 Mixing
c) The proof is similar and may be found in Moricz, Serfling, Stout (1982) .•
field and M(R) = Maxl~ IS(R(b, k»1. We say that a function f defmed on rectangles is
subadditive, if f(R) ~ 0 and f(R 1) + f(R 2) S; f(R) for rectangles R, Rl and R2
with Rl = R(b, m') and R2 = R(b', m") if R = R(b, m) and where we define b', m' and
m" as b'=(b 1 , ••• , b j _1, bj+Pj' b j + 1, ... , b d), m'=(ml, ... ,mj_l,Pj,mj+l, ... ,md) and
m"=(ml, ... ,mj_l,mrPj,mj+l, ... ,md) for IS;Pj <mj' 1 S;j S;d.
[£ [M(R)/ ~ 3d ('Z'-l) f(R) (h'(f(R). m)/ and [£ [M(R)]'Z' ~ (~/ feR) (h'(f(R). m)/ with
ln2 m] ln2 md
Applications to S.L.L.N.
Remark 12. If Y is a stationary and centered random sequence, then SLLN holds with the
rate n-'t12 for processes with ('t+E)-order finite moments.
Loss with respect to the independent case is small for big 't and small E.
assumptions.
then if either the sequence is rfJ-mixing or it is a-mixing with an = O(n- r) for some
r > 2 + 8, there exist positive constants C, c with
(~- a) fP(maxlsksn {lSkl} > x) ~ /P(ISnl > x - 2y{Ti) + Cn- t, for x> 0, y> c and
. I
respectIve y t ="28.In the ,/, "
'1rmlXlng case an
d 0 r(2+8) l' h ..
< t < 2(2+o+r) - In t e a-mlxmg case,
Proof of Theorem 8. In the <I>-mixing case let m = min{n; <1>0 ::;~} and
2m2+1i 21i 0-2t l+t
C = 4( - ) [E IX,I + and in the a-mixing case let ~ = - - , s = - (s < r) and
c 2(2+0) ~
Doob (1953), Ibragimov and Yokoyama (1980), and Yoshihara (1976) proved inequalities for
arbitrary moments of sums of <1>, p or a-mixing sequences in a form where constants are not
explicit or not useful to work with triangular arrays. Billingsley (1968) obtained the Rosenthal
bound for the fourth order moment of a <I>-mixing sequence. The same technique led Doukhan
& Portal (1983 a, 1987) and independently Uteev (1985) to obtain the Rosenthal inequality for
strong mixing sequences. This was extended to random fields in Doukhan, Leon & Portal
44 Mixing
(1984) and Orlicz space analogues are proved in Bulinskii & Doukhan (1987). The case of the
variance of sums in random fields is worked out in Neaderhouser (1978).
The first exponential inequality was probably given in Blum, Hanson and Koopmanns (1963).
They proved a Hoeffding's inequality for 'V-mixing random variables. Bosq (1975) obtained a
first <j>-mixing exponential inequality who proposed exponential inequalities through a direct
proof. It was improved in Collomb (1984) and in Carbon (1983). Gyarfi et al. (1989) use them
extensively for non-parametric curve estimation. After that, Bosq (1991) used the
reconstruction results to get a sharp exponential inequality . The same trick is used here to get
l3-mixing Bernstein inequalities. In every previously recalled work, the exponential inequalities
are proved using grouping arguments. The Rosenthal moment inequalities led Doukhan &
Portal (1983) to a unifonn exponential inequality presented in Proposition 1, this inequality is
extended to the case of random fields in Doukhan, Le6n & Portal (1984) and improved in the
strong mixing case in Massart (1987). Analogue results are given in Uteev (1985). Now Lin
Zhengyan (1989) seems to give the best exponential inequalities known for strong mixing or
<j>-mixing sequences.
General maximal inequalities are given in Moricz (1983), Moricz, Serfling & Stout (1982) and
Serfling (1968). Ottaviani's inequalities for mixing sequences proved in Reznick (1968) may be
found in Rosenblatt (1956), Wolkonski & Rozanov (1959), Ibragimov (1962) under weaker
fonns or BiIIingsley (1968), Iosifescu & Teodorescu (1969), Mac Leish (1975), Yoshihara
(1978), Hall & Heyde (1980), Yokoyama (1980) and Uteev (1984). They are improved in
Doukhan & Le6n (1987) and Massart (1987). Recall also that Nahapetian (1991) proves
various limit results which develop the ones proposed here.
Properties: Central Limit Theorems 45
(1) the sequence X is ¢-mixing, I. 1f/~2 < 00 and IEIX l l 2 < 00, or
n;=()
(2) the sequence X is tfJ-mixing, I. tfJ~+1)/[2+Jl < 00 and IEIXl I2 + O< 00 for some 0> 0,
n=O
(3) the sequence X is a-mixing, I. a~[2+81 < 00 and IEIX l I2 + O< 00 for some 0> O.
n;=()
n
the expression (5~ = n [IX jl2 + 2 I. (n - k)[ X j Xk and from the use of the covariance
k~j
inequalities of § 1.2.1.
Assume that (52) 0 then Ibragimov (1962) shows that CLT holds under condition (3).
Moreover Ibragirnov (1975) proves the CLT if ~ =n L(n) for some slowly varying function
L and either I.
n=O
p ~n2 < 00 or [IX j I2+1l < 00 for some () > 0; he conjectures that CLT still
holds if the p-mixing assumption replaces the last one. Billingsley (1968) shows that CLT
holds under condition (1). Mac Leish (1975 b) shows that an invariance principle holds under
condition (2). We also recall the results by Serfling (1968) based on martingale techniques. Let
X = (Xt)tE £. be a process centered at expectation and such that [IXa+j +".+ Xa+nl2 behaves
like nA for big n and uniformly with respect to a. Then the mixing assumption implies CLT.
Remark 1. Note that Bradley (1981 b) provides a sufficient condition for such linear growth
46 Mixing
of partial sums of a stationary process. Let (~) be a stationary real valued process, set as usual
n
Sn = L Xt· Set now rk = SUpy z ICorr(Y, Z)I for the maximal correlation of random
t=1 '
variables Y = L at Xt and Z = L
b t Xt defined with the help of real valued sequences at
t:::;T t<':T+k
and b t with a finite number of nonzero elements. Then a sufficient condition for n- 1 Var Sn to
assumption is easily provided for r-mixing processes since rk ::; Pk (see § 1.3).
A result due to Oodaira and Yoshihara (1972) (See the bibliographical comments for the
paternity of Theorem 1) relaxes the previous conditions. It makes use of the deep ergodic
theoretic argument of Gordin (1969 a).
L a;I[2+8] < 00, LEIX]1 2 + 8 < 00, for some {) > °and rr = LEIX]12 + 2 i LEX ]Xk > 0,
n=O k=]
then the sequence of processes {S[ntlan; t E [0, J]} converges in the Skorohod topology to
a standard Brownian motion Won [0, 1J.
Recall that the Donsker CLT holds for any independent and identically distributed
sequence if the random variables have finite variance, see Petrov (1975). Note that no strong
mixing condition (I) implies CLT without an assumption like IEIX/+1i < 00 for some 0> O.
That is quite natural in view of the inhomogeneous covariance inequalities in § 1.2.2. and
Herrndorf (1983) gives an example of a strongly mixing stationary sequence with arbitrary fast
decay of the mixing sequence, IE IX 112 < 00, and such that CLT does not hold.
It is still possible to generalize the moment assumption IEIX/+1i < 00 to IE'P(IXII) < 00 for
convex functions with 9(x) = x- 2 'P(x) increasing and limlxl~oo 9(x) = 00. Under the
mixing condition L un 'P-I(U n) < 00, CLT holds (2). If 'P(x) = x2 Ina(Max{x, b}) (with
n=O
b > b(a) big enough), then CLT holds if a > 1 under the assumption of a geometric decay of
the mixing sequence. Now if the function 'P increases very fast at infinity, the condition
L un 'P- I(Un) < 00 is as close as wanted to the mixing condition for bounded random
n=O
variables L un < 00 since 'P- I has a very slow behavior at 0 ( e.g. 'P(x) = e ax yields
n=O
'P-I(u) = InaU ). As a final consideration on this topic, a necessary and sufficient condition
1 The sense of a strong mixing condition is the one defined in § 1.3.2. This is Ibragimov's conjecture.
2 Use the result in Herrndorf (1983) and the covariance inequality in Orlicz spaces from Bulinskii (1987,
1989).
Properties: Central Limit Theorems 47
for CLT to hold for a strong mixing process is given by Jakubowski & Szewczak (1990). It is
based on Gordin's argument. We do not recall it since its assumptions cannot be checked using
only the existence of moments and the decay of the mixing sequence.
In Doukhan, Massart & Rio (1994), we give an optimal result [see (3)] under the strong
mixing assumption, at least for an arithmetic decay of the mixing sequence. The CLT problem
is described under alternative mixing assumptions in Nahapetian (1991). For instance, it holds
under 'Jf-mixing under the additional assumption ~ ;::: en.
The case of a p-mixing sequence is of interest since very weak assumptions are assumed for
CLTto hold.
b) If L P n<
2 00, X has a continuous spectral density f, ~ = 2;mf(0)( 1+0(1)) and
=1
frO) .r0, then the sequence S,(an converges to a standard Gaussian random variable.
If X = (Xt)tE Zd is a real valued stationary centered at expectation and mixing random field,
things do not work the same way. Let An be an increasing sequence of finite subsets in ;Zd. Set
Sn = I, X t' now (J;;- = [E ISnl2 behaves like IAnl if the regularity condition
tEAn
la~1
li~~ I~I = 0 holds [see (4)] as well as a condition analogue to (2)
. laAnl
(4) ltmn~ ~I = 0,
the field X is a-mixing an,Q,b = o(n-d ). for some l5 > 0 LfIX112 + 0 < 00 and for a + b 54
3 Set Q(u) = Inf{t; [P(lXol > t) ~ ul, and a-I for the inverse function of the mixing sequence, then the
I
condition f a-I(u) Q2(u) du < 00 implies the conclusion of Theorem I. Under the moment assumption of
o
Theorem I, the Donsker CLT holds if:i: n2/ O an < 00. This assumption is weaker than the previous one. If
I
IE'I'(IXII) < 00 then the functional CLT holds as long as L 1;(n) an < 00 where 1; denotes the inverse
I
function of '1".
4 It means that the sequence An does not increase in only one direction and dAn denotes the border of An'
48 Mixing
~ d-J
£.., n an,a,b < 00.
n=O
Set
Remarks 2. The case of random fields is considered in Gorodetskii (1984) for various mixing
conditions. Bradley (1992) proves a CLT for a centered and weakly stationary random field X
without any mixing rate assumption. Assume that px(n;=,oo~..:;!, 0 and the spectral density f
of the random field satisfies, then LX/II L Xtll 2 is asymptotically normal. A sufficient
[l,njd [l,njd
condition for the second assumption to hold is r(l) < 1 and r(n~..:;!, 0 where we denote (see
Remark 1) r(n) = SUPy,Z ICorr(Y, Z)I with Y = L at X t and Z = L b t X t for finite
U V
subsets U and V in Z. d distant at least k.
The assumption of stationarity has mainly a technical origin and it seems that the previous
results may be be extended to the non stationary situation. For instance Guyon (1992) extends
the Theorem 3 to the non stationary case. The following result is also a consequence of the
theorem 9.6. in Bulinskii (1990), p. 84.
Theorem 3bis• Assume that there is a 0> 0 such that the following conditions hold
~ d-J M2+1i}
£.., n an J J < 00, and
n=O ' ,
~ d-J
for a + b:::; 4, £.., n an,a,b < 00,
n=O
then limsuPn~~ IAnl- J Lie ov (Xi' Xj ) I < 00.
i,jEAn
If moreover, liminfn~~ IAnrJ ~ > 0, then the sequence S,/Gn converges to a standard
Gaussian random variable.
o<c?<oo.
If an = O(n-/3(2+0)(I+O)/02)for some f3 > 0 then L1n = O(n- O(/3-1)/2(/3+1)).
If an = O(e-/3n)for some f3 > 0 then L1n = O(ln1+0 n n- (12 ).
If Pn = O( e-/3n)for some f3 > 0 then L1n = O(ln l +OI2 n n-(O/\1)12).
If the mixing coefficients satisfy a(n; u, v) = D(e- an ) uniformly with respect to u, v and
[E IX t I2 +O< 00, then Guyon & Richardson (1984, theorem 2) prove that 6n' defined as
previously, satisfies
If the strong mixing coefficients satisfy a(n; u, v) = D(n- a) uniformly with respect to u, v
and [EIX t I2+1l < 00, then Guyon & Richardson (1984, theorem 3) prove that
6n = D(a n-X) with X = (oA1) 2(b-1) and b a 0 (oAl)
2b+oAI 2d(2+o)«6+1)A2)
Bulinskii (1987) proves that, in the case where the mixing sequence a(n; u, v) depends in a
polynomial way of u and v, the results are analogous. This extension includes the case of a
linear growth with respect to u and v proved in Takahata (1983).
In Bulinskii & Doukhan (1990), we investigate the case of low moment assumptions of the
kind [EIXlln o (IX t lv1) < 00. For geometrically mixing stationary random fields with
a(n; u, v) ::;; (u+v)b e- an we get 6n = D(ln- O/2 an). In the \jI-mixing case with
\jI(n; u, v) ::;; (u+v)b e- an , the convergence rate obtained is the one given in the independent
and identically distributed case by Petrov (1975) ~ = D(ln-oan). Note that such convergence
rates are really useful. Indeed as remarked in Reznick (1968), the rate 6n = D(ln-(1+E) n)
yields the LIL for E > O. This idea is also used for instance in Dehling (1983) and in
Doukhan, Le6n (1989). Finally the results essentially still extend to the non stationary case as
shown independently in Bulinskii (1990) and Guyon (1992).
In this section we consider a generic case. Let Z(n) = (Z~>tET be a sequence of [Rd valued
50 Mixing
random processes, then their finite dimensional distributions take the form
Z(n){t l , ... , t h } = (~~, ... , Z~). We shall make use of the Prohorov metric (5), 1t, between
two probability distributions, which is compatible with the convergence in Probability. The
classical proofs of functional limit theorems (in distribution) always have two steps (6). The
first one is the convergence of finite dimensional marginal distributions and the second is the
tightness of the sequence of processes considered. The problem is analogue if one wants to get
a rate of convergence. T is a Polish space, assume that for any fmite subset U in T, Z(n){U}
converges to the finite dimensional distributions Z{U} of some continuous process Z. The
distance in the metric of uniform convergence between Z(n) and Z is bounded by the oscillation
of both processes on balls of T with radius £ > 0, and the sup bound of the distance of their
finite dimensional marginal distributions over subsets U with cardinal h = h(£) such that h
balls with radius £ cover the set T
1t(Z(n), Z)::;; supu 1t(Z(n){U}, Z{U}) + 2~
Let Z ::; (Zt)te T be an [R d valued and strong mixing random process. The finite dimensional
distributions of Z take the form X = (XI' ... ' xh) = (Ztl' ... ' Zth)' where [Elx/+ l3 ::;; M2+l3 if
for any t in T, [EIZ/+l3 ::;; M2+l3. Let F = (ff)j~ be a filtration such that for some C > 0 and
o :; a < 1
(1) an = Sup a(ff~, ff j:n) ::;; C an,
where ff{ denotes the a-algebra generated by ffi' ff i+ I , ... , ff j • Now X~ is an [Rk-valued
(equipped with its Euclidean norm) and ffrmeasurable random vector such that the process
(X~)j;::1 is stationary for any k and such that for some ~ > 0
k
(2) IIXj 112+l3 < Ak where Ak 2: ao > O.
This implies that for some constant C, only depending on the mixing coefficients, and any finite
subset ofll'l [See (1)]
~ k 2+,,( 1+"(12 2+,,(
(3) [Elk Xii ::;;C{Card(l)} Ak if~>'Y.
ie I
5 The Prohorov distance between two probability distributions P, Q on a Polish space E metrized by d if
defined. if AE = {x e E; d(x. A)!5: el. by n(P. Q) =Inf{e; "V A. p(AE)!5: Q(A) + el. Borel subsets A
may restricted to the closed ones.
6 Excepted for Hungarian constructions; see Komlos et al. (1975).
7 In view of Theorem 1.4.2. an assumption less restrictive than (I) for (3) to hold is I. r2
r=l
IX ~(4+£) < 00 for
some e < l3 - "(.
Properties: Central Limit Theorems 51
Denote by 1tk the Prohorov metric between two probability distributions in [Rk equipped with its
euclidian norm. In this section we bound 1t(n) = 1tk(n)(2J (S~(n))), ciY' k(n)(O, Lk(n))).
ciY' k(O, L) denotes the k dimensional Gaussian probability distribution with covariance L.
Define two sequences converging to infinity, p = pen), q = q(n) and 1= [p~q]. We shall
omit the index n, writing k for ken), S = s~(n) and A for Ak(n) and c will denote a positive
constant which may change from one inequality to another. Also 1tk(2J (A), 2J (B)) will
simply be denoted by 1tk(A, B) for two random variables A and B with distributions 2J(A) and
2J(B).
We may group the random variable p-112 X~ in such a way that S = U + V where V is the
I
sum of less than lq+q such terms and U = L Uj is the sum of q-distant random variables
j=1
such that U. has the same distribution as U 1 = S~. Using the Berbee & Bradley reconstruction
results of §1.2.1., we consider the sum of independent and identically distributed random
I
variable U* = L Uj where Uj has the same distribution as U I and if'Y = 8 - £. [see (8)],
j=1
2 (1EIU.1 2+Y)I/(2+y) (2+y)/(2y+5)
[p (lUj - Uj I ~ e) : :; 18 (u q J e ) .
If we balance both terms' we get
* < (2 2+y 1/(2+y))(2+Y)l{(2Y+5)(3Y+7)}
1tk(U j ,Uj )_c uq(IEIU} ) .
Now 1tk(U*, U) :::; 11t k(Uj, Uj ) thus according to (3) (Resp. see (9))
U 4 A2)(2+Y)/{(2Y+5)(3Y+7)}
(5) 1t k (U*, U):::; c (i T .
2+y 1/(3+y)
In the same way 1tk(S, U) :::; (IE IVI) :
(6) 1t k(S, U) :::; c [A 2(~ + t)](2+y)/{2(3+yn .
Let W, W', W" be respectively ciY' k(O, L\ ciY' k(O, L~), ciY' k(O, \¥- L:) random variables.
Following Doukhan, Le6n & Portal (1985) yields 1tk(W, W"):::;(i - ~) IIL~III + IILk - L~III
and
(7)
Assuming that ~ A 2 is bounded and p2 = o(qn), we see that the term (6) is neglectible with
respect to the term (7). A truncated version of Yurinskii's result, given in Massart (1987),
yields 1tk(U*, W") ~ c [In 1/2 n + In 112 (lIE IV 11 2+Y](k lIE IV 11 2+Y) 1/4 if 0 < Y ~ 1
(8) 1tk(U*, W") ~ c In 112 n kl/4 n- y/B A (2+y)/4.
Thus
(9) 1t(n) ~c {ln l12 n (Q/ BA (2+y)/4 + (~)(2+Y)/(2(3+Y))} + 1tk(U*, U).
kl/4
n p
[ y -2 2(2+y) (3+')')/(8+7,),+,),2)] . .
Set q = [Q In n] and p = {n k A· } then If Q bIg enough the term
(5) is neglectible and
(10) 1t(n) ~ c In 1/2 n {n -y/2 k A 2(1 +Y)/(3+Y)} (2+')')/(8+7,),+,),2).
. -u -2(3+y! 2(2+y)(3+y!
Theorem 5. Assume (1), (2), and (4). If llmn --7=n k A = 0 for some
u > (8+r-r)12 then (10) holds. For r= I if limn --7=n- k- A
u 2 6 =
0 for some u> I then
7r(n) ~ c ln1l2 n n- 3132 k 31 / 6 A31/6.
Remark 3. The control of 1t(n) is effective if A + 2(I+y)J.t < (3+y)12 when ken) = [nA.]
and Ak(n) = [ni-!]. If y = 1 this condition is rewritten A + 6/-1 < 3.
If one gets a coordinatewise control of X~ = (~L I'"'' ~~,k)j;::I' that is lI~f,hIl2+0 < M for
h = 1, ... , k, then Ak < kll2 M and the last condition is rewritten A < (3+y)/[2(2+y)]. If
y= 1, A< 112 the dimension must increase more slowly than {n.
In the independent case Yurinskii (1977) obtains 1t(n) ~ c In l/2 n k 51B n- IIB while the result
. 112 9116 -3/32
wrItes here 1t(n) ~ c In n k n .
We now present generalizations of Theorem 5 which extend also the results in Doukhan, Le6n
& Portal (1985).
Arithmetic mixing cases. According to the footnote before inequality (3) we have to
• ~ 2 £/(4+£)
assume the convergence of the senes £... r a r < 00 for some e < () - y.
r=1
a-mixing : If qa a q is a bounded sequence, set q = [nh], h = (3y+ 7)(2 y+5), then a factor
4a(2+y)
n h(2+y)1 (2(3+y)) appears in inequality (10), that is for y = 1 a factor n35112a. If the random
variable are bounded, () = 00, we also have to assume a> 3.
n 3/8b •
If the random variable are bounded (0 = 00) we also have to assume b > 3.
L Pk < 00. A good bibliography is given in Hall & Heyde (1980) and Peligrad (1986). Rates
k=O
of convergence in the CLT are given by Tikhornirov (1980), Guyon & Richardson (1984),
Bulinskii (1989) and Bulinskii & Doukhan (1990). Note that invariance principles are reported
in Philipp (1985). Such results obviously yield convergence rates in the functional CLT.
Unfortunately, up to now, they are far from being optimal. In the independent and identically
distributed case, recall that Kornlos, Major & Tusnady (1975) explicit the strong rate for the
invariance principle; it is close to n-112.
The case ofCLTwith explicit dependence in the dimension is originated in Yurinskii (1977) for
the independent and identically distributed case) and was studied by Dehling (1983), Doukhan
& Portal (1983, 1987), Doukhan, Leon & Portal (1984, 1985) and Massart (1987). It gives
rise to invariance principles for the empirical measure. See Philipp (1986) for a bibliography
and, for instance, Doukhan & Leon (1989), Doukhan, Leon & Portal (1987), or Massart
(1987) for additional convergence rates. We do not recall here the functional CLT results
obtained this way for the empirical measure.
55
2. Examples
Our aim is in this part to make clear the mixing properties of random processes and
fields classically used in probability and statistics. Reviews concerning examples of mixing
sequences and fields are given in Bradley (1986) and in Iosifescu (1980) or in Roussas &
Ioannides (1987). Unnatural examples and counterexamples will not be considered. As far as
possible, we present explicit conditions on the parameters of the proposed models for a given
decay of the mixing coefficients to hold. In the discrete time Markov case, for instance, we
shall not make use of the properties of potential kernels because they are practically impossible
to verify. In order to avoid repetitions, we shall consider in the same section the random
processes and fields of the same kind.
Section 2.1 contains an extension of the results in Wolkonski and Rozanov (1959, 1961)
communicated by Ibragimov and published in Doukhan & Guyon (1991). It concerns with
Gaussian random fields.
Section 2.2 contains part of the results in Guyon (1986), Georgii (1988) from Dobrushin
(1970), Kiinsch (1982), and Follmer (1988). We present there the case of mixing Gibbs
random fields.
Section 2.3 presents in detail an extension of Gorodetskii (1977),s result published in Doukhan
& Guyon (1991) and concerning the mixing properties of linear random fields.
Section 2.4 contains properties Markov chains with general state spaces. Davydov (1973)
characterized the mixing properties of Markov processes. After a quick review of general
results, we give the sufficient conditions for geometric <II-mixing deduced from Doob (1953)
[see(l)] and geometric~-mixing in Mokkadem (1985-1990) [see (2)]. As it is shown in
subsequent subsections those results lead to results concerning the most familiar times series
see in Mokkadem (1986, 1987), as well as their natural generalizations, see in Doukhan &
Tsybakov (1993) and in Ango Nze (1992).
Section 2.5 gives some properties of continuous time processes. We mainly consider Markov
processes and diffusions. After this, we point out the properties of hypermixing - Chiyonobu &
Kusuoka (1988). Those properties lead to a satisfactory large deviation principle. They are
linked to the mixing properties of Gaussian processes and to more analytic properties of the
Markov processes as Hypercontractivity, Ultracontractivity, Sobolev's logarithmic inequality
and the spectral gap. In this case, the results in Bakry & Emery (1985) yield simple and explicit
hypermixing sufficient conditions.
We consider, in this part, a random field X, that is a family ofrandom variables X =(Xt)teT indexed by
some metric space T and defined on a probability space (n, JI/:., IP), taking values on a measurable set (E, (;)
(E is caHed the state space), usuaHy E =IR and (; = ffi (IR). We shaH denote by ~ (n, JI/:.) the set of
probability measures on (n, JI/:. ) and by J'4, (n, JI/:. ) the set of measures on (n, JI/:.). If T =IR, IN or Z then
X is a process, otherwise it is a general random field.
1 Doob provides sufficient conditions for Doeblin recurrence of a Markov chain. ell-mixing was not defined
there but in Ibragimov (1962).
2 General theory ofR-recurrence for Markov chains from Tweedie (1974) led Nummelin & Tuominen (1982)
to sufficient geometric ergodicity conditions involving additional irreducibility assumptions. Mokkadem
obtain sufficient conditions for those assumptions. Markov chains are thus shown to be geometric ~mixing.
Examples: Gaussian Fields 57
After some general results relating mixing properties of a Gaussian random field, we
propose an explicit bound of the mixing coefficients of such a random field based on the
approximation properties of its spectral density in § 2.1.1. In § 2.1.2. more precise results
characterize the decay of such coefficients for Gaussian processes. In this chapter,
X = (Xt)teT denotes a stationary Gaussian random field indexed by some metric group T.
The first result proves that ljl-mixing condition is in this case highly restrictive.
Proof. We only need to prove the right member inequality for ax(A, B) :::; i- Let 10 > O.
There exist normalized Gaussian random variables x and y measurable with respect to X A and
X B such that r = IExy 2': Px(A, B) - e. For this use Lemma 2.1.1, proved independently
below. Set U = {x> O} and V = {y > O}. A direct computation yields
!P(UnV) _1 + Arcsin rand !P(U) !p(V)_l
- 4 2 1t - 4'
58 Mixing
The inequality Ar~si~ r ::::; cx X (A, B) follows. Hence we have proved that
Px(A, B) - E::::; r::::; sin 21tcx x (A, B)::::; 21tcx x (A, B) .•
continuous function which does not vanish on the torus If d = [0, 21t]d, with Fourier
expansion g(z) = L gt zt. Let Z = (Zt)te ;Z d be a Gaussian white noise. Consider
tEZ d
d, where X t = '"
£... gt-s Zs' Then f(A,) = 1 '"
2
X = (Xt)te ;Z £... gt e iA.tI .
IS clearly bounded
SEZd tEZ d
below by some a> 0 over lfd. Consider the Fourier expansion of f, f(A,) = L c t e iA .t with
tE Zd
ct = L gtgt-s' Noting that Llk(f) = Inf{ IIf - PII",,} ~ IIf - Pk""" ::::; L Ictl where
SE Zd Itl>k
Pk(A,) = L ct eiA.t, Theorem 2 and Corollary 1 yield
Itl";k
cxx(k) ::::; Px(k) ::::; ~ L 1 L gt gt-s i : : ; ~ SUPtlgtl L L Igtl.
Isl;;,k tE Zd Isl;;'k Itl;;'lsl/2
Examples: Gaussian Fields 59
Corollary 2. Assume that X = (Xt)tE;Zd is a stationary Gaussian random field such that
Cov(Xa, Xs) = O(lsr A) for some A > d and the spectral density of X is bounded below,
then ax(k) = O(kd -A).
l. f3X.n < 1
lmn~oo ben) - a'
Proof of Theorem 2. First note that it is enough to prove the following bound for any
k-separated subsets A and Bin ;Zd,
Px(A, B) ~ c(f) dk(f)·
Proof of Lemma 1. We propose here a different proof of this classical result shown in
Kolmogorov and Rozanov (1960). Consider first finite subsets A and B with
IAI = n ~ m = IBI. Up to a linear transformation we can assume the random vectors X A and
X B to be normalized, with CovX A = In' CovX B = 1m' Cov(XA, XB ) = (Dn' 0n,m-n)' Dn
is a diagonal (n, n)-matrix with PI 2 ... 2 Pn 2 0 entries. On m-n is the zero matrix with order
(n, m-n). Canonical analysis results yield such a decomposition. Set A= {i l ,··, in},
B = (jl, .. ,jm}' Let gA(XA)E L 2 (X A), then Hi denoting the i-th degree Hermite
polynomial
60 Mixing
gA (X A) = L
a i Hi(x A), with H/x A) = Hi/xk) where i = (i k; k E A). II
iE!N A kEA
Now [gA(X A) = 0 is rewritten as ao = 0 and
L i- = 1, with i! = kEA
2
[g1(X A) = 1 is rewritten as II k!,
iE!NA1.
2
Analogously gB(XB) E L (X B ) may be represented as
gB(x B) = L bj Hj(x B) = gB(x B) + g~(xB)' with gB(xB) = L bj H/x B)·
jE!N B j=(i,O)
The last summation extends to such j = (i, 0) E [NB'X{O} depending only on {Xjk' k:::; nl,
B' = {h ,... , jn} and to} is an element of [NB\B'. Hence
, ,pi i
[gA(X A) gB(X B) = [gA(X A) gB(X B) = ~ ai b i 7f, where p =
ik
Pk'
II
iE!N 1. kEA
The previous inequality yields
'" la· b·1
l[gA(X A) gB(XB)I:::; PI ..t.... ~:::; PI
A 1.
= [Xi xJ..• 1 1
iE!N
b) Let hEN A (X), <I>s E N B ( X)
Provide L2 with the following scalar products and induced norms
,112 2 2
Then 11<\>11 2 :::; a 1I<\>lIf. Note that [<\>A = II<\>Allfand
<<\>A' 'liB> = f[ LYt ei1 '][P(A)+(f(A)-P(A»]dA = f[ LYt ei1t][f(A)-P(A)]dA,
tEA-B tEA-B
for any (k - l)-th degree polynomial P. Indeed, It I ;::: k for tEA - B [set of differences
between elements in A and in B]. Thus, since II<\> AII2 :::; a'I!2I1<\> Allf:::; a'll2,
<<\> A' 'liB> = (<\> A[f - P], 'liB) :::; IIf - PIUI<\> AI12 II'IIBII2:::; a1 L\k(f)·.
Note, also that px(A, B) may also be estimated without the minorization condition on f. Let
c(A, f) = Sup{ II<\> A112; II<\> Allf = 1} = Sup{ c t2 ; L L
C s ct rt-s = 1} < 00, the previous
tEA s,tEA
proof shows that if the set {f > O} has nonzero Lebesgue measure then
Px(A, B) :::; c(A, f) c(B, f) L\k(f). Unfortunately we do not know general explicit bounds
for c(A, f). Such bounds could avoid the minorization condition on f.
Examples: Gaussian Fields 61
The problem was solved by Helson & Sarason (1967) for discrete time processes and by
Hayashi (1981) and Dominguez (1989) for continuous time processes. Let ~ denote the spectral
measure of a stationary Gaussian process (X t); f..l is defined on [0, 2it] if t E Z and ~ is
defined on [R if t E [R. If ~ is not absolutely continuous, then the process (X t) is not regular
and thus it is not mixing. Lemma 1 and the proof of Theorem 2 imply that if ~ absolutely
continuous with a density w(x)
PX,t = Sup If fl (x) f2(x) e itx w(x) dxl.
The supremum is considered over fi' i = 1,2 which are respectively polynomials in e isx for
s ;e: 0 and in e isx for s :5: 0 with f Ifi(x)1 2 w(x) dx:5: 1. The problem is to characterize the
set W of weight functions w such that limH~ PX,t = O.
In the case of functions on [0, 2it], introduce the set Woof weight functions w such
that for any 10 > 0 they are functions r, s, t in L2([0, 2it], dx) with r continuous on the torus,
"s,,~ < 10, "tll~ < 10 and such that In w = r + s + t. We denote by s the harmonic conjugate
of s if s is square integrable ( 1).
Note that any positive or negative power of such a function Wo is integrable, this may allow to
detect non mixing processes.
Ibragimov & Rozanov (1978) prove that for a stationary Gaussian discrete time
process, ~-mixing is equivalent to the fact that f(x) = IP(eix )12 exp{ L aj e ijx } for some
.i=
polynomial P and some real sequence (aj) with L Ijl la/ < =.
j=
The previous results are optimal. Ibragimov & Solev (1969) give an example of a
stationary strongly mixing Gaussian process which is not absolutely regular. Ibragimov &
Rozanov (1978) also give such an example. It is associated with the spectral density
1 For s in L2 there is a unique harmonic function h with h(O) = s(O) such that Re(h) = s; now s = Im(h).
62 Mixing
f(x) = exp{ f
j=o
2- ljl ei22jx)}.
~ 1 ..
The example f(x) =exp{ .£.. J' In . eIJX } in Ibragimov & Rozanov (1978) shows that
J
.r-oo
discontinuity of the spectral density at the origin does not hold only for long range dependent
sequences (see Rosenblatt (1985» since the corresponding gaussian process is strongly
mixing.
Characterizing the spectral measure for a specified decay rate of the mixing sequence is
another problem proposed by Ibragimov and solved by Dominguez (1989) using the extension
techniques initiated by M. Cotlar (Caracas, Venezuela).
Let vt be a junction decreasing to zero at infinity and JL be the spectral measure of the stationary
Gaussian process (Xt)tE fR; in order that Pt = O(vt) for t ~ 00 it is necessary and sufficient
that there exists ao ~O such that for all a ~ao' there is ra E HI and ta such that
JL{dx) = Irix)1 etix) dx and IItalioo =O(va), IIsalioo =O(v~) for six) =Arg(rix) e-iax).
Dominguez (1990) and Cheng (1992) prove multivariate analogues of this result. Moreover
Cheng does not directly consider gaussian random fields but only second order stationary
random fields. Indeed if one considers linear correlation coefficients all the previous results
extend to this framework and cosines of the angle of vector spaces spanned by a random field
may be attained by using spectral properties.
Yaglom (1963) proposed the characterization problem solved by Helson & Sarason (1967) for
discrete time processes and by Hayashi (1981) for continuous time processes. Dominguez
(1989) characterized the decay of the mixing coefficients for the continuous time processes and
Dominguez (1990) (see also Cheng (1992» proves a matricial extension of the Helson-Sarason
theorem and a characterization of some multivariate linearly completely regular processes.
Examples: Gibbs Fields 63
We begin in § 2.2.1. with Dobrushin theory for random fields defined through
conditional marginal distributions. The comparison result of § 2.2.1.1. yields Dobrushin's
uniqueness condition (§ 2.2.1.2.) and then a mixing condition arises in § 2.2.1.3. The
fundamental example of such random fields is the Markov field case (§ 2.2.2.), it is described
in terms of potentials in § 2.2.2.1. The non compact case is evocated in § 2.2.3. with the
examples of point processes in § 2.2.3.1 and diffusion based random fields in § 2.2.3.2.
We follow here the presentation proposed in Georgii (1988) and in Kiinsch (1982) for the
results of Dobrushin (1970). A simplified version of those results is provided in Guyon
(1992). Let T be some denumerable set - T is called the parameter space, e.g. T = ;Zd.
The canonical version of X is defined on the product space (n, .1/:,) given by n = ET,
I8iT . .
.1/:, = (; , X t( ro) = rot for ro = (rot)tE T' E IS a PolIsh space called the state space. A
probability measure m on (n, .1/:,) defines the distribution of the random field (X t).
Set 0/ = {V c T; 0 < IVI < oo} . .1/:, is the smallest a-algebra on n containing the cylinder
events
I8iV
.1/:,V={{X V EV};VE{; }, V E 0/.
Let (R, ~), (S, 8) and (V, 'lJ) be measurable spaces. A kernel from (S, 8) to
(R, ~) is a function x: ~ x S ~ [R +, such that x(.! s) is a measure on (R, ~) if s E S
and x(VI.) is 8-measurable if V E ~. Ifx(RI.) = 1, x is called a probability kernel from
~ to 8. If A is a kernel from (V, 'lJ) to (S, 8), then A x is the kernel from (V, 'lJ) to
(R, ~) defined by Ax(ZI r) = JA(dsl r) x(ZI s), Z E 'lJ. II.IIVar is the norm in variation
».
of signed measures (see (1 The probability kernel x is said to be proper if R = S, ~ c 8
and x(VI .) = llv. The measures m on (S, 8) are mapped on measures Jl on (R, ~) by the
relation Jlx(V) = JJl(ds) x(VI s). The conditional probability Jl(VI 8') is the conditional
expectation of lEJl(ll v l 8').
j1-a.s.
A specification p with parameter space T and state space (E, C;) is a family of proper
probability kernels p = (PY)YE'V' where Py(A, x) is defined for A E 054. y , x E ETW. We
also define following Hillmer (1975) the associated kernels 1ty (A, x) for A E 054., x E Q,
by the relations
7rV<., x) is a probability measure on (Q, d) and equals Ox on d1\V and pV<., x1\V) on d v .
The last condition is a consistency condition Let ceQ) denote the space of continuous and
bounded function on Q equipped with the uniform norm. The specification will be assumed to
be continuous, that is 1tyf E ceQ) if f E ceQ) with 1tyf(x) = f 1ty(dy, x) fey).
A Gibbs state /l for a specification p is a probability measure on (Q, 054.) with
/l(AI 054. T\c) = Pc(AI.) for A E 054. and C E qr - that is /l(1tyf) = /l(f) for f E C(Q).
The set of Gibbs states,
tl (p)= {mE ~(Q,054.);m(UI054.TW)=py(UI.)m-a.s. V AE 054., VVE qr),
is convex. If the state space E is compact, Dobrushin (1970) proves the existence of Gibbs
states (2), and that tl (p) is compact. If W E qr and x E E W, Y E E T\W, we define (xy) as
the element of Q with corresponding coordinates x and y. Extremal points in tl (p) are the
measures with a trivial tail field. In the case T = Zd, extremal points in the set tl s(p) of
stationary Gibbs distributions are exactly the ergodic stationary measures in tl (p).
In the case where tl (p) has exactly one element, this measure is called the Gibbs state of the
system. We say that there is no phase transition.
£..J
y = '
Xs,t " (r yn )s,t and
n=O
2 Indeed any sequence ltv n(.I x) has a weakly convergent subsequence; choosing a sequence of finite Yn
increasing to T leads to the result. The compactness assumption may be relaxed: see Georgii (\ 988).
Examples: Gibbs Fields 65
Set 13 s = ~ SuPx IIp~(.1 x) - p;(.1 x)IIVar and Pa(f) =SUPx,y;ta If(x) - f(y)1 for x E ET\{a}
and f E C(Q). For any x, y in Q and fin C(Q) we have If(x) - f(y)1 $; L Pt(f).
tET
\.i t E T, L
seT
rs,1 s a < 1, and fii E §(pi).
Lemma 1. Set, for any s fued in T, a( s) = (at< s» E [RT with arCs) = at if s 7: t and
a is) =as + L au rU,s then a(s) is an estimate for fi] and fi2'
u;t,s
Sketch of the Proof. Indeed the fact that 111 and 112 are specifications implies
1111(f) - 112(f)1 $; 1111(1t}f) - 111 (1tJf)1 + 1111 (1tif) - 112(1tJfl
$; 13 s pt(f) + Ls
as P s(1tif) $; 13 s pt(f) + L
s;ft
as Ps(f) + L s;ft
as "(s,t Pt(f)·
The second inequality follows from p t(1tsf) ::;; pt(f) + "(s,t pt(f) if s 7: t and = 0 else.
Sketch of Theorem I's Proof. Define now a by '\ = 1, as =0 for s if:. t and apply the
Lemma 1 for various well chosen values of t. A contradiction can be exhibited allowing to
complete the proof (see Kiinsch 1982, Theorem 2.1.).•
Define the influence matrix of p by r = ("{s.t)s,t with "(s.t = i SUPx,y IIPs(·1 x) - ps(·1 y)II Var '
where x and y in Q are subject to the restriction Xu = Yu for u if:. t and s, t E T. The
following Dobrushin's condition gives the uniqueness of the Gibbs measure.
Proof. Consider only two elements 111 and 112 of ~ (p) in Theorem 1 to prove Theorem 2.•
The Gibbs underlying measure will be written!! £Y(p) = I!!}]. Dobrushin's uniqueness
condition is a sufficient condition for mixing properties to hold. This gives an idea of the
limitation of the mixing techniques: e.g. the physical phenomena which exhibit a phase
transition property are not considered.
66 Mixing
\t V E 0/, \t x E n,
S,tEVUf!'V uf!'V
for a continuous specification p and a corresponding Gibbs measure ~ (see (3».
Let V, W be disjoint subsets in T, with V finite then cp(V, W) =sup{I~(A I B) - ~(A)I} for
A E ffb y , B E ffb w may now be bounded. Then the following inequality holds
Consider the infimum of this expression with respect to f and use the inequality i1. I S; Xs,1 to
get.
Proof. The dominated convergence theorem implies a bound for the mixing coefficients. For
fixed t, L 'YU,I converges to 0 when V increases to T. Hence the following inequality gives the
Uf!'V
Remark 1. It may be shown that the cp-mixing assumption implies uniqueness of the Gibbs
measure.
3 Indeed, fix V, pv is a Gibbs stale on Q' = EV for the specification (pW)wcv thus Theorem 1 gives
In v f(x) - n v f(y)l::; f J
f(ux)(p v (dulx) - pv (duly)) + (f(ux) - f(uy)) p V(duly)
: ; s,IEV
L UEVL Yu,t Xs~t PI(f) + ueV
L pif)·
Examples: Gibbs Fields 67
Hence (rn)s I = 0 if des, t) ;::: nk. It yields L (rn)s I ~ an if des, t) > r k. This and
SET
rk
Xs,1 = L (rn)s,I' together imply L X s,1 ~ L L (rn)s,1 ~ a
n>kr s,d(s,t)<!rk SET n>kr I-a
Let now B cAE qJ', and r be an integer with deB, Ac) > r k, the triangle inequality
shows that <p(A, B) ~ L L L 'Yu s Xs t and
. IEB SE1\{t} d(s,B»(r-llk' ,
"'(A
'I' ,
B) <
_a IBI S UPteB ~
£.. < _l-IBI a dCA, B)/k .•
Xs,t-
SE A,d(s,Bl>(r-llk I-a
2.2.2.1. Potentials
Let v be an a priori single spin measure, that is a measure on E, we may define for any finite
subset A of T the measure on EA
V finite in T,
Pv(dyl x) = Z~x) exp{-
WeT,
L VnW;o'~
Iw(yx)j vV(dy),
IP (U (") V) . IP (U (") V)
Remark 2. Set \jI 0 (V , 'If) = sup IP IP I/mf IP IP I where extrema are
(U) (V) (U) (V)
defined for U E V and V E 'If such that IP (U) IP (V);c O. Bryc (1992) theorem 6.1.
provides a \jIo-mixing criterion for stationary Gibbs Markov fields in terms of interaction
potential (4). These coefficients are clearly related to \jI-mixing coefficients. Since it does not
take the same form as previous ones we do not present it in details but it gives rise to large
deviations results.
Example 1. If X is a k-Markovian discrete valued random field such that each point may be
visited, that is ~({x}) > 0, then Hammersley & Clifford prove that there is such a potential
function (see in Besag (1974) or in Guyon (1992».
eh dJ.l
Lemma 2. Let J.lo be an arbitrary measure on n, define dJ.lh = 0 for h E C( n)
Jeh dJ.lo
then II J.lg - J.lhIlVar.s;' IIg - hll oo'
This Lemma (see (5» yields IIPs(.1 x) - ps(.1 y)IIVar ~ L IIIvlloo' s, t E T, where Xu =Yu
V3s,t
for U;C t. Simon (1979) proves that Dobrushin's uniqueness condition holds here if
IX = Max t L (lVI - 1)IIIvlloo < 1.
V3t
This condition may be weakened to SUPtL (lVI- I)(Max Iv - Min Iv) < 1, or replacing
V3t
(Iv) by (Iv + H), for an arbitrary constant H. Now Theorem 4 asserts the geometric decay of
the associated cp-mixing sequence.
Example 2. Set N(t) = {s E T; 0 < d(s, t) ~ I}, N(t) is called the set of neighbours of
t in T. Assume that Iv = 0, for IVI > 2 with I{t}(x) = at Xt' I{s.t}(x) = b s•t Xs x t for S;c t.
Uniqueness holds as soon as
SUPt L Ibs.tl IIxlI,,! ~ IX < 1.
seN(t)
If E c [a, b] and lal ~ Ibl this means
IIxll;, ~ b 2 _ a2 •
Assume now that the distribution ~ is k-Markovian on T and has a density Q with respect to
some positive measure v on ET. A finite subset C of T is said to be a clique if either ICI = 1 or
any couple s. t in C satisfies d(s, t) ~ k. Hammersley & Clifford, and Besag (1974) have
IIfll.. = 1 because ~Vt(f) =vt(fq) - vt(f) Vt(q). Schwarz inequality now leads to the desired result.
Examples: Gibbs Fields 69
proved that if the distribution P does not vanish on the set of configurations ET then
Example 3. Let T = ;Z 2.
(i) Triangular grid. Random Markov fields with 6 nearest neighbours. Here IN(s)1 = 6 and
cliques have the form
C 1 ={{S};SE T}, C 2 ={{s,t};d(s,t)=1,s , tE T} or
C 3 = {{s, t, u}; des, t) = 1, des, u) = 1, d(u, t) = 1 s, t, u E T} .
A Markov field is constructed by setting for neighbours s, t , u
I{s} = f(x s)' Ils,t} = g(x s' x t), Ils,t,u} = h(x s' xI' xu),
An isotropic and statIonary field may be constructed by setting for neighbours s, t , u
lis} = a x s' Ils,t} = b Xs xI' Ils,t,u} = c Xs x t xu' Uniqueness holds if 3 Ibl + 61cl ~ l.
Triangular grid
Figure 2.2.1.
(ii) Hexagonal grid. Random Markov fields with 3 nearest neighbours. IN(s)1 = 3 and cliques
have the form {s} and {s, t} and a Markov field is constructed setting for neighbours s, t
li s} = f(x s)' Ils,t} = g(x s' Xt) ·
70 Mixing
Hexagonal grid
Figure 2.2.2.
(iii) Rectangular grids. Random Markov fields with 4 and 8 nearest neighbours on a regular
grid (k = I, T = Z 2) .
r
r: .J
-
I tt:'\
I.,;
"\ .r
-
.
r-- \:17
Figure 2.2.3.
Set R(S, 8') = InfQ I rex, y) Q(dx, dy), the lnf bound is taken over distributions Q with
marginals S and S'. One can prove that
,
R(S, S ) = Sup[
II f(x) S(dx) - I f(x) S'(dx)1 WIth. O(f) = Sup x y
If(x) - f(Y)1
( ).
O(f) , r x, y
Examples: Gibbs Fields 71
Usually r is chosen as the discrete metric rex, y) =0 if x;J:. y and rex, x) = 1. Dobrushin's
uniqueness condition still holds when IIPs(.1 x) - pS<.1 y)IIVar is replaced by
-
r~y
1() R(ps(.I x), Ps(.I y)) in the definition of 'Ys ,t and analogously for the Ws (see Dobrushin
(1970), Kiinsch (1982) and Follmer (1982)). Moreover Theorem 3 extends replacing q,-mixing
coefficients by a-mixing coefficients. Set r = ('Ys t) and A = (X s t) with Xs t = I (rn\ t.
, , 'n~O'
The assumption a = SUPt L 'Ys,t < 1 implies the geometric strong mixing condition
seT
analogue to that of Theorem 3 since 'Ys,t is here a geometric series.
An important case is the case of translation invariant Gibbs measures on ;;ld. In this case the
specifications are invariant under parameter translations. Set 'Ys-t ='Ys,t and Xs-t =Xs,t then
a(A, B) ::;; a2 L L f
Xs-t· with a 2 = SUPte T Il(dx) [infyeE r 2 (x t, y) pt(dxtl x)].f
seA teB
Assuming a decay Xs =O(lsr c-d-e) for some £ > 0 yields a(k; a, b) ::;; const a 2 [aAb] m-c .
We now recall the definition of a point process on [Rd. [Rd is the set where particles live. The
extension to arbitrary state space is given in Preston (1976, § 6) and the following presentation
is the one in Jensen (1990). Set fa the Borel a-field on [Rd and C c fa the set of bounded
Borel sets. A point process is a random variable with values in the set S of locally finite integer
choose h > 0 and set Ai = [hO - ~), h(i + ~)[ for i E ;;l d if 11 = (1, ... , 1). Then
[R d = U
Zd
Ai and S is isomorphic to n
ieZ d
S(Ai) equipped with the a-field
. ieZd
n
fa (Ai) and thus
m
and y = L ByG ) with perhaps repetitions in sequences x(.) and y(.). The Vasershtein metric is
j=l
defined with the help of r(., .). For A in ffi we also define x(A) = {xU) E A; j ;::: I} the set
of particles defmed by x lying in A and xA the restriction of x.
For example, point processes may be defined in terms of a pair-potential cp: IRd -7 IR with
2.2.3.2. Diffusions
We only recall some of the ideas in Deuschel (1986) following Follmer (1988). Let P be the
distribution of an infmite dimensional diffusion process
P is a distribution probability on (qo, I])T, the state space is here E = qo, 1] and the
metric r considered is the uniform metric.
If Y = (Xt\"s' then the local specification of the previous random field is given by
where p* denotes the Wiener measure and Ps the s-marginal of the distribution P of X.
Let b s have the specific form bs(x) = V's Is(x) where Is(x) = is(x s) + L jst(x s' Xt) for
to's
Note that lIalS<X)1I ~ 211V t Isllco + IIVt gsllco if at denotes the partial Frechet derivative· of the
N2(s)
function fs on (qo, 1]) .
In order to apply the Dobrushin contraction technique, Deuschel (1986) shows that
R(1t s(·1 ro), 1t s(·1 ro') ~ L 'Yst llro(t) - ro'(t)1I and 'Yst ~ crt lIasft(X)II,
N2(s)
Now, the non compact extension of Theorem 4 yields the cp-mixing condition under the
assumption
a = SUPt L 'Ys,t < l.
seT
The initial results concerning mixing properties of Gibbs fields are due to Dobrushin
(1968, 1970).
They are improved iq Kiinsch (1982) and conditions on the potentials come from Simon
(1979). The present presentation of specification comes from Follmer (1975). Georgii's (1988)
book is very complete concerning Gibbs states and proposes an extensive description of mixing
properties. See also Sinai (1982) and Prum (1986) for related results. Simplified results are
provided in Guyon (1986, 1992). Follmer (1988) proposes an approach to Gibbs random
fields with applications to infinite dimensional diffusion processes.
The examples of point processes and diffusions come from Preston (1976), Jensen (1990) and
Follmer (1988) and the example of Potentials comes from Simon (1969) and Guyon (1986, and
1992).
The mixing properties of lattice systems are also given in related works by Eberlein & Csenki
(1979), Hegerfeldt & Nappi (1977), Nahapetian (1980) and Neaderhouser (1978). Bowen
(1975) proves 'JI-mixing sufficient conditions for Ising models. Nahapetian (1991) proposes an
extensive discussion of most of the results in the present chapter.
Examples: Linear Fields 75
This chapter proposes sufficient strong nuxmg conditions for random fields
X =(Xt)te ;t'd defined linearly by Xt = L dgt' s Zs' The results proposed are announced in
se ;t'
Doukhan & Guyon (1991) and concern cases where Z is either an independent (§ 2.3.1) or a
mixing random field (§ 2.3.2). Usual results impose an invertibility assumption which may be
omitted using a change of the innovation process Z. Moreover the random field may be non
causal. We provide explicit bounds for the mixing coefficients which usually depend on the
cardinal of the subsets considered, at least for d 2: 2. We compare the results with the result in
(§ 2.1) for Gaussian random fields. The results are proved through lemmas of independent
interest in § 2.3.3. Finally § 2.3.4. proposes a motivation for the study of such random fields,
extensions as well as some miscellany results.
Let Z = (Zt)te ;t'd, be a real valued random field, where Zd is equipped with the uniform norm
defined as Itl = Max {Itll,,,, Itdl}, for t = (t 1,,,, td). d(t, C) denotes the distance between t and
the subset C of Zd.
(1) Xt = L gt s Zs'
seZ d '
We assume the existence of constants 0> 0, M > 0 such that for tin Zd
Set p = min { 1, o}. Convergence of the series (1) in the L I)-sense involves the following
uniformity condition on the coefficients (gt,s)t,se ;t'd,
(3)
Lemma 1. The distribution of the random field X in (1) is well defined under the assumptions
(2) and (3). Moreover the finite distributions of the random field X are limits in La of those of
the moving average random fields X"' defined by X"! = L g I,s ZS'
Isl9n
To quantify the assumption (3) we introduce the following notations for any constants m 2: 0,
II > 0, t E Z d and C c Z d : am,~ = SUPt L
Igt,slll, AC,t,~ = L
Igt,slll. Note that if
Is-tl>m sl1'C
d(t, C) 2: m, then ACm,t,~::; am,~ where Cm = {t E Z d; d(t, C)::; m} denotes the m-
order neighbourhood of a subset C c Zd.
Remark 1. Assume that the infinite dimensional matrix (gt,s) is almost diagonal, say there
exists some integer m 2: 0 such that Is - tl > m implies gt,s =0; the series is locally finite.
76 Mixing
Let cz(k; a, b) denote any of the mixing coefficient sequence associated to the random field Z,
then cx(k; a, b) :5: cz(k-m; a, b); see (1). For this, only note the inclusion of the a-fields
XA C ':itAm. Our aim is to extend this kind of result to arbitrary operators (gt,s)'
In view of (2) assume now that
(4) Sup d LIg t siP
tEL sEit4 •
=r < 00
Let (; c IR Zd be the space of bounded and real sequences on ~ d equipped with the norm
IIxlioo = Sup{lxtl; t E ~d} for x = (xt)te Zd. The linear mapping G defmed on (; by
(Gx)t = Ldgt,s Xs
SEZ
We denote by I the identity operator on (; and IRA is considered indifferently as the usual
d d
product space as well as the subspace of IR Z defined by {x E IR Z ; x t =0, t e: A}.
=
Counterexample 1. Let X (Xt)te Z be the process defined by X (1 - B)P Z, where B =
=
is the shift operator and Z (Zt)te Z is an independent and identically distributed sequence
(3). Then X is not mixing if p > 4 is not an integer.
Example 1. The linear operator U on (;, defined for t E ~ d by (Ux)t = L ut s x S' where
seZ d '
L IUt,sl < 1 and I~,tl = 1, is invertible and bicontinuous. Its inverse takes the form
Sup tE Z d
S;<t
1 Such a random field is only a finite moving average of Z. Mixing holds more generally if gt is a family of
measurable functions defined on the space of sequences on Zd such that llt(z) only depends on Zs for Is-t1~.
3 Let ~be the formal series defined by (1 - z)p = L at zt. The process X is defined by ~ = L a t_s Zs'
t=O s=o
Examples: Linear Fields 77
(Hx)t = h t xI' with 0 < a:::; Ihtl :::; A < 00 for t E Z d has the same properties. Let G be a
product of operators with the form of H of U and of its inverse, then condition (5) still sholds.
We shall begin with the assumption that Z is an independent random field; after this we shall
consider a mixing assumption.
We also assume the existence of a density Pt of the marginal Zt of Z and a constant c > 0 with
(6) f Ip/z + x) - p/z)1 dz ::; c lxi, \f x E fR.
fR
Note that this condition (condition (i) in Gorodetskii, 1977) holds if the random field Z is
independent and identically distributed and its marginal distribution is of bounded variation over
the real line (6).
4 To prove this, use Schwartz inequality and the summability of series L Itl 2k Ig tl2, L Itl- 2k .
tE Zd tE Zd
5 The integral part of Xtr- t is equal to XO' If r =~, ~'s marginal distribution is Lebesgue measure on [0, I].
6 Recall that a function p is of bounded variation over the real line if and only if there exists some constant
p
C < = such that any sequence Xo < Xj < ... < xp satisfies L Ip(x i) - p(xi_j)1 $ C.
i=j
78 Mixing
Theorem 1. Assume that conditions (2)-(6) holdfor the linear random field X given in (1) in
terms of the independent random field Z. Then assume that k is big enough, the mixing
coefficients sequences (ax) relative to X =(Xt)tE Ld satisfy the following for some constant (;
only depending on 0 and y.
i) For any finite subsets A and B ofT,
ax(A, B) ~ (; (L
fR.-lo, Ak )1I(l+6) + fR.-lo, Bk )1I(l+6)j. L
tEA tEB
ii) For any a, b > 0
ax(2k; a, b) ~ (; (a + b) N(k, 0).
The dependence on cardinals a and b in the bound ii) of Theorem 1 may be weakened for
hypercubes A and B with the form.~ lUi' vi] of ~ d.
1=1
11(1+11)
L am,lI L L(am,2)}'
00 00
Corollary 1. Let d = 1. Consider the random process coefficient sequence of the process X.
Assume that Z is independent and that assumptions in Theorem 1 hold. If k is large enough.
there exists a positive constant (; with
aX,2k ~ (; W(k. 0).
n
such hypercubes, C, K(C) = ~ .I1. [vrUj] we obtain 3'e (C, m, f.l) ::;; K(C) L.
H(a m.Il ).
1=1 J;Ct m=k
Dependence with respect to the cardinality is weakened, indeed in this setting K(C) = ICI I - lld if
C = [u, v]d.
m=k m=k
11(1+0)
from the choice H(x) = x vL(x) for such 2k-distant hypercubes A and B.
Examples 3. If Igt.sl ::;; g(lt-sl) for some nonnegative and nonincreasing function g, then
a m. ll ::;;
m-1
f
x d- I gll(x) dx. Assume for instance that for x ~ 1, g(x)::;; c x- P rX for some
constants c, p > 0 and 0::;; r < 1. Direct calculations imply am.1l = O(m d- IlP r llm ),
N(k, 0) = O(g(k) ~ k(d+1) In k) and W(k, 0) = O(g(k) ~k(d+3)l2ln k).
If X t = L.
gt-s Zs for a Gaussian white noise Z = (Zt)tE Zd and if g, g(z) = L.
gt zt, is a
SEZd tEZ d
non vanishing continuous function on the torus IT d, we have seen in Example 2.1.1 that if
a = InfzE lrd Ig(z)1 2 then we get similar bounds from Theorems 2.3.1. or 2.1.2. except for the
cardinality (IAI+IBI) factor. Let for example Igt.sl ::;; const. r lt-sl . Use Theorem 2.1.2., then the
inequality ux(k)::;; ~ SUPtlgtl. L. L.Igtl yields ux(n; a, b) = O(n d+ 1 rnl2) while
Isl:2:k Itl:2:lsll2
ux(n; a, b) = O«a + b) n(d+1)12 rnl2) comes from Theorem 1.
A simple modification of the proof of Theorem I yields the case of L2-linear random fields.
Proposition 1. IfZ is independent and if the conditions (2)-(6) hold with 8 = p = 2 then
there exists a constant (> 0 with
1/3
a x (2k; a, b)::; ((a+b) ak ,2'
The following result proves that one may expect more than strong mixing from the case
of causal processes (see Pham & Tran (1985».
Theorem 2. Assume that d=l and conditions (2)-(6) hold for some independent random
sequence Z. Let moreover X be a causal process (gt,s = 0 for s > t). Then if k is big enough,
the mixing coefficients sequences (fix) relative to X = (Xt)tE Zd satisfy for some constant (
only depending 8 and r
fiX;k ::; (W(k, 8).
Note, that, no absolute regularity sufficient condition has been proved (up to our knowledge)
for the more general case of non causal processes.
80 Mixing
(8)
r=l
If 0 < 0::; 1, the convergent series assumption will be omitted and we set 't = O.
In this case, assume that the fInite marginal distributions of Z are absolutely continuous. Let C
be a fInite subset of Zd. The density ofZc conditional to ~CC' Pc(z I~cc), satisfIes for some
constant c > 0 independent of C and for any x = (xt; t E C) E IR c
(9)
IR c
f ess-sup $5 CC
Ipe(z+x /$5 e c) - Pe(z /$5ec)1 dz::; c L Ixtl.
tee
Example 4. Assumption (9) holds if for t e C, the conditional density of Zt knowing that
Zc = y, PI,C' satisfIes
(9')
,
r Sup C IPt c<z + x, y) - Pt c(z, y)1 dz ::; c lxi, 'if x
ye~' ,
E IR.
If Z is specifIed by its conditional marginal distributions, this condition is natural (see § 2.2.),
the underlying measure Amay be chosen different of the Lebesgue measure. For a Markov fIeld
Z, it is enough to write (9') for a fIxed neighbourhood C of t, and suffIcient mixing conditions
based on conditional marginal distributions are known. Dobrushin and Simon's weak
dependence condition described in § 2.2. provides such properties. For a Gibbsian model, the
zh(y)
,for an underlying measure A. If the
,"
conditional density Pt cis Pt c(z, y) =
f e
exh(y) A(dx)
random fIeld Z is independent this is only condition (6).
In the mixing innovation setting, it is natural in view of Theorem 1.4.1 to replace the notations
(7) by
~12 11(1 H) ~/2(1 H)
(10) S't('t,C) = AC,t,'tvAc, 1,2' M(m,'t)=am,~ vam,2 •
11(1 H)
If 0::; 2, we set S't('t, C) = AC,t,'t and M(m, 't) = am,~ .
Theorem 3. Assume that conditions (2)-(5), (8), (9) hold for the random field Z. Then ifm is
big enough, the mixing coefficients sequences (aX) relative to X = (Xt)te Zd satisfy the
following for some constant' only depending on 0 and r.
i) For any finite subsets A and B ofT,
ax(A, B)::;, fL L
$/-., Am)II(1H) + $/-., Bm)II(1H)} + az(A m, Bm).
teA teB
ii) We have, for any a, b > 0 if k :2: 2m,
ax(k; a, b)::;, (a+b) M(m, -.) + a z (k-2m; a(2m+l)d, b(2m+l)d).
The same remarks as after Theorem 1 still apply and analogues of Corollary 1 are provided in
Examples: Linear Fields 81
the same way. The following simple corollaries of Theorem 3 may be useful.
Corollary 2.lfZ is p-dependent and the assumptions in Theorem 3 hold. Then let m be the
k-TJ
least integer greater tha~, we have
ax(A, B) ::; t;{ L /klr, A m)+ L /ktr, Bm)} and ax(k;a,b)::; I;(a+b) N(m, r).
tEA tEB
Corollary 3. If M(m, r) ::; const. e- JIm and az(p; u, v) ::; con st. (u+v Y e- ap for some
fl, a, r > 0 and the assumptions in Theorem 3 hold, then there exist a constant I; > 0 with
ax(k; a, b)::; I; (a+b) kwdl(2a+JI) e- JIakl(2a+J1J,
Remark 3. Corollary 2 allows to consider moving average parts in X without using the
invertibility condition (8). For instance, if Zt = lOt - lOt-v' for v fixed in Zd, and some white
noise £, we write X = G Z = G R £ for some non invertible operator R. Dependence of the
field Z allows to weaken the invertibility assumption (5). Corollary 3 may be used in the case
of a Gibbs field Z.
2.3.3. Proofs
Proof of Lemma 1. Convergence of the series defining X holds in the L sense. L Ois a °
complete metric space normed by IIUII/i = [[ IUIO] I/o if 0 ~ I, and for 0 < I, it is only
metrized with the distance do(U, V) = [IU - Vlo invariant by translations. Now X~ is a
Cauchy sequence in L 0, indeed we have, by assumptions (2) and (3),
Proof of Theorems 1 & 3. The points ii) follows clearly from i) and from the inequality
AcID,q! :::; am,~ valid if t E C. From this inequality we obtain
L 9t- t(O, C ID )II(I+OJ:::; ICI N(m, 0), L S'lt, C ID )II(1HJ:::; ICI M(m, 't).
~c ~c
Let E c [R A, F c [R B be any Borel sets; in order to prove the results we have to precise
bounds for the expressions
I.l = IP(X A E E, X B E F) - IP(X A E E) IP(X B E F).
The proof is based on the following decomposition of the random field X as a sum of a moving
average random field and an small remainder term with respect to the total variation. Let C be a
finite subset of Zd, we define W t and ~ (1) as
7 An additional index C should be added but we shall write for simplicity WI = W~ and We = (W~ltE C'
82 Mixing
We are in position to state the two following Lemmas. Their use is fundamental for proving the
bounds of mixing coefficient sequences.
Lemma 2. Let e be afinite subset of ;Zd. Assume that conditions (3), (4), (5) and (9) hold.
Let S be a measurable subset of [Rc. For m big enough there exists a constant /( such that for
any rC in [RC the following relation holds
ess-suP:it me Ip(W C E S - rc 1::b(Cm;c) -peW C E S 1::b(Cm;c) I ~ /( L Irtl.
(C ) teC
Note that under the assumption of independence of the random field Z, conditioning is useless
in Lemm& 2 and the assumption (9) reduces to the assumption (6).
Lemma 3. Let e be a finite subset of ;Zd. There exists a constant /( such that for any
measurable subsets U and S of.Q and [Rc the following relations hold.
i) Assume the assumptions in Theorem 1, then
Proof of Lemma 3. Set 1; = [P «Xc E S)n U) - [P «WeE S)n U). For any family of
Examples: Linear Fields 83
Hence ISI:5 IP (RC ~ H) + K L Tll' it only lasts to bound the fIrst term of this inequality.
C
i) The first term in this inequality is bounded under the assumptions of Theorem 1, using
Nagaev-Fuk inequality (Nagaev-Fuk (1971), Corollary 4),
1P(IRtl 2: Tl t):5 Tl-~ mt(o, Cm).
ii) Under the assumptions of Theorem 3, the fIrst term in this inequality is bounded using
Theorem 1.4.1 on the finite set C yields
1P(IRtl 2: Tl t ):5 Tl-( 8lt, Cm).
Now Lemma 2 implies respectively that ISI:5 K L [Tl t + Tl -~ mt(o, Cm)] for some constant
teC
K > 0, or lSi :5 K L [Tl t + Tl-( 8lt, Cm)]. Equilibrating both terms yields (*) and (**) for
teC
some constant K .•
Figure 2.3.1.
Apart from!l3 which may be bounded by uz(A m, Bm), any of those terms takes the form
S= !P«Xc E S)nU) - !P«W c E S)nU) for some event U. Use repeat idly (**) in Lemma
3 yields Theorem 3. Theorem 1 follows using (*) instead (**), here !l3 = 0 .•
Note that if one introduces two parameters m and m' instead of m, Theorem 3 may be
strenghtened in the case when uz{k; a, b) only depends on /lAb.
Proof of Proposition 1. The only modification with respect to the proof of Theorem 1 is
the simpler control
!P(lRtl ~ T]t) ~ T]-t IER~,
It yields the result with IER~ ~ a m,2' and T]t = ar::.~ .•
Proof of Corollaries 1 & 2. Nagaev Fuk inequality is used for p-dependent fields and a
direct optimisation yields Corollary 2 .•
Proof of Theorem 2. Let us note a modification of Theorem 1 for the case of random
processes (d=I). The process is causal, i.e. gt,s = 0 if s > t. In this case we may consider, by
a time shift, A = £. - and B = [k, +00[' in this case XB = W B + RB with
Wt =Lgt,sZs, R t = Lgt,sZs iftE B.
s$m+k s>m+k
This implies that W B is independent of XA and RB, hence
!l = !leE, F) = f [!P(W B E F - r) - !P(W B E F)]!P x A'
Ex~B B
R (dx, dr).
2.3.4. Miscellany
Example 2 motivates the previous linear representation ofthe random field X. We now
recall some general features concerning second order random fields given in Rozanov (1967)
and Guyon (1992). A second order stationary random field (Xt)tE;z>d is a random field with
IEX t = m and Cov(X O' Xt) = Cov(Xs' X s+!) for any s, t in £. d. Let <I> be the family of finite
subsets of £. d and H(X, F) be the linear subspace of L2(Q) spanned by (Xt)tE F'
The random field is said to be <I>-regular ifn H(X, F) = {OJ and <I>-singular if
FE <I>
there is Fo in <I> with n H(X, F) = H(X, FO)' It may be proved that any second order
FE <I>
Examples: Linear Fields 85
stationary random field (Xt)te Zd may be decomposed in an unique orthogonal way as the sum
X = X(r) + Xes) of a 4>-regular random field X(r) and 4>-singular random field X(s).
Let 4> = {F E ~d; pC is finite}; X is a 4>-regular random field if and only if its
spectral measure is absolutely continuous and its spectral density f is such that
f IP~~~2 dx < 00 for some trigonometric polynomial P. The random field X may then be
represented as X t = L gt-s Zs for some coloured noise Z (this representation is non causal if
d=I). Let d=1 and 4> = {F = [a, +00[; a E ~}, the same representation holds for some
white noise and is causal if the process X is 4>-regular; moreover 4>-regularity is equivalent to
the absolute continuity of the spectral measure and the condition f In f(x) dx > -00.
The fact that nongaussian linear processes are easier to identify than Gaussian ones (see
Rosenblatt (1985), Theorem 5, p. 46) gives also more interest to the present results.
Note that the previous mixing results may be extended in the following
multidimensional way. Assume that random variables ~ are vector valued say in a Banach
space [8, the coefficients gt,s define now a continuous and linear operator over [8 and all of the
assumptions over the linear sequence (gt,s) are obviously rewritten in terms of the underlying
norm of [8. The interesting case seems to be [8 = IR d, however changing X to
Y = (Yt)te Zd+l, with Yt+kll = Xt,k and 11 = (0, ... , 0, 1), the k-th coordinate of Xt leads to
analogous multidimensional results.
Following Pham, Tran (1985) the previous results may be adapted to vector valued
random fields in IRd. For this, it is enough to consider elements gt,s as dxd matrices and to
replace modulus by n,orms of operators on dxd matrices. However considering infinite
dimensional vector spaces is not a simple adaptation because in that case Lebesgue measure
cannot be used in the proof; see the use of condition (5) in the proof. The case of function
spaces has a special interest. Strong mixing can also be replaced by absolute regularity, see
Pham & Tran (1985).
The ~-mixing condition of Z = (Zn)ne Z is related in Denker & Keller (1986) to the
one of the process X = (Xn)ne Z if X t = f(Zt, Zt_I"") is a stationary random process
generated by some function such that for some constants C > 0 and 0 ~ a < 1,
If(xl"'" x n' YI' Y2"") - f(xl"'" x n' zl' z2, ... )1 ~ Can.
Survey of the literature
Because of their particular structure allowing a direct approach for most of the limit results,
works concerning the mixing properties of linear random fields came relatively late.
Withers (1981) studied a CLT for this particular class of processes defining a mixing condition
adapted to linear processes. Gorodetskii (1977) corrected an incomplete proof in Chanda
(1974) giving the strong mixing properties for linear and causal sequences. The same processes
are also shown to be absolutely regular in Pham & Tran (1985) and the results are extended
there to vector valued processes (see also Athreya & Pantula (1986». The results presented
here are the ones in Doukhan & Guyon (1991), causality as well as independence of the
innovation are omitted. Gorodetskii (1977) and Rosenblatt (1980) proposed the
counterexamples 1 and 2; they show the importance of the invertibility and absolute continuity
86 Mixing
assumptions.
Among the important literature concerning the classical time series, Kesten & O'Brien's (1976)
propose a review of mixing processes such as linear ones. Mixing conditions for infinite
memory systems are proposed by Denker & Keller (1986). Mokkadem (1990) proves the
mixing properties of ARMA processes which are an important class of linear processes. Basic
textbooks for the general theory of random fields are Rozanov (1967) and Guyon (1992).
Examples: Markov Processes 87
This chapter is divided in three main parts. We first present results for general Markov
processes; some consequences are provided in § 2.4.0.1. for a class of nonlinear processes,
dynamical systems are explored in § 2.4.0.2. and a class of nonhomogeneous processes is
considered in § 2.4.0.3. The main consequences are provided in two sections devoted to
Polynomial processes (§ 2.4.1) and explicit examples of nonlinear processes (§ 2.4.2).
We first recall that Q is called a probability kernel on the countably generated measured space
(E, &') if
x ---) Q(x, F) is a measurable function on the space (E, rf:), for F E rf:.
We use again the notations used in § 2.2. Let Q, R be two probability kernels, fbe a mesurable
function and y be a signed measure, we define QR, Qf, and yO as the probability kernel, the
mesurable function and the signed measure defined, when it makes sense, for x in E and F in &'
by
f Q(x, dy) R(y, F),
QR(x, F) =
A Markov process X = (X UIE 1f is defined by a measure space (E, &') and some probability
space (Q, 8t., [p), via a transition probability kernel P~(x, dy) such that P~(x, F) is a regular
version of the conditional probability [P (X t E FI Xs = x) for any F in &'. Therefore,
according to the usual definition, conditional probability with respect to the past is equal to
conditional probability with respect to the immediate paste.
The process is said to be homogeneous if P~(x, F) only depends on (t - s), in this case
we shall write ph = p:+h. We consider from now on a discrete time, homogeneous Markov
chain X = (Xt)IE:i" The n-step transition kernel is defined by
pn(x, F) = lP (Xn E FI Xo = x) - set P = pl. Let x be a point of E we set IE x for the
expectation operator conditionally to Xo = x.
Davydov (1973) proves the following fundamental explicit bounds (I) for the mixing
coefficients ~n and <\In'
I (I), (2) are the transcriptions in tenns of transition operators of the relations, valid for Markov chains:
Pn = [ SUPB IIP(B I Xk = x) -IP(B)I, <\In = SUPk ess-sup SUPB IIP(B I Xk = x) - IP(B)I, BE cr(Xk+n)·
88 Mixing
We first propose a sufficient condition for geometric decrease of the <I>-mixing coefficients of
the Markov chain (Xn) with transition probabilities P.
Theorem 1 (Ueno (1960), Davydov (1973)). Let (Xn ) be a Markov chain and )1 be
some nonnegative measure with non-zero mass )10' Assume that there exists some integer r
such that jar any x in E, pr(x, A) ~ )1(A) then
\1' x, x' E E, Ilpr(x, .) - pr(x', .)II Var ~ 2( 1-)10)'
\1'n E [1'/, \1' x, x' E E, IIpn(x, .) - pn(x', .)IIVar ~ 2(1_)10)nlr-1 = 2 rt.
There exists a probability measure n: on E such that
Sketch of Theorem l's proof. The measure pf(X, .) -110 is nonnegative and up to a
factor 2 its mass equals its variation norm, so the first point follows. The other points are
deduced classically using the multiplicative properties of transition kernels and inequality (2).
See Orey (1971) or Doob (1953) .•
Remarks 1. For the case of non homogeneous Markov chains, replacing the assumption
pf(X, A) ~ Il(A) by P~(x, A) ~ Il(A) yields <I>-mixing and <l>n ~ 211 n whatever the initial
distribution is.
A necessary and sufficient condition is given in Blum, Hanson & Koopmans (1963) for a
stationary Markov process (Xn)n~O to be 'JI-mixing. (Xn)n~O is 'JI-mixing if and only if the
probability transition kernel is absolutely continuous w.r.t. the stationary marginal distribution
1t and there are an integer m and a real number a, 0 < a < 1 such that the density Pm(x, y) of
the kernel pm(x,.) satisfies 1t®1t({(x, y) E E2; 1Pm(x, y) - 11 > aD = O. This determins
*-mixing sequences which are not <I>-mixing (see § 1.3.). An example is the integer-valued
Markov chain given by the transition matrix P = (Pi} with Pi,j = (i)j + (oi,j - 0i+ I} (~)i+j.
probability measure. Let La = {f E LP(E, It);f It(dx) f(x) = OJ, the Markov chain X is
sald L (E, It)-mlXlng If ap(n) = SUPfE Lo T
. p ... IIpnfll .
~2· In thls case, ap(n) decreases
p
geometrically to zero because the norm of operator based on the LP(E, It) is a norm of algebra.
IIp nfll 1
Moreover b n = SUPfEL ~ has the same decay as an and an ::; b n. An LP(E, It)-mixing
o 00
Markov chain is strongly mixing with a geometric decay. Doeblin condition and consequently
geometric <»-mixing, holds if aoo(n) -7 O. Moreover L 2 (E, It)-mixing is equivalent to p-
n....=
mixing, thus a p-mixing Markov process is geometrically p-mixing.
The Markov process is called recurrent if it is recurrent in the sense given in § 1.3.3. with the
same a-finite nonnegative measure It. X is said positive recurrent if It is bounded and null
recurrent else. The measure It is invariant and unique up to a positive multiplicative constant.
Assume now that there exists an unique invariant probability measure It. The chain will be
called aperiodic positive recurrent if IIpn(x, .) -ltll Yar -7 0 It-a.s. We shall name indifferently
ergodic an aperiodic positive recurrent chain in view of the ergodic theorem provided by the
definition, the terminology of stationary distribution is justified by the fact that if the initial
distribution of the process is It, the process X is stationary. Orey (1971), Nummelin &
Tuominen (1982) prove sufficient conditions for the following relations to hold It-a.s. for some
function A(x) > 0, with f A(x) It(dx) < 00,
Relations (3) and (4) are known as geometric ergodicity and Riemanniann recurrence with order
1C if 1C is an integer. With the identity (1) they imply respectively 13 n =O(Tl n) and 13 n = O(nK)
if the chain is stationary. However these relations do not hold necessarily whatever is the initial
distribution of the process. More precisely this condition will depend, in a nontrivial sense, on
the degree of relationship of the initial distribution and the stationary one (2).
A Coset is a set C in (t such that for some a-finite nonnegative measure m and some integer r,
pr(x, F) ~ m(F) for any x in C and F in (t (3).
Let 'tc be the hitting time of the C-set C, 'tc = Inf{ t > 0, X t E Cj, we set the relations
(3') SuP [f ea'tc < 00
XEC x
(4') SuP [f 'tK+l < 00
XEC X C
The relations (3') and (4') are proved to imply (3) and (4) respectively in Nummelin (1984) and
3 In tbe terminology of Orey (1971). the terminologies small set or petit set (tbe French translation of the
previous one) are also often used.
90 Mixing
Duflo (1990), using Orey (1971). Moreover SuP [Ex'tc < 00 implies aperiodic positive
XEC
recurrence. Now sufficient conditions for those conditions to hold for some Coset are stated in
terms of Lyapounov functions. We fIrst recall the following defInitions.
(d) Period of irreducible Markov chains: Orey (1971) proves, under the irreducibility
assumption, that there is a measurable partition rC1, ... , Cd' F} of n, with m(F) = 0,
P(x, Ci+ 1) = 1, It x E Ci - the index i is considered modulo d.
If d = 1, the Markov chain X is said to be aperiodic.
We are now in position to state the following Lyapounov function test criterion - see
also Nummelin (1984).
The forthcoming Theorem 3 from Mokkadem (1990) provides simple sufficient conditions for
m-irreducibility and aperiodicity in this result. It is extensively used in § 2.4.0.2., § 2.4.1,
§ 2.4.2.1, § 2.4.2.3.
A Harris recurrent Markov chain admits a unique invariant nonnegative measure 1t and m is
absolutely continuous with respect to 1t. Davydov (1973) proves (see also, Athreya & Pantula
(1986 b)) that an aperiodic and Harris recurrent Markov chain is absolutely regular if 1t is a
fInite measure. More precisely absolute regularity (or /3-mixing) holds for an Harris recurrent
Markov chain with a fInite invariant measure if the distribution of Vn has support in F and no
more than one of the Cj's. This holds whatever the initial distribution is.
II Uj = 0 the chain is recurrent with invariant probability 7t such that 7ta = 7tb =~. If
j=1
moreover the initial distribution is concentrated on {a, b} the chain is "mixing
by the previous remarks. If the initial distribution is concentrated on {I} the chain is ~-mixing;
it is "'-mixing iff SUPj Uj < 1. This provides an example of Markov ~-mixing chain which is
not "'-mixing:
b) tfJ-mixing does not imply the recurrence of a Markov chain. Let E = [N; consider the one
(112 112)
step transition probability given by P =
~ooOJgg:::)
0 U 0 0 ...
U 00 .. .
...............
with U = 112 1/2 then the
Let us now state assumptions useful in the following result. Let E be a separable topological
space endowed with its Borel a-field. Assume that for some subspace S of E and some
nonnegative and a-finite measure Jl with Jl(S) > 0 we have IP C-v n E [N, Xn E S) = I; this
means that the process ~ lives on S and set
(HI) For any compact subset K of E and any Borel subset N of S such that Jl(N) =0, there
is an integerr with 'v' XE KnS, pr(x, N) = O.
(H2) For any compact subset K of E and any Borel subset A of S such that Jl(A) ::t: 0, there
s
is an integer s with InfxE Kr.S p (x, A) > o.
(H3) There are a nonnegative and measurable function g (the Lyapounov function) on S, a
compact subset K of E 'and positive constants A, E > 0 and 0 < p < 1 of S with
Jl(KnS) > 0,
'v' xE KnS, [£(g(X n+ 1 ) I Xn = x) ~ pg(x) - E,
'v' XE KnS, [£(g(X n+ 1 ) I Xn = x) ~ A.
(H4) There are a nonnegative and measurable function h on S, a compact subset K' of E and
a positive constants A with
Jl(K'nS) > 0,
'v' U: K'nS, IE (h(X n+ l ) I Xn = x) ~ h(x) - 1.
'v' x E K'nS, [£ (h(X n+ 1) I Xn = x) ~ A.
b) Assumptions (H2 ) and (H4) imply positive recurrence. By a result in Jain & Jamison (1967)
there is an excessive measure 7r » Ils. Consider a compact K with Ils(K) > O. apply (H2 )
to K and A with Ils(A) > 0 and 1t(A) < 00. then 1t(Kr"lS) < 00. indeed
1t(A);:: J1t(dx) pS(x. A) ;:: 1t(KnS) InfxE
Kr1S
KnS pS(x. A).
Hence subsets Kr"IS with K compact and Il(KnS) > 0 are test sets for Tweedie ergodicity
criterion (H4 ).
c) Assumption (H]). Ils-irreducibility and positive recurrence imply that 7r is equivalent to Ils
and Harris positive recurrence. We only have to prove that 1t « Ils. Let K be an arbitrary
compact in E. and A with Ils(N) =O. then there is an integer r such that pr(x. N) =0 on
KnS. Hence 1t(N) =sl1t(dX) pS(x. N):$; 1t(S\K) is arbitrarily little as shows the regularity
of the measure 1t. Notice thus that subsets Kr"IS with K compact and Il(KnS) > 0 are C-sets
in the sense of Nummelin & Tuominen (1982) since the measure 1t and Ils are equivalent.
Using theorems 2.3 and 2.5 in Revuz (1984). chapter 3. we see that there is some nulset N
with Ils(N) =0 and such that S\N is absorbing thus F(x. A) = 1 for x in N. Ils(A) > 0 and
~
F(x. A) = P(x. u (X n E A». We still have to show that F(x. A) = 1 for x in S\N. and
n=1
for this use (HI) with K ={x} and the relation
F(x. A) ;:: P r (x. A) + JP (x. dy)
r F(y. A).
S\A
d) Assumptions (H]). (H2). (H3) and positive recurrence imply the conclusions of the Theorem
3. Now use the ergodicity criterion in Nummelin. Tuominen (1982) and equality (1) to
conclude.•
Remarks 2. Using Meyn & Tweedie (1992),s theorem 8.1. proves with the identity (1') in
§ 1.1 that an = O(11 n) for some 0 < 11 < 1 under the assumptions of Theorem 3 if the initial
distribution is a Dirac mass at some point. If the initial distribution v 0 of is absolutely
J
continuous with respect to 1t and satisfies g dv o < 00. then ~n = O(11 n) (use theorem 6.3.
ofMeyn & Tweedie (1992) which proves that A(x) = O(g(x» in inequality (3) and the related
Examples: Markov Processes 93
We follow here the presentation in Mokkadem (1987). Let <Xn) the [Rd-valued Markov chain
Here f is a measurable function on [Rd while en+1 is a sequence of independent and identically
distributed random fields. Such processes are in the class of Markov systems defmed e.g. in
Meyn & Tweedie (1992).
(IA) 3 r E il'l : pr(x,.) '" A. and Inf K pr(x, A) > 0 for any Borel set A with A.(A) > O.
xe
This condition implies irreducibility and aperiodicity. Suppose that en is absolutely continuous
and set gtCx) for the density of en(t). Here, the transition density of the process writes
p(x, y) = gx(Y - f(x». Hence condition (IA) follows from the following assumption.
Proposition 1. (i) If (IA) and (U') hold and !f(t)1 ~r It I if ItI > M for some M> 0 and
o~ r < 1 then the conclusions of Theorem 3 hold.
(ii) Without assuming (U') but only the uniform integrability of (leit)IS)t and the condition
(flf( t) + eit) IS ~ r ItiS for some M > 0 and 0 ~ r < 1 then the conclusions of Theorem 3
hold.
(iii) If (IA) and (un) hold and lfft)1 ~ ItI - A if It I > M for some M> 0 and A > InaC then
the conclusions of Theorem 3 hold.
(CT3) The distribution of c n is absolutely continuous with respect to Lebesgue measure and
its density is positive on some interval Hi, 0[.
Theorem 4 (Tong 1990). Assume that f is continuous over /Rd and continuously
differentiable in a neighbourhood of the origin. Suppose that conditions (CT]), (CT2), (CT3 ),
(CT4 ), (CT5 ), and (CT6) hold. The Markov process defined by (CT) is then geometrically
ergodic.
For this, use Theorem 2 with the Lyapounov function g(x) = SUPn~O {eCn IfI(x)I} in order to
prove that the assumption (H 3) holds with S = [Rd. The rest of the proof uses the results in
Feigin and Tweedie (1985). The main difference with Theorem 3, is that no localization ofthe
Markov chain on a subset may be considered. However, as quoted in Tong (1990), Theorem 4
may be used to prove the geometric ergodicity of various classes of Markov process instead of
Theorem 3. A partial converse to this result is also provided in Tong (1990).
Proposition 2 (Tong 1990). Suppose that conditions (CT2 ), (CT5 ), (CT6 ), (CT7 ) and
(CT8) hold. For any initial distribution of XO' the process defined by CT satisfies IXnl -+ 00
The philosophy of the previous results justifies them. The stability properties of the dynamical
system (DS) are essentially the same as the ones of the noisy system (CT). The point is
however that only very strong contractivity conditions are considered in these results.
2.4 . 0 . 3. Annealing
We recall here some basic facts in the case of nonhomogeneous discrete valued Markov
Examples: Markov Processes 95
chains. Such chains are used in annealing techniques for Gibbs fields, see § 2.2. We follow
here the lines of FoIlmer (1988) and Guyon (1992).
Annealing techniques have thus been introduced for computational reasons. The finite
case Inl = w < 00 is the only one interesting to investigate. Here T is finite as well as E("
The density of the distribution probability Il with respect to to the product discrete counting
measure v, takes the form
dll 1
-(co) =-exp{-I(co)}.
dv Z(co)
We set c(P) = ~ SUPi,j t IP(i,k) - PU,k)1 = ~ SUPi,j IIP(i,.) - PU,.)IIVar for a Markov
The chain is said to be strongly ergodic if there is a distribution Il"" such that, for any
n ~ 1, li~-7"" SUPIlIIIlQ~ -1l""lI var = O. It holds if for any n ~ 1, n c(Q~) = O.
m~n
Assume that Iln Pn = Iln, then weak ergodicity implies strong ergodicity if moreover
""
L IIll n+I-ll nll var < 00. The behaviour of ~L
n
f(X i) is that of f f(co) Il",,(dco) if the
~ ~ n
following conditions hold. Let cn = max!::;i::;n c(Pi), then the weak law of the large numbers
holds if Iim n -7"" n(1 - c n) = 00, the strong law of the large numbers holds if
L n-2(1 - c nr2 < 00 and the central limit theorem holds if limn-7"" n 1/3(1 - c n) = 00.
n=!
Simulation algorithms for the distribution Il follow from such results for a Markov chain with
Il P n = Il· Since strong ergodicity implies li~-7"" !P (X n = x I Xo = xo) = Il(x), we only
96 Mixing
The ftrst example is the dynamic of Metropolis. Choose any symmetric matrix Q on n.
Set P n = P with if i"* j, !LU) ~ !L(i), P(i,j) = Q(i,j) !LU.), and otherwise P(i,j) = Q(i,j) if
!L(I)
i"* j, P(i, i) = 1 -I, P(i, k) if i =j. Rewrite this as P(i,j) = Q(i,j) exp{ (lU) - I(i))+}.
bi
The other fundamental example is the Gibbs sampler. It is defmed for a sequence sn in
T such that each element of T is visited infinitely often (e.g. sn = n - N [N] if
T = (l, ... ,N}). In this case, set Pn(i,j) = 11i j1t Us I i(n)) where we have set
(n) (n) sn n
=J'
i = (i I' ... , ir) E nand i(n) = (it)t;tsn and analogously for j. In this case
limn~~ [p (X n = x I Xo = xO) = !L(x) if !L(x) > O.
Consider the general polynomial AR process (Zn) with values in a smooth algebraic variety (4)
E; it is defined by an independent and identically distributed sequence (en)nE 2'. with values in a
smooth algebraic variety F, a polynomial function <P: ExF -7 E and by the recurrence relation
(PAR) n = 1, 2, ...
The marginal distribution of sequence (en)nE 2'. is assumed to be absolutely continuous with
respect to a Lebesguian measure !LF (5) and its support M = {f > O} is deftned by its density
f. Let <pe(z) = <p(z, e), S the semigroup of polynomial applications generated by <Pe and Sz the
orbit of a point z in E by this semigroup (Sz = (<Pe\o ... o<Pek(z); k E [N, ei E F}). We shall
assume (H3) and
(AI) ::3 TEE, ::3 a E F, \;/ X E E: T E cl(Sx), T = <peT, a), T is said to be an
attraction point for the chain.
(~) The sequence Rk(x) = <Peno ... o<Pen_k(x) converges in mean of order s to a limit
independent of x for some real number s > O.
(AI) holds true if<p~ = <pao ... o<Pa (k times) is Lipschitzian with order (strictly) less than 1. Let
Zn be the limit in distribution of Rk(x) under assumption (A 2), then the process (Zn)nE Z is
stationary, satisfies the recurrence relation Zn+\ =<p(Zn' e n+\), and its marginal distribution 1t
4 Recall that an algebraic subset E in IRd is a subset defined for polynomials FI"'" Fr in [XI"'" Xd1 by
E = Ix E IRd; FI(x) = 0, ... , Fr(x) = OJ. A smooth algebraic variety E is now an analytic manifold which is
an algebraic set and which is not the union of two proper algebraic subsets.
5 If F is a submanifold of IR d it is the induced area measure.
Examples: Markov Processes 97
is invariant by the Markov chain (Zn). (H3) and (H4) imply (~).
Let us set some definitions before stating the result. The vector space spanned by ST is called
=
the Euclidean space of the process ("n)ne z. Let now cpk(z, e 1, ... , ek) CPe,o ... oCPe/z) be the
iterated polynomial function we set Dk = cpk(z, Fk). Note that ST = 'i Dk and (H2) implies
that the process (Xn)ne Z is in the closure of ST, that is 1t(cl(ST)) = 1. Let W k be the closure
in Zariski's topology (6) ofDk; the sequence W k is increasing thus (1) it is constant, W k = W,
for k ~ leo; W is called the algebraic variety of the states of the process (Xn)ne z.
Theorem 5. Under the assumptions (A 1), (A 2) and (H3 ), the equation Zn+I qJ(Zn' en+I ) =
defines on the orbit ST ofT a geometrically ergodic Harris recurrent chain and its invariant
probability 1C is equivalent to the restriction J.ls of J.l to ST.
Moreover, if the process (Zn) is stationary then it is geometrically absolutely regular and
(fg(Zn) = jg(X) 1C(dx) < 00.
If M is open, then the Markov chain is Harris recurrent and geometrically ergodic.
The proof of this result is based on heavy algebraic geometry arguments and may be found in
Mokkadem (1990). We are here much more interested in the applications of this result to
tractable models.
Until the end of this subsection we shall only consider affine models defined on
E = [RP by Zn+l =A(e n+1) Zn + b(en+1), n = 1, 2, ... where A is a polynomial function
with pxp matrix values and b is an [RP-valued polynomial function. Let 0 be a point in F, we
shall make the assumptions
(A'I) The eigenvalueS of A(O) are inside the open unit disk.
Corollary 1. Under assumptions (A~), (A~) and (H3), the conclusions of Theorem 5 hold.
Moreover the assumption (A;) implies (A~) and (H3 ) with g(x) = Ixls + 1.
6 Zariski's topology is generated by elementary closed sets which are the zero sets of polynomials.
7 An increasing sequence of algebraic varieties in IRd is stationary.
98 Mixing
(BIL)
Example 2. if A = ( 0 01 0
000
0) , B= ( 0) (0 0 0
010
, c= 01 )
0
then (c, Ac, B c)
In this case Theorem 4 cannot be used to prove geometric ergodicity, since the process does not
fill the whole space.
Assume that the [Rl-valued process (Yt)tE Z satisfies the recurrence relation
(ARMA) ±
i=O
Bi Y t-i = :f
j=O
Aj Ct_j'
for some independent and identically distributed [Rf-valued sequence (ct) of centered random
variables, Bi is a lxl real matrix for i = 0, ... , p, Aj is a lxr real matrix for j = 0, ... , q and
BO is the identity lxl matrix. Define for z E a:: the matrices P(z) = fBi zi, Q(z) = ~ Aj zj
i=1 J=1
and assume that
(S) The zeros of the polynomial PI (z) =det P(z) have modulus bigger than 1.
If (S) holds then the equation (ARMA) has a unique solution which is stationary and takes the
form Y t = L C k tt_k' A first way to get mixing sufficient conditions is the use of the results
k=1
in section 2.3, we shall prefer the simpler one concerning white noise (Et ) given in Mokkadem
(1990). It is also based on the previous Theorem 5.
Theorem 6. Under the stationarity assumption (S) the vectorial ARMA(p, q) model (ARMA)
is geometrically absolutely regular if the marginal distribution of (ct) is centered at expectation
and dominated by a Lebesguian measure on a smooth algebraic subvariety V of fRr containing
O.
Sketch of the proof. Think of a Lebesguian measure as the Lebesgue measure on a linear
submanifold of [Rd or more generally as its representations on the maps defining a Riemannian
manifold. We shall only sketch the proof of this result based on the Markovian representation
of Y for some X t in [Rk with k = max{p, q+ I}: X t = FX t_1 + GEl' Y t = HXI' here F, G,
H are real matrices such that the non zero eigenvalues of F are the inverse of the roots of P I' X
is stationary and Et and {X t _l , X t _2 , ... } are independent. It enough to prove the result for the
process X, the algebraic variety considered is the subspace S = n FP G([R k). A suitable
function g is given with the help of a Jordan decomposition of th/r~~triction of F to S, with
FVj,1 = A.j v j ,I' FVj,i = A. j Vj,i + Vj,i_l for i> 1, and i = 1, ... , Ij , j = 1, ... , J.
lA./ + lA.jl < r < 1 for j = 1, ... , J. It follows that IE (g(X t + 1)/ X t = x) :5 r S g(x) + A .•
Ango Nze (1992) proves the following robustness result. It is stated as a Corollary since the
technique used in the proof is the same.
Corollary 3. Assume that (ct ) is a sequence of independent and identically distributed real
valued random variables centered at expectation with common distribution equivalent to A.
P
Assume also that the polynomial P(x) = xf - L ai xf- i has its roots inside the open unit disk
i=l
of (f. Let g be a measurable function g: fRP ~ fR, such that Ig(x)I5'lxl for Ixl > A, and
bounded on the set {Ixl 5' A} for some A ~ O. Then there exists some ao > 0 such that the
Theorem 6 states this result for a = 0 if the distribution of (Et ) is absolutely continuous with
100 Mixing
respect to A.. Unfortunately the value of 110 is not directly related with the distance of the zeros
of P to the unit circle; note however that this result holds for any a in the case of a bounded
function g.
where f: [RkdxlMq ~ [Rd, d, k and q~ 1 are integers and {x n}, {~n} are independent
sequences of independent and identically distributed random variables with distribution L and
M valued in [R d and a Banach separable space (1M, J>{, ) respectively. Let 1.1, 1.101 denote
respectively norms on [Rd and 1M. On the state space of the process [RdkxlM q, we set the norm 1.1
defmed by a=Max{lutl, ... ,lukl,lvtlD1' ... ,lvqI01} if a = (uP ... ,uk,vp ... ,Vq)E [RkdxlMq.
The aim here is to state the mixing properties of such models under assumptions that can be
checked in terms of a priori information about the problem (i.e. in terms of the properties of
distributions L and M of {xn} and (~n}and the function f). We study probabilistic properties
of ARX Markov chains, as irreducibility, aperiodicity, and ergodicity. From § 2.4. general
results we obtain a-mixing and $-mixing sufficient conditions if the initial distribution of the
process is a Dirac mass and ~-mixing and $-mixing under a stationarity assumption. Note that
the initial condition now involves the values ofXo,·.·, xl_q.
This measure is very singular and thus it is clear that not much can be expected directly from
the one step transition measure. Thus we compute the r-th power transition kernel pr(a, A).
Define a[ui, ... ,U;,vi, ... ,v;1 = (ui, ... ,u;,up ... ,uk_r,vi, ... ,v;,vp ... ,Vq_r) for r::;;min{k,q},
a[ui,···,u;,vi,···,v;1 = (ui,···,uk,vi,···,v;, v t , ... , v q-r) for k ::;; r ::;; q and
a [ui , ... , u;, vi, ... , v;1 is defined similarly for the other cases, we have
Lemma 1. Using the previous notations and setting Uj = fRd if j > k and "i = 1M if j > q,
Examples: Markov Processes 101
we have
pr(e,A)= f L(duj) f L(dui)··· f L(dur:j)x
Ur+j((J) Ur_1+j(d!u;, v;]) U2+j((J[u ;, ... , u:_2 ' v;, .. , v:_2 ])
x f M(dvj) ... f M(dv;) M(V}) L(Ud(X[uj,., u;_I' vj,., v;_IJ))·
Vr v2
.
Proof. Use recurrence and the relatIOn P m+l (8, A) = f P(8, d8') P m
(8', A). Here
8' = 8 [u; , v;J and the integral only depends on the random variables U; , v; except for Dirac
masses. +
Let us consider some consequences of Lemma 1. We directly obtain the following sufficient
condition for (H 2).
Proposition 3. Under the irreducibility assumption (A), the Markov chain {Xn} is aperiodic
(i.e. d = 1 in relation (d)).
Proof. Let P be a period in condition (d) and S denotes the support of the distribution M, then
the form of the transition probabilities kernel P yields
d
8=(ul""u k , v!' ... , v q )IR x{ul'···' uk_l }xSx{v 1 '···' vq_l} C A p + 1'
E Ai =?
Using now the recurrence device we deduce easily that Ai = IRkdxS q, thus P = 1 and the
chain is aperiodic, indeed IR kdxSq is the support of ~'s distribution. +
Geometric ergodicity· is still considered from two different ways. The fIrst way is a direct
application of Lemma 1, leading to Doeblin's condition which is a strong assumption. Denote
by B(a, r) the closed ball centered at a and with radius r in IRd.
Proposition 4. Assume that f is bounded and L;;:: a 11 B(O, a+llflJ~) 1, for some a, a > O.
Then if r;;:: max{k, q}
pre e, A) ;;:: J1.{A) for some non-negative measure 11 with nonzero mass 110.
Proof. The result is proved for product Borel sets using Lemma 1 and the inequality follows
with ~(A) =ak
ME
f A(dul) ... A(duk)M(dvl) ... M(dvq)' which holds ifB=(B(O,a»kxlMr.+
Remarks 4. The result still holds if (x n) takes values in an arbitrary Polish space. The
process (Yn) defined by Yn = fn(Yn-l'···' Yn-k' x n,···, x n_q+ 1) + ~n is geometrically <1>-
mixing for any initial condition {Yo, ... , Yl-k} if (ft ) is an uniformly bounded family of
measurable functions.
Irreducibility and aperiodicity assumptions follow from Proposition 3 under condition (A).
We are thus in position to prove ergodicity criteria using Theorem 3. E.g. this result shows that
ergodicity and geometric ergodicity follow from the existence of a nonnegative and measurable
Lyapounov function g locally bounded such that for some Xo > 0, c > 0, 0 < p < 1 and any
102 Mixing
In order to establish these results we introduce the following assumption on the function f.
We assume that there exist nonnegative constants a I , ... , ak' a locally bounded and
measurable function h: 1M ~ IR+, and positive constants Xc, c, such that
Theorem 7. Assume that assumptions (A) and (fF) hold and fEh(xl) + fEl~11 < co. Assume
that the unique nonnegative real zero of the polynomial P(z) = 1- al I-I -... - ak satisfies
p ~ 1.
i) Then the process X is ergodic if fEl~11 + q fEh(xl) < c.
ii) The process X is geometrically ergodic if p < 1. Hence if the process X is stationary then
y is a geometrically {3-mixing process.
Recall that if the distribution of ~1 is equivalent to Lebesgue measure then condition (A) trivially
holds.
Proof. The polynomial P has only a positive zero p as it is shown in Polya & Szego (1972),
§ III-1-2, p. 106. Assumption (A) in Lemma 2 follows from Proposition 1 so that the only non
trivial point to prove in Theorem 3 is the existence of a locally bounded and nonnegative
function g and suitable constants E and 0 < p ~ 1 and Xc > 0 with
(L) IE {g(Xn+I) I Xn = e} ~ p g(e) - E, if lei> xo·
Set g(e) =
i=I
f
ai IUil +
j=I
t ~j
h(vj) for positive constants a i and to be precised later and~j
al = 1. Let lei> xo, note that if Xn = e = (uI' ... ' Uk' VI' ... ' v q ) is fixed then
Xn+I = (Yn' UI'···' Uk_I' xn+I' VI'···' vq_I) for Yn = f(e) + ~n. Hence
IE {g(Xn+I) I Xn = e} ~f ai Iujl+
h(vj) + IE + t ~;
IEh(xn+I) - c, I~nl ~I
i=I
j=I
where 14' = ai + ai+I for 1 ~i<k, ak = ak and~} = 1 + ~j+I for I ~j <q, ~~ = l.
Let us show that there exist some {ai' ~j} with
ai = p ai' ~} ~ p ~ j for 1 ~ i ~ k and 1 ~ j ~ q.
The first equalities yield iteratively P(p) = 0 as well as an appropriate choice of coefficients
a i > 0 for i < k. Now choose ~ j = p ~ j for j < q and ~ ~ ~ p ~ Ii or
equivalently 1 ~ P ~q and p ~j = 1 + ~j+I for j < q. We get ~j = pH ~I - (1+ ... +pj-2)
fior J· < q. Those reI·
ations h0 Id I·f PI
R > 1+ ... + pq-I S·
_ q . mce p < 1 thi s IS
. the case I·f we set
p
Examples: Markov Processes 103
~ _ .9-
1- pq.
Thus we have detennined coefficients {(Xi' ~j} such that Lyapounov inequality (L) holds. The
two cases considered in Theorem 7 have now to be considered separately.
Assume ftrst that p < 1 then (L) holds for some real number E. Replacing p by some greater
value minor than 1 and increase Xc yields condition (L).
If now p = 1 then ~l = q is a suitable choice hence (L) holds for some E > 0 and p if
IEI~nl + q IEh(xn+l) < c .•
Remarks S. If the distributions L and M are absolutely continuous, then it is also the case
for 1t and pn(8, .). Moreover if /.1 and /.1n are the densities of 1t and ~ then the second part of
J
Theorem 7 asserts that 1/.1(8) - /.1 n(8)1 d8 = O(ll n).
Remarks 6. Polya & Szego (1972, § I1I-1-2, p. 106) propose the bound M ~ P for the
positive zero of P. This bound.is defined by constants c i ~ 0 such that c t + ... + c k ~ I
for 1 ~i ~ k,
for 1 ~i ~ k.
Remarks 7. For q = 0 ARX process are simply nonlinear AR(k) processes. They are
defined by
Yn = f(Yn_I'···' Yn-k) + ~n·
If k = 1, the condition in part two of Theorem 7 writes with limsup If(lxlX)1 = P < 1.
Ixl~
k
In the AR(k) process Yn = L b i Yn-i + ~n' the best possible bound in condition (ff) is
i=I
k
If(8)1 ~ L Ibilluil. The roots of the polynomial Q(z) = zk - b l zk-I + ... - b k have a
i=I
modulus less or equal to the only positive zero p of P(z) = zk - Ibll zk-t - ... - Ibkl.
Mokkadem (1990) Theorem 6 proves geometric ergodicity of the process if the roots of Q lie
inside the unit disk. In fact Polya & Szego (1972, § I1I-1-2) show that (2 11k - l)p ~ /.1 ~ P
if is /.1 the largest modulus of a zero of Q. The previous loss between assumption conceming
the zeros of Q and R seems to be essential in view of the form of majorization chosen; this also
explain the interest of Corollary 3.
Example 3. Let (Xn) be the real valued Markov process deftned for some independent and
identically distributed sequence (~) with an absolutely continuous distribution with 1E1~11 < 00
by the recurrence relation,
(TAR(l)) Xn+l = [(i>I 1 {X :!>O} + <1>2 1{X >O}]X n + ~n' n = 1,2, ...
n-p n-p
Petrucelli & Woolford (1984) prove for p =0 that a necessary and sufftcient condition for
104 Mixing
geometric ergodicity to hold is q,1 < 1, q,2 < 1 and q,1 q,2 < 1. Chen & Tsay (1991) extend
> 1, the condition is transformed adding q,~ q,~ < 1 and q,~ q,1 < 1 for
(10) this result for p
some explicit integers s, t depending only on p. It is interesting to note that Theorem 7 yields
the sufficient weaker condition 1q,11 < 1, 1q,21 < 1.
In the more general situation of a multiplicative (11) locally a-compact group (E, .) the
particular case of AR(I) nonlinear process is investigated more precisely. The presentation in
Doukhan & Ghindes (1980) is followed here.
The AR(l) nonlinear process is defined by Xo' (Xo e E) and the recurrence relation
Xn = f(Xn_I)'~n' n = 1, 2, ...
where f: E -7 E is measurable with respect to the Borel a-field $(E) on E and {~n} is a
sequence of independent and identically distributed random variables with marginal distribution
L. The reference measure is the left hand Haar one, A. Additional probabilistic properties of the
Markov chain - as irreducibility, aperiodicity and ergodicity - are given below. The transition
probabilities are
=
P(x, A) L([f(x)r l A) for xe E, Ae $(E).
Let C(E) (resp. M(E» be the set of bounded and continuous (resp. measurable) real
functions on the topological space E, X has the Feller (resp. strong Feller) property if
P(C(E» c C(E) (resp. P(M(E» c C(E». Note that if f is continuous then the Markov chain
is aperiodic and has the Feller property, if moreover L« A then it has the strong Feller
property.
Consider for 0 ~ z < 1 the kernel Gz<x, A) = L zn pn(x, A) and for any Borel set
n=O
A, the set of the points that can be reached from A, I(A) = {x e E, Gz(x, A) > O}.
X is said irreducible in the open sets (10, for short) if 1(0) = E for any nonempty open set in
E. Write S for the support of the measure L and say that L has ACP if L has a non trivial
absolutely continuous part with respect to A.
Proof. Note first that there are some Borel set B with L(A);::= b A(AnB) for some constant
b > 0 and any Borel set A.
10 The sufficient condition of the previous result is proved using Theorem 2 for the iterated Markov process
(Xnd)~' The converse is proved directly proving that else the process is explosive.
II The multiplicative notation means that no commutativity assumption is assumed. It will thus include the
use of classical groups as the orthogonal one in the space of square matrices. We denote by I the unit of E.
Examples: Markov Processes 105
a) Assume that A E ffi (E) satisfies A(A) 7; 0 and 0 is a nonempty open set in E then
Gz<x, A);::: f Gz(x, dy) Gz(Y, A);::: JGz(x, dy) A«[f(y)rl.A)nB).
The facts that f is continuous and onto and the continuity of the function 11. A* 11. B yield the
result.
b) Note that 1(0) is an open set, then considering J = I(O)c yields a contradiction because
P(x, J) = I for x in J and f(J).S c J.•
X is said recurrent in the open sets (RO, for short) if G 1(x, 0) = 00 for any nonempty open
set in E and any x in E.
X is called I-recurrent if the taboo probability ('2) satisfies U(x, A) = I for any Borel
set A and any x outside of a A-nulset, N(A) (,3). From now on we assume in this subsection
that E is a metric space with a distance d invariant under translations. Usual Markov tools yield
Lemma 4. Assume that there is a compact set K in E such that G j(x, K) = 00 for any x in E.
Iffis continuous and onto, L has an ACP and (*) holds then X is A-recurrent.
Proof. Use the results in Revuz (1984, chapter 2.7) to prove that a I-recurrent and
A-irreducible process is A-recurrent. In the present framework these notions are equivalent as it
is shown using
U(x, A) = P(x, A) +
AC
f P(x, dy) U(y, A) ;:::1 - [p (.n
J=I
{Xj E N(A)} / Xo = x).
Now it is natural to set Rx = {y E E; G1(x, B(y, e) = 00, V 10 > O}. Let 10 > 0,
aE Rx' SE S. The precompactness of K determines a point in Rx as the intersection of balls
with radius tending to O. By continuity of f there exists 10' ;::: 10 with
G 1(x, B(f(a).s, 210'» ;::: f G 1(x, dy) L(B([f(y)r1.f(a).s, He'».
B(a,E)
Hence G,(x, B(f(a).s, 210'»;::: G1(x, B(s, 10» L(B(s, 10» since we get by the triangular
inequality B(s, e) c B([f(y)r1.f(a).s, He'). The assumptions on f imply f(Rx)'S c Rx and
the proof runs classically .•
Using renewal theory techniques yields the following result in the case E = [R.
13 That is, coming from any point x outside of N(A) the process visits infinitely often any Borel set A.
106 Mixing
Ergodicity is considered in the same way as recurrence. Let Ex be the ergodicity set, coming
from the point x defmed by
1 n
Ex = {ye E; limsup -
k-+~ n k=1
L
pk(x, B(y, e» > 0, "if e > OJ.
Lemma 5. Iff is continuous and onto, L has an ACP and (*) holds then X is ergodic if
moreover there is a compact subset K of E such that for any x in E
I
(**) G l(x, K) = 00 and limsup -
k-+~ n
Ln
k=l
pk(x, K) > o.
The following Lemma yields condition (**) ~ order to derive explicit ergodicity results. The
first result is elementary while the second one uses renewal theory.
Lemma 6. If one of the following assumptions holds, there is some compact K and some
integer nO with InA~no pk( 1, K) > O.
a) There exist two sequences of compact sets (Hn) and (Kn) with L L(H~) < 00, and
n=l
b) The closed balls Bc( x, r) in E are compact and there are constants a, T > 0 with
f(B/1, r» c B/1, r-a) for r> T. The distribution of d(l, Xl) is not arithmetic (14) and
satisfies fE d(l, xl) < 00.
Remark 8. Results of § 2.4.2.1. easily extend to the present framework, but the
corresponding results are not recalled in details in this subsection. We only mention that
=
geometric ergodicity - Theorem 7 - holds if E [Rd and If(x)1 $; k Ixl for some k < 1 and if
Ixl is big enough. Doeblin condition holds assuming that L has an ACP and f(E) is relatively
compact - see Theorem 1.
The vectorial autoregressive model with heteroscedastic errors is defined by the recurrence
relation
(ARCH) Xt+1 = f(X t) + g(X t) e t+ 1
where Xo e [Rd, g: [Rd -7 GLd([R) (15), f: [Rd -7 [Rd are measurable functions and (et) is an
independent and identically distributed sequence with distribution L. We assume that
L(r 1(Z» =0 where Z = {x e [Rd; det g(x) =OJ. If L is equivalent to Lebesgue measure
14 A probability distribution on IR is called aritlunetic if it is supported by a tl. for some real number a.
15 The space GLd(lR) of invertible real dxd matrices is equipped with some norm of algebra [written 11.11]
deriving of a norm on IRd [written 1.1], this means IIAII = Sup{IAxI; Ixl :,; I}.
Examples: Markov Processes 107
on a neighbourhood of the origin in [Rd, then the process (~) is an aperiodic and A.-irreducible
Markov chain.
An alternative condition for i) can be given in the Euclidean case. If the process (~) is centered
at expectation and its covariance matrix V satisfies
1· (!f(x)12 + tr[Vg'(x)g(x)]} 1
zmsuPlxl-+oo IxP < ,
then the process (Xt) is geometrically ergodic. For this set g(x) = 1 + Ixl2 in Mokkadem
criterion (Theorem 3).1.1 denotes here the Euclidean norm of [Rd: Ixl = ~ xI + ... + xl.
Proof. Use the method in Theorem 3 with hI (x) = (1 + Ixl)S, h2 (x) = e a1xl yields
respectively the following inequalities valid for Ixl big enough and some 0:5: P < 1
lE(h 1(Xt+1)IXt =x)= IE (1 + If(x)+ g(x)~+II)s:5:(1 + If(x)l+ IIg(x)III1E l lls)s:5:p (1 + Ixl)s,
IE (h2(Xt+l)1 X t = x) = lEexp(aIf(x)+ g(x) Et+II) :5: exp(alf(x)l+ln y(x»:5: p h 2(x) .•
Note that here again, Theorem 4 does not apply. Moreover, ergodicity criteria also result
from Theorem 3 by relaxing strict inequalities in weak ones.
Examples 5.
a) Engle (1982) introduced the ARCH linear univariate model given by
b) T ARCH models where introduced in Zakoian (1990) who considers the recursive equation
Xt+l = (a+ bX t 11 {Xt>Oj -cXt 11 {Xt<Oj)Et+1 with a, b, c > 0,
and a second order independent and identically distributed sequence (Et). The process is
geometrically ergodic if 0 < bvc < I.
=
Y t + 1 h(Y t) + 11t+l·
where the independent and identically distributed sequences (£t) and (11 t) are independent, and
the functions f: [R x [R -? [R, h: [R -? [R, g: [R -? [R are measurable and g satisfies
InfyelR Ig(y)1 > 0 (see Nelson (1990».
The ~-mixing and «I>-mixing coefficients were linked to the properties of transition kernels in
Davydov (1973). Rosenblatt (1972) studied the behaviour of the a-mixing coefficients.
Rosenblatt (1971, 1972) proposed an approach to mixing using operator theory.
The first mixing properties of general space state Markov chains were stated for Doeblin
recurrent Markov chains in Doob (1953) and Deno (1960) (Theorem 1). Following this
approach, Iosifescu & Teodorescu (1969) proved a simple «I>-mixing condition for learning
systems.
Foster (1953) and Pitman (1974) studied convergence rates to the invariant measure for Markov
chains with values in a discrete state space and respectively proposed propose necessary
conditions for (3') and (4') to hold.
The general probabilistic properties of recurrent Markov chains are studied in Doob (1953),
Jain & Jamison (1967), Orey (1971), and Revuz (1984). The approach of fundamental
interest here, is the one in Tweedie (1974, 1975, 1983) and in Nummelin & Tuominen
(1982) (Theorem 2). These authors proved explicit sufficient mixing conditions - rates of
decay for the mixing sequences (see also Nummelin (1984) and the more general results in
Meyn & Tweedie (1992». Those results are widely used in Mokkadem (1985, 1986, 1987)
for models of statistical interest (§ 2.4.1.). Mokkadem (1990) (Theorem 3) proves explicit
sufficient conditions for ergodity and geometric ergodicity based on simple assumptions on
the transition probabilities. Duflo (1990) and Meyn & Tweedie (1992) give conditions for the
assumption of stability - weaker than ergodicity - in the case of nonlinear models using
Lyapounov functions; she also gives ergodicity criteria based on the Lyapounov functions
technique. Borovkov (1989, 1990) proposes a Lyapounov functions approach to the
ergodicity properties of Markov chain.
The nonlinear AR processes are studied in Doukhan & Ghindes (1980) and in Doukhan &
Tsybakov (1993); those papers are the source of the results in § 2.4.2.2 and § 2.4.2.1. The
results for nonlinear financial processes studied in § 2.4.2.3. come from Ango Nze (1992).
Related processes, called doubly stochastic processes (16) are studied with the same point of
16 The simplest doubly stochastic models are described for a Markov process (~) and an independent
inovation et as the solution of a coupled system
Xt+1 = Zt+1 X t + et"
The process (XI' Zt) is still Markovian and the first authors cited provide sufficient geometric ergodicity
condition in the case when (~) is an AR( 1) or an MA( I) process. The second reference provides ergodicity and
stationarity sufficient conditions in a more general setting.
Examples: Markov Processes 109
Moreover Doukhan, Ghindes (1980, 1981) and Chan and Tong (1985) propose an approach
linking as much as possible the properties of the deterministic system to those of non linear
AR( 1) processes. Asymptotic relations between a dynamical system and the fIrst order
nonlinear autoregressive processes associated are related in Doukhan & Ghindes (1980); the
asymptotic is there with repeet to the fact that the distribution of the noise converges to the Dirac
mass at point O. No decisive result seem to be known in this case (17). See also Tong (1990),
and the papers by Cheng & Tong (1992) or Nychka et al. (1992). The latter authors determine
the dependence of the behaviour of the process on the initial conditions. Related results are in
Pham (1986), Tuan (1986), Tong (1990) and Diebolt & Guegan (1990,1991). We did not use
the results in Tong (1990) to prove the geometric ergodicity properties of various classes of
models since it is restricted to real valued excitating noise and it does not yield results for every
class of models considered here. However those result are really nice since one may really
expect to relate the properties of a dynamical system to the nonlinear autoregressive processes
naturally associated to it. Unfortunately in order to get a general result, Chan & Tong (1985) or
Tong (1990) are led to work with a set of restrictive assumptions. Several proofs may be
simplified in this Chapter (for particular cases) using this result.
Denker & Keller (1986) investigate the case of infInite memory systems. Reviews are given in
Roussas & Ioannides (1987), Athreya & Pantula (1986), as well as in Hernandez-Lenna et al.
(1991).
17 One may expect that the invariant measure, 7tcr of the Markov process Xn+ I = f(X n) + CJ En converges
to some invariant measure 7t for the associated dynamical system i.e. 7t(f l (B» = 7t(B); for definiteness,
suppose those processes to be ergodic for any cr). If f is continuous, the limit points of 7tcr are invariant by f
but no convergence result has been proved in general.
Examples: Continuous Time Processes 111
The notions of mixing still work here, considering the time dependence structure of
continuous time processes. We flrst present here some results concerning continuous time
Markov processes in § 2.5.1. while § 2.5.2. is devoted to their description in terms of
operators. General diffusion processes are described in § 2.5.3. We recall the strong notion of
hypermixing in § 2.5.4. Sufficient conditions for it to hold are the fundamental Bakry &
Emery's hypercontractivity criterion (§ 2.5.5.) and the ultracontractivity condition (§ 2.5.6.).
We detail explicit examples of processes in those sections. The last subsection, § 2.5.7.,
indicates some references concerning general stochastic differential equations (SDE). This
section doe obviously not describe every kind of continuous time mixing processes, e.g.
§ 2.1.2 and § 2.2.3 present other classes of such random processes.
Let X = (XJtElR'" an E-valued homogeneous Markov process with a regular version of the
conditional probability IP(X t E AI Xs = x) = Pt_s(x, A) for some transition probability
function on the Polish space (E, al (E». We assume the classical continuity assumption
Pt(x,.) => Bx to hold if t ~ O. The process is said to satisfy Dynkin's regularity assumption
if for any compact set in IRd
limHo SUPxe K IP (IX t - xl > £1 Xo = x) = O.
This is a continuity assUmption meaning that li~~o P t =I.
2.5.2. Operators
We give more details on the presentation of Markov processes with the help of
operators. Assume that X is stationary with marginal invariant distribution IJ., then the Hilbert
space L2(1J.) - with scalar product (.,.) and norm 11.11- is separable and contains some dense
subspace D(L). D(L) is called the domain of the inflnitesimal operator L of the Markov process.
Set Px(A) = IP(A I Xo = x).
L is a linear unbounded and non-negative operator defmed by the fact that, for fin D(L), the
t
process Mf =(MDQ() is a local Px-martingale where Mi = f (X t) - JLf(Xs) ds. It is deflned
o
P -I
as L = limt~o+ - \ - .
Ptf =
m=O
in the vocabulary of operators, it means that Pt = e-Lt. Assumption (iii) implies the ergodicity
of the process X, see Bhattacharya (1982, proposition 2.2). Now IIPlll ::;; e- tA \ 11ft I if
(f, 1) = 0, thus Rosenblatt (1971) results (see § 2.4.) imply
Proposition 1. The process X is mixing with at::; e-t}"l if assumptions (i), (ii) and (iii)
hold.
We also defme the Green operator G by Gf = f Ptf dt on the subspace of centered functions,
o
00
cr E [liminfn~oo an' liminfn~oo b n) with an = A. n + P-n -..J A.n_I P-n -..J A.nP-n+I and
1 n _,-
b n = -n L [A.i + P-i - 2-'1 A.i-IP-J .
i=O
Now L is the generator of a diffusion on E if, for any x in E, there exist (Xt, n, ff, ff t) for
some probability space (n, ff) and some process (Xt) on E with XQ = x, adapted to the
filtration (fft) such that for any f in ~, (MP
is a local martingale where
M{ = f(X t) - f(x) - Jt
Lf(X s) ds.
In that case the process (Xt) is said continuous if [t ~ f(~)) is a continuous real function for
fin~. Moreover~ <Mf, Mg>t = 2 r(f, g)(X t).
It is proved in Bakry & Emery (1985) that
In order to prove the converse, they first prove an interesting localization lemma. If for any fin
~, r(f2, g) = 2 f r(f, f) then for any f and g in ~, f= 0 on the set {g O} implies *"
Lf = 0 on the set {g O} . *"
The second energy form is defined as r 2: ~x~ ~ ~ such that
1
'if f, g E ~: r 2(f, g) = "2 [r(Lf, Lg) - r(f, Lg) - r(Lf, g)).
114 Mixing
J
selfadjoint (1), indeed
E
f ref, g) dll = Ef f Lg dll and Ll =0 yield r 2(f, g) dll = f Lf Lg dll·
E
If Lf = 0 only for constant functions f (assumption (iii) in 2.5.2), the process X is said to be
ergodic. In this case Ptf converges Il-a.s. to f f dll·
Examples 3 presented below are taken from Bakry & Emery (1985).
Examples 3.a) If E = IR, .if, = Coo, and Lf = a f" + b f, then ref, g) = a (f)2. If L
is the generator of a diffusion process then a;::: 0 and L = H2 + P H for Hf = a f',
a = -Va and P = ~, and r 2(f, t) = (H2f)2 - Hb (Hf)2.
For the particular d-dimensional case, following the presentation in Karatzas, Shreves (1988),
let cr(x) = (crij(x)i~d.j~r and b(x) = (bi(x))i~d be respectively the dispersion and the drift
matrices of the stochastic differential equations. The functions cri/x) and bi(x) are assumed to
be locally Holder continuous from IR d into IR, for 1:::; i :::; d, 1:::; j :::; r. W = (W t) is an
r-dimensional Brownian motion.
(D) dX t = b(X t) dt + cr(X t ) dW t •
There is a continuous process satisfying (D) if there is a constant K such that for (x, y) in
IRdxIR d
IIcr(x) -cr(y)1I + IIb(x)- b(y)1I :::; Klx-yl and Iicr(x)112 + IIb(x)1I 2 :::; K(l + IIxIl 2).
The dxd matrix defined by a(x) = cr(x) crT(x) is called the diffusion matrix. Set
~ ~ a 2f(x) ~ af(x)
"i f E C 2 (IR d), Lf(x) =.t...J .t...J aij(x) ax.ax. + .t...J bi(x) ax.'
i=l j=l I J i=l I
t
A solution of (D) satisfies that M~ = f(X t) - f(X o ) - f Lf(Xs) ds is a continuous local
o
martingale. It is a martingale if cr is bounded on the support of f.
Examples: Continuous Time Processes 115
Before we describe the hypercontractivity properties of diffusion processes we first recall some
simple results yielding directly mixing properties.
Example 4. Let (Wt)t:<:O be a Brownian motion on [Rd, the process X = (Xt)t~O is solution
of the equation dX t = b(X t) dt + dW t. Recall results in Doukhan, Le6n (1986) yielding
properties (i), (ii) and (iii) for such processes. Let C~ be the space of k-times continuously
d . af af
differentiable functions on [R with a compact support and V'f = (-a' ... , -a). For
Xl xd
2
Jl(dx) = e (x) dx define the bilinear form Bon CJ f
by B(f, g) = V'f.V'g dJl, its domain
D(V') is a Hilbert space with the norm IIfllV = (B(f, f) + IIflI2)l!2.
If the form B is closed there is a self adjoint operator, L, on D(V') with B(f, g) = (Lf, g). If
V'e E Ll~c([Rd) then C6 c D(V') and Lf= -At - b.V'f for fE C6, b = 2 e- I V'e and
Hf= -~f+c with c=e- I ~e.
The operators Land H have the same spectrum, their domains satisfy D(L) c L 2(Jl),
D(H) c L 2([Rd) via the transformation L(e- I f) = e- I Hf for f in CO', it is discrete if
limlxl-7oo c(x) = 00. Grouping these considerations yields
This result applies for multidimensional Omstein-Uhlenbeck processes dXt + CXt dt = dWI'
in this case c(x) = 'Y Ixl2 with 'Y> O.
m(dx) = +b (x)
x
exp{j2 ~(y) dy} dx is a finite measure and then Jl(dx) = m(dx) .
b (y) m(dy) f
a2 a
1
Assume now that E = [Rd and let L =:2 Ld aij(x) ax.ax. + L bi(x) ax. be an elliptic
d
i,j=1 I J i=1 I
The matrix A(x) = (ai}{x)) l~i,j5d is symmetric positive definite for x in [Rd and continuous as a
function of x, the functions b/x) are Borel measurable and bounded on compact subsets of [Rd.
116 Mixing
If moreover the coefficients bj(x) are bounded, for each x in IR d, Bhattacharya (1978) shows
that there exist an unique probability measure Px on n with Px(Xo = x) = 1 and that for any
function f twice continuously differentiable on IR d the process W = (Mf>~ is a Px-martingale,
t
where M{ = f(X t ) - JLf(X s) ds. The process X is a diffusion with generator L. This
o
process satisfies Dynkin regularity assumption. Defme now for y = x - z and ro > 0
d y. y. d d
Az(x) = L ajj(x) ~, B(x) = tr A(x) = L
ajj(x), Cz(x) = 2 Yj bj(x), L
jj=l Iyl j=l j=l
B(x) - Az<x) + Cz<x) - B(x) - Az(x) + Cix)
~ir) = Inf1yl=r ~(x) , ~Z<r) = SUPlyl=r ~(x) ,
Examples 3.b) If E=lRd,.sIt. = Coo. and Lf=df. we already have noticed that
d d
r(f. g) = Vf.Vg = L (D j f) (D j g) and r 2(f. g) = L (DjD j f) (DPj g) where Dj
j=l jj=l
denotes the partial derivation operator Dj = k. 1
Examples 3.c) More generally. let E be an abstract state space equipped with an algebra of
functions .sIt., a linear application D: .sit. ~ .sit. is called a derivation if the indentity
D(fg) = fDg + g Df holds on .sIt.. Consider now (k+ 1) linear derivations Do, ...• Dk , a
linear operator L may be defmed as
k
L = Do + L DT·
j=l
k
Then we write. analogously to b). nf. g) = L (Dj f) (Dj g). If the Lie brackets satisfy
j=l
Examples: Continuous Time Processes 117
[L, Di J = LDi - DiL = a Di for some a E 8t, independent of i, an easy calculation yields
k
r 2(f, g) = L
(DPj f) (DPj g). In the previous case a = O. Other nontrivial cases follow
i,j=!
in the same way. A linear derivation will be a fIrst order differential operator in the examples
proposed.
d) C~ manifolds. Assume that the state space is a C~ manifold, 8t, = C~ and the operator L
d d d
writes locally as L = LLgij Dij + L
Xi Di - with Einstein's notation, it is rewritten
i=! j=! i=!
L = gij Dij + Xi Di and '(gij) denotes the inverse matrix of (gij). (gij) defines a Riemannian
metric 2 as soon as (tj(x)) is a symmetric positive definite matrix for any point x E E. For
this Riemannian structure, the Laplace-Beltrami operator takes the form
M = gij(Di/ - rfj Dkf) (3), where crt) denotes the Christoffels symbol (4). It yields
L = 6 + X for some linear derivation X.
In this case ref, g) = (grad f I grad g) if (. I .) denotes the associated inner Riemannian
product. Now if X = grad h (it is the case with h = In dll with 11 the invariant measure of
dv
k
2 The metric which assigns the distance L gjj(x) dXj dXj to the points x and x + dx.
i,j=1
d d n
3 That is M = I L gij(O 1).. f - k=1
L ~I) 0kf).
1=1 j=1
k ! d kl ag' j ag il agio
4 r ..
I)
= - > g [..::.J,!
2 i=i ax'
+ - . - ~ J.
ax) ax
118 Mixing
the process and v the Riemann measure in the Markov case) then if Ric = (Ricij) denotes the
Ricci curvature (5) linear operator and Hess f denotes the operator (6) determined locally as
d
(Hess f)ij = glJ(Di/
.. '" k
- k r ij Dkf)·
k=l
r 2(f, g) = (Hess f I Hess g) + (Ric - (Hess h))(grad f, grad g).
2.5.4. Hypermixing
The strong notion of mixing called hypermixing for discrete or continuous time
processes arose from Large Deviations Theory (see Chiyonobu & Kusuoka (1988)).
This paragraph contains some of the results in Deuschel & Stroock (1989) as well as
their notations. Let E be a Polish and n a space of left limited and right continuous functions E-
valued trajectories w(.) on IR (usually named cadlag functions). Thus n
equipped with the
Skorohod topology is still a Polish space. We set fF (I) for the a-algebra generated in by n
(w(s), s e I} for closed sets I. Write 1< J for closed intervals I = [a, b] and J = [c, d] if
a < b < c < d. J,{,s(n) is the space of stationary measures on n; for t in IR and w in n we
define wt(s) =w(s) if lsi ~ t and wt (s+2t) =wt(s) for s e IR the 2t-periodically extended
function and 9tw(s) =w(s+t) the translation map on n.
P E ,)y6lD) is said to be hypermixing if
(H.I) There is a decreasing function p(t) > 1 defined on [c, +oo[ for some c > such that: °
lim H "" t(p(t) - 1) =° and I!fI .. JnIlI.:s;"I!flllp(t)" .. I!fnllp(t)
where ~ is fF(Ij)-measurable and I j are t-distant intervals with I j < Ij+ l'
(H.2) There are decreasing functions it) > 1 and c(t) defined on [c, +oo[ such that
limH""t('"f(t) -1) = 0, limH""c(t) = 0, rI(t) + I)I(t) = 1, and
llfE/2 fEljlIO(t).:s;" c(t) I!f1l J(t) for intervals 11,12 distant at least t and any function f with fEPf = °
and fE/is the conditional expectation offwith respect to P given fF(I).
The previous definition is the one given in Chiyonobu & Kusuoka (1988). In fact Deuschel &
Stroock (1989) give an alternative defmition replacing (H.2) by (H.2')
(H.2') There are decreasing functions it) > 1 and c(t) defined on [c, +oo[ with
limH""t('"f(t) -1) = 0, limH""c'(t) = 0, and ICov(fJ,h)ll.:s;"c'(t) I!fIlIJ(t) 1!f211X't)
where~ is fF(Ij)-measurable and (Ij)j=I,2 are t-distant intervals.
Both defmitions are equivalent (1). An interesting simple consequence is the following
More precisely considering fj = llA' -1P(Aj) in relation (H.2') leads for t-distant measurable
1
subsets Al and A2 to
IIP(A I f1A 2) -1P(A I )IP(A 2)1::;; c'(t) IP 1I")'(t)(A I ) IP II")'(t)(A2).
Remark 1. The previous proof also implies that a c'(t)-hypermixing process is also aa,b-
mixing (see § 1.1.) for any a, bE [0,1[, with the same decay rate.
Chiyonobu & Kusuoka (1988) give examples of such hypermixing processes. The ftrst one is a
stationary vector valued Gaussian process such that li~~oo t.p(t) =O. The second one is the
e-Markov case. Hypercontractivity and hypermixing are closely related in this case. We only
present the Markov case.
2.5.5. Hypercontractivity
Let Pt be the transition probability of a stationary Markov process with stationary
distribution J.l. and denote IIfllp = <f IfiP dJ.l.)lIp, then the process is hypermixing iff
IIPTfII 4
IIPT II 2,4 =Sup ~ = 1 for some T> 1, it is called J.l.-hypercontractive. Moreover,
c(t) =O(e-ct), yet) = 1 + O(e-'Yl) and pet) = 1 + O(e-pt) for some constants c, y, p > O.
Proposition 5. Assume· that some s, t > 0 satisfy
Pix, dy) = pix, y) J1.(dy) and Plx, dy) = plx, y) J1.(dy).
The strong mixing property holds with a geometric decay rate of the mixing coefficients if
f (f p;(x, y) J1.(dy)/J1.(dx) < 00, Hypermixing holds if moreover plx, y) ~a > 0, for
some a > 0 and any x, y E E.
The example of symmetric diffusions is widely investigated in Deuschel, Stroock (1989) and
the decay of c(t) may be checked using the results in Korzeniowski (1987). The second part of
this result is proved in Deuschel & Stroock (1989), the same lines of proof yield the ftrst part of
this result (sufftcient strong mixing condition).
checked. This implies that an hypermixing process is not necessarily I\>-mixing. E.g. Ornstein-
120 Mixing
Uhlenbeck process is stationary and Gaussian thus Proposition 2.1.1 shows that <j>-mixing
implies m-dependence, yielding a contradiction. That means also that o.a,o-mixing condition
may hold for a arbitrarily close to 1 without 0. 1 o-mixing - that is <j>-mixing - holds. Neveu
(1976) presents an alternative proof of this results.'
We present below the powerful Bakry & Emery (1985) r 2-criterion. This criterion
yields simple mixing sufficient conditions. $t. + denotes the subset of functions in $t. with
InfE f> 0 or equivalently the functions f> 0 with In f E $t..
The following fundamental lemma will yield simple hypercontractivity criterions. Set for this
IIPtfll q liPfllq . .
IIPtllp,q = SUPfELP(fl) IIfll = SUPfE ~ + IIfll . The Il.lIp are consIdered WIth respect to the
p p
spaces LP(!!). U will denote the convex function U(x) = x In x.
Lemma 1. For A > 0, fixed, the six following properties are equivalent
(1) \1 p > 1, \1 t :? 0, [1 ~ q ~ I+(p_l)eAtj => [lIPtllp,q ~ 1].
(2) :3p>I, \1t:?O, [l~q~I+(p-l)eAtj=>[IIPIIIp.q~1].
(3) \1 t :? 0, \1 q E [1, eAtj, \1 fEd +, lI(exp Pt)(lnf)lI q ~I!flil'
(4) \1 t :? 0, \1 f E d +, f U OP if) dJl ~ e,AI f U(f) dJl + (J-e- At) U (f f dJl).
A
Theorem 2. If the process is ergodic, and \1 fEd: r 2 (f, f) :? "2 ref, f) then
hypercontractivity holds with the constant A.
Theorem 3. If the process is ergodic, and there are constants a > 0, b:? 0 such that
\1 fE d,
a) b < 1 and rif, f) :? a rif, f) + b (Lf)2 then hypercontractivity holds with the constant
2a
A = I-b'
b) 1 < b < 4 and rif, f) ~ a ref, f) + b (Lf)2 then hypercontractivity holds with the
2a
constant A = b-I .
Note that the restriction b < 4, is according to Bakry & Emery, a technical one.
here r 2(f, f) ~ d/ ref, f), yielding A = 2 d;21 . if L =,1.. However Mueller & Weissler
(1982) showed that A = 22d is the suitable hypercontractivity constant. However Theorem 3
r
gives the right constant with the inequality IIHess f1l2;e: ~ (,1.f)2. If E is ad-dimensional
Riemannian manifold with a positive curvature, say that the eigenvalues of Ric are greater or
equal to c, the same computations yield A = 1~I~d . Rothaus (1981) proves hypercontractivity
for the Brownian motion on general Riemannian manifolds (non necessarily with positive
curvature). The explicit bound set here allows to consider more general diffusion processes
with the ,1. + X, where the first order differential operator X satisfies a suitable Lipschitz
condition related to theassumption in Theorem 3 a) by
X®X:S: (t- d)(Ricc(L) - a g) and b d:S: 1.
1 ..... .
In local coordinates, this means that [(5 - d)(Ricc(L)IJ - a glJ - Xl XJhj is a nonnegative
matrix. Other examples may be found in Bakry & Emery (1985).
2.5.6. Ultracontractivity
We give in this section some of the results in Davies (1988). Assume now that
Pt = e-Lt is a symmetric Markov seroigroup on L2(E, f.L(dx» for some Borel measure f.L on the
second countable locally compact space E.
P is said to be ultracon.tractive if ct = IIP tll oo ,2 < 00 for all t > O. Let K(t, x y) be the kernel
associated to the seroigroup: e- Lt f(x) = f K(t, x, y) fey) f.L(dy).
Davies shows that K(t, x, y) :S: Cr.72 and conversely if O:S: K(t, x, y) :S: at then P is
ai
ultracontractive with ct :S: 12.
In the particular case of the heat kernel: L = -,1. on L 2([Rd, dx), K(t, x, y) = ~(x - y)
with ~(x) = (41ttr d12 e-x2/4t. Here ~ = IIKtii oo = (41ttrd/2 and ct = IIKtll 2 = (81ttr d/4 , thus
the previous bounds are sharp.
The Sobolev's logarithmic constant 13(£) is defined as the infimum of I3's such that for each
f E Dom(L)nL2(E)
f f2 In f dx :S: £ (Lf, f) + 13(£) Ilfll~ + Ilfll~ In IIfII~,
or equivalently, f f2(x) In f(xl dx :S: £ (Lf, f) + 13(£) IIfll~.
IItl~
Gross (1975) proved (see also Deuschel & Stroock (1988» the following
122 Mixing
Sobolev's logarithmic inequality holds with ~(t) such that c t = "Pt"~ 2 = e i3 (t) if the
semigroup is ultracontractive. Reciprocally, if Sobolev's logarithmic inequltlity holds for a
continuous and decreasing function ~(e) then the semi group e-Lt is ultracontractive with
t
ct = "Pt"~ 2 = eM(t)
,
and M(t) = it
0
f ~(e) de. Recall that hypercontractivity holds with
~(e) = O.
If L is strictly elliptic on E c [R d then 0 ~ K(t, x, y) ~ c C d12 and ~(e) ~ c' - ~ In e.
Q(f) = f (~aij
~ af at' 2
ax. ax. + v If I ) dx.
!,J ! J
Assume that there is some C 2 function <\> on E with L<\> ~ O. The operator Uq, defined by
[U q, f = <\> f] is unitary and defined L2(E, <\>2(x) dx) ~ L2(E, dx). It allows transportations
of the Markov structure on L 2(E, dx), replacing the operator L on L2(E, dx) by some operator
H on L2(E, <\>2(x) dx). Results in the previous subparagraphs may now be used to get
ultracontractivity and thus hypermixing.
dZ(t)
CIt = e F(t, Z(t), co) for e ~ O.
Under some suitable regularity assumptions such processes converge to a diffusion process.
The techniques used there are based on the strong mixing properties of such equations. See e.g.
Papanicolaou & Varadhan (1973), Cogburn & Hersh (1973), Kesten & Papanicolaou (1979) or
Gerencser (1989). The previous authors give various conditions under which the random
function has such mixing properties. This approach explains the link between deterministic
dynamical systems and a related noisy system.
Khas'minskii (1960) and Bhattacharya (1978, 1982) prove recurrence properties of diffusion
processes. Carmona (1979) and Molcanov (1981) study the spectrum of Schrodinger
operators. Chen Mufa (1990) is interested by the spectrum of diffusion processes. Baxter &
Examples: Continuous Time Processes 123
Brosamler (1976) and Bolthausen (1982) studied the particular case of the Brownian motion on
a Riemannian manifold.
Chiyonobu & Kusuoka (1988) defme the hypermixing properties in view of a theorem of large
deviations; they also present examples.
Bakry & Emery (1985) prove fundamental explicit necessary hypercontractivity conditions for
very general diffusion processes - those results are extended in Bakry (1991) : the r 2-criterion.
Deuschel & Stroock (1989) present other hypermixing sufficient conditions. Davies (1989)
studies this property from the operators viewpoint, linking the infimum of the spectrum, the
Sobolev logarithmic inequality and the notion of ultracontractivity in the case of Schrodinger
operators.
Papanicolaou & Varadhan (1973), Cogburn & Hersh (1973), Kesten & Papanicolaou (1979),
Rosenblatt (1987) and Gerencser (1989) prove limit theorems for SDE related to mixing
techniques.
125
Bibliography
After each reference we indicate the chapter of this volume with which it is related in
underlined characters, e.g. § 1. § 2.3. § 2.5. means that the corresponding reference is
related together with each paragraph in chapter 1 and to the subsections 2.3 and 2.5. The title of
monographs is written in italic characters. The title of reviews is classically abreviated.
Aaronson J., Denker M. (1991) On the FLIL for certain 'If-mixing processes
and infinite measure preserving transformations. C. R. Acad. Sci. Paris, Serie 1,
313, 471-475 .....§.ll
Athreya K.B., Pantula S.G. (1986 b) Mixing properties of Harris chains and
autoregressive processes. J. Appl. Probab. 23, 880-892. § 1.3. § 2.4.
Baxter J. R., Brosamler G. A. (1976) Energy and the law of the iterated
logarithm. Math. Scand. 38, 115-136....§..bi"
Berkes I., Phillip W. (1977) An almost sure invariance principle for the
empirical distribution of mixing random variables. Z. Wahrsch. Verw. Gebiete
41,115-137.~
Bhattacharya R. N. (1982 a) On the functional central limit theorem and the law
of iterated logarithm for Markov processes. Z. Wahrsch. Verw. Gebiete 60, 185-
201.~
Bolthausen E. (1982) On the central limit theorem for stationary mixing random
fields. Ann. Probab. 10-4, 1047-1050. § 2.5.
Bradley R. C. (1987) Idendical mixing rates. Probab. Th. ReI. Fields 74, 497-
503 ......§..l.J.,
Bryc W. (1992) On the large deviation principle for stationary weakly dependent random
fields. Ann. Probab. 20,2, l004-1030 ...,§,ll
Bryc W. (1992) On the large deviation principle for stationary weakly dependent random
fields. Ann. Probab. 20, 2, 1004-1O30...,§,ll
Bulinskii A. V., Zhurbenko I.G. (1976) A central limit theorem for additive
random functions. Theory Probab. AppI. 21,4,687-697. § 1.3, § 2.4,
Chan K. S., Tong H. (1985) On the use of the deterministic Lyapounov function
for the ergodicity of stochastic difference equations, Adv. in AppI. Probab, 17, 666-
678.~
Doukhan P., Guyon X. (1991) Mixing for linear random fields. C. R. Acad. Sci.
Paris, Serie 1,313, 46S-470....ill
Doukhan P., Leon J. (1986) Invariance principles for the empirical measure of a
mixing sequence and for the local time of a Markov process; in Strasbourg,
Conference of Probability in Banach Spaces 1985; L.N.M. 1993,4-21. Springer-
Verlag, Berlin.~
Doukhan P., Leon J. (1989) Cumulants for mixing sequences and applications
to empirical spectral density. Probab. Math. Stat., 10.1, 11-26. § 1.4. § I.S.
Doukhan P., Leon J., Portal F. (1984) Vitesse de convergence dans Ie
theoreme central limite pour des variables aleatoires melangeantes a valeurs dans un
espace de Hilbert. C. R. Acad. Sci. Paris, Serie 1,298, 30S-308. § 1.4. § I.S.
Doukhan P., Leo~ J., Portal F. (1985) Calcul de la vitesse de convergence
dans Ie theoreme centrallirnite vis a vis des distances de Dudley, Levy et Prokhorov.
Probab. Math. Stat. 6.2, 19-27....§.J...S..
Doukhan P., Leon J., Portal F. (1987) Principe d'invariance faible pour la
mesure empirique d'une suite de variables aleatoires dependantes. Probab. Th. reI.
fields 76, SI-70.,,§,U
Doukhan P., Massart P., Rio E. (1994) The functional central limit theorem
for strongly mixing processes. To appear in Ann. I.H.P. § 1.4. § I.S.
Doukhan P., Portal F (1983 a) Moments de variables aleatoires melangeantes. C.
R. Acad. Sci. Paris, Serie I, 297, 129-132.~
Doukhan P., Portal F. (1983 b) Principe d'invariance faible pour un processus
empirique dans un cadre multidimensionnel et melangeant. C. R. Acad. Sci. Paris,
Serie I, 297, SOS-S08.,,§,U
Doukhan P., Portal F. (1987) Principe d'invariance faible pour la fonction de
repartition empirique dans un cadre multidimensionnel et melangeant. Probab. Math.
Statist. 8.2, 117-132. § 1.4. § I.S.
Doukhan P., Tsybakov A. (1993) Estimation in non parametric A.R.X.
Mixing 131
principle for weakly dependent random fields. Soviet Math. Dokl. 29, 3, 529-532.
LU..
Gross L. (1975) Logarithmic Sobolev Inequalities. Amer. J. of Math.97,
1061-1O83.~
Neaderhouser C. C.' (1978 b) Some limt theorems for random fields. Comm.
Math. Phys. 61, 293-305. § 1.5. § 2.2.
Nelson D. (1990) Stationarity and persistence in the GARCH(I-I) model.
Preprint, University of Chicago.~
Nelson E. (1973) The free Markov Field. J. Funct. An.~
Orey S. (1971) Limit theorems for Markov chain transition probabilities. Van Nostrand,
London.~
Peligrad M. (1986) Recent advances in the central limit theorem and its weak
invariance principle for mixing sequences of random variables in Dependence in
probability and statistics, a survey of recent results. Oberwolfach, 1985, Birkhiiuser.
§ 1.1. § 1.3. § 1.5.
Pham T. D., Tran L. T. (1985) Some mixing properties of time series models.
Stochastic Process. Appl. 19, 297-303.~
Philipp W. (1970) Some metrical theorems in number theory II. Duke Math.
J. 37, 447-458 .....§...L..2."
Pitman J.W. (1974) Uniform rates of convergence for Markov chain transition
probabilities. Z. Wahrsch. Verw. Gebiete 29,193-227.~
Preston C. (1976) Random fields. L.N .M. 534. Springer-Verlag, Berlin . .§...lA
Rio E. (1994) A new covariance inequality for strongly mixing processes. To appear in
Ann. I.H.P.~
Sinai Va. G. (1982) Theory of phase transition " rigorous results. Pergamon
Press, N.y .....§...bl."
Stein Ch. (1973) A bound for the error in the normal approximation of a sum of dependent
random variables. Proc. 6th Berkeley Symp. Math. Stat. and Probab. 2, 583-602.
§...U..
Takahata H. (1983) On the rates in the central limit theorem for weakly
dependent random fields. Z. Wahrsch. Verw. Gebiete 64, 445-456...§...1..J..
Tjostheim D. (1986) Some doubly stochastic time series models. J. Time Series
Anal. 7, 51-72.~
Uteev S. (1984) Inequalities and estimates of the convergence rate for the
weakly dependent case. Adv. in Probab. Th. 1985, Novosibirsk. § 1.4. § 1.5.
Van Dorn E. A. (1985) Conditions for exponential ergodicity and bounds for the
decay parameter of a birth-death process. Adv. in Appl. Probab. 17, 514-530.
U,2.
fields and mixing processes. Statistical studies 9. Finnish Stat. Soc., Helsinki.
§ 1.4. § 1.5.
Index
This index is divided into a specific index for mixing coefficients and a general index.
Index
I-recurrent 105
absolute regularity 3; 7; 90
admissible 67
affine models 97
algebraic variety 96
annealing 95
aperiodic 90; 104
aperiodic positive recurrent 89
mixing coefficients AR(l) nonlinear process 104
ARCH 100
arithmetic 106
ARMA99
c-mixing 16; 18 ARX nonlinear process 100
*-mixing 3; 88 Bernstein inequality 33; 36
a-mixing 3; 17; 32; 45; 57; 80 bilinear model 98
birth-death 112
aa,b-mixing 5; 119 Brownian motion 112
~-mixing 3; 17; 36; 59; 90 C-set 89
causal 79; 85
$-mixing3; 19;32;39;45;57;68;88; 112 central limit theorem 45
p-mixing 17; 19; 35; 47; 57; 89 Chapman-Kolmogorov 87
r-mixing 15 clique 69
configuration 71
'JI-mixing 3; 19; 88 covariance inequalities 10
dependence 63
derivation 116
diffusion 116
diffusion process 72; 115
Dobrushin's condition 65
Doeblin recurrent 88
dynamical system 93
elliptic operator 115
equilibrium 94
ergodic 19; 21; 64; 89; 104; 114
Feller 104
142 Mixing
Vol. 3: BD. Spencer, Benefit-Cost Analysis of Data Used to Vol. 24: T.S. Rao, M.M. Gabr, An Introduction to Bispectral
Allocate Funds. VIII, 296 pages, 1980. Analysis and Bilinear Time Series Models. VIII, 280 pages, 1984.
Vol. 4: E.A. van Doom, Stochastic Monotonicity and Queueing Vol. 25: Time Series Analysis of Irregularly Observed Data.
Applications of Birth-Death Processes. VI, 118 pages, 1981. Proceedings, 1983. Edited by E. Parzen. VII, 363 pages, 1984.
Vol. 5: T. Rolski, Stationary Random Processes Associated with Vol. 26: Robust and Nonlinear Time Series Analysis. Proceed-
Point Processes. VI,I39 pages, 1981. ings, 1983. Edited by J. Franke, W. HlIrd1e and D. Martin. IX,286
pages, 1984.
Vol. 6: S.S. Gupta andD.-Y. Huang, Multiple Statistical Decision
Theory: Recent Developments. VIII, 104 pages, 1981. Vol. 27: A. Janssen, H. Milbrodt, H. Strasser,lnfiniteIy Divisible
Statistical Experiments. VI, 163 pages, 1985.
Vol. 7: M. Akahira and K. Takeuchi, Asymptotic Efficiency of
Statistical Estimators. VIII, 242 pages, 1981. Vol. 28: S. Amari, Differential-Geometrical Methods in Statistics.
V, 290 pages, 1985.
Vol. 8: The First Pannonian Symposium on Mathematical Statis-
tics. Edited by P. RtSvCsz, L. Schmetterer, and V.M. Zolotarev. VI, Vol. 29: Statistics in Ornithology. Edited by B.J.T. Morgan and
308 pages, 1981. P.M. North. XXV,418 pages,1985.
Vol. 10: AA. McIntosh, FittingLinear Models: An Application of Vol. 31: J. Pfanzagl. Asymptotic Expansions for General Statisti-
Conjugate Gradient Algorithms. VI, 200 pages, 1982. cal Models. VII, 505 pages, 1985.
Vol. 11: D.F. Nicholls and B.G. Quinn, Random Coefficient Vol. 32: GeneralizedLinear Models. Proceedings,1985. Edited by
Autoregressive Models: An Introduction. V,l54pages,1982 R. Gilchrist, B. Francis and J. Whittaker. VI, 178 pages, 1985.
Vol. 12: M. Jacobsen, Statistical Analysis of Counting Processes. Vol. 33: M. Csargo, S. Cs6rgo,L. Horvath,AnAsymptotic Theory
VII, 226 pages, 1982 for Empirical Reliability and Concentration Processes. V, 171
pages, 1986.
Vol. 13: J. Pfanzagl (with the assistance of W. Wefe1meyer),
Contributions to a General Asymptotic Statistical Theory. VII, Vol. 34: D.E. Critchlow, Metric Methods for Analyzing Partially
315 pages, 1982 Ranked Data. X, 216 pages, 1985.
Vol. 14: GUM 82: Proceedings oftheIntemational Conference on Vol. 35: Linear Statistical Inference. Proceedings, 1984. Edited by
Gene.ralised Linear Models. Edited by R. Gilchrist. V, 188 pages, T. Calinski and W. Klonecki. VI, 318 pages, 1985.
1982
Vol. 36: B. Matern, Spatial Variation. Second Edition. 151 pages,
Vol. 15: KR.W. Brewer and M. Hanif, Sampling with Unequal 1986.
Probabilities. IX,l64pages, 1983.
Vol. 37: Advances in Order Restricted Statistical Inference. Pr0-
Vol. 16: Specifying Statistical Models: From Parametric to Non- ceedings, 1985. Edited by R. Dykstra, T. Robertson and F.T.
Parametric, Using Bayesian or Non-Bayesian Approaches. Edited Wright. VIII, 295 pages, 1986.
by J.P. Plorens, M. Moucbart, J.P. Raoult, L. Simar, and A.F.M.
Smith, XI, 204 pages, 1983. Vol. 38: Survey Research Designs: Towards a Better Understand-
ing of Their Costs and Benefits. Edited by R.W. Pearson and R.F.
Vol. 17: I.V. Basawa and D1. Scott, Asymptotic Optimal Infer- Boruch. V, 129 pages, 1986.
ence for Non-Ergodic Models. IX, 170 pages, 1983.
Vol. 39: J.D. Malley, Optimal Unbiased Estimation of Variance
Vol. 18: W. Britton, Conjugate Duality and the Exponential Components. IX, 146 pages, 1986.
Fourier Spectrum. V, 226 pages, 1983.
Vol. 40: H.R. Lerche, Boundary Crossing of Brownian Motion. V,
VoJ. 19: L. Fernholz, von Mises Calcu!us For StatisticalFunctionals. 142 pages, 1986.
VIII,l24pages, 1983.
Vol. 41: F. Baccclli, P. Bremaud, Palm Probabilities and Station-
Vol. 20: Mathematical Learning Models - Theory and Algo- ary Queues. VII, 106 pages, 1987.
rithms: Proceedings ofa Conference. Edited by U.Herlc:enrath, D.
Kalin, W. Vogel. XIV, 226 pages, 1983. Vol. 42: S. Kullback, J.C. Keegel, J.H. Kullback, Topics in
Statistical Information Theory. IX,158 pages,1987.
Vol. 21: H. Tong, Threshold Models in Non-linear Time Series
Analysis. X, 323 pages, 1983. Vol. 43: B.C. Arnold, Majorization and the Lorenz Order: A Brief
Introduction. VI, 122 pages, 1987.
VoL 44: DL. McLeish, Christopher G. Small, The Theory and Vol. 65: A. Janssen, D.M. Mason, Non-Standard Rank Testa. VI,
Applications of Statisti.caJ. Inference Functions. VI, 124 pages, 252 pages, 1990.
1987.
Vol. 66: T. Wright, Exact Confidence Bowds when Sampling
VoL 45: JX Ghosh (Ed.), Statistical Infotmation and Likelihood. from Small Finite Universes. XVI, 431 pages, 1991.
384 pages, 1988.
Vol. 67: M.A. Tanner, Tools for Statisti.caJ. Inference: Observed
VoL 46: H.-G. MIUler, Nonpammetric Regression Analysis of Data and Data Augmentation Methods. VI, 110 pages, 1991.
Longitudinal Data. VI, 199 pages, 1988.
Vol. 68: M. Taniguchi,HlgherOrderAsymptoticThcoryforTlII1e
VoL 47: A.J. Getson, F.e. Hsuan, (2)-Invemes and Their Statis- Series Analysis. vm.160 pages, 1991.
tical Application. vm, 11 0 pages, 1988.
Vol. 69: N.J.D. Nagelkedte, Maximum Likelihood Estimation of
VoL48:GL.Bretthorst,BayesianSpectmmAnalysisandParam- FunctionalRelationshipa. V,110pages, 1992.
eter Estimation.xu.
209 pages, 1988.
Vol. 70: L fida, Studies on the Optimal Search Plan. vm. 130
VoL 49: SL. Laurllzc:n, Extremal Families and Systems of Suffi- pages, 1992.
cient Statistics. XV, 268 pages, 1988.
Vol. 71: E.M.R.A. Engel. A Road to Randomness in Physical
VoL SO: O.E. Bamdorff-Nie1sen, Parametric Statistical Models Systems. IX,155 pages,l992.
and Likelihood. vn. 276 pages, 1988.
Vol. 72: JX Lindsey, The Analysis of Stochaatic Processes using
Vol. 51: J. HOsler, R.-D. Reiss (Eds.), Extreme Value Theory. GUM. VI, 294 pages, 1992.
Proceedings, 1987. X, 279 pages,1989.
Vol. 73: B.e. Arnold, E. Caatillo, J.-M. Sarabia, Conditionally
VoL 52: PX God, T. Ramalingam, The Matching Methodology: Specified Distributions. xm, 151 pages, 1992.
Some Statistical Properties. vm, 152 pages, 1989.
Vol. 74: P. Barone, A. Frigessi, M. Piccioni, Stochastic Mode1a,
Vol. 53: B.C. Arnold, N. Balakrishnan, Relations, BOWlds and Statistical Methods, and Algorilhms in Image Analysis. VI, 258
Approximations for Order Statistics. IX, 173 pages, 1989. pages, 1992.
Vol. 54: LR. Shah,B.L Sinha, Theory ofOptimalDesigns. vm, Vol. 75: PX Goe1, N.S. Iyengar (Eds.), Bayesian Analysis in
171 pages,1989. Statistics and Econometrics. XI, 410 pages, 1992.
Vol. 55: 1.. McDonald, B. Manly, I. Lockwood, J. Logan (Eds.), Vol. 76: 1.. Bondesson, Generalized Gamma Convolutions and
EstimationandAnalysis ofInsectPopulatioos. Proceedings, 1988. Related Classes of Distributions and Densities. vm. 173 pages,
XlV,492pages, 1989. 1992.
VoL 56: JX Lindsey, The Analysis of Categorical Data Using Vol. 77: E. Mammen, When Does Bootstrap Watk? Asymptotic
GLIM. V, 168 pages, 1989. Results and SimulatioDS. VI, 196 pages, 1992.
Vol. 57: A. Decarli, B.J. Francis, R. Gi.lcIuist, G.U.H. Seeber Vol. 78: L. Fahnnelr, B. Francis, R. Gi.lcIuist, G. Tutz (Bds.),
(Bds.), Statistical Modelling. Proceedings. 1989. IX, 343 pages, Advances in GUM and Statisti.caJ. Modelling: Proceedings of the
1989. GLIM92 Conference and the 7th International Wcnkahop on
Statistical Modelling. Munich, 13-17 July 1992. IX, 22S pages,
Vol. 58: O.E.Bamdorff-Nie1sen,P.Blcsilcl,P.S.Eriksen,Decom- 1992.
position and Invarlance of Measures, and Statistical Transfcmna-
tion Models. V,147 pages, 1989. Vol. 79: N. Schmitz, Optimal Sequentially Planned Decision
Procedures. XU, 209 pages, 1992.
Vol. 59: S. Gupta, R. Mukerjee, A Calculus for Factoria1Arrange-
ments. VI, 126 pages, 1989. Vol. 80: M. Fligner, J. Verducci (Eds.), Probability Mode1a and
Statisti.caJ. Analyses for Ranking Data. xxn, 306 pages, 1992-
Vol. 60: 1.. Gyikfi, W. HlIrdle, P. Sarda, Ph. Vl.eU, Nonpammetric
Curve Estimation from Time Series. vm. 153 pages, 1989. Vol. 81: P. Spirtes, C. Glymour, R. Schc.ines, Causation, Predic-
lion, and Search. xxm, 526 pages, 1993.
Vol. 61: J. Bxeckling, The Analysis of Directional Tune Series:
Applications to Wind SpeedandDirection. vm,238 pages, 1989. Vol. 82: A. Komste1ev and A. Tsybakov, Minimax Theory of
Image Reconstruction. XU, 268 pages, 1993.
Vol. 62:I.C. Akkcrbocm, TestingProb1ems with Linear or Angu-
lar Inequality Consuainta. XU, 291 pages, 1990. Vol. 83: e. Gatsonis, J. Hodges, R. Kass, N. Singiurwalla (Eds.),
Case Studies in Bayesian Statistics. XU, 437 pages, 1993.
Vol. 63: J. Pfanzagl. Estimation in Scmiparametric Mode1a: Some Vol. 84: S. Yamada, Pivotal Measures in Statistical Experiments
Recent Developments. m, 112 pages, 1990. and Sufficiency. VII,129 pages,1994.
Vol. 64: S. Gabler, Minimax Solutions in Sampling from Finite Vol. 85: P. Doukhan, Mixing: Properties and Examples. XI, 142
Populations. V,132pages,l990. pages, 1994.
General Remarks