Professional Documents
Culture Documents
Distribuição: IMPA
Estrada Dona Castorina, 110
22460-320 Rio de Janeiro, RJ
E-mail: ddic@impa.br
ISBN: 978-85-244-0411-5 http://www.impa.br
i i
i i
i i
i i
i i
i i
i i
Contents
3 Estimation concepts 33
3.1 Empirical estimates . . . . . . . . . . . . . . . . . . . . 33
3.2 Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Functional estimation . . . . . . . . . . . . . . . . . . 38
3.4 Division trick . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 A semi-parametric test . . . . . . . . . . . . . . . . . . 50
4 Stationarity 53
4.1 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Spectral representation . . . . . . . . . . . . . . . . . . 57
4.3 Range and spectral density . . . . . . . . . . . . . . . 64
4.3.1 Limit variance . . . . . . . . . . . . . . . . . . 67
4.3.2 Cramer Wold representation . . . . . . . . . . . 69
4.4 Spectral estimation . . . . . . . . . . . . . . . . . . . . 70
i i
i i
i i
2 CONTENTS
i i
i i
i i
CONTENTS 3
i i
i i
i i
4 CONTENTS
Appendices 238
A 239
A.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . 239
A.1.1 Notations . . . . . . . . . . . . . . . . . . . . . 239
A.1.2 Random variables . . . . . . . . . . . . . . . . 240
A.2 Distributions . . . . . . . . . . . . . . . . . . . . . . . 247
A.2.1 Normal distribution . . . . . . . . . . . . . . . 247
A.2.2 Multivariate Gaussians . . . . . . . . . . . . . . 251
A.2.3 γ−distributions . . . . . . . . . . . . . . . . . . 252
A.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . 256
A.3.1 Convergence in distribution . . . . . . . . . . . 256
A.3.2 Convergence in probability . . . . . . . . . . . 259
A.3.3 Almost sure convergence . . . . . . . . . . . . . 260
i i
i i
i i
CONTENTS 5
Objectives
Time series appear naturally with data sampled in the time but many other
physical situations let also appear such evolutions indexed by integers.
We aim at providing some tools for the study of such statistical models.
The purpose of those lectures is introductory and definitely no systematic
study will be proposed here.
Those notes are divided into 3 Parts including each 4 Chapters and an
Appendix.
1. Independence and Stationarity.
Even wether this part addresses mainly items of the independent
world, the choice of subjects is biased and motivated by the fact
that they easily extend to a dependent setting.
(a) Independence.
This is a main concept in those notes so we set some simple
comments concerning independence as a separate chapter. For
instance we mention all the elementary counter-examples in-
voking independence. Other examples relating orthogonality
with independence may be found in the Chapter 8 and in Ap-
pendix, § A.2.1.
(b) Gaussian convergence and moments.
A special emphasis is set on Lindeberg method with easily ex-
tends to a dependent setting. Applications of the central limit
theorems are proved in the independent setting. Moment and
exponential inequalities related to Gaussian convergence are
also derived.
(c) Estimation concepts.
Classical estimations techniques, as empirical ones, contrasts
and non-parametric techniques are introduced. Kernel density
estimates are described with some details as an application of
previous results in view of their extension to time series in a
further Chapter.
(d) Stationarity.
The notions of stationarity are essential for spectral analysis
of time series. [Brockwell and Davis, 1991] use filtering tech-
niques in order to return to such a simple stationary case.
Indeed this assumption is not naturally observed. Weak and
strong stationarity are considered together with examples.
i i
i i
i i
6 CONTENTS
i i
i i
i i
CONTENTS 7
i i
i i
i i
8 CONTENTS
Paul Doukhan
i i
i i
i i
Part I
Independence and
stationarity
i i
i i
i i
i i
i i
i i
Chapter 1
Independence
P(A ∩ B) = P(A)P(B).
11
i i
i i
i i
12 [CHAP. 1: INDEPENDENCE
i i
i i
i i
13
i i
i i
i i
14 [CHAP. 1: INDEPENDENCE
• Cov (X, Y ) = 0,
• moreover if |X| is not a.s. constant then X, Y are not indepen-
dent.
An important use of this Exercise in provided in Exercise 23.
Quote that either Gaussian or associated vectors fit the same property
see in Appendix A.2.1, and Chapter 8 respectively.
This Exercise 3 admits tight assumptions as suggests the following:
Exercise 4. Exhibit random variables X ∈ {0, ±1}, Y ∈ {0, 1} not
independent, but orthogonal anyway, i.e. Cov (X, Y ) = 0.
i i
i i
i i
15
i i
i i
i i
i i
i i
i i
Chapter 2
Gaussian convergence
and inequalities
17
i i
i i
i i
bounds:
k
X
E |Ui |2 (kg 00 k∞ ∧ (kg 000 k∞ |Ui |)) .
|E(g(U ) − g(V ))| ≤ 3
i=1
k
X
≤ 3 kg 00 k1− 000
∞ kg k∞ E|Uj |2+ .
i=1
i i
i i
i i
Then:
L
X
ζn,k →n→∞ N (0, σ 2 ).
k
Proof. In the first inequality from Lemma 2.1.1, set Uk = ζn,k1I{|ζn,k |≤}
for a convenient > 0 then the first assumption implies that
X
2
sup Eζn,k ≡ C < ∞,
n
k
Pn
and thus setting ζn = k ζn,k , we derive
n
X
E|Uk |3 ≤ C · .
k=1
i i
i i
i i
i i
i i
i i
√
L 1
n(Mn − M ) →n→∞ N 0, 2 .
4γ
√ √
Proof. Notice that P( n(Mn − M ) ≤ x) = P(Mn ≤ M + x/ n) is
the probability that N + 1 observations
√ Yi (among the n = 2N + 1
considered) satisfy Yi ≤ M + x/ n:
n
√ X
P( n(Mn − M ) ≤ x) = P 1I{Yi ≤M +x/√n} ≥ N + 1 .
i=1
√
Setting pn = P(Y1 ≤ M + x/ n) and
1I{Yi ≤M +x/√n} − pn
Xi,n = p ,
npn (1 − pn )
yields
n
√ X N + 1 − npn
P( n(Mn − M ) ≤ x) = P sn ≤ Xi,n , sn = p .
i=1
npn (1 − pn )
and non decreasing on R+ ). For each k > 0 there exists Mk > 0 that we may
choose non decreasing and such that E|X0 |1I{|X0 |≥M }≤ 1 . Set H(x) = kx2 for
k k2
Mk ≤ |x| < Mk+1 .
i i
i i
i i
with S − np
n
∆n,p (u) = Pp p ≤ u − Φ(u).
np(1 − p)
2.0
0.5
1.5
0.4
Density
Density
0.3
1.0
0.2
0.5
0.1
0.0
0.0
4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8
X1 Y1
i i
i i
i i
i i
i i
i i
(k) 1 x kf (k) k∞
fu,η (x) = k
u+ ≤ , for k = 0, 1, 2 or 3.
η η ηk
For the second function k may be chosen arbitrarily large.
This allows to conclude.
i i
i i
i i
Now from centering conditions we see that terms T vanish except for
cases when i1 = i2 , . . . , i2p−1 = i2p , since else an index i would be
isolated and the corresponding term vanishes by using independence.
Among A = {i2 , i4 , . . . , i2p } which take precisely np values one needs
to make summations according to Card(A).
If all those indices are equal T = EX02p and there are n such terms,
i i
i i
i i
i i
i i
i i
Hints.
1.
n
X k k n−k k
gn (x) = x (1 − x) g .
n n
k=0
1
2. Prove that x(1 − x) ≤ 4 if 0 ≤ x ≤ 1.
3. Set t > 0 arbitrary, an An,p = (|Sn,p − p| > t) then:
kgk∞
kgn − gk∞ ≤ + ctγ .
2nt2
γ
kgk∞
Setting t2+γ = 2cn provides a rate n− 2+γ .
2m −m
5. From Lemma 2.2.1 E(Sn,x − gn (x))2m ≤ cEX1,x n .
6. Now
ckgk∞
kgn − gk∞ ≤ + ctγ
2nt2m
mγ
set t2m+γ = C
nm then a rate is n− 2m+γ is now provided.
i i
i i
i i
Analogously we obtain:
Lemma 2.2.2 (Hoeffding). Let R1 , . . . , Rn be independent Rademacher
random variables (i.e. P(Ri = ±1) = 12 ).
For real numbers a1 , . . . , an set
n
X n
X
ξ= ai Ri , and we assume that a2i ≤ c.
i=1 i=1
i i
i i
i i
Then:
x2
1. P(ξ ≥ x) ≤ e− 2c , for all x ≥ 0,
x2
2. P(|ξ| ≥ x) ≤ 2e− 2c , for all x ≥ 0, and
ξ2
3. Ee 4c ≤ 2.
1 s 2
e + e−s ≤ es /2 .
ch s ≡
2
Indeed the two previous functions may be expanded as analytic func-
tions on the real line, R, and:
∞ ∞
X s2k 2 X s2k
ch s = , es /2
= k
.
(2k)! 2 · k!
k=0 k=0
(k + 1)(k + 2) · · · (k + k) ≥ (2 · 1)(2 · 1) · · · (2 · 1) = 2k .
i i
i i
i i
Here Fubini-Tonnelli justifies the first inequalities while the last in-
equality is consequence of relation 2).
1 t
EetR ≤ e + e−t ,
2
P
thus Hoeffding instantaneously extends to sums i ai Ri for Ri with
values in [−1, 1], centered independent random variables.
n
X
If ξ = Yi then for each x ≥ 0 the Bennett inequality holds:
i=1
P(|ξ| ≥ x) ≤ 2e− 2V B (
x2 Mx
V ) , with B(t) = 2 ((1 + t) log(1 + t) − t) .
t2
i i
i i
i i
The first inequality follows from EYi = 0 and |EYik | ≤ M k−2 EYi2 for
each k > 1.
Both from independence and from Markov inequality we then obtain:
P(ξ ≥ x) ≤ eV g(t)−xt .
t2
(1 + t) log(1 + t) − t ≥ .
2(1 + t/3)
i i
i i
i i
satisfies moreover
1
f 0 (0) = 0, and f 00 (t) = ((1 + t) log(1 + t) − t) ≥ 0,
3
entails f (0) = 0 and f 0 (t) ≥ 0,
and
Z ∞
Eg(|ξ|) ≤ g(3V /M ) + 8M/3 g 0 (4M x/3)e−x dx,
9V /4M 2
with x = 3z/4M .
Hence if g(x) = |x|p for some p > 0,
Z ∞
p
Eg(|ξ|) ≤ (3V /M ) + 2p(4M/3) p
xp−1 e−x dx.
9V /4M 2
i i
i i
i i
Chapter 3
Estimation concepts
1
X= (X1 + · · · + Xn ) →n→∞ EX0 , a.s.
n
33
i i
i i
i i
0.7
0.6
Y/N
0.5
0.4
Remark 3.1.1. Quote the the ergodic Theorem 9.1.1 proves that the
simple assumption E|X0 | < ∞ ensures indeed this SLLN. We set
this result as a simple consequence of the previous Marcinkiewicz-
Zygmund inequality in Lemma 2.2.1 for clarity of the exposition.
Convergence in this LLN is simulated in the Figure 3.1
Proof. Let > 0 be arbitrary then Markov inequality entails
EX04
P(|X| ≥ ) ≤ C .
4 n 2
Thus X
P(|X n | ≥ ),
n
i i
i i
i i
1.0
0.8
0.6
Fn(x)
0.4
0.2
0.0
-2 -1 0 1 2 3
i i
i i
i i
|Fn (x)−F (x)| ≤ |Fn (x)−Fn (xi )|+|Fn (xi )−F (xi )|+|F (xi )−F (x)| <
3.2 Contrasts
Assume that an independent identically distributed sample with val-
ues in a Banach space E and admits a marginal distribution in a class
(Pθ )θ∈Θ .
i i
i i
i i
θ(X)
b = Argminθ∈Θ ρ(X, θ) (3.1)
If Θ ⊂ Rd is open and such that the function θ 7→ ρ(X, θ) be differ-
entiable the estimate θ(X)
b of the parameter θ0 satisfies
∇ρ(X, θ(X))
b =0 (3.2)
(usually this is easier to check than (3.1)).
i i
i i
i i
i i
i i
i i
where
n
1X
cj =
b ej,m (Xi )
n i=1
empirical unbiased estimate of Eej,m (X0 ) (which means that the mean
of those estimates is the coefficient to be fitted).
Such estimators are empirical estimators of the orthogonal projection
fm of f of the vector space spanned by (ej,m )1≤j≤m ; they are know as
projection density estimations of f . In order to make them consistent
one needs to choose a sequence of parameter m ≡ mn ↑ ∞. Such
general classes of estimates are thus reasonable and may be proved to
work.
Anyway we propose to develop an alternative smoothing technique
since in this case an asymptotic expansion of the bias may be exhib-
ited (quote that the wavelet type estimator corrects this real problem
of projection estimators) .
Let gh be an approximation of the Dirac measure as h ↓ 0, we derive
Fn ?gh (x) to get the forthcoming kernel type estimates of the density:
Definition 3.3.1. Let (Yn ) be a real valued and independent identi-
cally distributed sequence such that Y0 admits a density f on R. If
K : R → R denotes a function such that:
Z Z
(1 + |K(y)|)|K(y)|dy < ∞, K(y)dy = 1,
R R
a kernel estimator of f is defined through a sequence h = hn →n→∞ 0
by:
n
1 X Yj − y
f (y) =
b K .
nh j=1 h
Figure 3.3 reports competitive behaviors for an histogram and for a
kernel density estimate (n = 32, h = 2.477). A first result allows to
bound the bias of such estimators:
Lemma 3.3.1. Let g denote a bounded density for some probability
distribution with moments up to order p ∈ N∗ , then there exists a
polynomial P with degree ≤ p such that K = P g is a kernel satisfying
Z
1, if j = 0, p,
y s K(y)dy =
R 0, if 1 ≤ j < p.
i i
i i
i i
0.06
6
5
0.04
Frequency
Density
4
3
0.02
2
1
0.00
0
10 15 20 25 30 10 20 30 40
i i
i i
i i
i i
i i
i i
n
(nhn) 2
Remark 3.3.4. In fact better results can be proved by using the Bern-
stein exponential inequality but the present section was only introduc-
tory in order to provide some statistical applications to be developed
later under dependence.
i i
i i
i i
and r Z
h
EZj2 ∼ f (y) K 2 (s) ds,
n
completing the proof of the first inequality. p
The moment inequality relies on the fact that setting u = t/ f (x)
yields:
Z Z
2 p+1 2
2 tp e−t /(cf (x)) dt = 2f (x) 2 up e−u /c du < ∞.
i i
i i
i i
i i
i i
i i
One may indeed check that each term in the sum above admits a
variance equivalent to its second moment and the usual change
in variable u = y + th yields:
2
Yj − y
Z
0
E K = h K 02 (t)f (y + th) dt
h
Z
∼h→0 hf (y) K 02 (t) dt.
We derive analogously
0 Yj − y
EK = O(h)
h
g 0 f − gf 0
r0 =
f2
i i
i i
i i
In the the general case for the previous relation (3.6) to hold we prove:
i i
i i
i i
1 z z2
=1+ =1+z+
1−z 1−z 1−z
thus since 0 ≤ a ≤ 1:
|z|2
1 |z|
1 − z − 1 ≤ max(|z| + ,| )
|1 − z| |1 − z|
|z| max(1, |z|)
≤ |z| +
|1 − z|
|z|1+a
≤ |z| + |
|1 − z|
i i
i i
i i
N/D that
Rn − Nn + 1 |Nn − N |
|Rn − R| ≤ D D
1 1 1
= |Nn |
− + |Nn − N |
Dn D D
|Nn | D 1
= − 1 + |Nn − N |
D Dn D
|Nn | 1 1
= − 1 + |Nn − N |
D 1−z D
|zNn | 1
= + |Rn ||z|1+a + |Nn − N |
D D
Thus
kRn − Rkm ≤ A+B+C
1
A = k(Dn − D)Nn km
D2
1
B = kRn |Dn − D|1+a km
D2+a
1
C = kNn − N km
D
and for constants denoted c, c0 , c00 . . . > 0:
N 1
A ≤ 2
kDn − Dkm + 2 k(Dn − D)(Nn − N )km
D D
≤ c(vn + k(Dn − D)(Nn − N )km )
≤ c(vn + kNn − N kp kDn − Dkq )
≤ c0 (vn + vn2 ) (since p, q ≥ 2m)
000
≤ c vn
Now quote that since (3.7) writes Rn as a convex combination, we
obtain |Rn | ≤ max1≤i≤n |Ui |; thus an idea of Gilles Pisier entails:
s s
E|Rn |s ≤ (E|Rn |t ) t ≤ (nM t ) t , for 1 ≤ s ≤ t.
Now use Hölder inequality writes
1 1
kY Zkm ≤ kY kum kZkvm , if + = 1, (3.10)
u v
i i
i i
i i
thus
ms (m − p)s + pm
1+a= =⇒ a = ,
p(s − m) p(s − m)
1
we need vna n t = O(1), the result follows since m ≤ q, C ≤ vn /D.
i i
i i
i i
it should imply:
1
Hint. First conditions (3.8) follow from independence, and vna n t =
1 a
n t − 2 if bounded in case
2p(s − m)
t≥ .
(m − p)s + pm
Z
2
θbn = fn,h (x)w(x)dx (3.11)
i i
i i
i i
θbn − θ = θb − θn + θn − θ
Zn
2
= (fn,h (x) − Efn,h (x)) w(x) dx
Z
+ (fn,h (x) − Efn,h (x)) (2Efn,h (x))w(x)) dx
Z
2
(x) − f 2 (x) w(x)dx
+ Efn,h
Z
2 1
= (fn,h (x) − Efn,h (x)) w(x) dx + O + h2
nh
Proof. Set v(x) = 2f (x)w(x), the remarks above yield to the study
of
Z n
1X
(fn,h (x) − Efn,h (x)) v(x) dx = (v(Xi ) − Ev(Xi ) + ∆i − E∆i )
n i=1
i i
i i
i i
comparable conditions.
0
For this we leave as an exercise that fn,h is also a convergent
0
estimate
√ of f and is asymptotically Gaussian (with normaliza-
tion nh3 ).
Differentiability of the map (u, v) 7→ u2 /v yields an affine ap-
0
proximation of this non linear functional of the couple fn,h , fn,h .
• Using bivariate iid samples (Xn , Yn ) yields estimation of the
regression function:
D(f, g, f 0 , g 0 , f 00 , g 00 )
r00 = = 0,
f3
i i
i i
i i
Chapter 4
Stationarity
1400
1200
Annual Flow
1000
800
600
Year
Some bases for the theory of time series are given below. Time series
are sequences (Xn )n∈Z of random variables defined on a probability
space (always denoted by (Ω, A, P)) and with values in a measured
space (E, E).
Another extension mainly avoided in these notes is that of random
fields (Xn )n∈Zd .
This means that we will never hesitate to assume that sequences of
independent random variables can be defined on the same probability
53
i i
i i
i i
54 [CHAP. 4: STATIONARITY
space.
A classical time series is used as a classically non-linear one, this
is Nile flooding data, see Figure 4.1 and some more financial data
are designed in Figures 4.2 (with daily and longer duration data) for
which stationarity seems more problematic.
Cours AGF sur un jour Cours de l’action AGF "en continu"
161 168
160.5 166
160
164
159.5
162
159
160
158.5
158
158
156
157.5
154
157
156.5 152
156 150
0 50 100 150 200 250 300 350 400 450 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
4.1 Stationarity
Definition 4.1.1 (strict stationarity). A random sequence (Xn )n∈Z
is strictly stationary if, for each k ≥ 0, the distribution of the vector
(Xl , . . . , Xl+k ) does not depend on l ∈ Z.
Definition 4.1.2 (weak stationarity). A random sequence (Xn )n∈Z
is second order stationary if EXl2 < ∞ and if only:
EXl = EX0 and Cov (Xl , Xk+l ) = Cov (X0 , Xk ), for each l, k ∈ Z.
We shall denote by m the common mean of Xn and by r(k) =
Cov (X0 , Xk ) the covariance of such a process.
In other words (Xn )n∈Z is strictly stationary if for each k, l ∈ N and
each function continuous and bounded h : Rk+1 → R:
i i
i i
i i
Under the Gaussian assumption we will see that both notions coin-
cide. Anyway this is not true in general.
0.04
0.02
−0.02
−0.04
−0.06
−0.08
−0.1
0 200 400 600 800 1000
i i
i i
i i
56 [CHAP. 4: STATIONARITY
i i
i i
i i
70
50
40
30
20
10
Year
8.18
8.16
Price of AGF-B.TO
8.14
8.12
8.10
8.08
i i
i i
i i
58 [CHAP. 4: STATIONARITY
If h : [−π, π] → R is continuous:
Z π Z π
h(λ)dG(λ) = h(λ)µ(dλ).
−π −π
Proof. Set
n−1 n−1 n−1
1 XX 1 X j
gn (λ) = rt−s e−i(t−s)λ = 1− rj e−ijλ ,
2πn s=0 t=0 2π n
j=−(n−1)
Rλ
and Gn (λ) = −π gn (u) du then relation (4.1) implies gn (u) ≥ 0 hence
Gn is continuous, non-decreasing and Gn (π) = r0 .
From a compactness argument, some subsequence Gn0 of Gn is con-
vergent (1 ). Note that dGn (λ) = gn (λ)dλ then
Z π
k
1− rk = eikλ dGn (λ).
n −π
i i
i i
i i
one computes
k
X
Cov (Xs , Xt ) = EXs Xt = rs−t = |aj |2 ei(s−t)bj .
j=1
Xn = ξn + aξn−1 .
i i
i i
i i
60 [CHAP. 4: STATIONARITY
in two steps:
• If g is a step function, g(λ) = gs for λs−1 < λ ≤ λs (with
−π = λ0 ≤ λ ≤ λS = π) 0 < s ≤ S, set
Z S
X
g(λ)dZ(λ) = gs Z([λs−1 , λs ]).
s=1
Notice that
Z π 2
X
E g(λ)dZ(λ) = gs gt EZ([λs−1 , λs ])Z([λt−1 , λt ])
−π s,t
i i
i i
i i
i i
i i
i i
62 [CHAP. 4: STATIONARITY
then the sequence (Zn (I))n≥1 is Cauchy in L2 (Ω, A, P) since for n >
m,
2
X Z b Z π
2 1 −iju
|hn −hm |2 dG.
E|Zn (I)−Zm (I)| = 2
E
Xj e du =
4π a −π
m<|j|≤n
2 Recall that monotonic functions admit limits on the left and on the right at
each point thus non-empty open intervals (f (x−), f (x+)) are disjointed in R.
Choose a rational number in each of them to derive this result.
i i
i i
i i
i i
i i
i i
64 [CHAP. 4: STATIONARITY
i i
i i
i i
σ2
g(λ) = ,
2π
then the sequence
Z π
ξn = einλ Z(dλ),
−π
σ2
Z([0, λ]) = W (λ),
2π
with W the Brownian motion. Here Gaussianness of the white
noise also implies its independence and it is an independent
identically distributed sequence (strict white noise).
If λ 7→ Z([0, λ]) admits independent increments, the sequence
ξn is again a strict white noise.
A weak white noise is associated with random spectral measures
with orthogonal increments.
• If
∞
X ∞
X
Xn = ck ξn−k , c2k < ∞,
k=−∞ k=−∞
i i
i i
i i
66 [CHAP. 4: STATIONARITY
i i
i i
i i
Proof. The first claim follows from the bilinearity properties of co-
variance :
X∞ X∞
Cov (Y0 , Yk ) = cj cj−m rk+m .
m=−∞ j=−∞
X n
n X n X
X n
2
E |X1 + · · · + Xn | = EXs Xt = rt−s
s=1 t=1 s=1 t=1
Thus:
n
X
2
E |X1 + · · · + Xn | = (n − |k|)rk . (4.2)
|k|<n
i i
i i
i i
68 [CHAP. 4: STATIONARITY
For each > 0 there exists K such that |rk | < for |k| > K.
Split the expression
X X
|k||rk | ≤ |k||rk | + n.
|k|<n |k|<K
quote that all the terms in the previous sum do not have the same
number of elements.
Namely the k−element of the sum is over n − |k| terms which makes
this estimate biased.
A variant of the previous estimate which is unbiased writes now as
mn n∧(n+k)
X 1 X
bn2 =
σ EXi Xi+k . (4.5)
n − |k|
k=−mn i=1∨k
i i
i i
i i
∞
X
Xn = EX0 + ck ξn−k . (4.6)
k=0
i i
i i
i i
70 [CHAP. 4: STATIONARITY
Series Nile
1.0
0.8
0.6
0.4
ACF
0.2
0.0
-0.2
-0.4
0 5 10 15 20 25
Lag
Definition 4.4.1. For a centered and second order stationary (Xt )t∈Z
define the periodogram:
n 2
1 X 1 X
In (λ) = Xk e−ikλ = rbn (`)e−ikλ .
2πn 2π
k=1 |`|<n
i i
i i
i i
i i
i i
i i
72 [CHAP. 4: STATIONARITY
i i
i i
i i
i i
i i
i i
74 [CHAP. 4: STATIONARITY
4.6 Subsampling
Besides model based bootstrap techniques in Section 11.3 this sec-
tion is aimed at explicating the specific features of resampling under
dependence.
Namely assume that a limit theorem holds for a sequence
tm (X1 , . . . , Xm ) →m→∞ T.
i i
i i
i i
i i
i i
i i
76 [CHAP. 4: STATIONARITY
K(g)
b →n→∞ Eg(T ) (in probability).
i i
i i
i i
Part II
77
i i
i i
i i
i i
i i
i i
Chapter 5
Gaussian chaos
79
i i
i i
i i
i i
i i
i i
Y = m + RZ ∼ Nd (m, Σ).
As an application:
(Γ(ti , tj ))1≤i,j≤n
satisfies (4.1) for all possible choices ti ∈ T, then there exists a Gaus-
sian process with covariance Γ.
i i
i i
i i
X n
n X n
X
|ti |2H ui uj = − |ti |2H ui u0
i=1 j=1 i=0
n
X
= − |ti − t0 |2H ui u0
i=0
Analogously
n X
X n n
X
|tj |2H ui uj = − |tj − t0 |2H u0 uj
i=1 j=1 j=0
hence
n X
X n
A=− |ti − tj |2H ui uj .
i=0 j=0
i i
i i
i i
B ∼ A, ↓ 0.
• Step 3. For each > 0 and H ∈ (0, 1], there exists a real
random variable ξ with
2H
φξ (t) = Eeitξ = e−|t|
= |s − t|2H .
Remark 5.1.3. Regularity properties of the fBm are clear from Fig-
ures 5.1, 5.3 representing its trajectories respectively for H = 0.3 and
0.9. while their derivatives are provided in 5.2 5.4: clearly bigger is
H and larger is the regularity. From the previous result in the latter
case derivatives almost exist while a ”real” white noise effect appears
for H = 0.3. Compare with real Gaussian noise in Figure A.2.
i i
i i
i i
f.B;m. (H=0.3)
1
0.5
−0.5
−1
−1.5
0 100 200 300 400 500 600 700 800 900 1000
f.G.n. (H=0.3)
0.5
0.4
0.3
0.2
0.1
−0.1
−0.2
−0.3
−0.4
0 100 200 300 400 500 600 700 800 900 1000
i i
i i
i i
f.B.m. (H=0.9)
0.35
0.3
0.25
0.2
0.15
0.1
0.05
−0.05
−0.1
−0.15
0 100 200 300 400 500 600 700 800 900 1000
−3 f.G.n. (H=0.9)
x 10
8
−2
−4
−6
0 100 200 300 400 500 600 700 800 900 1000
a>0
(Z(at))t∈R = (aH Z(t))t∈R , in distribution.
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
100
50
H n (x )
-50
n= 0
n= 1
n= 2
n= 3
n= 4
-100 n= 5
-6 -4 -2 0 2 4 6
dk Hk (x)
= k! hence EHk2 (N ) = k!
dxk
i i
i i
i i
First
Z
0, if l < k − 1
k Hk−1 (x)Hl (x)ϕ(x)dx =
k(k − 1)! = k!, if l = k − 1
i i
i i
i i
Z
= (−1)l+1 Hk (x)ϕ(l+1) (x)dx
Z
= Hk (x)Hl+1 (x)ϕ(x)dx
ϕ0 (x) = xϕ(x).
From the definition ϕ(k) (x) = (−1)k ϕ(x) hence the previous expres-
sion rewrites
Hk+1 (x) = xHk (x) − Hk0 (x).
Derive k times this relation with Leibniz formula, then
thus
Hk+1 (x) = xHk (x) − kHk−1 (x).
The formula follows from comparing the two previous expressions of
Hk+1 .
i i
i i
i i
Remark 5.2.3. If one would knows how to prove that this series
x 7→ g(x, z) converges in a more accurate space and if is derivable for
each z, then it would be easy to deduce
∂
g(x, z) = zg(x, z)
∂x
2
and the function x 7→ ezx−z /2 satisfies the same partial differential
equation.
In both cases Eg(N, z) = 1 implies (5.2) for all x ∈ R, z ∈ C.
A alternative proof of (5.3) would follow: unfortunately we do not
know such a result.
i i
i i
i i
Proofs.
i i
i i
i i
2. As the degree of Pn (x) − xPn−1 (x) is < n − 1 one may write its
expansion
1 ∞ dn
Z
Pk (x) n xn e−x dx
(Pn , Pk )p =
n! 0 dx
(−1)n ∞ (n)
Z
= Pk (x)e−x dx.
n! 0
i i
i i
i i
and
∞
X |gk |2
Eg(Y1 )g(Y2 ) = rk
k!
k=0
∞
X |gk |2 k
Cov (g(Y1 ), g(Y2 )) = r
k!
k=1
i i
i i
i i
P P k
Thus in case l |rl | < ∞, each series Rk = l rl converges (for
k ≥ 1) because |rl | ≤ r0 = 1 and
2
∞
n Rk |gk |2
X X
E g(Yj ) ∼ n = O(n)
j=1 k!
k=m(g)
If only
X
S≡ |rl |m(g) < ∞
l
with m(g) the Hermite rank in Definition 5.2.3 then the previous
claim still holds true; indeed all series Rk are then convergent for
k ≥ m(g).
Also remark that Cauchy-Schwarz inequality implies |r(`)| ≤ 1 ≡ EY02
thus
|r(`)|k ≤ |r(`)|m(g) if k ≥ m(g).
i i
i i
i i
This expression is
1 X
Var Fn (x) = O( ), as n → ∞, if |rl | < ∞.
n
l
P
If now l |rl | = ∞ its order of magnitude is
1 X |l|
Var Fn (x) = O 1− rl
n n
|l|<n
i i
i i
i i
i i
i i
i i
If Σ = (ri,j )1≤i,j≤k with ri,i = 1 we thus get the heat equation from
derivations:
∂f (y, Σ) ∂ 2 f (y, Σ)
= , if i 6= j.
∂ri,j ∂yi ∂yj
Also set
X
ni,j = nj,i , if i > j, and sn,i = ni,j
j6=i
then denoting by
k
Y
f (y, Ik ) = ϕ(yi ),
i=1
we get
i i
i i
i i
where
k
Y
φ(y) = ϕ(yi )
i=1
denotes the density function of a random vector Nk (0, Ik ) and the
previous sums extend to all integer multi-indices
n = (ni,j )1≤i<j≤k .
Indeed sn,i is the number of apparitions for yi in the second identity.
Relation (5.5) thus implies
Yk X rn Yk Z ∞
E Hsi (Yi ) = Hs (yi )Hsi (yi )ϕ(yi )dyi ,
i=1 n
n! i=1 −∞ n,i
for sums X
i i
i i
i i
Precisely the first line the arrays may be divided into k − 1 parts with
respective sizes n1,2 ,. . . , n1,k .
The number of such multi-indices is also the number of arrays satis-
fying the constraints sn,i = si .
if X
|rk |m < ∞, r(k) = EY0 Yk
k
i i
i i
i i
variance 21
Z ∞
kf k2 = f 2 (t)dt .
0
We aim at defining it first for step functions and noticing that for
such functions f 7→ I1 (f ) is an isometry L2 (R+ ) → L2 (Ω, A, P),
Z ∞ 12
2
1
kf k2 = f (t)dt) = EI1 (f )2 2 = kI1 (f )kL2 (Ω,A,P) .
0
Those spaces are naturally equipped with their natural Hilbert norms
Z
2
khkHk = h2 (t)dt1 · · · dtk ,
Rk
kSym(h)kHk ≤ khkHk .
i i
i i
i i
{(t1 , . . . , tk ) ∈ Rk , t1 ≤ · · · ≤ tk }
Hk (I1 (f )) = Ik (f ⊗k ),
i i
i i
i i
{W (t), t ≥ 0}.
f ⊗p g(t1 , . . . , tm+k−2p )
Z
= f (t1 , . . . , tk−p , s1 , . . . , sp )
Rp
× g(tk−p+1 , . . . , tk+m−2p , s1 , . . . , sp ) ds1 · · · dsp
Ito formula writes in this case as the forthcoming formula and the
other 2 formulas are also useful:
k∧m
X k m
Ik (f )Im (g) = p! Ik+m−2p (f ⊗p g).
p=0
p p
1 Many thanks to Ivan Nourdin for his essential and friendly help.
i i
i i
i i
(k + m)!
kSym(f ⊗ g)k2Hk+m = kf k2Hk kgk2Hm
k!m!
k∧m
X
k m
+ kf ⊗q g)k2Hk+m−2q .
q=1
q q
is equivalent to
lim kSym(fn ⊗p fn )k2H2(k−p) = 0, ∀p ∈ {1, . . . , k − 1}.
n→∞
We now present the most impressive rigidity result from this Nualart-
Peccati-Tudor theory.
Theorem 5.2.1. Assume that a sequence fn ∈ Hk satisfies
lim kfn kHk = 1,
n
then
in law
Ik (fn ) −→n→∞ N (0, 1) ⇐⇒ lim EIk4 (fn ) = 3.
n
i i
i i
i i
Now set
t2
ψn (t) = e 2 E(exp(itIk (fn )),
then
t2
ψn0 (t) = tψn (t) + ie 2 E(Ik (fn ) exp(itIk (fn ))
Z ∞
t2
2
= te E 1 −
2 Ik−1 fn (·, t) dt Ik (fn ) exp(itIk (fn ))
−∞
Then : Z ∞
t2
|ψn0 (t)| ≤ te 2 E 1 − 2
Ik−1 fn (·, t) dt .
−∞
Quote that Z ∞
2
E Ik−1 fn (·, t)dt = 1,
−∞
i i
i i
i i
i i
i i
i i
Chapter 6
Linear processes
implies that the previous series converge if E|ξ0 |m < ∞ for some
m ∈ (0, 1] (if E|ξ0 | < ∞ then m = 1), then this series converges in
probability.
1 Usually this is only a L2 −stationary white noise sequence and not an inde-
109
i i
i i
i i
i i
i i
i i
denoting e
c = (e
ck )k∈Z with e
ck = c−k .
Remark that !2
X X
|rk | ≤ |ck | ,
k k
thus this series converges in case
X
|ck |. < ∞
k
We thus obtain:
Proposition 6.1.1. Let (Xt ) be a linear process defined from eqn.
(6.1) (with iid inputs ξn ) then the above series converge a.s., this
process is stationary and in Lm in case either
X
E|ξ0 |m < ∞, |ck |m < ∞, 0 < m ≤ 1,
k
or it is causal and,
X
E|ξ0 |2 < ∞, |ck |2 < ∞, m = 2.
k
In the latter case the covariance of the process writes as in eqn. (6.2).
The series of covariances converges if
∞
X
|ck | < ∞.
k=0
operator.
i i
i i
i i
Using the backward operator B the previous causal models also write
∞
X
X = g(B)ξ, g(z) = ck z k , |z| < 1.
k=0
i i
i i
i i
is written
α(B)Xt = β(B)ξt
Trajectories of such ARMA models are reported in Figure 6.1.
10
−5
−10
−15
0 200 400 600 800 1000
Then:
i i
i i
i i
1 2 3
-1
-3
0.4
0.0
0 5 10 15 20 25 30
∞
X
Xt = cj ξt−j
j=0
∞
X β(z)
cj z j = ,
j=0
α(z)
i i
i i
i i
with
z z
α(z) = 1 − a1 z − · · · − ap z p = 1− ··· 1 − .
r1 rp
If the roots r1 , . . . , rp of the polynomial α are such that
|r1 | > 1, . . . , |rp | > 1
then the function 1/α is analytic if
|z| < min{|r1 |, . . . , |rp |}
and thus on a neighborhood of the closed complex unit disk. For
example
−1 X ∞
z
1− = r1−l z l .
r1
l=0
Moreover the analycity of the function β/α on some disk D(0, 1 + )
implies |ck | ≤ Ce−γk .
i i
i i
i i
R a = b
bp b rp
b2
σ a0 b
= rb0 − b rp
Remark 6.3.1 (ARMA case). Those equations also extend for ARMA
models, but besides the previous estimates CLT results are optimal
under AR–modeling, see again [Brockwell and Davis, 1991] Chapter
8.
Remark 6.3.2 (nonlinear models). Extensions to the case of weak-
white noise are used; for example nonlinear models such as ARCH
models are such white noises and a linear process with such input
may also be considered. In the forthcoming chapter we describe some
elementary versions of this idea.
Remark 6.3.3 (Durbin-Levinson algorithm).
From such estimation a plug-in one-step ahead prediction of the pro-
cess writes:
X a1 Xt−1 + · · · + b
bt = b ap Xt−p ,
once the parameters have been estimated from the data X0 , . . . , Xt−1 .
2-steps ahead predictions are similar by replacing now Xt by X bt in
the previous relation and:
X
bt+1 = b
a1 X a2 Xt−1 + · · · + b
bt + b ap Xt−p+1 .
i i
i i
i i
∆d Xt = ξt .
Xt = X0 + ξ1 + · · · + ξt ,
∆2 Xt = ∆(∆Xt ) = ξt ,
∆d Xt = ξt for d ∈ N.
then:
j
Γ(j + d) 1 Y k−1+d
bj = = . (6.5)
Γ(j + 1)Γ(d) Γ(d) k
k=1
i i
i i
i i
1
implies indeed bj ∼ Γ(d) j d−1 as j → ∞.
For − 2 < d < 2 , define the operators ∆±d . In order to ∆d out of
1 1
Moreover:
Γ(1 − 2d)
r(0) = σ2 , σ 2 = Eξ02 ,
Γ2 (1 − d)
Γ(k + d)Γ(1 − 2d) Γ(1 − 2d)
r(k) = σ2 ∼ σ2 |k|2d−1 as |k| → ∞.
Γ(k − d + 1) Γ(d)Γ(1 − d)
Thus
X 1
|r(k)| = ∞ ⇐⇒ d ∈ 0, .
2
k
i i
i i
i i
1 2 3 4
3
2
d=0.01
1
d=0.1
-1 0
-1
-3
-3
0 200 400 600 800 1000 0 200 400 600 800 1000
1 2 3
2
d=0.2
d=0.3
0
-1
-2
-3
-4
0 200 400 600 800 1000 0 200 400 600 800 1000
2
4
0
2
d=0.49
d=0.4
-2
0
-4
-2
-6
0 200 400 600 800 1000 0 200 400 600 800 1000
Set Z for the random spectral measure associated to the white noise
ξt .
i i
i i
i i
0.8
ACF
ACF
0.4
0.4
0.0
0.0
0 20 40 60 80 100 0 20 40 60 80 100
0.8
0.8
ACF
ACF
0.4
0.4
0.0
0.0
0 20 40 60 80 100 0 20 40 60 80 100
1.0
0.6
0.6
ACF
ACF
0.2
0.2
-0.2
-0.2
0 20 40 60 80 100 0 20 40 60 80 100
Then Z π
Xt = eitλ (1 − e−iλ )−d Z(dλ)
−π
and −2d
σ2 σ2
λ
gX (λ) = |1 − e−iλ |−2d = 4 sin 2
.
2π 2π 2
Remark 6.4.2 (Simulation). Such integral representations are used
i i
i i
i i
to simulate such models. For the case of Gaussian inputs the previ-
ous spectral process is Gaussian with independent increments which
makes the previous simulation trick possible by providing independent
random variables with a given distribution, see Remark A.1.4.
This idea extends to each process with independent increments as
Poisson unit process.
Another possibility to simulate such time series is clearly to truncate
the corresponding series. The problem is that simulation is approxi-
mative in this case.
If d < 21 the process is causal and well defined in case the roots of α
are not inside the unit disk.
It is invertible if d > − 21 and the roots of α are out of the unit disk.
Indeed in this case ξt = γ(B)Xt for a function γ analytic on the unit
disk
D(0, 1) = {z ∈ C/|z| < 1}.
Let again Z denote the random spectral measure associated to the
white noise ξt then
Z π
−d β(eiλ )
Xt = eitλ 1 − e−iλ Z(dλ).
−π α(eiλ )
Thus
σ2 β(eiλ ) 2
gX (λ) = |1 − e−iλ |−2d .
2π α(eiλ )
6.6 Extensions
Clearly for any meromorphic function γ : C → C without singularities
on D(0, 1) with finitely many singularities on the unit circle we mayo
define a process
Xt = γ(B)ξt .
i i
i i
i i
In case 1/γ satisfies the same assumptions then the process is re-
versible (zeros replacing singularities).
Singularities 6= 1 on the unit circle circle are called periodic long
range singularities.
Now let (ck,n )k,n∈Z be a sequence of real numbers, analogously to
eqn. (6.1) we may define non stationary linear processes from the
relation
X∞
Xn = ck,n ξn−k .
k=−∞
i i
i i
i i
Chapter 7
Non-linear processes
123
i i
i i
i i
For Volterra models with higher order Appell polynomials As (ξn ) re-
place ξn2 − σ 2 in order to take into account the repetitions in diagonal
terms.
i i
i i
i i
(k,j) 2
Exercise 18. Suppose (without loss of generality) that E ξn = 1,
then
(k)
EX0 Xn(l) = 0, if k 6= l,
2
(k) (k) (k)
X
EX0 X0 = cj1 ,...,jk ,
j1 <j2 <···<jk
(k) (k) (k)
X
EX0 Xn(k) = cj1 ,...,jk cn+j1 ,...,n+jk .
j1 <j2 <···<jk
Hence
A0 (x) = 1
A1 (x) = x − Eξ0
A2 (x) = x2 − 2Eξ0 x + 2(Eξ0 )2 − Eξ02
... ... ...
Ak (x) = xk + · · ·
i i
i i
i i
Set g = f P then
Z ∞
EAk (ξ0 )P (ξ0 ) = Ak (x)g(x)dx.
−∞
i i
i i
i i
for series which converge uniformly over compact subsets of the disk
Dτ .
Conversely for a sequence such that
cn = Eg (n) (ξ, )
∂
An ,...,nk (x1 , . . . , xk ) = ni An1 ,...,nk (x1 , . . . , xk ), 1≤i≤k
∂xi 1
EAn1 ,...,nk (ξ) = 1 if n1 + · · · + nk = 0 and 0 else.
i i
i i
i i
i i
i i
i i
Then previous remark implies that the main term in this equality is
a convergent as m ↑ ∞, and its Lp −norm is thus bounded above by
some A > 0.
Now this also entails (1 − α)kX0 kp ≤ A.
Hints.
1. From independence of ξt with Xt−1 and eqn. (7.1):
EX1 = aEξ0 , EX12 = E(a + bξ0 )2 EX02 + Eξ02 hence M (1 − (a2 +
b2 ) = 1, moreover C = EX0 X1 = aM .
2. The previous relations are rewritten accurately:
C = aM, M 2 (1 − (a2 + b2 )) = M 2 (1 − b2 ) − C 2 = M
i i
i i
i i
8
6
4
2
-2 0
0.4
0.0
0 10 20 30 40
Lag
hence M b2 = M 2 − C 2 − M , thus
r
M M 2 − C2 − M
a= , b=
C M
See results in § 7.3.3 for a formal justification and the con-
sistency of those estimators, namely the ergodic theorem ap-
plies to prove
√ a.s. consistency of those estimates (Corollary
9.1.3) and a n−CLT also applies to get asymptotic confidence
i i
i i
i i
Xn = h(ξn )Xn−1 + ξn
Xn = Hn Xn−1 + ξn
Xn = ξn + Hn Xn−1
= ξn + Hn (ξn−1 + Hn−1 Xn−2 ) = ξn + ξn−1 Hn + Xn−2 Hn Hn−1
= ξn + ξn−1 Hn + (ξn−2 + Hn−2 Xn−3 )Hn Hn−1
= ξn + ξn−1 Hn + ξn−2 Hn Hn−1 + Hn−2 Xn−3 Hn Hn−1
= ··················
P∞ Q
If k=0 kξn−k j<k Hn−j kp < ∞ then the following series is conver-
gent in Lp , and it defines a solution of the previous recursion:
∞
X Y
Xn = ξn−k Hn−j
k=0 j<k
In the case when Hn = (ξn , ξn−1 , . . . , ξn−r+1 ) then Hn Xn−1 are not
independent anymore which needs additional moment conditions,
Y Y k−r
Y
kξn−k Hn−j kp = kξn−k Hn−j kp k Hn−j kp
j<k k−r<j<k j=0
Y
(`−1)r
≤ kξn−k Hn−j kp kH0 kpr , if k = `r
k−r<j<k
i i
i i
i i
1 1
for q, q 0 ∈ [1, +∞] with + 0 = 1.
q q
7.2.2 LARCH(∞)−models
Theorem 7.2.1. Consider the general non-Markov model solution
of the recurrence equation of the LARCH(∞)−equation:
X∞
Xn = b0 + bj Xn−j ξn .
j=1
Under condition
∞
X
kξ0 kp |bk | < 1.
k=1
i i
i i
i i
If now the variables ξn are centered and admit a finite variance the
previous representation still holds in L2 if
∞
X
Eξ02 b2k < 1.
k=1
Xt = M (Xt−1 , ξt ). (7.2)
i i
i i
i i
is a (measurable) kernel
From a recursion
p
(n) (n+1)
p
E
U0 − U0
≤ anp E kM (u0 , ζ0 ) − u0 k
i i
i i
i i
(n)
Hence U0 → U0 (n → ∞) converges in Lp to a random variable
U0 ∈ Lp .
(n)
Moreover U0 is measurable wrt the σ−algebra generated by {ξt , t ≤
0} hence this is also the case for U0 . U0 may thus also be represented
as a function U0 = H(ξ0 , ξ−1 , . . .) of this sequence.
Then the sequence Xt = H(ξt , ξt−1 , ξt−2 , . . .) is a stationary solution
of the previous recursion.
(0) (1)
Now the sequences (Ut )t and (Ut )t , satisfy
(0)
U0 = u0 ,
(0)
U1 = M (u0 , ξ1 ) = H(ξ1 , 0, 0, . . .),
(0)
U2 = M (M (u0 , ξ1 ), ξ2 ) = H(ξ2 , ξ1 , 0, 0, . . .)
Analogously
(1)
Ut = H(ξt , ξt−1 , . . . , ξ1 , ξ0 , 0, 0, . . .).
p
(0) (1)
Hence γn = E1/p
Un − Un
≤ aγn−1 and
p
γn ≤ an γ0 = an E1/p kM (u0 , ζ0 ) − u0 k (7.5)
i i
i i
i i
7.3.1 AR-ARCH–models
Proposition 7.3.1. Let d = 1, E = R and set
and if p ≥ 1 in case
and the last rectangle term simply vanishes from Eξ0 = 0, this allows
to improve the previous bound, indeed
2 2 2
(Lip(A)) + Eξ02 (Lip(B)) ≤ (Lip(A) + kξ0 k2 Lip(B)) .
i i
i i
i i
10
−5
−10
−15
0 100 200 300 400 500 600 700 800 900 1000
Xn = A(Xn−1 ) + ξn .
Xn = B(Xn−1 )ξn .
i i
i i
i i
quote that the above recursion is just the simplest way to get
such conditional distributions for Gaussian innovations.
• ARCH(2)–models are solutions of equations
Xt = σt ξt , σt2 = α2 + β 2 Xt−1
2
+ γ 2 Xt−2
2
30
20
10
−10
−20
−30
0 200 400 600 800 1000
Xt = σt ξt , σt2 = α2 + β 2 Xt−1
2
+ γ 2 σt−1
2
.
This is clear through iterations that one may rewrite such mod-
els as
X∞
σt2 = α2 + βk2 Xt−1
2
,
k=1
i i
i i
i i
and such models have also be designed for financial purposes for
their clustering aspects and trajectories may be seen in Figure
7.3.
EXt2 = (β + γ 2 EXt−1
2
)kξt k22 = βkξt k22 + EXt−1
2
(2 ).
For the AR-LARCH models with centered inputs the limit condition
α2 + γ 2 Eξ02 = 1
analogously implies that any solution of this equation does not have
second order moment. Also there exists a Lp −solution of this equa-
tion in case p is small enough in case |ξ0 | is not constant. This is
simple to see if either α of γ = 0.
i i
i i
i i
Proof. |δ|kξ0 kp < 1 is the contraction constant in this case. Now the
solution of the equation is the limit of a polynomial in the innovations
and thus writes as a Bernoulli shift in Lp .
i i
i i
i i
1
0
-1
0.4
0.0
0 5 10 15 20 25
Lag
Zt = ξt σθ (Xt−1 )
i i
i i
i i
density writes
1 1 y2
πθ (x, y) = p exp(− )
2π(β + δx)2 2 (β + δx)2
the QMLE of (Zt ) is now the couple θ = (β, δ) minimizing the ex-
pression
n
X Zt2
Lθ (Z1 , . . . , Zn ) = + log(β + δZt−1 )2
t=2
(β + δZt−1 )2
i i
i i
i i
Proofs.
νβ
M = EZ02 = .
1 − νδ 2
` = νEZ0 (β + δZ0 )2
= ν(2βδEZ02 + δ 2 EZ03 )
= 2νβδEZ02 (7.8)
2ν 2 β 2 δ
=
1 − νδ 2
4. From independence
All the possible cases when moments exist have thus been considered.
i i
i i
i i
1
δb = 1 −
m
b
i i
i i
i i
m2 (1 − δ 2 )
=M
1 − νδ 2
Proof. Relations β, δ > 0 imply with its existence that ` > 0. Now
eqn. (7.8) together with νβ = M (1 − δ 2 ) entails `(1 − νδ 2 ) = 2δM
thus δ is the positive solution of the second order equation
ν`δ 2 + 2δM − ` = 0
i i
i i
i i
√
hence δ = −1 + 1 + ν`.
√ 2 M √
M
β= 1 − − 1 + 1 + ν` = 2 1 + ν` − (1 + ν`) .
ν ν
M
β=
ν(1 − νδ 2 )
ηβ(β 2 + 3δ 2 M )
P = .
1 − ηδ 3
(i) (j)
• Eξt ξt = 0 if i 6= j in case i, j ≥ 1 and
i i
i i
i i
(0)
• P(ξt ∈
/ {1, 2, . . . , D}) = 0.
If the functions M1 , . . . , MD are Lipschitz on R satisfying assump-
tions (7.3) and (7.4) with constants aj > 0 for each j = 1, . . . , D:
(j) (j)
kMj (u, ξ0 ) − M (v, ξ0 )kp ≤ aj ku − vk, (∀u, v ∈ Rd )
kMj (u, ξ0 )kp < ∞. (∃u ∈ Rd )
We set
XD
M u, z (1) , . . . , z (D) = Mj (u, z (j) )1(z
I (0) = j),
j=1
i i
i i
i i
Branching models
4
3
2
1
x
0
-1
-2
Time
i i
i i
i i
(1) (2)
• if D = 3 and ξ0 = 1 − ξ0 ∼ b(p) is again independent of the
(3)
centered random variable ξ0 ∈ L2 we get random regime mod-
(3) 2
els if A3 ≡ 1 and the contraction condition writes if E ξ0 <
∞ as
2 2
a = p (Lip(A1 )) + (1 − p) (Lip(A2 )) < 1
This model is defined through the recursion
(
(3) (1)
A1 (Xn−1 ) + ξn , if ξn = 1
Xn = (3) (1)
A2 (Xn−1 ) + ξn , if ξn = 0
i i
i i
i i
4
3
X
2
1
0
0 20 40 60 80 100
Time
thus
kM (y, ξ0 ) − M (x, ξ0 )kp ≤ a|y − x|.
Many other integer models write the same idea.
E.g. the bilinear model
Xt = a ◦ Xt−1 + b ◦ (Xt−1 ζt ) + ζt
i i
i i
i i
2
1
0
5 10 15 20
Time
Models (GLM) are derived produced from [Kedem and Fokianos, 2002].
Assume that (V (u))u∈U is a process defined on a Banach space U
equipped with a norm k · k and f : E × U → E is a function then
i i
i i
i i
i i
i i
i i
• M ((0, 0), P ) is a vector with a first random coordinate P(f (0, 0))
and and a deterministic second coordinate f (0, 0): it thus ad-
mits moments eg. with order 1.
•
thus
then relations (1 + )a < 1 and (1 + )b < imply together the
relation (7.3).
Then, some cases may be considered:
1
• the stability holds if Lip f < 2 (set = 1).
• if f (x, `) ≡ g(x) only depends on x (analogously to ARCH-
cases), the stability condition holds if Lip g < 1 ( = 0).
• if f (x, `) ≡ g(`) only depends on ` (analogously to the MA-
case), the stability condition holds if Lip g < 1 (with a large
).
Exercise 21. In equation (7.9) consider the function f (x, `) = a +
bx + c`. Assume that coefficients are such that a stationary solution
a
of the equation (Xt , λt )t exists in L2 . Then EX0 = Eλ= b+c 0. Set
µ = EX0 then µ = a + (b + c)µ. As before such considerations are
useful to fit the model.
i i
i i
i i
i i
i i
i i
with p
ζnp = E H ξ(n) − H ξ(n
b − 1) . (7.13)
b
i i
i i
i i
i i
i i
i i
Φ : Lp (ν + ) → Lp (ν + ), H 7→ K,
with
K(v0 , v1 , . . .) = M (H(v1 , v2 , . . .), v0 ).
Conditions (7.4) and (7.3) allow to prove that prove that K ∈ Lp (ν + )
if H ∈ Lp (ν + ) (for this condition wrt ξ0 and use triagular inequality).
Consider the fixed point e as an element of Lp :
kKkp = E1/p |M (H(ξ1 , . . .), ξ0 )|p ≤ E1/p |M (e, ξ0 )|p + akH − ekp .
kK − K 0 kp ≤ akH − H 0 kp .
i i
i i
i i
7.4.2 Coupling
This section explicits ways to couple such Bernouilli shifts. Decorre-
lation rates are also deduced. This will allow to derive quantitative
laws of large numbers for expressions of a statistical interest.
Those ideas are widely developed later in chapter 9 in order to un-
derstand how to derive limit theorems in distribution.
(p)
Definition 7.4.3. Assume that a Bernoulli shift satisfies limn δn =
0 with the above definition (7.14) then it will be called Lp −dependent.
Remark 7.4.2. Replace ξ(n)
e by
ξk , if |k| =
6 n,
ξ(n)
b k=
ξk0 , if |k| = n.
leads to the fruitful physical measure of dependence by Wei Biao Wu.
The two previous proposals are couplings in the sense that they leave
unchanged the marginal distribution of the Bernoulli shift.
Another alternative is to set
0 ξk , if |k| ≤ n,
ξ (n)k =
0, if |k| > n.
i i
i i
i i
Now factors 4 and 2 arise from the fact that covariances are expecta-
tions of a product minus the product of expectations: same bounds
are provided for both terms.
i i
i i
i i
δk,Y ≤ Lδk,X .
Proof. Set
1, if u ≤ x − ,
gx, = 0, for u ≥ x,
is affine, else.
Moreover
2 2
|δk,Y x,
− δk,Y x
| ≤ 2P(X0 ∈ [x − , x]) ≤ 2Cc .
3 Those are the simplest discontinuous functions. They are classes of functions
with only one singularity. More general functions with finitely many discontinu-
ities may be analogously considered.
i i
i i
i i
So
2 2
δk,Y x,
≤ δk,X /2 + 2Cc .
We then conclude with c+2 = δk,X
2
/(2C).
Remark 7.4.5. Quote that higher order moment inequalities are de-
rived from analogue ideas as in Chapter 12.
i i
i i
i i
i i
i i
i i
Chapter 8
Associated processes
8.1 Association
Definition 8.1.1. A random vector X ∈ Rp is associated if for
all measurable functions f, g : Rp → R, with E|f (X)|2 < ∞ and
E|g(X)|2 < ∞ such that f, g are coordinatewise non decreasing
Cov (f (X), g(X)) ≥ 0
Definition 8.1.2. A random process (Xt )t∈T is associated is the vec-
tor (Xt )t∈F is associated for each finite F ⊂ T.
163
i i
i i
i i
i i
i i
i i
i i
i i
i i
f1 (x) = a1 x1 + · · · + ap xp
−a1 (y1 −x1 )−· · ·−ap (yp −xp ) ≤ f (y)−f (x) ≤ a1 (y1 −x1 )+· · ·+ap (yp −xp ).
if those function are such that f (X), g(X), f1 (X), g1 (X) ∈ L2 and
f f1 , g g1 .
Proof. The 4 covariances
Cov f (X) + af1 (X), g(X) + bg1 (X)
i i
i i
i i
then:
u X
X v
|Cov (f (Y ), g(Z))| ≤ ai bj Cov (Yi , Zj ) (8.1)
i=1 j=1
Remark 8.3.1. We thus derive that for each associated random vec-
tor in L2 :
• Independence
If the vectors Y , Z admit pairwise orthogonal components then
they are stochastically independent as for the Gaussian case.
• Quasi-independence
u X
X v
|Cov (f (Y ), g(Z))| ≤ Lip f · Lip g Cov (Yi , Zj )
i=1 j=1
≤ uvLip f · Lip g max max Cov (Yi , Zj )
1≤i≤u 1≤j≤v
i i
i i
i i
holds for the stationary and associated process (Xn )n∈Z then
[nt]
1 X
√ Xk → σWt in the Skohorod space D[0, 1].
n
k=1
Note that the condition precisely extends that obtained for the inde-
pendent identically distributed case since it reduces to EX02 < ∞ in
this case. It cannot be improved which also makes such conditions
so attractive.
i i
i i
i i
Part III
Dependences
169
i i
i i
i i
i i
i i
i i
171
A first chapter in this part begins with the (central) ergodic theorem
which asserts that the strong law of large numbers (SLLN) works for
the partial sum process of most of the previously introduced models.
Assume that an unknown parameter for (Xn )n∈Z , a stationary se-
quence, is θ = EX0 , this writes as:
1
Xn = (X1 + · · · + Xn ) →n→∞ θ, a.s.
n
The question of convergence rates in this results is solved in the forth-
coming dependence types for stationary sequences.
Two additional chapter detail as much as possible more precise asymp-
totic results useful for statistical applications.
According to the fact that they are either LRD or SRD very different
asymptotic behaviors will be seen to occur including corresponding
rates.
nα (X n − θ) →n→∞ Z, in distribution
with α = 12 , or > 12 according to the fact that SRD or LRD holds.
Asymptotic confidence bounds may thus be derived. Namely set
τ > 0 a confidence level then in case there there exists t > 0 with
P(Z ≤ t) = 1 − τ , then:
This also yields to goodness of fit tests for this mean parameter θ.
i i
i i
i i
i i
i i
i i
Chapter 9
Dependence
173
i i
i i
i i
satisfies !
Y Y
T Ak = Ak+1 .
k∈Z k∈Z
Proof of Proposition 9.1.1. Let C denote the closure (in L2 (Ω, A, P))
of the convex hull C of
E = {f ◦ T k / k ∈ Z}, (1 ).
From the orthogonal projection theorem (see e.g. théorème 3.81, page
124 in [Doukhan and Sifre, 2001]) there exists a unique f ∈ C with
kf k2 = inf{kgk2 / g ∈ C}.
If one prove
kRn (f )k2 →n→∞ kf k2
1C = C with
I I
( )
X X
C= ai xi ; ai ≥ 0, xi ∈ E, ai = 1, I ≥ I
i=1 i=1
i i
i i
i i
In order to prove
kRn (f )k2 →n→∞ kf k2
consider a convex combination
X
g= aj f ◦ T j ∈ C, with kgk2 ≤ kf k2 + .
|j|≤k
i i
i i
i i
Hence
kf k2 ≤ lim sup kRn (f )k2 ≤ kf k2 +
n
L1
Rn (f ) −→n→∞ EI f.
The ergodic theorem (aim of this section) is also based upon the next
inequality
i i
i i
i i
Then:
E f ◦ T · 1ISn+ (f )>0 ≥ 0.
Sk (f ) ≤ f ◦ T + Sn+ (f ) ◦ T.
Sn+ (f ) = max Sk (f ).
1≤k≤n
Thus
This entails
Now
Hence
Ef ∨ 0 Ef ◦ T 1ISn+ (f −c)>0
≥ ≥ P(Sn+ (f − c) > 0).
c c
i i
i i
i i
Thus
Hence
Ef ∨ 0
≥P max (Rk (f ) − c) > 0 .
c 1≤k≤n
The result follows from summing up the previous inequalities and for
n → ∞.
Indeed |f | = f ∨0−f ∧0 and P(R−c > 0)+P(R+c < 0) = P(|R| > c)
for each random variable R.
Rn (f ) →n→∞ EI f, a.s.
Hence
m
kgk∞ X (m + 1)kgk∞
|Rn (g − Rm (g))| ≤ 2j = .
nm j=1 n
i i
i i
i i
Thus
Hence
P
1) Markov inequality implies EI (gm − f ) →m→∞ 0. Indeed
1
E EI (gm − f ) ≤ kgm − f k1
c
2) Let Am = supn≥1 |Rn (f − gm )| then from Lemma 9.1.1:
1
P(Am > c) ≤ kf − gm k1 .
c
The previous relations 1) and 2) imply
P lim sup Rn (f ) − EI f > c = 0
n
i i
i i
i i
i i
i i
i i
The result still holds for each bounded function from a density
argument. Now Corollary 9.1.3 entails EJ f (X) = Ef (X) and
ergodicity follows.
Forthcoming examples follow this scheme:
• A Gaussian stationary sequence is ergodic if its covariance rn →
0. Quote that this condition seems necessary since eg. a con-
stant sequence Xn = ξ0 ∼ N (0, 1) is not ergodic.
Assume X0 ∼ N (0, 1). If f Hermite expansion writes
∞
X
f= ck Hk
k=0
then
∞ 2
X c k k
Cov (f (X0 ), f (Xn )) = rn = G(rn ) .
k!
k=1
i i
i i
i i
9.2 Range
We aim at providing some ideas yielding definitions for the range of
a process. Namely we advocate to define it according a possible limit
theorem. As this is claimed at the beginning to the present Part
III, a definition through a limit theorem in distribution allows to
define an asymptotic confidence interval for testing a mean through
the simplest frequentist empirical mean. After Theorem 9.1.1 this is
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
i i
Chapter 10
Long range pendant phenomena were first exhibited by Hurst for hy-
drology purposes. This phenomenon occurs from the superposition of
independent sources, namely confluent rivers provide this behavior.
Such aggregation procedures provide this new phenomenon.
In the present chapter we address the Gaussian and linear cases as
well as the case of functions of such processes where such LRD phe-
nomena occur. Due to the technical difficulties we restrict to the
initial example of Rosenblatt for functions of Gaussian processes. Fi-
nally we also describe some few additional extensions.
The most elementary example is that of Gaussian processes. We fol-
low the presentation in [Rosenblatt, 1985] who discovered long range
dependent behaviors. He considered models of instantaneous func-
tions of a Gaussian process.
rk ∼ ck −β as k → ∞,
187
i i
i i
i i
and
Var S[nt]
Zn (t) ∼ N 0, .
Var Sn
• Hence if β > 1, Var Sn ∼ nσ 2 the sequence is SRD and
1
Var S[nt] → tσ 2 .
n
Now Zn converges to a Brownian motion with variance
∞
X
σ2 = rk .
k=−∞
i i
i i
i i
• !
n
X
Var g(Xk ) = O (n) ,
k=1
[nt]
1 X
√ g(Xk ) →n→∞ σWt , in D[0, 1].
n
k=1
n
!
X
Var g(Xk ) = O n2−m(g)β .
k=1
i i
i i
i i
m(g)β 1
Here 1 − 2 > 2 and the convergence still holds
n
1 X
m(g)β
g(Xk ) →n→∞ Zr ,
n1− 2
k=1
i i
i i
i i
The simple observation that trace(Rn ) = n follows from the fact that
Rn ’s diagonal elements equal 1; we thus deduce that
n
!
−tnβ
X
β−1 β−1
e = exp −2tn trace Rn = exp − (2tn )λi,n
i=1
Thus:
n
!
1 X
EetUn log 1 − 2tnβ−1 λi,n + 2tnβ−1
= exp −
2 i=1
∞
!
1X1 β−1 k k
= exp (2tn ) trace Rn
2 k
k=2
i i
i i
i i
thus 2|t| nβ−1 λi,n < 1 if |t| < c for some constant c > 0.
Thus all the necessary convergences hold in order to justify the
above mentioned calculation if we assume |t| ≤ c/2.
Hence if t is small enough:
∞
!
1 X c k
EetUn →n→∞ exp (2t)k
2 k
k=2
i i
i i
i i
and thus the method of moments does not apply to prove convergence
in law (see Theorem 12.1.1).
[Dobrushin and Major, 1979] introduced weaker convergences for se-
quences of multiple Ito integrals in order to derive ”non Central Limit
Theorems”, .
i i
i i
i i
Proof. Set
√ n
nα X
Sn (x) = 1I{Xt ≤x} − F (x) + f (x)Xt .
nL(n) t=1
i i
i i
i i
Then
n
Var (Sn (y) − Sn (x)) ≤ µ([x, y]).
n1+γ
where µ is a finite measure on R. Then a chaining argument is used.
Remark 10.5.1. [Ho and Hsing, 1996] extend the expansion of the
reduction principle:
p
n
!
npα/2 X X
r (r)
Sn,p (x) = 1I{Xt ≤x} − F (x) − (−1) F (x)Yn,r ,
nL(n) t=1 r=0
n
X X r
Y
Yn,r = bjs ξt−js .
t=1 1≤j1 <...<jr s=1
i i
i i
i i
where (ζu )u∈Zd is an i.i.d. random field with zero mean and variance
1, and bu = B0 (u/|u|)|u|−(d+α)/2 , for u ∈ Zd , 0 < α < d and B0 is a
continuous function on the sphere.
1 X
Let An = [1, n]d ∩ Zd and Fn (x) = d 1I{Xt ≤x} :
n
t∈An
also admit LRD behaviors if the iid sequence (ξt ) is centered and
∞
X
Eξ02 b2j < 1,
j=1
i i
i i
i i
but
∞
X
|bj | = ∞.
j=1
t−j + j − 1
d d
+ 1 dt−3 + 2 d
t−1 t−2
aj (t) = ··· ,
1 2 3 j
d d
t−1 t−j + 1
dt−j+1 + 2 dt−2 + j − 1
bj (t) = ··· .
1 2 3 j
i i
i i
i i
i i
i i
i i
D[0,1]
n−d−1/2 Sn (t) −→ N →∞ cd h0∞ (0)Bd (t),
D[0,1] (2)
n−2d Sn (t) −→ N →∞ cd h00∞ (0)Zd (t).
i i
i i
i i
i i
i i
i i
Chapter 11
The aim of the Chapter is only to fix some simple ideas. We inves-
tigate here Conditions on time series such that the standard limit
theorems obtained for independent identically distributed sequences
still hold.
After a general introduction to weak dependence conditions an ex-
ample quotes the fact that the most classical strong-mixing condition
from [Rosenblatt, 1956] may fail to work, see [Andrews, 1984].
Now when dealing with any weak dependence condition (including
strong mixing), additional decay rates and moment conditions are
necessary to ensure CLTs. Thus decay rates need to be known. Cou-
pling arguments as proposed in § 7.4.2 are widely used for this.
Finally to make more clear the need of decay rates, we explain how
CLTs may be proved with such assumptions.
The monograph [Dedecker et al., 2007] is used as the reference for
weak dependence and formal results should be found there.
201
i i
i i
i i
i1 ≤ · · · ≤ iu ≤ j1 − r ≤ j1 ≤ · · · ≤ jv .
i i
i i
i i
Hence if we assume that initial values of the model are in the unit
interval, the remainder term is ≤ 2−1−p →p→∞ .
i i
i i
i i
0.4
model1
0.2
0.0
0.4
0.0
0 5 10 15 20 25
Lag
i i
i i
i i
This does not matter much since such event admits zero probability.
The marginals of this process are easily proved to be uniformly dis-
tributed on [0, 1] :: for example choose an interval with dyadic ex-
tremities makes it evident and such intervals generate the Borel sigma
field of [0, 1].
Now the previous condition writes in terms of sigma-algebra gener-
ated by the processes and Xt−1 is the fractional part of 2Xt which
implies the inclusion of sigma algebras generated by marginals of such
processes.
More precisely
X0 = 0, ξ0 ξ−1 ξ−2 . . . ,
and
Xr = 0, ξr ξr−1 ξr−2 · · · ξ0 ξ−1 ξ−2 . . .
thus the event A = (X0 ≤ 12 ) writes as ξ0 = 0 and thus P(A) = 12 ;
now there exists a measurable function (namely the r−th iterate of
x 7→frac(2x)) such that X0 = fr (Xr ) and thus A = fr−1 ([0, 12 ]) ∈
σ(Xr ).
If r = 1 then h 1 i h 1 3 i
A = X1 ∈ 0, ∪ ,
4 2 4
and if r = 2 we also easily check that
h 1 i h 1 3 i h 1 5 i h 3 7 i
A = X2 ∈ 0, ∪ , ∪ , ∪ , .
8 4 8 2 8 4 8
More generally A = B with Ar = (Xr ∈ Ir ) where Ir is the union of
2r intervals with dyadic extremities and with the same length 2−r−1 .
Thus
1
αr ≥ sup{A ∈ σ(X0 ), B ∈ σ(Xr )} ≥ P(A∩Br )−P(A)P(Br ) = .
4
Such examples prove that strong mixing type notions are not enough
to consider a wide class of statistical models.
i i
i i
i i
Gk = gk (X1 , . . . , Xk ) →k→∞ Γ.
Xn = aXn−1 + ξn , (11.3)
i i
i i
i i
i i
i i
i i
i i
i i
i i
(p) (p)
then this is simple to check that Xs and Xt are independent if
|t − s| > 2p and in case r > 2p this also implies that also f 0 and
(p) (p)
g0 are independent when setting f 0 = f (Xi1 , . . . , Xiu ) and g0 =
(p) (p)
g(Xj1 , . . . , Xjv ) and thus
P
• |Cov (f , g)| ≤ vLip g · r if respectively r = 2E|ξ0 | i>r |ai | for
the causal case, ai = 0 if i < 0.
i i
i i
i i
the sequence (ξi )|i|≤r is obtained by setting 0 for indices with |i| > r.
Then r ↓ 0 as r ↑ ∞ (1 ). and the forthcoming conditions ψθ or ψη
apply according the fact the the Bernoulli is causal or not.
Here respectively
i i
i i
i i
i i
i i
i i
P∞
Under the condition B = kξ0 kp l=1 |bl | < 1, this is sim-
ple to derive with independence of all those products that
(k) (k,L)
kSn kp ≤ |b0 |ak . Now set Sn for the finite sum where
each of the indices satisfy 1 ≤ l1 , . . . , lk ≤ L then analo-
gously
X
kSn(k) − Sn(k,L) kp ≤ k|b0 |ak−1 BL , with BL = kξ0 kp |bl |,
l>L
where the factor k comes from the fact that in order that
only the tail of a series appears, it may appear at any
position in those multiple sums.
Turning now to p = 1, we approximate Xn by the following
(LK)−dependent sequence
r
Eg. if bl = 0 for l > L large enough then θr ≤√ Ca L , if
BL h decays geometrically to 0, then θr ≤ Ce−c r , and if
BL ≤ cL−β then θr ≤ c0 r−β .
• Either Gaussian processes or associated random processes (in
L2 ) are κ−weakly dependent because of Lemma 8.1. Here
i i
i i
i i
θr ≤ αr .
i i
i i
i i
Set
k
X
Cov (ei<t,X1 +···+Xj−1 > , ei<t,Xj > ).
T (k) =
j=1
Then
∆k ≤ T (k) + 6ktk2+δ Ak .
Proof. Going ahead with the proof of Lemma 2.1.1 we will only
reconsider the bound of Eδj , for this let a random variable Uj∗ be
independent of all the other random variables already considered and
with the same distribution as Uj .
Then we decompose:
δj = (g(Zj + Uj ) − g(Zj + Uj∗ )) + (g(Zj + Uj∗ ) − g(Zj + Vj ))
The second term admits the bound provided in Lemma 2.1.1, which
writes as stated above since for f (x) = ei<t,x> one easily derive
that kf (p) k∞ = |t|p . Now the first term is the ”dependent” one and
from the independence of V ’s and multiplicative properties of the
exponential:
|E(g(Zj +Uj )−g(Zj +Uj∗ ))| ≤ |Eg(U1 +· · ·+Uj−1 )(g(Uj )−g(Uj∗ ))|
= |Cov (g(U1 + · · · + Uj−1 ), g(Uj ))|.
i i
i i
i i
q = q(n) p = p(n) n as n ↑ ∞.
The variables U are almost independent since they are all distant at
least q so that Lemma 11.5.1 may be applied.
i i
i i
i i
Theorem 11.5.1 (Dedecker & Rio, 1998). Let (Xn )n∈Z be an ergodic
stationary sequence with EXn = 0, EXn2 = 1. Set Sn = X1 +· · ·+Xn .
∞
X
Assume that the random series X0 E Xn σ(Xk /k ≤ 0) converges
n=0
in L1 .
Then the sequence E(X02 + 2X0 Sn ) converges to some σ 2 and
1
√ S[nt] →n→∞ σWt , in distribution in D[0, 1] (2 ).
n
i i
i i
i i
1. The first point follows from the Strong Law of Large Numbers.
2. Ergodic theorem (Corollary 9.1.3) does not hold since the limit
is non-deterministic, which proves non-ergodicity.
i i
i i
i i
i i
i i
i i
Chapter 12
219
i i
i i
i i
then
Un →n→∞ U, in law.
If moreover U admits an analytic Laplace transform around 0. This
holds if there exists α > 0 with Eeα|U | < ∞.
12.1.1 Notations
Let Y = (Y1 , . . . , Yk ) ∈ Rk be a random vector we set
Xk
φY (t) = Eeit·Y = E exp i tj Yj ,
j=1
mp (Y ) = EY1p1 · · · Ykpk ,
where
p = (p1 , . . . , pk ), t = (t1 , . . . , tk ) ∈ Rk ,
r r
|p| = p1 + · · · + pk = r, E(|Y1 | + · · · + |Yk | ) < ∞.
Denote
p! = p1 ! · · · pk !, tp = tp11 · · · tpkk ,
i i
i i
i i
Previous sums are taken over all the partitions µ1 , . . . , µu of the set
{1, . . . , k}. The Taylor expansion of s 7→ log(1 + s) as t → 0 succes-
sively yields
Xi|p|
φY (t) = 1+ mp (Y )tp + o(|t|r ),
p!
0<|p|≤r
u
r
X (−1)u−1 X i|p|
log φY (t) = mp (Y )tp + o(|t|r )
u=1
u p!
0<|p|≤r
r u
X (−1) u−1 X (it)|p| Y
= mp (Y ) + o(|t|r )
u=1
u 0 < |p| ≤ r
p! j=1 j
p1 + · · · + pu = p
i i
i i
i i
i i
i i
i i
tition satisfy
X
N (k, u) = Nu (µ1 , . . . , µu )
µ1 ,...,µu
u−1
X
= Ckj (u − j)k−1 ,
j=1
k
X
and N (k, u) = (k − 1)!.
u=1
|κ(Y1 , . . . , Yk )| ≤ Mk , (12.3)
Mk Ml ≤ Mk+l , for k, l ≥ 2. (12.4)
Hence:
u
Y
κ(Y1 , . . . , Ypu ) ≤ Mp1 +···+pu . (12.5)
i=1
Proof of Lemma 12.1.1. The second point follow from inequality
a
a! b! ≤ (a + b)! also written Ca+b ≥ 1 and the second comes from
Lemma 12.1.2 and of the forthcoming Lemma.
Lemma 12.1.2. For each j, p ≥ 1 and for all the real valued random
variables
kc(ξj , ξj−1 , . . . , ξ1 )kp ≤ 2j max kξi kjpj (with kξkq = E1/q |ξ|q ).
1≤i≤j
i i
i i
i i
Thus using the recursion assumption for the pair (q, j − 1) where
q = pj/(j − 1) Minkowski and Hölder inequalities give
k
X X u
Y
|κ(Y )| ≤ Nu (µ1 , . . . , µu ) 2#µi −1 kY0 k#µi
#µi
u=1 µ1 ,...,µu i=1
k
X
≤ 2k−u N (k, u)kY0 kkk
u=1
k
X
≤ 2k−1 kY0 kkk N (k, u)
u=1
= 2k−1 (k − 1) !kY0 kkk
i i
i i
i i
Define
κq (t2 , . . . , tq ) = κ(1,...,1) (X0 , Xt2 , . . . , Xtq ).
The forthcoming decomposition explain the way cumulants behave
as covariances.
Precisely it proves that cumulants κQ (Xk1 , . . . , XkQ ) are small if for
some index l the lag kl+1 − kl is large. Here k1 ≤ · · · ≤ kQ and a
weak dependence condition will be is assumed.
This is also a natural extension of an important property of cumu-
lants. A cumulant vanishes is it is derived from a couple of indepen-
dent vectors.
i i
i i
i i
sequence (t1 , . . . , tp ).
Define the other alternative dependence coefficient:
κp (r) = max κp Xt1 , . . . , Xtp . (12.8)
t1 ≤ · · · ≤ tp
r(t1 , . . . , tp ) ≥ r
Q−2 Q−s+1
X Q
κX,Q (r) ≤ cX,Q (r) + MQ−s κX,s (r).
s=2
2
XX
κ(Xk1 , . . . , XkQ ) = Cov (Xη(k) , Xη(k) ) − κνµ (k) Kµ,k , (12.9)
u {µ}
Y
with Kµ,k = κµi (k) where he previous sum extends to all par-
µi 6=νu
titions µ = {µ1 , . . . , µu } of {1, . . . , Q} such that µi ∩ ν 6= ∅ for some
i ∩ ν 6= ∅.
i ∈ [1, u] and µ
From r νµ (k) ≥ r(k) it is easy to derive |κνµ (k) | ≤ κX,#νµ (r).
This allow to let the size of lags increase.
With Lemma 12.1.1 we arrive to |Mµ | ≤ MQ−#µν as in (12.5) and
i i
i i
i i
[Q/2]
X X
κ Xk1 , . . . , Xk ≤ CX,Q (r) + (u − 1)! MQ−#νµ |κνµ (k) (X)|
Q
u=2 µ1 ,...,µu
[Q/2] Q−2
X X X
≤ CX,Q (r) + (u − 1)! MQ−s κX,s (r) 1
u=2 s=2 µ1 , . . . , µu
#νµ = s
[Q/2] Q−2
X X
≤ CX,Q (r)+ (u − 1)! (u − 1)Q−s MQ−s κX,s (r)
u=2 s=2
Q−2 Q−s+1
X 1 Q
≤ CX,Q (r) + MQ−s κX,s (r)
s=2
Q−s+1 2
Inequality
U
X 1
(u − 1)p ≤ U p+1
u=1
p+1
Q−2
X
κX,Q (r) ≤ cX,Q (r) + BQ,s κX,s (r),
s=2
then
i i
i i
i i
κX,6 (r) ≤ cX,6 (r) + B6,4 κX,4 (r) + B6,3 κX,3 (r) + B6,2 κX,2 (r)
≤ cX,6 (r) + B6,4 (cX,4 (r) + B4,2 cX,2 (r))
+B6,3 cX,3 (r) + B6,2 cX,2 (r)
≤ cX,6 (r) + B6,4 cX,4 (r) + B6,3 cX,3 (r)
+(B6,2 + B6,4 B4,2 )cX,2 (r).
Remark 12.2.1.
Q
X
cX,Q (r) ≤ BQ,s κX,s (r).
s=2
eQ κ∗ (r),
cX,Q (r) ≤ A κ∗X,Q (r) = max κ∗X,l (r)µQ−l .
X,Q
2≤l≤Q
i i
i i
i i
Example 12.2.2. Constants AQ are not explicit but more tight bounds
are derived from the previous proof for small values of Q
i i
i i
i i
i i
i i
i i
p
Xn q
X
nu γu ,
E
Xj
≤ with (12.11)
j=1 u=1
2q
X X p!
γu = κp · · · κpu .
v=1 p1 +···+pu
p
=p 1
! · · · pu ! 1
X
p
|E(X1 + · · · + Xn ) | =
EXk1 · · · Xkp
1≤k1 ,...,kp ≤n
X
≤ p!Ap,n ≡ p! EXk1 · · · Xkp
1≤k1 ,...,kp ≤n
i i
i i
i i
X p
X X u
Y
Ap,n = κµj (k) (X)
1≤k1 ,...,kp ≤n u=1 µ1 ,...,µu j=1
p
X X X u
Y
= κµj (k) (X)
u=1 µ1 ,...,µu 1≤k1 ,...,kp ≤n j=1
p
X X p!
= × (12.13)
r=1 p1 +···+pr
p ! · · · pr !
=p 1
Yr X
× κpu (Xk1 , . . . , Xkpu )
u=11≤k1 ,...,kpu ≤n
q u
X X p! Y
|Ap,n | ≤ nu κp (12.14)
u=1 p1 +···+pu
p ! · · · pu ! j=1 j
=p 1
Identity (12.13) follows from a change of variable and takes into ac-
count the fact that the number of partitions for {1, . . . , p} into u sets
with respective cardinalities p1 , . . . , pu is the multinomial coefficient.
For λ ∈ N one may deduce from the stationarity of X that
X
|κpu (Xk1 , . . . , Xkλ )| ≤ nκλ .
1≤k1 ,...,kλ ≤n
Cumulants with order 1 always vanish and non zero terms are such
that if there exist u indices pj ≥ 2 (p1 , . . . , pu ≥ 2 thus 2u ≤ p) then
u ≤ q.
We thus get (12.14).
q
X
Cp up nu .
u=1
i i
i i
i i
Generally if p = 2q or p = 2q + 1 we obtain
q
X
Ap,n ≤ cj,n nj ,
j=1
i i
i i
i i
(q)
with cj,n a polynomial wrt expressions Ci,n for i ≤ j. Precisely this
Qt (qs )
is a linear combination of expressions s=1 Cis ,n with i1 +· · ·+it = j
and q1 + · · · + qt = p.
Hence for cX,p (r) = O(r−q ) one deduces Marcinkiewicz-Zygmund
inequality
E(X1 + · · · + Xn )p = O(nq ).
Lip K
Lip h ≤ l2p−1 · , if we denote:
hn
l
Y tj − x Xj − x
h(t1 , . . . , tl ) = K − EK
j=1
hn hn
If for each n > 0 the joint density fn (x, y) of the couple (X0 , Xn )
exists and
kfn (·, ·)k∞ ≤ M. (12.15)
i i
i i
i i
Exercise 26.
u ∧ v ≤ uα v 1−α , if u, v ≥ 0, 0 ≤ α ≤ 1. (12.16)
2 Check that Proposition 7.3.1 implies the existence of a stationary distribution
R
and the relation f (x) = R p(x, y)f (y) dy implies with p(x, y) = g(y − r(x)) that
M = kgk∞ .
i i
i i
i i
Proof. First quote that cU,2 (0) ∼ hn f (x) K 2 (s)ds and thus one
R
but for some constant and from relation (12.16) one derives
1 θr
cU,2 (r) ≤ C 2 ∧ hn ≤ h1−3α
n θrα
hn hn
1
The assumption a > 3 implies that there exists some α < 3 such that
∞
X
θrα < ∞.
r=1
i i
i i
i i
This bound has order (nhn )−p/2 for even p and (nhn )−(p−1)/2 if p is
odd.
Consider now some even integer p > 2. Almost sure convergence of
such estimates also follows from Markov inequality and Borel-Cantelli
lemma in case:
∞
X 1
< ∞.
n=1
(nhn )p/2
which concludes.
i i
i i
i i
u + v = p, i1 ≤ · · · ≤ iu , j1 ≤ · · · ≤ jv with j1 − iu = r allow to
bound higher order moments.
Set I1 = h(Xiu ), I1 = h (Xiu ), J1 = h(Xju ), . . . since those functions
are bounded and Lip h = 1/, using the following inequalities yields
eg. under η−weak dependence:
i i
i i
i i
Appendix A
A.1 Probability
A.1.1 Notations
For a space E, recall that a sigma-algebra E is a subset of P(E), set of the
subsets of E), such that
• ∅ ∈ E,
• ∀A ∈ E : Ac ∈ E, where we denote by Ac = E \A the complementary
set of A.
• ∀An ∈ E, n = 1, 2, 3 . . . :
[
An ∈ E.
n∈N
• ∀Ai ∈ A, i = 1, 2, 3 . . . :
∞
[
A1 ⊂ A2 ⊂ · · · ⇒ lim P(An ) = P(A), A= An .
n→∞
n=1
239
i i
i i
i i
240 [CHAP. A:
i i
i i
i i
sigma-algebra containing all the open sets; it contains thus also intersections of
open sets but also much more complicated sets.
2 Anyway we assume that students will rectify by themselves the numerous
[0, +∞). In this case integrals take values in the space [0, +∞].
i i
i i
i i
242 [CHAP. A:
X ≥ 0 =⇒ EX ≥ 0. (A.1)
EV = EV 1A + EV 1Ac ≥ EV 1A ≥ uP(A)
Proof. We begin with the case d = 1. In this case we assume that C = (a, b)
is an interval then g : (a, b) → R is derivable excepted possibly on some
denumerable set.
Moreover on each point of C the left and right derivatives exist (at the
extremities, only one of them may be defined; moreover for any x, y, z ∈ C
if x < y < z then
with
g(y ± h) − g(y)
g 0 (y±) = lim ,
h→0+ ±h
then for each x0 ∈ C choose any u ∈ [g 0 (x0 −), g 0 (x0 +)], then the affine
function
f (x) = u(x − x0 ) + g(x0 )
satisfies f ≤ g and f (x0 ) = g(x0 ) by convexity.
Thus each convex function g is the upper bound of affine functions f ≤ g.
From linearity of integrals f (EZ) = Ef (Z) and thus f (EZ) ≤ Eg(Z).
Now the relation supf f (EZ) = g(EZ) allows to conclude.
If now d ≥ 1 then from the most elementary variant of Hahn-Banach
i i
i i
i i
theorem (4 ) the same representation of g holds and the proof is the same
see Figure A.1.
Remark A.1.1.
• This inequality is an equality for each affine function.
• The inequality is strict if g is strictly convex and Z is not a.s. con-
stant. The case of power functions is investigated in lemma 7.3.1.
• Let B ⊂ A be a sub-σ algebra of A, a conditional variant of this
inequality writes(5 ) :
EB g(Z) ≥ g EB Z . (A.3)
i i
i i
i i
244 [CHAP. A:
φX (t) = Eeit·X , ∀t ∈ Rd .
(Dom(LX ) if the set of such z such that this expression is well defined.)
Remark A.1.2. First quote that the characteristic function always exists
and φX (t) = LX (it).
If 0 is interior to the domain of definition of LX then this function is
analytic around 0 as well as φX .
Thus interchanging derivation and integrals is legitimate:
∂
φ(0) = i · EXj .
∂tj
Moreover Fourier integral theory implies that inversion if possible and thus
in this case φX determines X’s distribution.
i i
i i
i i
X1 + · · · + Xn ∼ B(n, p).
i i
i i
i i
246 [CHAP. A:
i i
i i
i i
Proof.
R∞
1. Let z ≥ 0 then z = 0 1I(z≥t) dt. Set λ the Lebesgue measure
on the line. Without any convergence assumption of those inte-
gral, Fubini-Tonnelli theorem applies to the non-negative function
(t, ω) 7→ 1I(Z(ω)≥t) . This allows to conclude.
2. First for X, Y ≥ 0 the same trick as above works and
Z ∞Z ∞
EXY = P(X ≥ s, Y ≥ t) ds dt.
0 0
X ±, Y ±.
The formula holds for each of them and
P(X + ≥ s),
if s ≥ 0
P(X ≥ s) =
1 − P(X − > −t), if s < 0
Now for an arbitrary couple of real valued random variables Cov (X, Y )
writes as a linear combination of four such integrals with respective
coefficients ±1.
A.2 Distributions
We provide a short introduction to Gaussians with first the standard nor-
mal and then it vector extension essential to define Gaussian processes and
finally γ−laws which permit many explicit calculations.
1 x2
ϕ(x) = √ e− 2
2π
i i
i i
i i
248 [CHAP. A:
√
wrt Lebesgue measure on R. The norming factor 2π may be checked
through the computation of a square as follows:
Z ∞ 2 Z ∞ Z ∞
x2 x2 +y 2
e− 2 dx = e− 2 dx dy
−∞ −∞ −∞
Z 2π Z ∞
r2
= dθ e− 2 r dr
0 0
= 2π.
From the analicity of φN over the whole complex plane C, the distribution
of a Normal random variable is given from its characteristic function.
i i
i i
i i
The density and the characteristic function of such distributions are derived
from linear changes in variable:
1 (y−m)2 1 2 2
fY (y) = √ e− 2σ2 , φY (t) = eitm e− 2 σ t
.
σ 2π
Gaussian samples, Gaussian densities and a Normal repartition functions
are reproduced in Figures A.2, A.3, and A.4.
3
2
1
0
-1
-2
-3
t
γ(t) = γ 2 √ ,
2
i i
i i
i i
250 [CHAP. A:
0.4
0.3
dnorm(x)
0.2
0.1
0.0
-3 -2 -1 0 1 2 3
from independence.
To prove that this characterizes Gaussians, it may be shown that the
log-characteristic function is a second degree polynomial.
With this formula, a simple recursion entails that there exists a con-
i i
i i
i i
u ∈ Rk , Σ = E(Y − EY )(Y − EY )0 ,
i i
i i
i i
252 [CHAP. A:
• Density
Through a change in variables: if Σ is invertible then Y admits a
density on Rk :
1 1 t −1
fY (y) = p e− 2 (y−EY ) Σ (y−EY ) (A.6)
(2π)k det Σ
• Characteristic function
Even for Σ non invertible we may write Y = EY + RZ. Thus for
each s ∈ Rk :
φY (s) = Eeis·Y
= eis·EY Eeis·RZ
= eit·EY EeiZ·Rs
1
= eis·EY − 2 (Rs)·(Rs)
1 t
φY (s) = eis·EY − 2 (s Σs)
(A.7)
• Conditioning
Let (X, Y ) ∼ Na+b (0, Σ) be a Gaussian vector with covariance ma-
trix written in blocs
Ia C
C0 B
for some symmetric positive definite matrix B (b × b) and a rectan-
gular matrix C with order a × b.
Then Z = Y − C 0 X est is orthogonal to X; hence from Gaussianness
of this vector they are independent. Thus E(Y |X) = C 0 X.
A.2.3 γ−distributions
As an example of the previous sections we introduce another important
class of distributions.
Definition A.2.3. The Euler function Γ of the first kind is defined over
]0, +∞[ by the relation
Z ∞
Γ(t) = e−x xt−1 dt.
0
i i
i i
i i
d t
Integration by parts together with the relation x = txt−1 entails
dx
Z ∞ Z
d ∞
{−e−x }xt dt = (−e−x )xt 0 − t (−e−x )xt−1 dx.
Γ(t + 1) =
0 dx
Lemma A.2.2. Let t > 0 then Γ(t + 1) = tΓ(t) and Γ(k) = (k − 1)! for
each k ∈ N∗ (with the convention 0! = 1).
Proof. The function fa,b is integrable around infinity in case b > 0 and this
integral converges at 0 if a > 0.
As a density admits the integral 1, we compare both integrals to get:
Z ∞ Z ∞
c−1
a,b = e−bx xa−1 dx = b−a e−y y a−1 dy = b−a Γ(a),
0 0
Lemma A.2.3. Let Z ∼ γ(a, b) then for m > 0 and <(u) < b:
Γ(a + m)
EZ m = ,
bm Γ(a)
a
uZ b
La,b (u) = Ee = .
b−u
Proofs.
Z ∞
ca,b Γ(a + m)
EZ m = ca,b xm e−bx xa−1 dx = = m .
0 ca+m,b b Γ(a)
i i
i i
i i
254 [CHAP. A:
This is an analytic function in case <(u) < b since integrals defining La,b (u)
is absolutely convergent because of
(u−b)x a−1 (<u−b)x a−1
e x =e x
Z + Z 0 ∼ γ(a + a0 , b).
if < u < a∧a0 , then the result follows from uniqueness of Laplace transforms
in case they and analytic on a domain with non-empty interior.
Γ(a)Γ(a0 )
B(a, a0 ) = .
Γ(a + a0 )
i i
i i
i i
is well defined, indeed such integrals converge at origin since a > 0 and at
point 1, it is due to the fact that a0 > 0.
Let g be a continuous and bounded function then for such independent
Z ∼ γ(a, b) and Z 0 = γ(a0 , b) one derives:
Z ∞Z ∞
Eg(Z + Z 0 ) = g(z + z 0 )fa,b (z)fa0 ,b (z 0 ) dzdz 0
0 0
Z ∞ Z u
= g(u) du fa,b (z)fa0 ,b (u − z) dz
0
Z ∞ 0 Z u
0
= ca,b ca0 ,b e−bu g(u) du z a−1 (u − z)a −1 dz,
0 0
Z ∞
0 a+a0 −1 −bu
= ca,b ca0 ,b B(a, a ) u e g(u) du with z = ut
0
0 ∞
ba+a B(a, a0 )
Z
0
−1 −bu
= ua+a e g(u) du
Γ(a)Γ(a0 ) 0
Γ(a)Γ(a0 )
so that B(a, a0 ) = .
Γ(a + a0 )
Tk = N12 + · · · + Nk2
i i
i i
i i
256 [CHAP. A:
Eg(T1 ) = Eg(N 2 )
Z ∞
2 dx
= g(x2 )e−x /2 √
−∞ 2π
Z ∞
2 −x2 /2 dx
= 2 g(x )e √
2π
Z0 ∞
1 −z/2 dz
= 2 g(z) √ e √ , (with z = x2 )
0 2 z 2π
Z ∞
1 −1 −z/2 dz
= g(z)z 2 e √
0 2π
1 √
Thus the density function of T1 ’s distribution is z 2 −1 e−z/2 / 2π for z ≥ 0.
Up to a constant this is the density f 1 , 1 of a law γ( 12 , 12 ).
2 2 √
Since they are both densities it implies c 1 , 1 = 1/ 2π. and thus Γ( 21 ) =
√ 2 2
π.
Now addition formulas allow to conclude for k > 1 that Sk ∼ γ(k, λ) and
Tk ∼ χ2k ∼ γ( k2 , 21 ). The last result follows either from the previous result
Lemma A.2.3 since
Γ( 1 + p)
ET1p = −p2 1 ,
2 Γ( 2 )
but it needs some additional effort. A simpler way to proceed is to use
relation A.4 and from comparing both sides of the expansion of EeitN =
2
e−t /2 ,
X 1 2 X 1 −t2 p
EeitN = (it)m EN m , e−t /2 = .
m
m! p
p! 2
Clearly the parity of the characteristic function implies that all odd mo-
ments vanish. Now for m = 2p, we obtain:
((−t2 )/2)p t2p
= EN 2p (−1)p .
p! (2p)!
A.3 Convergence
A.3.1 Convergence in distribution
We consider a sequence of random variables Xn and a random variable X
with values in an arbitrary complete separable metric space (E, d) .
i i
i i
i i
This metric makes D[0, 1] complete and separable. We shall not prove this
but this is simple to prove that limn δ(gt , gt+ 1 ) = 0.
n
Quote that δ ≤ d thus for example the function f 7→ sup0≤t≤1 f (t) is a
continuous function on this space (D[0, 1], δ). Convergence in this space is
addressed in [Billingsley, 1999]; it is not in the scope of those notes.
A criterion for the convergence of empirical repartition functions Zn (t) =
1
n− 2 (Fn (t) − t) of a stationary sequence with uniform marginal distribution
(see in [Dedecker et al., 2007]) is
7 i.e. bijective continuous functions with a continuous inverse.
i i
i i
i i
258 [CHAP. A:
tinuous. Indeed, from Heine theorem, a continuous over a compact set is uni-
formly continuous.
9 From a convolution approximation with a bounded and indefinitely derivable
i i
i i
i i
i i
i i
i i
260 [CHAP. A:
On the probability space ((0, 1], B((0, 1], λ), the sequence Xn = 1IAn
follows the same Bernoulli distribution b( 21 ) thus it converges in dis-
tribution to X0 .
Now the sequence Xn does not converge in probability since
1 1 1
λ An ∩ 0, = < = λ(A0 ).
2 4 2
E|Z|q
P(|Xn | > ) ≤ →n→∞ 0,
nq q
for each > 0 in case q ∈ (0, p[.
i i
i i
i i
Lemma
P A.3.3 (Borel-Cantelli). If (Bn )n∈N is a sequence of events such
that n P(Bn ) < ∞ then
P(limBn ) = 0.
n
Remark A.3.5.
• If Xn → X in probability then some subsequence of Xn also converges
a.s.
Indeed for each k > 0 limn P(kZn > 1) = 0 with Zn = |Xn − X|.
This is possible to extract a subsequence φk (m) such that
X
P(kZφk (m) > 1) < ∞.
m
i i
i i
i i
i i
i i
i i
Bibliography
[Bardet et al., 2006] Bardet, J. M., Doukhan, P., Lang, G., and Ra-
gache, N. (2006). Dependent Lindeberg central limit theorem and
some applications. ESAIM P&S, 12:154–171.
263
i i
i i
i i
264 BIBLIOGRAPHY
[Dedecker et al., 2007] Dedecker, J., Doukhan, P., Lang, G., León,
J. R., Louhichi, S., and Prieur, C. (2007). Weak dependence:
With Examples and Applications. Lecture Notes in Statistics 190,
Springer-Verlag, New York.
i i
i i
i i
BIBLIOGRAPHY 265
i i
i i
i i
266 BIBLIOGRAPHY
[Giraitis et al., 2012] Giraitis, L., Koul, H., and Surgailis, D. (2012).
Large sample inference for long memory processes. Imperial College
Press.
[Ho and Hsing, 1996] Ho, H.-C. and Hsing, T. (1996). On the asymp-
totic expansion of the empirical process of long memory moving
averages. Ann. Stat., 24:992–1024.
i i
i i
i i
BIBLIOGRAPHY 267
i i
i i
i i
268 BIBLIOGRAPHY
[van der Vaart, 1998] van der Vaart, A. (1998). Asymptotic statistics.
Cambridge series in statistical and probabilistic mathematics.
[van der Vaart and Wellner, 1998] van der Vaart, A. and Wellner,
J. A. (1998). Weak convergence and empirical processes. Springer
Verlag series in statistics.
i i
i i
i i
Index
269
i i
i i
i i
270 INDEX
i i
i i
i i
INDEX 271
law, exponential, 245, 246, 255 Poisson process, 121, 134, 152
law, Gaussian, 217, 220, 245, probability space, 239
247, 248
law, normal, 245 random measure, 60, 61
law, Poisson, 152, 245 random variable, 241
law, Rademacher, 216 range, 171, 182
law, uniform, 245, 246 regression, non-parametric, 44
least squares estimator (LSE), regression, random design, 44
37 resampling, 74, 207
limit superior, 261
Lindeberg, 17, 19 short range dependence (SRD),
Lindeberg, dependent, 214 65, 171, 183, 189
linear process, 109 sigma-algebra, 239, 240
long range dependence (LRD), Skorohod space, 184, 257
64, 171, 187, 196, 198 spectral estimation, 71, 72, 219
spectral representation, 61
Markov chain, 133 stationarity, second order, 54
Markov chain, stable, 133 stationarity, strict, 54–56, 123,
maximum likelihood estimator 132, 135, 153, 156, 168,
(MLE), 37 171, 179, 184, 187
mean, 110, 112, 149, 171, 182, stationarity, weak, 54, 55, 61,
241 64, 67, 69
measured space, 239 stochastic volatility model, 137
Mehler formula, 96 strong mixing, 145, 202, 207
memory model, 128, 133 subsampling, 74, 206, 217, 238
model selection, 37
moment, 131, 139, 142, 143, unbiased, 35, 36, 39, 68, 98
145, 153, 161, 201, 219
Volterra expansion, 123, 124,
moment inequality, 24, 213
128, 211
moment method, 142, 148
multispectral density, 219 weak dependence, 159, 164, 168,
non-linear AR–model, 73, 137, 201, 202, 208, 217, 225
153 Weierstrass theorem, 26
Whittle estimator, 72, 140
operator, Steutel-van Harn, 149 Wold decomposition, 69
operator, thinning, 149
Yule-Walker equation, 115, 140
periodogram, 70, 219
i i
i i
i i
272 INDEX
The main focus of this book is to present examples and tools for
modeling non-linear time series. The volume is divided 3 parts and
an Appendix.
The main standard tools of probability and statistics which directly
apply to the time series case are rapidly described in order to get a
wide panel of modeling possibilities. For example functional estima-
tion and bootstrap are rapidly recalled. Stationarity is first intro-
duced and this volume will keep this assumption for clarity.
We then describe some tools from Gaussian chaos. Then we give a
fast tour on linear time series models. Nonlinearity then appears from
polynomial or chaotic models for which explicit expansions are avail-
able. Then we turn to Markov and non-Markov linear models; we
provide standard examples such as ARCH-type, integer valued mod-
els... and their estimation. We also develop an useful tool: Bernoulli
shifts time series models.
A third part addresses more theoretical tools with first the ergodic
theorem which is seen as the first step for statistics of time series.
Then distributional range is defined to get generic tools for limit
theory under Long or Short Range dependences (LRD/SRD). Exam-
ples of LRD behaviors are made explicit. More general techniques to
prove (central-)limit theorems are described under SRD; mixing and
weak dependence are also recalled. Finally moment techniques are
described together with there relations to cumulant sums as well as
an application to kernel type estimation.
A short Appendix recalls basic facts of probability theory and useful
laws issued from the Gaussian laws are discussed.
The text is illustrated with simulations and an index aims at helping
a reader. This is an advanced master course. Expected reader are
thus either mathematician intending to enter the field of time series as
well as statisticians wanting to get a more mathematical background.
i i
i i